oh my god i had missed this from the postmortem, but the way they validated the test queries they ran (asking the LLM for explanations of code samples) was... TO FEED THE RESULTS INTO GPT-4 AND ASK IT TO CLASSIFY THEM AS "ACCURATE" OR "INACCURATE". THEY ONLY REVIEWED SOME OF THE ONES LABELLED AS "INACCURATE".
i want to scream.
at least, i feel vindicated for engaging in the thread despite not being a heavy MDN user, because it is extremely apparent that they have no idea what they're doing.