i got so angry after reading this paper on LLMs and African American English that i literally had to stand up and go walk around the block to cool off https://www.nature.com/articles/s41586-024-07856-5 it's a very compelling paper, with a super clever methodology, and (i'm paraphrasing/extrapolating) shows that "alignment" strategies like RLHF only work to ensure that it never seems like a white person is saying something overtly racist, rather than addressing the actual prejudice baked into the model
every day i wake up in utter disbelief of the fact that people continue to take these products seriously as tools, especially in the realm of education. end rant. FOR NOW
and what's ADDITIONALLY infuriating is some engineer or product team at openai (or whatever) is going to read this paper and think they can "fix" the problem by applying human feedback alignment blalala to this particular situation (or even this particular corpus!), instead of recognizing that there are an infinite number of ways (both overt and subtle) that language can enact prejudice, and the system they've made necessarily amplifies that prejudice
what's especially infuriating is that this outcome is *totally obvious* to anyone who knows the first thing about language, i.e., that even the tiniest atom of language encodes social context, so of course any machine learning model based on language becomes a social category detector (see Rachael Tatman's "What I Won't Build" https://slideslive.com/38929585/what-i-wont-build) & any model put to use in the world becomes a social category *enforcer* (see literally any paper in the history of the study of algorithmic bias)