My family really does get me. Depending on how you count, this is either one big gift or 1,132 little gifts. ;) Now I have to clear some table space. #LEGO#perseverance#Mars 🚀
@inthehands I agree that a focus on output is the strongest part of the NYT’s argument (and what’s been missing from most of these cases), but there is something to be said for countering the impression that these systems just spit out copies of others’ work without a lot of effort. One has to work to get those nearly identical outputs, otherwise you’re left with paraphrased content, which isn’t a copy. @mmasnick
If you only read one article on the NYT suing OpenAI, make it this one from @mmasnick
This isn't about copyright in the traditional sense. At best it's a negotiating tactic to get better terms in an agreement with OpenAI, at worst it's a push to create a bunch of "rights" out of thin air. Copyright is not the way to deal with "AI!"
(11/23) For a collection of answers in such a space, we can think of their average as their center, just as we did with the bullseye, & we can infer that this average is close to the “right” answer.
But the question is, "even with this need for review, does our machine ranking help us out somehow?" And I think the answer is, "yes." We just need to ask "compared to what?"
(16/23) Keeping these suggestions in mind, the methods we use to turn words into numbers have a lot of limitations, and these limitations suggest common ways in which this method might go awry. E.g., they don’t do well with idioms.
(12/23) The closer an answer is to this centroid, the “better” it is in some sense. So we can rank answers by seeing how far they are from the centroid. In this way we can score a set of exam answers without ever having to define a correct answer, based only on the text of all answers...
A paper of mine was just included in a collection at the MIT Computational Law Report.¹ So it seemed like a good time to tell you about its novel #ML method (M) for scoring free-response questions. Given the text of student answers, AND NOTHING MORE, M can produce a ranked list of answers that more closely match the order given by a human grader (H) than a random shuffle (R). Learn how, a 🧵 (1/23) ___ ¹ https://law.mit.edu/smu-past-present-future (Unsupervised Machine Scoring...)
(2/23) "Wait, you can 'grade' essays based on just the text of answers w/o a model answer or any external context and no LLM? Impossible," you say. By the end of this thread, if I've done my job right, you'll be saying, "Well, of course that works."
(4/23) To understand how this is possible, consider: if you’re running a competition to guess the number of candies in a jar—or weight of a cow—the average of the guesses is probably close to the right answer. This is often attributed to the wisdom of crowds. See https://npr.org/sections/money
(5/23) Human judgment can go wrong in a lot of ways, and there are more ways to be wrong than right. This is sometimes called the Anna Karenina principle, named for the 1st line of the Tolstoy novel. “Happy families are all alike; every unhappy family is unhappy in its own way.”
(8/23) But how the heck do we do this with text answers? First, remember that when people think of the average answer to a test they often mean an answer with the average grade. That’s not the average of the answers, which is what we are looking for.
(7/23) We can see how it works w/ 2D targets like a bullseye. Bias pulls some folks to the side, noise scatters things… but if there are enough ppl w/ different biases, different ways of being wrong cancel each other out such that the “middle”/average of hits is close to the bullseye.
(6/23) It is easy to see how this helps us with numeric predictions, like the count of candies or guessing a cow's weight. Some folks guess too high, others too low, and if there is a diversity of wrong answers, the different wrong answers cancel each other out when averaged.
(10/23) The point is we can map words, and collections of words (like essay answers) to points in some many-dimensional space (e.g., 300 dimensions for word2vec). Mapped into such spaces, words/answers with similar content are close to each other. See e.g., https://projector.tensorflow.org
Co-Director of Suffolk University Law School's Legal Innovation & Technology (LIT) Lab—@SuffolkLITLab. Attorney & science educator by training and practice. Creator of @LOLSCOTUS & @icymilaw. Data scientist, craftsman, and writer by experience. See eponymous website for more. He/him. No manels!#AccessToJustice (#a2j) work: https://papers.ssrn.com/sol3/papers.cfm?abstractid=3911381 (#LegalTech) & https://spot.suffolklitlab.org (#LegalTech + #AI)