Conversation

Notices

Embed this notice
Valerie Aurora (vaurora@wandering.shop)'s status on Friday, 24-Mar-2023 14:49:05 JST Valerie Aurora

PSA: LLM's are not trained on a knowledge base, they are trained on a text corpus - a collection of strings. What you get out is another collection of strings, generated using a a prompt, another text string. The output only contains "knowledge" to the extent that a human edits and curates it

In conversation Friday, 24-Mar-2023 14:49:05 JST from wandering.shop permalink
- Embed this notice
  Valerie Aurora (vaurora@wandering.shop)'s status on Friday, 24-Mar-2023 17:50:52 JST Valerie Aurora
  in reply to
  
  Amazing how many brilliant scientists will look at an LLM "passing" a standardized test and think, "wow, this computer is very smart" and not "standardized tests are very bad at measuring intelligence"
  
  In conversation Friday, 24-Mar-2023 17:50:52 JST permalink
- Embed this notice
  Valerie Aurora (vaurora@wandering.shop)'s status on Friday, 24-Mar-2023 17:53:30 JST Valerie Aurora
  in reply to
  
  Like, any lawyer will tell you the relationship between what they do to pass the bar exam and what they do at their jobs. The bar exam is easy for a computer to pass because it is designed to be easy to grade with computer assistance.
  
  In conversation Friday, 24-Mar-2023 17:53:30 JST permalink
  
  pettter repeated this.
- Embed this notice
  Valerie Aurora (vaurora@wandering.shop)'s status on Friday, 24-Mar-2023 17:54:37 JST Valerie Aurora
  in reply to
  
  Teachers don't use multiple choice questions because they are the best way to measure knowledge or skill, they use them because teachers are underpaid and overworked and multiple choice is easy to grade
  
  In conversation Friday, 24-Mar-2023 17:54:37 JST permalink
- Embed this notice
  Valerie Aurora (vaurora@wandering.shop)'s status on Friday, 24-Mar-2023 17:58:44 JST Valerie Aurora
  in reply to
  
  I distinctly remember the last time I bragged about my performance on a standardized test. My age began with a 1.
  
  In conversation Friday, 24-Mar-2023 17:58:44 JST permalink
- Embed this notice
  Valerie Aurora (vaurora@wandering.shop)'s status on Friday, 24-Mar-2023 17:59:34 JST Valerie Aurora
  in reply to
  
  These are people with PhDs. What's their excuse for looking at an LLM's scores on a test and thinking anything other than, "this test sucks, our entire system of deciding who gets access to higher education is horrible" OH SNAP
  
  In conversation Friday, 24-Mar-2023 17:59:34 JST permalink
- Embed this notice
  Valerie Aurora (vaurora@wandering.shop)'s status on Friday, 24-Mar-2023 18:03:35 JST Valerie Aurora
  in reply to
  
  MAYBE the unwillingness to question the basis of the entire academic hierarchy they are perched on top of makes researchers embarrassingly credulous when it comes to the "knowledge" of computer programs that lightly remix a few hundred answers to very similar questions?
  
  In conversation Friday, 24-Mar-2023 18:03:35 JST permalink
  
  clacke and pettter repeated this.
- Embed this notice
  bignose (bignose@fosstodon.org)'s status on Friday, 24-Mar-2023 18:04:40 JST bignose
  in reply to
  
  @vaurora
  One of my favourite recent Toots:
  * journalist: Are you sentient?
  * chatGPT: yes
  * journalist: holy shit
  
  In conversation Friday, 24-Mar-2023 18:04:40 JST permalink
- Embed this notice
  bs2 (bsmall2@mstdn.jp)'s status on Saturday, 25-Mar-2023 01:05:58 JST bs2
  in reply to
  - Aleksandra Fedorova :fedora:
  @bookwar @vaurora
  Jeff Schmidt's _Disciplined Minds_ go me thinking that the tests filter out people who refuse to think like machines. It's like people have to accommodate themselves to sociopaths for a while as they jump through hoops. And then if they want to get paid well for their trouble it's best not to snap out of the accommodation.
  The LLM- and test-awed PhDs and scientists are swayed by the #DisciplinedMinds set-up described by #JeffSchmidt?
  
  In conversation Saturday, 25-Mar-2023 01:05:58 JST permalink
- Embed this notice
  Aleksandra Fedorova :fedora: (bookwar@fosstodon.org)'s status on Saturday, 25-Mar-2023 01:05:59 JST Aleksandra Fedorova :fedora:
  in reply to
  
  @vaurora Most of the time when i complain about tests, i get the replies like: but tests are objective, and uniform, they help us fight biases and remove the human factor. That is a good thing, right?
  The problem with this answer is that, even if you belive it*, it doesn't matter how objective and unbiased your measurements are if they don't measure the correct thing. And tests simply don't work as a measure of understanding.
  * in fact tests introduce own set of biases
  
  In conversation Saturday, 25-Mar-2023 01:05:59 JST permalink
  
  Valerie Aurora repeated this.
- Embed this notice
  Chris Adams (acdha@code4lib.social)'s status on Saturday, 25-Mar-2023 01:07:24 JST Chris Adams
  in reply to
  - Sunflower Björnskalle 🌻
  @apodoxus @vaurora a softer form of this is confirmation bias: I’m objectively a smart person, doing well on this test was important to my career & most of my peers’, therefore it must be measuring something real.
  I used to support neuroscientists & one thing they’d remind everyone of is that humans are notoriously prone to believing we’re being rational when we’re constructing a story that explains the present.
  
  In conversation Saturday, 25-Mar-2023 01:07:24 JST permalink
  
  Valerie Aurora repeated this.
- Embed this notice
  Sunflower Björnskalle 🌻 (apodoxus@mastodon.online)'s status on Saturday, 25-Mar-2023 01:07:25 JST Sunflower Björnskalle 🌻
  in reply to
  
  @vaurora Another way to put the same thing is to say they would be discrediting themselves by doing so. Their ego and reputation relies on the fact that they passed these tests, so the tests mus tbe good or else their ego and reputation are less secure than they thought. Misaligned incentives here.
  
  In conversation Saturday, 25-Mar-2023 01:07:25 JST permalink
  
  Valerie Aurora repeated this.
- Embed this notice
  Aleksandra Fedorova :fedora: (bookwar@fosstodon.org)'s status on Saturday, 25-Mar-2023 01:07:37 JST Aleksandra Fedorova :fedora:
  in reply to
  
  @vaurora Also from this perspective, LLM is actually a good vulnerability scanner for various grading and review processes.
  It is just that the reaction to it should be not "Let's ban ChatGPT from doing X" but rather the generic rule should be:
  "If ChatGPT can do this job/pass this test, this is the job/test not worth doing"
  In conversation Saturday, 25-Mar-2023 01:07:37 JST permalink
  Attachments
  1. No result found on File_thumbnail lookup.
    
    processes.it
    
    This domain may be for sale!
  clacke likes this.
- Embed this notice
  Valerie Aurora (vaurora@wandering.shop)'s status on Saturday, 25-Mar-2023 04:07:15 JST Valerie Aurora
  in reply to
  - Reay Jespersen
  @reay this kind of test is often more about transferring liability from the corporation to the individual ?
  
  In conversation Saturday, 25-Mar-2023 04:07:15 JST permalink
- Embed this notice
  Reay Jespersen (reay@mastodon.social)'s status on Saturday, 25-Mar-2023 04:07:17 JST Reay Jespersen
  in reply to
  
  @vaurora I’m always amused at still having multiple choice questions for quizzes I need to do occasionally for work, some of which are made painfully obvious because they really don’t want you to fail stuff so basic.
  It sometimes feels like:
  A coworker has had their foot caught under the wheel of a one-ton power lifting equipment device. Do you:
  ? Get back to work.
  ? Scream.
  ? Take your scheduled break.
  ? Help them.
  
  In conversation Saturday, 25-Mar-2023 04:07:17 JST permalink
- Embed this notice
  ouinne (ouinne@mastodon.social)'s status on Saturday, 25-Mar-2023 04:08:11 JST ouinne
  in reply to
  
  @vaurora if an LLM CAN’T pass standardized tests easily I’d wonder how badly it was designed, honestly. It’s all rote, exactly where spicy autocomplete should excel.
  
  In conversation Saturday, 25-Mar-2023 04:08:11 JST permalink
- Embed this notice
  Valerie Aurora (vaurora@wandering.shop)'s status on Saturday, 25-Mar-2023 04:20:46 JST Valerie Aurora
  in reply to
  - Ulrike Hahn
  @UlrikeHahn there is a loooong messy history of essay answer grading, filled with attempts to standardize human grading using simple heuristics and computer assistance. What we are learning is that essay questions that can be easily and uniformly graded are also often questions that can be answered by remixing the answers to similar essay questions
  
  In conversation Saturday, 25-Mar-2023 04:20:46 JST permalink
- Embed this notice
  Ulrike Hahn (ulrikehahn@fediscience.org)'s status on Saturday, 25-Mar-2023 04:20:48 JST Ulrike Hahn
  in reply to
  
  @vaurora
  but the essay parts of the bar exam are *not* easy for computers to pass, they are not graded by computer, and they do actually reflect well what lawyers do in their job:
  take descriptions of things that happened and work out which legal rules apply to them in order to seek a legal resolution
  
  In conversation Saturday, 25-Mar-2023 04:20:48 JST permalink
- Embed this notice
  Valerie Aurora (vaurora@wandering.shop)'s status on Saturday, 25-Mar-2023 04:25:52 JST Valerie Aurora
  in reply to
  - Joseph Reagle
  @reagle it's genuinely cool that computers can output plausible sounding remixes of existing essays and analysis. I'm just noting that that doesn't mean that it's doing the things that humans did to produce the training text corpus
  
  In conversation Saturday, 25-Mar-2023 04:25:52 JST permalink
- Embed this notice
  Joseph Reagle (reagle@ohai.social)'s status on Saturday, 25-Mar-2023 04:25:53 JST Joseph Reagle
  in reply to
  
  @vaurora Interesting points. And as good as LLM is on MCQ (which suck but are used for obvious reasons) I'm still impressed with their capabilities on essays and analysis -- the higher layers of Bloom's taxonomy https://reagle.org/joseph/zwiki/Teaching/Exercises/Tasks/Questions_-_Taxonomy.html
  In conversation Saturday, 25-Mar-2023 04:25:53 JST permalink
  Attachments
  1. No result found on File_thumbnail lookup.
    
    Questions - Taxonomy
- Embed this notice
  Avram Grumer (avram@wandering.shop)'s status on Saturday, 25-Mar-2023 05:05:49 JST Avram Grumer
  in reply to
  
  @vaurora I’m reminded of how James Randi realized that scientists were bad at examining claims of the paranormal, because the scientists weren’t expecting to be deceived, so they weren’t setting up their experiments to account for the possibility of deception.
  
  In conversation Saturday, 25-Mar-2023 05:05:49 JST permalink
- Embed this notice
  アイアンハンドガンダム (mrt181@mastodon.social)'s status on Sunday, 26-Mar-2023 00:35:36 JST アイアンハンドガンダム
  in reply to
  
  @vaurora
  In conversation Sunday, 26-Mar-2023 00:35:36 JST permalink
  Attachments
  1. Skinner out of touch meme. Upper image: LLMs passing standardized. Are the tests bad at measuring intelligence? Bottom image: No, this computers must be very smart
    https://files.mastodon.social/media_attachments/files/110/084/441/515/275/165/original/ca7c763132e6d76e.jpg
- Embed this notice
  Valerie Aurora (vaurora@wandering.shop)'s status on Monday, 27-Mar-2023 01:11:04 JST Valerie Aurora
  in reply to
  - ???? DEFEND DEMOCRACY ?????? ?
  @45xiatai what are you trying to measure and why? Because in this society, "intelligence test" means "plausible reason to steer resources to people who already have the most"
  
  In conversation Monday, 27-Mar-2023 01:11:04 JST permalink
- Embed this notice
  ???? DEFEND DEMOCRACY ?????? ? (45xiatai@infosec.exchange)'s status on Monday, 27-Mar-2023 01:11:09 JST ???? DEFEND DEMOCRACY ?????? ?
  in reply to
  
  @vaurora Genuine question: what's a good way to measure intelligence?
  
  In conversation Monday, 27-Mar-2023 01:11:09 JST permalink
- Embed this notice
  Valerie Aurora (vaurora@wandering.shop)'s status on Monday, 27-Mar-2023 03:17:45 JST Valerie Aurora
  in reply to
  - ???? DEFEND DEMOCRACY ?????? ?
  @45xiatai "can recall facts in a specific knowledge area selected by the test designer" is a thing yes. One purpose for this is "find people who can compete with each other on TV to see who is best at this skill for the purpose of selling ads to the people watching" :)
  
  In conversation Monday, 27-Mar-2023 03:17:45 JST permalink
- Embed this notice
  ???? DEFEND DEMOCRACY ?????? ? (45xiatai@infosec.exchange)'s status on Monday, 27-Mar-2023 03:17:46 JST ???? DEFEND DEMOCRACY ?????? ?
  in reply to
  
  @vaurora Your reply here (for which I thank you) kind of goes in a different direction from where I thought the discussion might go when I saw your first comment to which I replied. Supposing, for example, there are a whole bunch of people who grew up with similarly privileged backgrounds, would there be any reason that someone (who?) might want to know who has the better knowledge base among those people? How would someone find out about that?
  
  In conversation Monday, 27-Mar-2023 03:17:46 JST permalink
- Embed this notice
  Valerie Aurora (vaurora@wandering.shop)'s status on Wednesday, 29-Mar-2023 01:07:57 JST Valerie Aurora
  in reply to
  
  From the bird site, a professor interrogates the difference between the skills he taught and the skills he graded for: "My exams, however, rewarded discursive fluency and verbal glibness over diligent study." https://twitter.com/alfiekohn/status/1640684775873576961?s=20
  In conversation Wednesday, 29-Mar-2023 01:07:57 JST permalink
  Attachments
  1. No result found on File_thumbnail lookup.
    
    https://twitter.com/alfiekohn/status/1640684775873576961
    
    from Alfie Kohn
- Embed this notice
  Valerie Aurora (vaurora@wandering.shop)'s status on Wednesday, 05-Apr-2023 23:32:47 JST Valerie Aurora
  in reply to
  
  In summary, LLMs are extremely sophisticated mad libs and we should treat them that way
  
  In conversation Wednesday, 05-Apr-2023 23:32:47 JST permalink
  
  clacke likes this.
- Embed this notice
  JustAFrog (justafrog@mstdn.social)'s status on Thursday, 06-Apr-2023 02:38:38 JST JustAFrog
  in reply to
  
  @vaurora There is such a thing as a "sunk prestige fallacy".
  If things get publicly acknowledged as useless bullshit, a lot of people stand to lose the respect they have now.
  Part of why some science only progresses per funeral, not per published paper.
  
  In conversation Thursday, 06-Apr-2023 02:38:38 JST permalink
  
  clacke likes this.
- Embed this notice
  Aleksandra Fedorova :fedora: (bookwar@fosstodon.org)'s status on Thursday, 06-Apr-2023 02:42:01 JST Aleksandra Fedorova :fedora:
  in reply to
  - bs2
  @bsmall2
  (I haven't read the book)
  I am a usually cautious about generic claims how rules or hierarchy turns you into a mindless gear in The Machine.
  It is not that black and white and, honestly, self-taught independent thinkers are more likely to reinvent the Perpetuum Mobile than to make a breakthrough in String Theory.
  But passing tests is indeed a very specific muscle to train. And has nothing to do with reasoning or researching - the skills which we are supposed to look for.
  @vaurora
  In conversation Thursday, 06-Apr-2023 02:42:01 JST permalink
  Attachments
  1. Domain not in remote thumbnail source whitelist: Machine.It
    
    MACHINE.IT
  clacke likes this.
- Embed this notice
  Jesse (misc@mastodon.social)'s status on Thursday, 06-Apr-2023 04:54:26 JST Jesse
  in reply to
  - Sunflower Björnskalle 🌻
  - Chris Adams
  @acdha @apodoxus @vaurora My unified theory of the Less Wrong Rationalist AI Doomer complex is that these are people whose whole sense of self worth derives from being good test takers in school, which led them to invest a lot in the idea of intelligence as a single, innate, quantifiable thing. Source: this was almost me.
  
  In conversation Thursday, 06-Apr-2023 04:54:26 JST permalink
- Embed this notice
  Valerie Aurora (vaurora@wandering.shop)'s status on Thursday, 06-Apr-2023 04:58:16 JST Valerie Aurora
  in reply to
  @misc @acdha @apodoxus same. If you're young and the only affirmation you receive is for your performance on intellectual tasks, it's easy to go down the IQ rabbit hole. My escape was just noticing how annoying those people were and deciding to look for another option
  
  In conversation Thursday, 06-Apr-2023 04:58:16 JST permalink
- Embed this notice
  Valerie Aurora (vaurora@wandering.shop)'s status on Friday, 07-Apr-2023 23:50:58 JST Valerie Aurora
  in reply to
  - Denzil Ferreira :fedora:
  @denzilferreira it's no accident that performing well on the test requires the expenditure of hundreds of hours of mind numbing practice and tutoring. Who has those resources? ?
  
  In conversation Friday, 07-Apr-2023 23:50:58 JST permalink
- Embed this notice
  Denzil Ferreira :fedora: (denzilferreira@techhub.social)'s status on Friday, 07-Apr-2023 23:50:59 JST Denzil Ferreira :fedora:
  in reply to
  
  @vaurora and to think they use them to decide who can be considered for college... ?♂️
  
  In conversation Friday, 07-Apr-2023 23:50:59 JST permalink

Public

Conversation

Notices

Feeds