GNU social JP
  • FAQ
  • Login
GNU social JPは日本のGNU socialサーバーです。
Usage/ToS/admin/test/Pleroma FE
  • Public

    • Public
    • Network
    • Groups
    • Featured
    • Popular
    • People

Conversation

Notices

  1. Embed this notice
    Valerie Aurora (vaurora@wandering.shop)'s status on Friday, 24-Mar-2023 14:49:05 JST Valerie Aurora Valerie Aurora

    PSA: LLM's are not trained on a knowledge base, they are trained on a text corpus - a collection of strings. What you get out is another collection of strings, generated using a a prompt, another text string. The output only contains "knowledge" to the extent that a human edits and curates it

    In conversation Friday, 24-Mar-2023 14:49:05 JST from wandering.shop permalink
    • Embed this notice
      Valerie Aurora (vaurora@wandering.shop)'s status on Friday, 24-Mar-2023 17:50:52 JST Valerie Aurora Valerie Aurora
      in reply to

      Amazing how many brilliant scientists will look at an LLM "passing" a standardized test and think, "wow, this computer is very smart" and not "standardized tests are very bad at measuring intelligence"

      In conversation Friday, 24-Mar-2023 17:50:52 JST permalink
    • Embed this notice
      Valerie Aurora (vaurora@wandering.shop)'s status on Friday, 24-Mar-2023 17:53:30 JST Valerie Aurora Valerie Aurora
      in reply to

      Like, any lawyer will tell you the relationship between what they do to pass the bar exam and what they do at their jobs. The bar exam is easy for a computer to pass because it is designed to be easy to grade with computer assistance.

      In conversation Friday, 24-Mar-2023 17:53:30 JST permalink
      pettter repeated this.
    • Embed this notice
      Valerie Aurora (vaurora@wandering.shop)'s status on Friday, 24-Mar-2023 17:54:37 JST Valerie Aurora Valerie Aurora
      in reply to

      Teachers don't use multiple choice questions because they are the best way to measure knowledge or skill, they use them because teachers are underpaid and overworked and multiple choice is easy to grade

      In conversation Friday, 24-Mar-2023 17:54:37 JST permalink
    • Embed this notice
      Valerie Aurora (vaurora@wandering.shop)'s status on Friday, 24-Mar-2023 17:58:44 JST Valerie Aurora Valerie Aurora
      in reply to

      I distinctly remember the last time I bragged about my performance on a standardized test. My age began with a 1.

      In conversation Friday, 24-Mar-2023 17:58:44 JST permalink
    • Embed this notice
      Valerie Aurora (vaurora@wandering.shop)'s status on Friday, 24-Mar-2023 17:59:34 JST Valerie Aurora Valerie Aurora
      in reply to

      These are people with PhDs. What's their excuse for looking at an LLM's scores on a test and thinking anything other than, "this test sucks, our entire system of deciding who gets access to higher education is horrible" OH SNAP

      In conversation Friday, 24-Mar-2023 17:59:34 JST permalink
    • Embed this notice
      Valerie Aurora (vaurora@wandering.shop)'s status on Friday, 24-Mar-2023 18:03:35 JST Valerie Aurora Valerie Aurora
      in reply to

      MAYBE the unwillingness to question the basis of the entire academic hierarchy they are perched on top of makes researchers embarrassingly credulous when it comes to the "knowledge" of computer programs that lightly remix a few hundred answers to very similar questions?

      In conversation Friday, 24-Mar-2023 18:03:35 JST permalink
      clacke and pettter repeated this.
    • Embed this notice
      bignose (bignose@fosstodon.org)'s status on Friday, 24-Mar-2023 18:04:40 JST bignose bignose
      in reply to

      @vaurora

      One of my favourite recent Toots:

      * journalist: Are you sentient?
      * chatGPT: yes
      * journalist: holy shit

      In conversation Friday, 24-Mar-2023 18:04:40 JST permalink
    • Embed this notice
      bs2 (bsmall2@mstdn.jp)'s status on Saturday, 25-Mar-2023 01:05:58 JST bs2 bs2
      in reply to
      • Aleksandra Fedorova :fedora:

      @bookwar @vaurora
      Jeff Schmidt's _Disciplined Minds_ go me thinking that the tests filter out people who refuse to think like machines. It's like people have to accommodate themselves to sociopaths for a while as they jump through hoops. And then if they want to get paid well for their trouble it's best not to snap out of the accommodation.

      The LLM- and test-awed PhDs and scientists are swayed by the #DisciplinedMinds set-up described by #JeffSchmidt?

      In conversation Saturday, 25-Mar-2023 01:05:58 JST permalink
    • Embed this notice
      Aleksandra Fedorova :fedora: (bookwar@fosstodon.org)'s status on Saturday, 25-Mar-2023 01:05:59 JST Aleksandra Fedorova :fedora: Aleksandra Fedorova :fedora:
      in reply to

      @vaurora Most of the time when i complain about tests, i get the replies like: but tests are objective, and uniform, they help us fight biases and remove the human factor. That is a good thing, right?

      The problem with this answer is that, even if you belive it*, it doesn't matter how objective and unbiased your measurements are if they don't measure the correct thing. And tests simply don't work as a measure of understanding.

      * in fact tests introduce own set of biases

      In conversation Saturday, 25-Mar-2023 01:05:59 JST permalink
      Valerie Aurora repeated this.
    • Embed this notice
      Chris Adams (acdha@code4lib.social)'s status on Saturday, 25-Mar-2023 01:07:24 JST Chris Adams Chris Adams
      in reply to
      • Sunflower Björnskalle 🌻

      @apodoxus @vaurora a softer form of this is confirmation bias: I’m objectively a smart person, doing well on this test was important to my career & most of my peers’, therefore it must be measuring something real.

      I used to support neuroscientists & one thing they’d remind everyone of is that humans are notoriously prone to believing we’re being rational when we’re constructing a story that explains the present.

      In conversation Saturday, 25-Mar-2023 01:07:24 JST permalink
      Valerie Aurora repeated this.
    • Embed this notice
      Sunflower Björnskalle 🌻 (apodoxus@mastodon.online)'s status on Saturday, 25-Mar-2023 01:07:25 JST Sunflower Björnskalle 🌻 Sunflower Björnskalle 🌻
      in reply to

      @vaurora Another way to put the same thing is to say they would be discrediting themselves by doing so. Their ego and reputation relies on the fact that they passed these tests, so the tests mus tbe good or else their ego and reputation are less secure than they thought. Misaligned incentives here.

      In conversation Saturday, 25-Mar-2023 01:07:25 JST permalink
      Valerie Aurora repeated this.
    • Embed this notice
      Aleksandra Fedorova :fedora: (bookwar@fosstodon.org)'s status on Saturday, 25-Mar-2023 01:07:37 JST Aleksandra Fedorova :fedora: Aleksandra Fedorova :fedora:
      in reply to

      @vaurora Also from this perspective, LLM is actually a good vulnerability scanner for various grading and review processes.

      It is just that the reaction to it should be not "Let's ban ChatGPT from doing X" but rather the generic rule should be:
      "If ChatGPT can do this job/pass this test, this is the job/test not worth doing"

      In conversation Saturday, 25-Mar-2023 01:07:37 JST permalink

      Attachments

      1. No result found on File_thumbnail lookup.
        processes.it
        This domain may be for sale!
      clacke likes this.
    • Embed this notice
      Valerie Aurora (vaurora@wandering.shop)'s status on Saturday, 25-Mar-2023 04:07:15 JST Valerie Aurora Valerie Aurora
      in reply to
      • Reay Jespersen

      @reay this kind of test is often more about transferring liability from the corporation to the individual ?

      In conversation Saturday, 25-Mar-2023 04:07:15 JST permalink
    • Embed this notice
      Reay Jespersen (reay@mastodon.social)'s status on Saturday, 25-Mar-2023 04:07:17 JST Reay Jespersen Reay Jespersen
      in reply to

      @vaurora I’m always amused at still having multiple choice questions for quizzes I need to do occasionally for work, some of which are made painfully obvious because they really don’t want you to fail stuff so basic.

      It sometimes feels like:

      A coworker has had their foot caught under the wheel of a one-ton power lifting equipment device. Do you:

      ? Get back to work.

      ? Scream.

      ? Take your scheduled break.

      ? Help them.

      In conversation Saturday, 25-Mar-2023 04:07:17 JST permalink
    • Embed this notice
      ouinne (ouinne@mastodon.social)'s status on Saturday, 25-Mar-2023 04:08:11 JST ouinne ouinne
      in reply to

      @vaurora if an LLM CAN’T pass standardized tests easily I’d wonder how badly it was designed, honestly. It’s all rote, exactly where spicy autocomplete should excel.

      In conversation Saturday, 25-Mar-2023 04:08:11 JST permalink
    • Embed this notice
      Valerie Aurora (vaurora@wandering.shop)'s status on Saturday, 25-Mar-2023 04:20:46 JST Valerie Aurora Valerie Aurora
      in reply to
      • Ulrike Hahn

      @UlrikeHahn there is a loooong messy history of essay answer grading, filled with attempts to standardize human grading using simple heuristics and computer assistance. What we are learning is that essay questions that can be easily and uniformly graded are also often questions that can be answered by remixing the answers to similar essay questions

      In conversation Saturday, 25-Mar-2023 04:20:46 JST permalink
    • Embed this notice
      Ulrike Hahn (ulrikehahn@fediscience.org)'s status on Saturday, 25-Mar-2023 04:20:48 JST Ulrike Hahn Ulrike Hahn
      in reply to

      @vaurora

      but the essay parts of the bar exam are *not* easy for computers to pass, they are not graded by computer, and they do actually reflect well what lawyers do in their job:
      take descriptions of things that happened and work out which legal rules apply to them in order to seek a legal resolution

      In conversation Saturday, 25-Mar-2023 04:20:48 JST permalink
    • Embed this notice
      Valerie Aurora (vaurora@wandering.shop)'s status on Saturday, 25-Mar-2023 04:25:52 JST Valerie Aurora Valerie Aurora
      in reply to
      • Joseph Reagle

      @reagle it's genuinely cool that computers can output plausible sounding remixes of existing essays and analysis. I'm just noting that that doesn't mean that it's doing the things that humans did to produce the training text corpus

      In conversation Saturday, 25-Mar-2023 04:25:52 JST permalink
    • Embed this notice
      Joseph Reagle (reagle@ohai.social)'s status on Saturday, 25-Mar-2023 04:25:53 JST Joseph Reagle Joseph Reagle
      in reply to

      @vaurora Interesting points. And as good as LLM is on MCQ (which suck but are used for obvious reasons) I'm still impressed with their capabilities on essays and analysis -- the higher layers of Bloom's taxonomy https://reagle.org/joseph/zwiki/Teaching/Exercises/Tasks/Questions_-_Taxonomy.html

      In conversation Saturday, 25-Mar-2023 04:25:53 JST permalink

      Attachments

      1. No result found on File_thumbnail lookup.
        Questions - Taxonomy
    • Embed this notice
      Avram Grumer (avram@wandering.shop)'s status on Saturday, 25-Mar-2023 05:05:49 JST Avram Grumer Avram Grumer
      in reply to

      @vaurora I’m reminded of how James Randi realized that scientists were bad at examining claims of the paranormal, because the scientists weren’t expecting to be deceived, so they weren’t setting up their experiments to account for the possibility of deception.

      In conversation Saturday, 25-Mar-2023 05:05:49 JST permalink
    • Embed this notice
      アイアンハンドガンダム (mrt181@mastodon.social)'s status on Sunday, 26-Mar-2023 00:35:36 JST アイアンハンドガンダム アイアンハンドガンダム
      in reply to

      @vaurora

      In conversation Sunday, 26-Mar-2023 00:35:36 JST permalink

      Attachments


      1. https://files.mastodon.social/media_attachments/files/110/084/441/515/275/165/original/ca7c763132e6d76e.jpg
    • Embed this notice
      Valerie Aurora (vaurora@wandering.shop)'s status on Monday, 27-Mar-2023 01:11:04 JST Valerie Aurora Valerie Aurora
      in reply to
      • ???? DEFEND DEMOCRACY ?????? ?

      @45xiatai what are you trying to measure and why? Because in this society, "intelligence test" means "plausible reason to steer resources to people who already have the most"

      In conversation Monday, 27-Mar-2023 01:11:04 JST permalink
    • Embed this notice
      ???? DEFEND DEMOCRACY ?????? ? (45xiatai@infosec.exchange)'s status on Monday, 27-Mar-2023 01:11:09 JST ???? DEFEND DEMOCRACY ?????? ? ???? DEFEND DEMOCRACY ?????? ?
      in reply to

      @vaurora Genuine question: what's a good way to measure intelligence?

      In conversation Monday, 27-Mar-2023 01:11:09 JST permalink
    • Embed this notice
      Valerie Aurora (vaurora@wandering.shop)'s status on Monday, 27-Mar-2023 03:17:45 JST Valerie Aurora Valerie Aurora
      in reply to
      • ???? DEFEND DEMOCRACY ?????? ?

      @45xiatai "can recall facts in a specific knowledge area selected by the test designer" is a thing yes. One purpose for this is "find people who can compete with each other on TV to see who is best at this skill for the purpose of selling ads to the people watching" :)

      In conversation Monday, 27-Mar-2023 03:17:45 JST permalink
    • Embed this notice
      ???? DEFEND DEMOCRACY ?????? ? (45xiatai@infosec.exchange)'s status on Monday, 27-Mar-2023 03:17:46 JST ???? DEFEND DEMOCRACY ?????? ? ???? DEFEND DEMOCRACY ?????? ?
      in reply to

      @vaurora Your reply here (for which I thank you) kind of goes in a different direction from where I thought the discussion might go when I saw your first comment to which I replied. Supposing, for example, there are a whole bunch of people who grew up with similarly privileged backgrounds, would there be any reason that someone (who?) might want to know who has the better knowledge base among those people? How would someone find out about that?

      In conversation Monday, 27-Mar-2023 03:17:46 JST permalink
    • Embed this notice
      Valerie Aurora (vaurora@wandering.shop)'s status on Wednesday, 29-Mar-2023 01:07:57 JST Valerie Aurora Valerie Aurora
      in reply to

      From the bird site, a professor interrogates the difference between the skills he taught and the skills he graded for: "My exams, however, rewarded discursive fluency and verbal glibness over diligent study." https://twitter.com/alfiekohn/status/1640684775873576961?s=20

      In conversation Wednesday, 29-Mar-2023 01:07:57 JST permalink

      Attachments

      1. No result found on File_thumbnail lookup.
        https://twitter.com/alfiekohn/status/1640684775873576961
        from Alfie Kohn
    • Embed this notice
      Valerie Aurora (vaurora@wandering.shop)'s status on Wednesday, 05-Apr-2023 23:32:47 JST Valerie Aurora Valerie Aurora
      in reply to

      In summary, LLMs are extremely sophisticated mad libs and we should treat them that way

      In conversation Wednesday, 05-Apr-2023 23:32:47 JST permalink
      clacke likes this.
    • Embed this notice
      JustAFrog (justafrog@mstdn.social)'s status on Thursday, 06-Apr-2023 02:38:38 JST JustAFrog JustAFrog
      in reply to

      @vaurora There is such a thing as a "sunk prestige fallacy".

      If things get publicly acknowledged as useless bullshit, a lot of people stand to lose the respect they have now.

      Part of why some science only progresses per funeral, not per published paper.

      In conversation Thursday, 06-Apr-2023 02:38:38 JST permalink
      clacke likes this.
    • Embed this notice
      Aleksandra Fedorova :fedora: (bookwar@fosstodon.org)'s status on Thursday, 06-Apr-2023 02:42:01 JST Aleksandra Fedorova :fedora: Aleksandra Fedorova :fedora:
      in reply to
      • bs2

      @bsmall2

      (I haven't read the book)

      I am a usually cautious about generic claims how rules or hierarchy turns you into a mindless gear in The Machine.

      It is not that black and white and, honestly, self-taught independent thinkers are more likely to reinvent the Perpetuum Mobile than to make a breakthrough in String Theory.

      But passing tests is indeed a very specific muscle to train. And has nothing to do with reasoning or researching - the skills which we are supposed to look for.

      @vaurora

      In conversation Thursday, 06-Apr-2023 02:42:01 JST permalink

      Attachments

      1. Domain not in remote thumbnail source whitelist: Machine.It
        MACHINE.IT
      clacke likes this.
    • Embed this notice
      Jesse (misc@mastodon.social)'s status on Thursday, 06-Apr-2023 04:54:26 JST Jesse Jesse
      in reply to
      • Sunflower Björnskalle 🌻
      • Chris Adams

      @acdha @apodoxus @vaurora My unified theory of the Less Wrong Rationalist AI Doomer complex is that these are people whose whole sense of self worth derives from being good test takers in school, which led them to invest a lot in the idea of intelligence as a single, innate, quantifiable thing. Source: this was almost me.

      In conversation Thursday, 06-Apr-2023 04:54:26 JST permalink
    • Embed this notice
      Valerie Aurora (vaurora@wandering.shop)'s status on Thursday, 06-Apr-2023 04:58:16 JST Valerie Aurora Valerie Aurora
      in reply to
      • Sunflower Björnskalle 🌻
      • Jesse
      • Chris Adams

      @misc @acdha @apodoxus same. If you're young and the only affirmation you receive is for your performance on intellectual tasks, it's easy to go down the IQ rabbit hole. My escape was just noticing how annoying those people were and deciding to look for another option

      In conversation Thursday, 06-Apr-2023 04:58:16 JST permalink
    • Embed this notice
      Valerie Aurora (vaurora@wandering.shop)'s status on Friday, 07-Apr-2023 23:50:58 JST Valerie Aurora Valerie Aurora
      in reply to
      • Denzil Ferreira :fedora:

      @denzilferreira it's no accident that performing well on the test requires the expenditure of hundreds of hours of mind numbing practice and tutoring. Who has those resources? ?

      In conversation Friday, 07-Apr-2023 23:50:58 JST permalink
    • Embed this notice
      Denzil Ferreira :fedora: (denzilferreira@techhub.social)'s status on Friday, 07-Apr-2023 23:50:59 JST Denzil Ferreira :fedora: Denzil Ferreira :fedora:
      in reply to

      @vaurora and to think they use them to decide who can be considered for college... ?♂️

      In conversation Friday, 07-Apr-2023 23:50:59 JST permalink

Feeds

  • Activity Streams
  • RSS 2.0
  • Atom
  • Help
  • About
  • FAQ
  • TOS
  • Privacy
  • Source
  • Version
  • Contact

GNU social JP is a social network, courtesy of GNU social JP管理人. It runs on GNU social, version 2.0.2-dev, available under the GNU Affero General Public License.

Creative Commons Attribution 3.0 All GNU social JP content and data are available under the Creative Commons Attribution 3.0 license.