Embed Notice

HTML Code

<blockquote style="position: relative; padding-left: 55px;"><section><a href="https://phire.place/users/eaton/statuses/113139770629726785">Eaton (eaton@phire.place)'s status on Sunday, 15-Sep-2024 13:51:49 JST</a><a href="https://phire.place/@eaton" title="eaton@phire.place"><img src="https://gnusocial.jp/avatar/46616-48-20221202184816.webp" width="48" height="48" alt="Eaton" style="position: absolute; left: 0; top: 0;">Eaton</a></section><article><p>So, I’m doing some automated comparison testing with various publicly available LLMs — classifying posts in a subreddit based on a fixed list of flare categories, and seeing how well different models do.</p><p>It's hit or miss in many cases, but about 1 out of 8 posts just makes certain models WIG OUT. Instead of responding with the name of a set category, phi3.5 started regurgitating the summary of a paper on gene polymorphism in dopamine receptors. Another responded with a snippet of python</p></article><footer><a rel="bookmark" href="https://gnusocial.jp/conversation/3670170#notice-7191688">In conversation</a><time datetime="2024-09-15T13:51:49+09:00" title="Sunday, 15-Sep-2024 13:51:49 JST">about 9 months ago</time> <span>from <span><a href="https://phire.place/@eaton/113139770629726785" rel="external" title="Sent from phire.place via ActivityPub">phire.place</a></span></span><a href="https://phire.place/@eaton/113139770629726785">permalink</a><h4>Attachments</h4><ol><li><label><a rel="external" href="https://gnusocial.jp/attachment/805134">Untitled attachment</a></label><br></li></ol></footer></blockquote>

Corresponding Notice

Embed this notice
Eaton (eaton@phire.place)'s status on Sunday, 15-Sep-2024 13:51:49 JST Eaton
So, I’m doing some automated comparison testing with various publicly available LLMs — classifying posts in a subreddit based on a fixed list of flare categories, and seeing how well different models do.
It's hit or miss in many cases, but about 1 out of 8 posts just makes certain models WIG OUT. Instead of responding with the name of a set category, phi3.5 started regurgitating the summary of a paper on gene polymorphism in dopamine receptors. Another responded with a snippet of python
In conversationabout 9 months ago from phire.placepermalink
Attachments
1. Untitled attachment

Public

Embed Notice

HTML Code

Corresponding Notice