Embed Notice

HTML Code

<blockquote style="position: relative; padding-left: 55px;"><section><a href="https://poa.st/objects/5dc8c4e5-2c4a-4bf9-a7e7-00cd6e043262">snap (snappler@poa.st)'s status on Thursday, 30-Jan-2025 03:54:48 JST</a><a href="https://poa.st/users/snappler" title="snappler@poa.st"><img src="https://gnusocial.jp/avatar/253307-48-20240402133237.webp" width="48" height="48" alt="snap" style="position: absolute; left: 0; top: 0;">snap</a><div><a href="https://poa.st/objects/2ede3356-1f03-4248-8aa3-9aa3b2cc4042" rel="in-reply-to">in reply to</a><ul><li></ul></div></section><article><a href="https://poa.st/users/IAMAL_PHARIUS">@IAMAL_PHARIUS</a> Some context no one asked for since fuck you, I'm spamming everywhere about LLM shit today:<br><br>DeepSeek V3 is the non-thinking version of R1. It has pretty severe repetition issues in multi-turn (chat) settings but is pretty good overall. I say that<br><br>The benchmarks here should always be taken with a grain of salt because they poorly reflect real world use, but, broadly speaking, they point toward a vague understanding of baseline ability for the models. Of the benchmarks listed, MMLU-Pro has the most general correlation with end-user ability, but it is still a rough-fit kind of thing. Benchmarks are automated, not human evaluated with an eye for detail, so they can only do so much.<br><br>THAT BEING SAID, this is very good performance for a non-thinking/reasoning model so it's very promising and you can try it out yourself at <a href="https://chat.qwenlm.ai">chat.qwenlm.ai</a><br><br>Qwen models tend to be pretty strongly "aligned" (cucked for "safety" purposes), so you are likely going to be able to have less fun with Qwen 2.5 Max than DeekSeek R1.</article><footer><a rel="bookmark" href="https://gnusocial.jp/conversation/4472783#notice-8756818">In conversation</a><time datetime="2025-01-30T03:54:48+09:00" title="Thursday, 30-Jan-2025 03:54:48 JST">about 3 months ago</time> <span>from <span><a href="https://poa.st/objects/5dc8c4e5-2c4a-4bf9-a7e7-00cd6e043262" rel="external" title="Sent from poa.st via ActivityPub">poa.st</a></span></span><a href="https://poa.st/objects/5dc8c4e5-2c4a-4bf9-a7e7-00cd6e043262">permalink</a><h4>Attachments</h4><ol><li><article><header><div>No result found on File_thumbnail lookup.</div><h5><a href="https://chat.qwenlm.ai/">Qwen Chat</a></h5><div></div></header><div>Qwen Chat offers comprehensive functionality spanning chatbot, image and video understanding, image generation, document processing, web search integration, tool utilization, and artifacts.</div><footer></footer></article></li></ol></footer></blockquote>

Corresponding Notice

Embed this notice
snap (snappler@poa.st)'s status on Thursday, 30-Jan-2025 03:54:48 JSTsnap
in reply to
- Chinese man ? #nobot
@IAMAL_PHARIUS Some context no one asked for since fuck you, I'm spamming everywhere about LLM shit today:

DeepSeek V3 is the non-thinking version of R1. It has pretty severe repetition issues in multi-turn (chat) settings but is pretty good overall. I say that

The benchmarks here should always be taken with a grain of salt because they poorly reflect real world use, but, broadly speaking, they point toward a vague understanding of baseline ability for the models. Of the benchmarks listed, MMLU-Pro has the most general correlation with end-user ability, but it is still a rough-fit kind of thing. Benchmarks are automated, not human evaluated with an eye for detail, so they can only do so much.

THAT BEING SAID, this is very good performance for a non-thinking/reasoning model so it's very promising and you can try it out yourself at chat.qwenlm.ai

Qwen models tend to be pretty strongly "aligned" (cucked for "safety" purposes), so you are likely going to be able to have less fun with Qwen 2.5 Max than DeekSeek R1.
In conversationabout 3 months ago from poa.stpermalink
Attachments
1. No result found on File_thumbnail lookup.
  Qwen Chat
  Qwen Chat offers comprehensive functionality spanning chatbot, image and video understanding, image generation, document processing, web search integration, tool utilization, and artifacts.

Public

Embed Notice

HTML Code

Corresponding Notice