Embed Notice
HTML Code
Corresponding Notice
- Embed this notice
snap (snappler@poa.st)'s status on Thursday, 30-Jan-2025 03:54:48 JSTsnap @IAMAL_PHARIUS Some context no one asked for since fuck you, I'm spamming everywhere about LLM shit today:
DeepSeek V3 is the non-thinking version of R1. It has pretty severe repetition issues in multi-turn (chat) settings but is pretty good overall. I say that
The benchmarks here should always be taken with a grain of salt because they poorly reflect real world use, but, broadly speaking, they point toward a vague understanding of baseline ability for the models. Of the benchmarks listed, MMLU-Pro has the most general correlation with end-user ability, but it is still a rough-fit kind of thing. Benchmarks are automated, not human evaluated with an eye for detail, so they can only do so much.
THAT BEING SAID, this is very good performance for a non-thinking/reasoning model so it's very promising and you can try it out yourself at chat.qwenlm.ai
Qwen models tend to be pretty strongly "aligned" (cucked for "safety" purposes), so you are likely going to be able to have less fun with Qwen 2.5 Max than DeekSeek R1.