Embed Notice

HTML Code

<blockquote style="position: relative; padding-left: 55px;"><section><a href="https://chaos.social/users/djh/statuses/113900535510603862">Daniel (djh@chaos.social)'s status on Monday, 27-Jan-2025 22:24:06 JST</a><a href="https://chaos.social/@djh" title="djh@chaos.social"><img src="https://gnusocial.jp/avatar/105177-48-20230307163817.webp" width="48" height="48" alt="Daniel" style="position: absolute; left: 0; top: 0;">Daniel</a><div><ul><li></ul></div></section><article><p><a href="https://chaos.social/@obrhoff">@obrhoff</a> It's all open research</p><p><a href="https://arxiv.org/search/cs?searchtype=author&amp;query=DeepSeek-AI" rel="nofollow noreferrer">https://arxiv.org/search/cs?searchtype=author&amp;query=DeepSeek-AI</a></p><p>For details on deepseek-r1 and the qwen / llama distilled models, see</p><p><a href="https://arxiv.org/pdf/2501.12948" rel="nofollow noreferrer">https://arxiv.org/pdf/2501.12948</a></p><p>for the distilled model benchmark see table 5.</p><p>They're qwen / llama model architectures and different compared to their main contribution.</p></article><footer><a rel="bookmark" href="https://gnusocial.jp/conversation/4457123#notice-8724273">In conversation</a><time datetime="2025-01-27T22:24:06+09:00" title="Monday, 27-Jan-2025 22:24:06 JST">about 3 months ago</time> <span>from <span><a href="https://chaos.social/@djh/113900535510603862" rel="external" title="Sent from chaos.social via ActivityPub">chaos.social</a></span></span><a href="https://chaos.social/@djh/113900535510603862">permalink</a><h4>Attachments</h4><ol><li><label><a rel="external" href="https://gnusocial.jp/attachment/4026772">table 5 comparison of deepseek distilled models</a></label><br><a href="https://assets.chaos.social/media_attachments/files/113/900/526/392/664/519/original/b0af5ea8b0f56df9.png" rel="external">https://assets.chaos.social/media_attachments/files/113/900/526/392/664/519/original/b0af5ea8b0f56df9.png</a></li></ol></footer></blockquote>

Corresponding Notice

Embed this notice
Daniel (djh@chaos.social)'s status on Monday, 27-Jan-2025 22:24:06 JSTDaniel
- obrhoff
@obrhoff It's all open research
https://arxiv.org/search/cs?searchtype=author&query=DeepSeek-AI
For details on deepseek-r1 and the qwen / llama distilled models, see
https://arxiv.org/pdf/2501.12948
for the distilled model benchmark see table 5.
They're qwen / llama model architectures and different compared to their main contribution.
In conversationabout 3 months ago from chaos.socialpermalink
Attachments
1. table 5 comparison of deepseek distilled models
  https://assets.chaos.social/media_attachments/files/113/900/526/392/664/519/original/b0af5ea8b0f56df9.png

Public

Embed Notice

HTML Code

Corresponding Notice