I juet gave DeepSeek (7B) a go with ollama. Works pretty fast on my 32GB M1Pro.
Coding wise it was just delivering me bullshit.
#deepseek #ollama
I juet gave DeepSeek (7B) a go with ollama. Works pretty fast on my 32GB M1Pro.
Coding wise it was just delivering me bullshit.
#deepseek #ollama
@obrhoff because it's not deepseek-r1.
Only the 671b / 404gb model is deepseek-r1 and most folks won't be able to run that locally.
All other variants are the qwen / llama distilled models. It's highly confusing that folks like ollama name them deepseek-r1. They're completely different models.
@djh Ah good to know! Is there a benchmark of distilled models?
@obrhoff It's all open research
https://arxiv.org/search/cs?searchtype=author&query=DeepSeek-AI
For details on deepseek-r1 and the qwen / llama distilled models, see
https://arxiv.org/pdf/2501.12948
for the distilled model benchmark see table 5.
They're qwen / llama model architectures and different compared to their main contribution.
@djh Btw I'm not a expert, but if I use the 7/14B Model, there is always a "thinking" paragraph in the output. I assumed that was the reasoning part?
I need to dig through it but it looks DeepSeek-R1-Distill-Qwen-14B seems to be a sweet spot? I gave it a try and my MacBook is just able run it and the coding results also look better.
@obrhoff Yeah the smaller distill models you can locally are great, too, it's just important to understand that you won't get the proper deepseek-r1 everyone is hyped about.
What models to pick for running locally depends on: What you can fit in memory, if you run on cpu or gpu, if you want to optimize for latency e.g. for interactive use cases, and what kind of quality output you're looking for.
Just try a bunch and see what happens 😛
GNU social JP is a social network, courtesy of GNU social JP管理人. It runs on GNU social, version 2.0.2-dev, available under the GNU Affero General Public License.
All GNU social JP content and data are available under the Creative Commons Attribution 3.0 license.