Conversation

Notices

Embed this notice
obrhoff (obrhoff@chaos.social)'s status on Monday, 27-Jan-2025 21:41:32 JST obrhoff

I juet gave DeepSeek (7B) a go with ollama. Works pretty fast on my 32GB M1Pro.
Coding wise it was just delivering me bullshit.
#deepseek #ollama

In conversation about 2 months ago from chaos.social permalink
- Embed this notice
  Daniel (djh@chaos.social)'s status on Monday, 27-Jan-2025 22:09:05 JST Daniel
  in reply to
  
  @obrhoff because it's not deepseek-r1.
  Only the 671b / 404gb model is deepseek-r1 and most folks won't be able to run that locally.
  All other variants are the qwen / llama distilled models. It's highly confusing that folks like ollama name them deepseek-r1. They're completely different models.
  
  In conversation about 2 months ago permalink
- Embed this notice
  obrhoff (obrhoff@chaos.social)'s status on Monday, 27-Jan-2025 22:16:23 JST obrhoff
  in reply to
  - Daniel
  @djh Ah good to know! Is there a benchmark of distilled models?
  
  In conversation about 2 months ago permalink
- Embed this notice
  Daniel (djh@chaos.social)'s status on Monday, 27-Jan-2025 22:24:06 JST Daniel
  in reply to
  
  @obrhoff It's all open research
  https://arxiv.org/search/cs?searchtype=author&query=DeepSeek-AI
  For details on deepseek-r1 and the qwen / llama distilled models, see
  https://arxiv.org/pdf/2501.12948
  for the distilled model benchmark see table 5.
  They're qwen / llama model architectures and different compared to their main contribution.
  In conversation about 2 months ago permalink
  Attachments
  1. table 5 comparison of deepseek distilled models
    https://assets.chaos.social/media_attachments/files/113/900/526/392/664/519/original/b0af5ea8b0f56df9.png
- Embed this notice
  obrhoff (obrhoff@chaos.social)'s status on Monday, 27-Jan-2025 22:28:35 JST obrhoff
  in reply to
  - Daniel
  @djh Btw I'm not a expert, but if I use the 7/14B Model, there is always a "thinking" paragraph in the output. I assumed that was the reasoning part?
  I need to dig through it but it looks DeepSeek-R1-Distill-Qwen-14B seems to be a sweet spot? I gave it a try and my MacBook is just able run it and the coding results also look better.
  
  In conversation about 2 months ago permalink
- Embed this notice
  Daniel (djh@chaos.social)'s status on Monday, 27-Jan-2025 22:45:41 JST Daniel
  in reply to
  
  @obrhoff Yeah the smaller distill models you can locally are great, too, it's just important to understand that you won't get the proper deepseek-r1 everyone is hyped about.
  What models to pick for running locally depends on: What you can fit in memory, if you run on cpu or gpu, if you want to optimize for latency e.g. for interactive use cases, and what kind of quality output you're looking for.
  Just try a bunch and see what happens 😛
  
  In conversation about 2 months ago permalink

Public

Notices

Feeds