@obrhoff It's all open research
https://arxiv.org/search/cs?searchtype=author&query=DeepSeek-AI
For details on deepseek-r1 and the qwen / llama distilled models, see
https://arxiv.org/pdf/2501.12948
for the distilled model benchmark see table 5.
They're qwen / llama model architectures and different compared to their main contribution.