Conversation

Notices

Embed this notice
Blaise Pabón - controlpl4n3 (blaise@hachyderm.io)'s status on Wednesday, 07-May-2025 01:48:30 JST Blaise Pabón - controlpl4n3
in reply to
- Zach Bennoui
- Chi Kim
@chikim @ZBennoui
I have 256G, but only 20 cores and no GPU, does MoE still help in that case?

In conversation about 8 months ago from gnusocial.jp permalink
- Embed this notice
  Chi Kim (chikim@mastodon.social)'s status on Wednesday, 07-May-2025 01:48:31 JST Chi Kim
  in reply to
  - Zach Bennoui
  @ZBennoui Yea try them on local machine. Especially if you have enough ram, qwen3:30b is really fast because of MoE architecture.
  
  In conversation about 8 months ago permalink
- Embed this notice
  Zach Bennoui (zbennoui@dragonscave.space)'s status on Wednesday, 07-May-2025 01:48:32 JST Zach Bennoui
  in reply to
  - Chi Kim
  @chikim I tried the really big model on the HF demo, too scared to make an account on the QWEN Chat site lol. Haven't yet tried the smaller local versions yet but the big one is really quite good.
  
  In conversation about 8 months ago permalink
- Embed this notice
  Chi Kim (chikim@mastodon.social)'s status on Wednesday, 07-May-2025 01:48:33 JST Chi Kim
  
  Qwen3 is released right before LlamaCon tomorrow! lol 32K context length, tool calling, a way to turn on/off reasoning with /think /no_think in prompt, 119 languages support. 6 dense models (0.6b, 1.7b, 4b, 8b, 14b, 32b) and 2 MoE models (30b-a3b, 235ba22b). #LLM #AI #ML https://qwenlm.github.io/blog/qwen3/
  In conversation about 8 months ago permalink
  Attachments
  1. No result found on File_thumbnail lookup.
    
    Qwen3: Think Deeper, Act Faster
    
    from Qwen Team
    
    QWEN CHAT GitHub Hugging Face ModelScope Kaggle DEMO DISCORD Introduction Today, we are excited to announce the release of Qwen3, the latest addition to the Qwen family of large language models. Our flagship model, Qwen3-235B-A22B, achieves competitive results in benchmark evaluations of coding, math, general capabilities, etc., when compared to other top-tier models such as DeepSeek-R1, o1, o3-mini, Grok-3, and Gemini-2.5-Pro. Additionally, the small MoE model, Qwen3-30B-A3B, outcompetes QwQ-32B with 10 times of activated parameters, and even a tiny model like Qwen3-4B can rival the performance of Qwen2.

Public

Conversation

Notices

Feeds