GNU social JP
  • FAQ
  • Login
GNU social JPは日本のGNU socialサーバーです。
Usage/ToS/admin/test/Pleroma FE
  • Public

    • Public
    • Network
    • Groups
    • Featured
    • Popular
    • People

Conversation

Notices

  1. Embed this notice
    Blaise Pabón - controlpl4n3 (blaise@hachyderm.io)'s status on Wednesday, 07-May-2025 01:48:30 JST Blaise Pabón - controlpl4n3 Blaise Pabón - controlpl4n3
    in reply to
    • Zach Bennoui
    • Chi Kim

    @chikim @ZBennoui
    I have 256G, but only 20 cores and no GPU, does MoE still help in that case?

    In conversation about 8 months ago from gnusocial.jp permalink
    • Embed this notice
      Chi Kim (chikim@mastodon.social)'s status on Wednesday, 07-May-2025 01:48:31 JST Chi Kim Chi Kim
      in reply to
      • Zach Bennoui

      @ZBennoui Yea try them on local machine. Especially if you have enough ram, qwen3:30b is really fast because of MoE architecture.

      In conversation about 8 months ago permalink
    • Embed this notice
      Zach Bennoui (zbennoui@dragonscave.space)'s status on Wednesday, 07-May-2025 01:48:32 JST Zach Bennoui Zach Bennoui
      in reply to
      • Chi Kim

      @chikim I tried the really big model on the HF demo, too scared to make an account on the QWEN Chat site lol. Haven't yet tried the smaller local versions yet but the big one is really quite good.

      In conversation about 8 months ago permalink
    • Embed this notice
      Chi Kim (chikim@mastodon.social)'s status on Wednesday, 07-May-2025 01:48:33 JST Chi Kim Chi Kim

      Qwen3 is released right before LlamaCon tomorrow! lol 32K context length, tool calling, a way to turn on/off reasoning with /think /no_think in prompt, 119 languages support. 6 dense models (0.6b, 1.7b, 4b, 8b, 14b, 32b) and 2 MoE models (30b-a3b, 235ba22b). #LLM #AI #ML https://qwenlm.github.io/blog/qwen3/

      In conversation about 8 months ago permalink

      Attachments

      1. No result found on File_thumbnail lookup.
        Qwen3: Think Deeper, Act Faster
        from Qwen Team
        QWEN CHAT GitHub Hugging Face ModelScope Kaggle DEMO DISCORD Introduction Today, we are excited to announce the release of Qwen3, the latest addition to the Qwen family of large language models. Our flagship model, Qwen3-235B-A22B, achieves competitive results in benchmark evaluations of coding, math, general capabilities, etc., when compared to other top-tier models such as DeepSeek-R1, o1, o3-mini, Grok-3, and Gemini-2.5-Pro. Additionally, the small MoE model, Qwen3-30B-A3B, outcompetes QwQ-32B with 10 times of activated parameters, and even a tiny model like Qwen3-4B can rival the performance of Qwen2.

Feeds

  • Activity Streams
  • RSS 2.0
  • Atom
  • Help
  • About
  • FAQ
  • TOS
  • Privacy
  • Source
  • Version
  • Contact

GNU social JP is a social network, courtesy of GNU social JP管理人. It runs on GNU social, version 2.0.2-dev, available under the GNU Affero General Public License.

Creative Commons Attribution 3.0 All GNU social JP content and data are available under the Creative Commons Attribution 3.0 license.