Public
- Public
- Network
- Groups
- Featured
- Popular
- People

Wanderer über dem Nebelmeer (wandereruber@poa.st)'s status on Thursday, 11-Jun-2026 11:42:55 JST

Embed this notice
Wanderer über dem Nebelmeer (wandereruber@poa.st)'s status on Thursday, 11-Jun-2026 11:42:55 JST Wanderer über dem Nebelmeer
in reply to
update:
Running beellama (cuda) on the same config is faster than llama-cpp vulkan, which is already faster than vanilla llama-cpp cuda.

I can't use TurboQuant because it's slower. It needs cpu-moe = true and apparently my cpu is NOT moe. Costs me ~15% t/s

I have not had ANY success with the dflash drafting. Logs show a lot of rejections. Maybe that's it. It's slow.

The absolute kicker why I will keep using it:
A 3X increase in prompt processing speed, that's on top of the inference speed increase. I have no idea what they did to achieve this.

In conversation about a month ago from gnusocial.jp permalink

Feeds