i have @actuallybot running on a mini PC using one of the 8b Llama models. yeah, it takes a few seconds to generate the response, but when latency isn't an issue, small models can perfectly run on a CPU.
uh-oh!! so this is something i heard Gary Marcus talking about on a podcast. the real recent progress in "AI" has been on the deterministic computing components **around** the LLMs, not the models themselves. building a good harness (think Claude Code) unlocks possibilities.