UX idea for local LLMs:
Speed and responsiveness are highly desirable when chatting with LLMs, but on edge devices we don’t have the same kind of computing horsepower at our disposal. So why don’t we use the same kind of tactics humans use in normal conversation: linguistic fillers, signals we are formulating our thoughts (like the … animation in chat), even asking clarifying questions. These tactics can be used with little computing power while the real answer is formulated in the background.