Conversation

Notices

Embed this notice
Alfred M. Szmidt (amszmidt@mastodon.social)'s status on Wednesday, 09-Apr-2025 21:05:48 JST Alfred M. Szmidt

Since LLMs are Markov chains .. then LLMs aren't AI.

In conversation about 2 months ago from mastodon.social permalink
- Embed this notice
  Alfred M. Szmidt (amszmidt@mastodon.social)'s status on Thursday, 10-Apr-2025 04:16:09 JST Alfred M. Szmidt
  in reply to
  - Karsten Johansson
  @ksaj I wouldn’t even say it is a complex one .. big, yea. https://arxiv.org/abs/2410.02724
  In conversation about 2 months ago permalink
  Attachments
  1. Domain not in remote thumbnail source whitelist: arxiv.org
    
    Large Language Models as Markov Chains
    
    Large language models (LLMs) have proven to be remarkably efficient, both across a wide range of natural language processing tasks and well beyond them. However, a comprehensive theoretical analysis of the origins of their impressive performance remains elusive. In this paper, we approach this challenging task by drawing an equivalence between generic autoregressive language models with vocabulary of size $T$ and context window of size $K$ and Markov chains defined on a finite state space of size $\mathcal{O}(T^K)$. We derive several surprising findings related to the existence of a stationary distribution of Markov chains that capture the inference power of LLMs, their speed of convergence to it, and the influence of the temperature on the latter. We then prove pre-training and in-context generalization bounds and show how the drawn equivalence allows us to enrich their interpretation. Finally, we illustrate our theoretical guarantees with experiments on several recent LLMs to highlight how they capture the behavior observed in practice.
- Embed this notice
  Karsten Johansson (ksaj@infosec.exchange)'s status on Thursday, 10-Apr-2025 04:16:10 JST Karsten Johansson
  in reply to
  
  @amszmidt If you try to explain llms in a Markov chain context, an LLM is very complex Markov chain where the "states" are words or tokens, and the "transitions" are the probabilities of predicting the next word based on the current state and context.
  Note how many qualifiers take it far and beyond a Markov chain. There is a cms deep kms wide similarity only.
  One could argue that a person editing a document is doing exactly this.
  
  In conversation about 2 months ago permalink
- Embed this notice
  Alfred M. Szmidt (amszmidt@mastodon.social)'s status on Thursday, 10-Apr-2025 04:37:11 JST Alfred M. Szmidt
  in reply to
  - Karsten Johansson
  @ksaj LLMs have been proven to be equivalent to Markov Chains, of only very much larger size (that cannot be simulated on the limits of memory we have today).
  
  In conversation about 2 months ago permalink
- Embed this notice
  Karsten Johansson (ksaj@infosec.exchange)'s status on Thursday, 10-Apr-2025 04:37:12 JST Karsten Johansson
  in reply to
  
  @amszmidt Even if you make a small model (kinda goes against the name, but you certainly can do it) it is not as simple as just Markov chains.
  Just because it has the surface appearance doesn't mean its the same thing. A duck has legs. So do humans. Therefore humans fly?
  
  In conversation about 2 months ago permalink

Public

Conversation

Notices

Feeds