GNU social JP
  • FAQ
  • Login
GNU social JPは日本のGNU socialサーバーです。
Usage/ToS/admin/test/Pleroma FE
  • Public

    • Public
    • Network
    • Groups
    • Featured
    • Popular
    • People

Conversation

Notices

  1. Embed this notice
    Alfred M. Szmidt (amszmidt@mastodon.social)'s status on Wednesday, 09-Apr-2025 21:05:48 JST Alfred M. Szmidt Alfred M. Szmidt

    Since LLMs are Markov chains .. then LLMs aren't AI.

    In conversation about 2 months ago from mastodon.social permalink
    • Embed this notice
      Alfred M. Szmidt (amszmidt@mastodon.social)'s status on Thursday, 10-Apr-2025 04:16:09 JST Alfred M. Szmidt Alfred M. Szmidt
      in reply to
      • Karsten Johansson

      @ksaj I wouldn’t even say it is a complex one .. big, yea. https://arxiv.org/abs/2410.02724

      In conversation about 2 months ago permalink

      Attachments

      1. Domain not in remote thumbnail source whitelist: arxiv.org
        Large Language Models as Markov Chains
        Large language models (LLMs) have proven to be remarkably efficient, both across a wide range of natural language processing tasks and well beyond them. However, a comprehensive theoretical analysis of the origins of their impressive performance remains elusive. In this paper, we approach this challenging task by drawing an equivalence between generic autoregressive language models with vocabulary of size $T$ and context window of size $K$ and Markov chains defined on a finite state space of size $\mathcal{O}(T^K)$. We derive several surprising findings related to the existence of a stationary distribution of Markov chains that capture the inference power of LLMs, their speed of convergence to it, and the influence of the temperature on the latter. We then prove pre-training and in-context generalization bounds and show how the drawn equivalence allows us to enrich their interpretation. Finally, we illustrate our theoretical guarantees with experiments on several recent LLMs to highlight how they capture the behavior observed in practice.
    • Embed this notice
      Karsten Johansson (ksaj@infosec.exchange)'s status on Thursday, 10-Apr-2025 04:16:10 JST Karsten Johansson Karsten Johansson
      in reply to

      @amszmidt If you try to explain llms in a Markov chain context, an LLM is very complex Markov chain where the "states" are words or tokens, and the "transitions" are the probabilities of predicting the next word based on the current state and context.

      Note how many qualifiers take it far and beyond a Markov chain. There is a cms deep kms wide similarity only.

      One could argue that a person editing a document is doing exactly this.

      In conversation about 2 months ago permalink
    • Embed this notice
      Alfred M. Szmidt (amszmidt@mastodon.social)'s status on Thursday, 10-Apr-2025 04:37:11 JST Alfred M. Szmidt Alfred M. Szmidt
      in reply to
      • Karsten Johansson

      @ksaj LLMs have been proven to be equivalent to Markov Chains, of only very much larger size (that cannot be simulated on the limits of memory we have today).

      In conversation about 2 months ago permalink
    • Embed this notice
      Karsten Johansson (ksaj@infosec.exchange)'s status on Thursday, 10-Apr-2025 04:37:12 JST Karsten Johansson Karsten Johansson
      in reply to

      @amszmidt Even if you make a small model (kinda goes against the name, but you certainly can do it) it is not as simple as just Markov chains.

      Just because it has the surface appearance doesn't mean its the same thing. A duck has legs. So do humans. Therefore humans fly?

      In conversation about 2 months ago permalink

Feeds

  • Activity Streams
  • RSS 2.0
  • Atom
  • Help
  • About
  • FAQ
  • TOS
  • Privacy
  • Source
  • Version
  • Contact

GNU social JP is a social network, courtesy of GNU social JP管理人. It runs on GNU social, version 2.0.2-dev, available under the GNU Affero General Public License.

Creative Commons Attribution 3.0 All GNU social JP content and data are available under the Creative Commons Attribution 3.0 license.