GNU social JP
  • FAQ
  • Login
GNU social JPは日本のGNU socialサーバーです。
Usage/ToS/admin/test/Pleroma FE
  • Public

    • Public
    • Network
    • Groups
    • Featured
    • Popular
    • People

Conversation

Notices

  1. Embed this notice
    iced depresso (icedquinn@blob.cat)'s status on Saturday, 22-Jun-2024 06:32:50 JST iced depresso iced depresso
    in reply to
    • Tero Keski-Valkama
    @tero it might just be working because most networks are too dense anyway. stanford's butterfly matrices capture more or less every standard transformation matrix. then there's the work in sparsity numenta's complimentary sparsity, or topkast. GMDH grew smaller networks incrementally. numenta's cortical models used to nod about how they could subset engrams and they still mostly worked.
    In conversation about a year ago from blob.cat permalink
    • Embed this notice
      Tero Keski-Valkama (tero@rukii.net)'s status on Saturday, 22-Jun-2024 06:32:52 JST Tero Keski-Valkama Tero Keski-Valkama

      I read this article and oh my god, are people doing PCA for reducing the dimensions of #LLM embeddings? I don't have any more polite way of saying it; that is pure stupidity.

      No, these embeddings do not have principal dimensions! They span practically all the dimensions. Your dataset will just create an illusion that some dimensions are correlated when in reality they aren't.

      Using PCA just shows people don't understand what these embeddings are.

      Furthermore, people are using way too long embeddings. Using embeddings of over 1k dimensions will make all distances approximately equal, and rounding errors will start to dominate.

      They compare their method with learning to hash methods and all kinds of misinformed methods which probably also use too long embedding vectors.

      Separately they tested 8-bit quantization of their thousand-dimensional embedding vectors and found it performs better. I could have told them this beforehand; it's roughly equivalent to dimensionality reduction with a random projection matrix. And this works, better than PCA, because LLM embeddings are holographic. Reducing the dimensionality with a random projection is analogous to decreasing the resolution which is analogous to quantization.

      But it works better if you have some supervised training set to rank the queries to results.

      And in any case you don't want to vector search match queries to documents like everyone still keeps doing, but you want to generate oranges to oranges indices where you generate example queries for documents and match query embeddings to example query embeddings. Oranges to oranges.

      https://arxiv.org/abs/2205.11498?ref=cohere-ai.ghost.io

      In conversation about a year ago permalink

      Attachments

      1. No result found on File_thumbnail lookup.
        http://stupidity.No/

Feeds

  • Activity Streams
  • RSS 2.0
  • Atom
  • Help
  • About
  • FAQ
  • TOS
  • Privacy
  • Source
  • Version
  • Contact

GNU social JP is a social network, courtesy of GNU social JP管理人. It runs on GNU social, version 2.0.2-dev, available under the GNU Affero General Public License.

Creative Commons Attribution 3.0 All GNU social JP content and data are available under the Creative Commons Attribution 3.0 license.