Conversation

Notices

Embed this notice
lucie lukas "minute" hartmann (mntmn@mastodon.social)'s status on Tuesday, 02-Dec-2025 23:09:55 JST lucie lukas "minute" hartmann

in terms of "finding things in large texts", for example "find a page in this pdf that mentions both shutdown mode and reg18", are there interesting alternatives to all that llm stuff beyond regex search? are there natural language processing systems that are precise/reliable and understandable? i imagine something like a fuzzy parser with stemming and some sort of ontologies, synonyms and logical inference

In conversation about 19 days ago from mastodon.social permalink
- Embed this notice
  lucie lukas "minute" hartmann (mntmn@mastodon.social)'s status on Tuesday, 02-Dec-2025 23:09:55 JST lucie lukas "minute" hartmann
  in reply to
  
  i don't like llms because they consume a lot of power and are connected to all the ai greed hype, they have to be strangely trained, their representations are not introspectable, they make tons of errors/are not reliable at all etc. i'd rather like a sharp, more machinistic tool that just clearly says "error" when it can't do the job. grep is such a tool--would be nice to have a grep that can clean up and normalize messy human language a bit
  
  In conversation about 19 days ago permalink
  
  Haelwenn /элвэн/ :triskell: likes this.
- Embed this notice
  Filip M. Nowak (fmn@mastodon.social)'s status on Tuesday, 02-Dec-2025 23:11:57 JST Filip M. Nowak
  in reply to
  
  @mntmn "are there natural language processing systems that are precise/reliable and understandable?" - let me wear my noam chomsky hat for a second: there are no such systems and never will be. natural language is ever changing and ambiguous, and parties involved often don't have - sufficiently - common context required for precise communication. this is why people talking or even reading have so many back-and-forths.
  
  In conversation about 19 days ago permalink
  
  Haelwenn /элвэн/ :triskell: likes this.
- Embed this notice
  Wolf480pl (wolf480pl@mstdn.io)'s status on Tuesday, 02-Dec-2025 23:52:10 JST Wolf480pl
  in reply to
  - Janne Moren
  @jannem @mntmn
  AFAIK the way these LLM tools work is they have an embedding of words into a vector space, they index text by converting every word in a every document to a vector, and storing it in a database together with ID of the document it came from, and then when you search, they turn each of the query words into vectors, and search for K nearest neighbors in the vector space for each of them.
  Then they feed the documents they found to an LLM.
  What if you skipped the last step?
  
  In conversation about 19 days ago permalink
  
  Doughnut Lollipop 【記録係】:blobfoxgooglymlem: likes this.
- Embed this notice
  lucie lukas "minute" hartmann (mntmn@mastodon.social)'s status on Tuesday, 02-Dec-2025 23:52:11 JST lucie lukas "minute" hartmann
  in reply to
  - Janne Moren
  @jannem i somehow find that hard to believe
  
  In conversation about 19 days ago permalink
- Embed this notice
  Janne Moren (jannem@fosstodon.org)'s status on Tuesday, 02-Dec-2025 23:52:11 JST Janne Moren
  in reply to
  
  @mntmn
  I mean, there's been many attempts. Especially for constrained applications such as a corporate document store and things like that. As far as I know, none of those systems were ever a success.
  
  In conversation about 19 days ago permalink
- Embed this notice
  Janne Moren (jannem@fosstodon.org)'s status on Tuesday, 02-Dec-2025 23:52:12 JST Janne Moren
  in reply to
  
  @mntmn
  No, not really. And that's a reason why small LLMs as language processors (not chatbots) are exciting.
  
  In conversation about 19 days ago permalink

Public

Notices

Feeds