in terms of "finding things in large texts", for example "find a page in this pdf that mentions both shutdown mode and reg18", are there interesting alternatives to all that llm stuff beyond regex search? are there natural language processing systems that are precise/reliable and understandable? i imagine something like a fuzzy parser with stemming and some sort of ontologies, synonyms and logical inference
Conversation
Notices
-
Embed this notice
lucie lukas "minute" hartmann (mntmn@mastodon.social)'s status on Tuesday, 02-Dec-2025 23:09:55 JST
lucie lukas "minute" hartmann
-
Embed this notice
lucie lukas "minute" hartmann (mntmn@mastodon.social)'s status on Tuesday, 02-Dec-2025 23:09:55 JST
lucie lukas "minute" hartmann
i don't like llms because they consume a lot of power and are connected to all the ai greed hype, they have to be strangely trained, their representations are not introspectable, they make tons of errors/are not reliable at all etc. i'd rather like a sharp, more machinistic tool that just clearly says "error" when it can't do the job. grep is such a tool--would be nice to have a grep that can clean up and normalize messy human language a bit
Haelwenn /элвэн/ :triskell: likes this. -
Embed this notice
Filip M. Nowak (fmn@mastodon.social)'s status on Tuesday, 02-Dec-2025 23:11:57 JST
Filip M. Nowak
@mntmn "are there natural language processing systems that are precise/reliable and understandable?" - let me wear my noam chomsky hat for a second: there are no such systems and never will be. natural language is ever changing and ambiguous, and parties involved often don't have - sufficiently - common context required for precise communication. this is why people talking or even reading have so many back-and-forths.
Haelwenn /элвэн/ :triskell: likes this. -
Embed this notice
Wolf480pl (wolf480pl@mstdn.io)'s status on Tuesday, 02-Dec-2025 23:52:10 JST
Wolf480pl
@jannem @mntmn
AFAIK the way these LLM tools work is they have an embedding of words into a vector space, they index text by converting every word in a every document to a vector, and storing it in a database together with ID of the document it came from, and then when you search, they turn each of the query words into vectors, and search for K nearest neighbors in the vector space for each of them.Then they feed the documents they found to an LLM.
What if you skipped the last step?
Doughnut Lollipop 【記録係】:blobfoxgooglymlem: likes this. -
Embed this notice
lucie lukas "minute" hartmann (mntmn@mastodon.social)'s status on Tuesday, 02-Dec-2025 23:52:11 JST
lucie lukas "minute" hartmann
@jannem i somehow find that hard to believe
-
Embed this notice
Janne Moren (jannem@fosstodon.org)'s status on Tuesday, 02-Dec-2025 23:52:11 JST
Janne Moren
@mntmn
I mean, there's been many attempts. Especially for constrained applications such as a corporate document store and things like that. As far as I know, none of those systems were ever a success. -
Embed this notice
Janne Moren (jannem@fosstodon.org)'s status on Tuesday, 02-Dec-2025 23:52:12 JST
Janne Moren
@mntmn
No, not really. And that's a reason why small LLMs as language processors (not chatbots) are exciting.
-
Embed this notice