Embed Notice

HTML Code

<blockquote style="position: relative; padding-left: 55px;"><section><a href="https://ingenthron.social/users/louis/statuses/113902955809285938">Louis Ingenthron (louis@ingenthron.social)'s status on Tuesday, 28-Jan-2025 08:49:46 JST</a><a href="https://ingenthron.social/@louis" title="louis@ingenthron.social"><img src="https://gnusocial.jp/avatar/273846-48-20240730174606.webp" width="48" height="48" alt="Louis Ingenthron" style="position: absolute; left: 0; top: 0;">Louis Ingenthron</a><div><a href="https://qoto.org/@freemo/113902923702709753" rel="in-reply-to">in reply to</a><ul><li></ul></div></section><article><p><a href="https://qoto.org/@freemo">@freemo</a> I'm not even saying that.  Something simpler.</p><p>LLMs typically work in syllables, not whole words.  That's good for LLMs, because then it learns patterns of conjugation and pluralization naturally from seeing those patterns used in its training.</p><p>But what if you ran it on words, not syllables, and you fed it a well-structured initial dataset (i.e. a dictionary) to seed its initial tokens.  It would understand that "humans" and "human" are related words and how.</p><p>1/2</p></article><footer><a rel="bookmark" href="https://gnusocial.jp/conversation/4460972#notice-8731745">In conversation</a><time datetime="2025-01-28T08:49:46+09:00" title="Tuesday, 28-Jan-2025 08:49:46 JST">about 3 months ago</time> <span>from <span><a href="https://ingenthron.social/@louis/113902955809285938" rel="external" title="Sent from ingenthron.social via ActivityPub">ingenthron.social</a></span></span><a href="https://ingenthron.social/@louis/113902955809285938">permalink</a></footer></blockquote>

Corresponding Notice

Embed this notice
Louis Ingenthron (louis@ingenthron.social)'s status on Tuesday, 28-Jan-2025 08:49:46 JSTLouis Ingenthron
in reply to
- 🎓 Doc Freemo :jpf: 🇳🇱
@freemo I'm not even saying that. Something simpler.
LLMs typically work in syllables, not whole words. That's good for LLMs, because then it learns patterns of conjugation and pluralization naturally from seeing those patterns used in its training.
But what if you ran it on words, not syllables, and you fed it a well-structured initial dataset (i.e. a dictionary) to seed its initial tokens. It would understand that "humans" and "human" are related words and how.
1/2
In conversationabout 3 months ago from ingenthron.socialpermalink

Public

Embed Notice

HTML Code

Corresponding Notice