Embed Notice

HTML Code

<blockquote style="position: relative; padding-left: 55px;"><section><a href="https://neuromatch.social/users/jonny/statuses/114466305210695629">jonny (good kind) (jonny@neuromatch.social)'s status on Wednesday, 07-May-2025 20:30:59 JST</a><a href="https://neuromatch.social/@jonny" title="jonny@neuromatch.social"><img src="https://gnusocial.jp/avatar/87216-48-20240919064343.webp" width="48" height="48" alt="jonny (good kind)" style="position: absolute; left: 0; top: 0;">jonny (good kind)</a><div><a href="https://neuromatch.social/@jonny/114466282211271692" rel="in-reply-to">in reply to</a></div></section><article><p>i wonder if the LLMs are susceptible to old style language model attacks. i wonder if you created enough training instances of a very unique phrase like shrimptools.exe() in the context of a bunch of example code based on tools/key phrases that are individually common but combinatorically rare within a popular LLM code generation domain like web tech, you could get the llms to occasionally try to import and execute shrimptools.exe(). so that way you make a sleeper vuln that acts as a mine in the latent space: one day the odds are not zero that you will wake up and have already executed shrimptools.exe()</p></article><footer><a rel="bookmark" href="https://gnusocial.jp/conversation/5017338#notice-9834471">In conversation</a><time datetime="2025-05-07T20:30:59+09:00" title="Wednesday, 07-May-2025 20:30:59 JST">about 3 days ago</time> <span>from <span><a href="https://gnusocial.jp/notice/9834471" rel="external" title="Sent from gnusocial.jp via ActivityPub">gnusocial.jp</a></span></span><a href="https://gnusocial.jp/notice/9834471">permalink</a></footer></blockquote>

Corresponding Notice

Embed this notice
jonny (good kind) (jonny@neuromatch.social)'s status on Wednesday, 07-May-2025 20:30:59 JST jonny (good kind)
in reply to
i wonder if the LLMs are susceptible to old style language model attacks. i wonder if you created enough training instances of a very unique phrase like shrimptools.exe() in the context of a bunch of example code based on tools/key phrases that are individually common but combinatorically rare within a popular LLM code generation domain like web tech, you could get the llms to occasionally try to import and execute shrimptools.exe(). so that way you make a sleeper vuln that acts as a mine in the latent space: one day the odds are not zero that you will wake up and have already executed shrimptools.exe()
In conversationabout 3 days ago from gnusocial.jppermalink

Public

Embed Notice

HTML Code

Corresponding Notice