Embed Notice

HTML Code

<blockquote style="position: relative; padding-left: 55px;"><section><a href="https://neuromatch.social/users/jonny/statuses/116667122145727604">jonny (nonvenomous) (jonny@neuromatch.social)'s status on Sunday, 31-May-2026 13:45:12 JST</a><a href="https://neuromatch.social/@jonny" title="jonny@neuromatch.social"><img src="https://gnusocial.jp/avatar/87216-48-20240919064343.webp" width="48" height="48" alt="jonny (nonvenomous)" style="position: absolute; left: 0; top: 0;">jonny (nonvenomous)</a><div><a href="https://gnusocial.jp/notice/12680448" rel="in-reply-to">in reply to</a></div></section><article><p>To a person, the whole purpose of the test is for it to fail when it should. That's an elemental part of writing good tests: they must fail before the patch, or else they provide no protection. We want protection from failure, that is good for us. We need tests to protect us because we can't possibly evaluate all the other parts of a complex system when we try to fix one part of it.</p><p>LLM slot machines change what tests mean - of course we still want the code to work good, but if we're not evaluating the code or the tests, then what the slot machine turns them into is just a high score and the jackpot condition. 130 new tests added, that means its good. They pass, that means I win.</p><p>The bugfix loop with LLMs defeats the purpose of automated tests and renders it no better than manual testing: you notice a bug, you yell at the LLM to fix it, you keep looking at the specific thing that's broken until its fixed, good robot, ship it. The changes don't have meaningful tests, and nothing else does either, so the slot machine loop repeats, bug-&gt;fix-&gt;win. Very velocity. Rocket fuel even.</p></article><footer><a rel="bookmark" href="https://gnusocial.jp/conversation/6436931#notice-12680449">In conversation</a><time datetime="2026-05-31T13:45:12+09:00" title="Sunday, 31-May-2026 13:45:12 JST">about 6 hours ago</time> <span>from <span><a href="https://neuromatch.social/@jonny/116667122145727604" rel="external" title="Sent from neuromatch.social via ActivityPub">neuromatch.social</a></span></span><a href="https://neuromatch.social/@jonny/116667122145727604">permalink</a></footer></blockquote>

Corresponding Notice

Embed this notice
jonny (nonvenomous) (jonny@neuromatch.social)'s status on Sunday, 31-May-2026 13:45:12 JST jonny (nonvenomous)
in reply to
To a person, the whole purpose of the test is for it to fail when it should. That's an elemental part of writing good tests: they must fail before the patch, or else they provide no protection. We want protection from failure, that is good for us. We need tests to protect us because we can't possibly evaluate all the other parts of a complex system when we try to fix one part of it.
LLM slot machines change what tests mean - of course we still want the code to work good, but if we're not evaluating the code or the tests, then what the slot machine turns them into is just a high score and the jackpot condition. 130 new tests added, that means its good. They pass, that means I win.
The bugfix loop with LLMs defeats the purpose of automated tests and renders it no better than manual testing: you notice a bug, you yell at the LLM to fix it, you keep looking at the specific thing that's broken until its fixed, good robot, ship it. The changes don't have meaningful tests, and nothing else does either, so the slot machine loop repeats, bug->fix->win. Very velocity. Rocket fuel even.
In conversationabout 6 hours ago from neuromatch.socialpermalink

Public

Embed Notice

HTML Code

Corresponding Notice