the fact that some people find LLMs useful for writing code is not a credit to LLMs but an indictment of the average signal to noise ratio of code: it means that most code is confusing boilerplate -- boilerplate because a statistical model can only reliably reproduce patterns that reoccur many times across its training corpus, and confusing because otherwise-intelligent people capable of convincing a hiring manager that they are competent programmers find it easier to ask a statistical model to produce an incorrect prototype they must debug than write the code themselves. we all know from experience that most code is bad, but for LLMs to be able to write more-or-less working code at all indicates that code is much worse than we imagine, and that even what we consider 'good' code is from a broader perspective totally awful. (in my opinion, it is forced to be this way because of the way we design programming languages and libraries.)
> and confusing because otherwise-intelligent people capable of convincing a hiring manager that they are competent programmers find it easier to ask a statistical model to produce an incorrect prototype they must debug than write the code themselves.
@feld what part of it isn't true? that some professional programmers are trying to use LLMs, or that LLMs produce incorrect code? because I can verify both from experience.
> we're supposed to all be refactoring code to avoid duplication, using third party libraries to avoid duplication of effort, writing code that's dense enough that it can be understood and consulted without putting too much load on short-term memory but spread out enough that new maintainers can reason about it.
idk about you but 3rd party libraries are quite often a tech debt liability as they're full of so much extra crap you don't need. If you can, you're better off just pulling the pieces you need directly into your own project and forgetting about the rest unless it's something Really Important^TM like a TLS stack
@feld maybe not yet. or maybe you're asking it to produce exceedingly trivial code. either way, you must be at least reading and checking it for bugs -- something that's a lot easier to do with code one has written oneself.
that said, you've completely missed the point of my post.
the fact that LLMs perform well enough on code generation that *anybody* wants to use them instead of coding means that we are doing coding in a fundamentally antihuman, gatekeepy way.
statistical models like LLMs work well in a narrow novelty range (since they are novelty-minimizing engines). that novelty range is well below the one we would want programming to live in.
we're supposed to all be refactoring code to avoid duplication, using third party libraries to avoid duplication of effort, writing code that's dense enough that it can be understood and consulted without putting too much load on short-term memory but spread out enough that new maintainers can reason about it. if we were doing that, LLMs couldn't write code based on arbitrary prompts, because most lines of code would be so specific to their top-level requirements that the only patterns that LLMs could learn would be generic -- it would not be able to do any better than a typeahead that checked the current token against a list of reserved words.
Code-monkey ecosystems are extremely complex, but they are complex in a trivial way that could be solved by better tooling. Better tooling would necessarily obviate the existing code in the organization, but such organizations IMO should not exist.
So, I think we're talking about different scales.
Like, working for a company, we're all trapped writing bad code. There's no choice. And a company would not be able to adopt humane tooling, because capitalization is inherently inhumane. If you're not willing to blur the boundary between non-technical end user and expert developer, then tooling improvements are always going to be incremental. Somebody who's working for a paycheck is justified in not caring about more than management's metrics. It drives me nuts for aesthetic reasons, but in a case like mine (writing code that shouldn't exist, for a task that should never be performed, to support an industry that should be abolished) it's probably better for the world if the code I write is broken.
My main concern is that, if using LLMs to code becomes widespread, it's going make it even harder for new, convivial tools to gain traction. Already, the most interesting work in tooling is being done by individuals and very small groups, mostly in isolation, and the people who would benefit most from them are unable to use them because they never get the opportunity to hear about them, are discouraged from learning about them, and would be forbidden from using them for most of their waking hours. LLMs promise to multiply how often the most common ways of doing the most common tasks are performed with the most common third party libraries.
Generating code is very tempting, when you're stuck with a language and a set of libraries that is tedious to work with. But the generated code still takes up bytes, and still needs to be maintained; on top of that, you've burned a bunch of cycles in generating code that wouldn't have been burned writing it. It is better to structure the tools in such a way that your tasks are not tedious, or use tools that are a better fit for the task. But it's easier to ignore the severity of these problems when code generators are used. This is not a problem specific to LLMs: IDEs that generate boilerplate for you, the layers of languages and template generators in autotools, and now even a lot of web development technology, all follow the general pattern of standardizing on a language that is unnecessarily bloated and awkward and then using code generators to hide the bloat rather than eliminating it. And it's much faster to write a code generator than to do the design & architectural work necessary to figure out how to make a tool that is a joy to use. Generating code also solves the compatibility problem: for social (rather than technical) reasons, you're stuck doing certain things in certain ways, and code generators can hide that too (until it breaks). So, it's extremely tempting!
@enkiv2@clacke I think the term “code-monkey ecosystem” is reductive and denies the complexity of “code-monkey work and languages”. These things are baroque because they are the work of humans working in corporate environments that for better or worse operate at scales that require that amount of redundancy and legacy.
Here’s a concrete example of an llm giving me joy from a couple of days ago. I need to test an app with a lot of barcodes for different skus, actions and logins.
The thing is, we already have constraint solvers and optimizing compilers.
The *common* solutions (like template expansion systems) are a nightmare, sure: nobody wants to write m4 code. But if you have a requirement, and you want to generate code to satisfy the requirement that actually works without writing any code yourself, the right way to do this is to use a constraint solver.
Normal code-monkey ecosystems have almost no interaction with the kind of tooling that solves these problems reliably -- they aren't covered in most university undergraduate CS programs, and they were too slow and awkward to use for this purpose when they were originally invented in the late 60s -- but there's no theoretical reason why we can just write formal requirments as prolog & generate javascript or whatever, provided somebody writes an ontology for doing it.
Retaining compatibility with popular and widely-used technical stacks is a killer in this respect, though: prolog is slow because optimizations intended for early-70s hardware (now incorporated into the standard) prevent certain modern hardware features from being used, algol68-like languages (i.e., basically every 'mainstream' language) are structured in a way that makes most tasks unnecessarily verbose but people don't want to learn the 'weird' languages that make these tasks easier, etc. Languages and development tools, outside of the esolang context and the occasional weird experiment like smalltalk, are designed to resemble and interact with the rest of the development ecosystem (and thereby inherit the ecosystem's attitudes and problems).
@enkiv2@clacke Programming with llms is however a skill that needs to be practiced, and currently there are no teachers and no real body of work. It’s not about telling the machine to write a program and hoping a program comes out, since that obviously doesn’t work. I prefer to think of them as “language transformers”. I can paste in a badly formatted html table of api documentation and ask for it to come back as a go package. Do I have to fix and verify? 3/
@enkiv2@clacke Because of that cognitive exhaustion of doing “inhumane” work like looking up wtf the correct field in the api for aws elastic whatever, we don’t have time or energy to do actual human work like designing software that actually is built by and for humans.
That’s where I see llms as actually liberating me from that toil, because I can now program using my language, not the machines.
@enkiv2@clacke I think a lot of programming for better or for worse is indeed almost trivial glueing of apis together. The fact that it is almost trivial is what makes it incredibly hard and cognitively exhausting for humans, yet any attempt to formalize things down to a level that would make it automated with “normal” software is just adding one more layer of “inhumane” (indeed, since we are adding it for the machine!) bureaucracy, and we’re back to where we started.
There's definitely room for improvement in terms of the interaction between coding & testing frameworks. Like, if you can write a specification for input, output, and side effects, then unit tests can be generated from that (and so can a functioning-but-not-necessarily-optimized implementation). In constraint languages like prolog, you basically write the unit tests & it produces the implementation from that.
I've run into a lot of situations where the repetitive nature of unit tests actually caused problems (ex., modifying the wrong unit test because a whole bunch of them look nearly identical and then changing the code because I thought the test failure was meaningful, or having important changes delayed because while they don't require substantial modifications to the real code they do involve fixing 300 unit tests).
@enkiv2 I feel like for me copilot was barely useful for writing new code, but super useful for writing unit tests, which are 90% boiler plate and repetitive (but I think that's in a good way)