@zenkat@dngrs Personally I haven't had these sorts of at-scale experiences, and all of my small experiments to dip my toe in have been such total failures that it's hard to approach a claim like this one and square it with my own personal experiences.
However, given that most of my work is on is open source, and the IP concerns surrounding that, and the impossibility of finding what Simon Willison would describe as a "vegan" model, it's difficult for me to do a realistic test
> all of my small experiments to dip my toe in have been such total failures that it's hard to approach a claim like this one and square it with my own personal experiences.
If this is your conclusion, then simply you have not used the tools enough. They do not work on their own. They are not magic. You have to spend hours playing with them to figure out what they're good at and what they're not, and you also have to explore things like how Cursor lets you provide rules to the agent. This means you need to sit down and write a text file explaining how you want the LLM to behave. Explain what the codebase is, what the language is, what the tools and libraries are you want to use, where the latest docs and API schemas are, and define very explicitly all of your best practices including code formatting, linting, and static analysis. You also have to tell it to not make up any functions or APIs and to ask if it doesn't know what to do. You have to tell it to not modify tests if they should not be modified because the code changes should still produce the same result. You have to instruct it to not waste time changing code docs for not reason and to avoid unnecessary code duplication. You have to tell it to not jam all the code into a single file/module, etc.
It is your responsibility to steer the LLM to obey the best practices that you expect. It will not read your mind or do it by default. (yet)
Once you do this you can get Cursor to operate in agent mode on your codebase and produce pretty high quality code that is free of detects, formatted, linted, and passing static code analysis.
When you experience it doing this it is ridiculous magic how it can turn an idea prompt into actual code.
@zenkat@dngrs I would like to explicitly take the temperature down assuming you have not blocked each other or me yet:
this was in response to an explicit invitation on my part to discuss the benefits of this technology, irrespective of its costs. it's very unfair to yell at zenkat for doing what I asked here.
I can respect needing to disengage from this conversation (I, too, find myself upset by LLMs with some regularity) but there's no call to be rude.
@zenkat@dngrs dngrs, as an example of how you may be experiencing some cognitive distortion, I should just point out that the character in question's proper name in the show is "demon cat", and that his main function in the episode he appears in is to promise to eat Finn's (the main character's) eyes, skin him, and rip out his heart. this is not a sympathetic comparison to be making.
@glyph@dngrs Thanks for taking the temperature down, Glyph. Much appreciated.
Interested in your thoughts on the refactoring example.
Also, for the record ... I personally am on the skeptical side of these technologies. The major use-cases for LLMs appear to be propaganda, disinfo, and spam at scale. I generally agree with Timrit's TESCREAL critique, and I find the Singularity crew to be somewhere between laughable and horrifying.
A year ago, I would have wholeheartedly agreed with many of the points here. But the technologies are advancing rapidly, and the quality+utility of the results is getting pretty amazing. Many of my colleagues are doing a rapid about-face on the utility of these tools, especially in the coding domain. Blanket dismissal is no longer viable.
@dngrs "I don't have any interest in thinking or engaging critically, so I'll just throw out some ad hominem attacks and tribal signifiers to stake out my territory. Oh, and read my blog."
@zenkat I have zero interest in tiring myself out with a debate bro over putting (RAG, etc.) lipstick on an LLM pig. Also your style of arguing, e.g. prompting sympathy by invoking a beloved cartoon series is frankly nauseating. If you want to read my thoughts on using word extruders in software development, I've written about it here: https://anatol.versteht.es/blog/llmnt/
@dngrs@glyph LOL, so eloquent. Internet dialog at it's best.
Tell me, would you give a thumbs-down if I had used a regex in that scenario? What's the diff except an ability to be slightly more expressive in the types of transforms you can do?
@dngrs@glyph Here's another interesting use case -- large scale code refactorings. Our code base is millions of lines, and scattered throughout it are thousands of calls to a legacy macro that is no longer compatible with our monitoring systems. The translation to the updated version is relatively mechanical, but too complex to be encoded as regex or AST transforms. Updating it everywhere will literally be months of boring shit-work that even a new grad will grumble about, so it sits unfinished.
But LLM-based tools are getting to the point where they can do these transforms. Search the code, feed the localized snippet plus an example fix to the LLM, and it generates the change. Automate that and it provides patches for the whole codebase.
Now you still need to review all those changes, run the tests, do evals to show it doesn't affect result quality, go through launch processes. But it's just automated away a huge amount of shitty grunt-work that literally nobody wanted to do.
@zenkat@glyph all your verbose LLM apologia does not change the fact that LLMs work statistically, always, and *don't know what facts are*. They can't not hallucinate, because that's the *only* thing they're capable of - even when the result looks correct.
@dngrs@glyph State of the art LLM summarization is quite good, far better than you give it credit for.
"Approximate knowledge" is an apt descriptor. Did you ever watch Adventure Time? There was this great side character called "Approximate Cat", who had "approximate knowledge of many things". This gave him great powers, but also glaring weaknesses.
Having a good LLM by your side when working through a massive pile of code & documentation is like having a tame Approximate Cat companion. There are times when approximate knowledge is a great power. And there are times when you better be on top of the fucking details or you're going to get smashed. The trick is knowing which situation you're in.
@dngrs@glyph Also note that I'm talking about systems that use RAG and large context windows to feed results directly into the prompt for summarization. I'm my experience, LLMs are much less likely to hallucinate in those cases, because they aren't reliant on latent information stored during training.
eg, If you ask an LLM, "tell me about the Floozbor API", most will happily make up bullshit for you. But if you take search results for [Floozbor API], feed them into your context window, and then ask for a summary, they'll do a decent job, because the key information is there in the input tokens instead of being extracted from the model weights.
@glyph I run a large software team. I find LLMs useful for summarizing related docs that are mentioned in designs and project plans I'm reviewing.
eg, There's a service I've never heard of that gets mentioned in passing. To understand the passage I need to know basically what it does, but exact details do not matter. Approximate knowledge is enough to get me unblocked.
Previously I would have needed to follow a link, skim a design, summarize in my head, jump back to the doc I care about.
Now I just ask for a summary of what it does. Instantly unblocked, stay in the flow of the design I'm reviewing. And if it's slightly mangled, who cares? It's not the design I'm thinking about, and I would have just as likely misunderstood trying to dig through the docs myself.
@zenkat@glyph but... it's not accurate to call LLM summaries "approximate" - they might be entirely wrong. If you make decisions based on that, I'd call this a pretty careless attitude
There are like nine people in hank’s mentions at various points in this explosion of replies saying stuff like “but they are super useful, for me! They make me so much more efficient at my job!” And then someone inevitably says “oh? What is your job?” And none of them have answered yet