This is a perfect case study in how LLMs (don’t) work.
Please consider carefully what human processes a systems like this could actually replace. https://toot.cat/@devopscats/112445057997076822
This is a perfect case study in how LLMs (don’t) work.
Please consider carefully what human processes a systems like this could actually replace. https://toot.cat/@devopscats/112445057997076822
It’s perhaps not obvious that in the example above, the LLM •does• actually do something useful! It conveys information about what’s typical: “When people talk about a goat and a boat and a river, there’s usually a cabbage too. Here are words that typically appear in the ‘answer’ position in such a context.”
What the LLM doesn’t do is actually solve the problem — or even understand the question in any meaningful way. Its answer is garbage. Garbage has clues, like a detective story. But garbage.
@inthehands @devopscats when I first saw this yesterday, I tried it myself and ChatGPT actually gave a great response (essentially “just take the goat across, since there is no cabbage or wolf constraint”). Just now I tried again, and got a different response, somewhere between the two in logical soundness. This raises another issue with automations based on these systems. You can test it, and get a perfectly reasonable output, then expose it to users and get garbage. You can never be sure.
I’ve noticed developers often express excitement about LLM assistants when working with unfamiliar tools, and express horror about them when working with tools they know well. That pattern repeats in other domains as well.
It makes sense: “garbage with clues” can be helpful when you’re learning something unfamiliar. It’s truly helpful to hear “When people import [e.g.] Hibernate and say `SessionFactory`, code like this typically appears next.” That’s useful! Also probably wrong!
Two thoughts:
1. Folks could design and market these ML tools around the idea of •identifying patterns• (the thing machine learning is actually good at) instead of •providing answers•. Pure fantasy at this point; too much collective investor mania around the wet dream of the magic answer box. Just noting that a better choice is on the table.
2. CS / software education / developer training and mentorship needs to redouble its emphasis on •cricial reading• of existing code, not just producing code. By critical reading, I mean: “What does this code do? Does it •really• do that? What is its context? How can it break? Does it do what we •want• it to do? We •do• we want it to do? What is our goal? Why? Is that really our goal? What is the context of our goal? How can it break?” etc.
@inthehands Just had a really good experience using ChatGPT 4.0 to help me learn AWS services and setup an architecture to a particular spec. I didn't run into many non-sensical answers, but I did need to verify everything and check that the produced responses would actually satisfy the requirements. The main error was incorrect order of operations. Some commands depended on certain resources already being created. The ML-aided generation of IaC would be a very good use-case to focus on.
@mattly
I mean, to some extent you’re describing static analysis. And static analysis is both helpful and limited in all the ways that any automated coding assistant will be: what you described requires an understanding of goals, expectations, the larger human systems in which the code will function and the humans who operate in those systems. (“Oh, that’ll never fly because….”) Considering all that requires the social understanding of a human embedded in that larger context.
@suzannealdrich
Yep. Imagine how much more useful LLM coding assistance would be if they didn’t require us humans to constantly, actively remind ourselves, “don’t trust any of this, it looks like a definitive answer but it’s not, verify everything.“ there’s very much a social aspect to how these systems present their output to us.
@thias
Maybe so. The pendulum does swing back and forth: in 2004, it seemed like all code was going to be written in Java.
The CS program where I teach does make a concerted point of exposing students to different tools and languages repeatedly, and eventually creating a context where they’re learning them self-directed and project-driven ways.
@inthehands one important aspect of reading code is understanding old/weird languages/systems. I feel the current computing monoculture is a large problem in that respect.
GNU social JP is a social network, courtesy of GNU social JP管理人. It runs on GNU social, version 2.0.2-dev, available under the GNU Affero General Public License.
All GNU social JP content and data are available under the Creative Commons Attribution 3.0 license.