AI coding tools are your interns, not your replacement

By Matt Asay

“AI models currently shine at helping so-so coders get more stuff done that works in the time they have,” argues engineer David Showalter. But is that right? Showalter was responding to Santiago Valdarrama’s contention that large language models (LLMs) are untrustworthy coding assistants. Valdarrama says, “Until LLMs give us the same guarantees [as programming languages, which consistently get computers to respond to commands], they’ll be condemned to be eternal ‘cool demos,’ useless for most serious applications.” He is correct that LLMs are decidedly inconsistent in how they respond to prompts. The same prompt will yield different LLM responses. And Showalter is quite possibly incorrect: AI models may “shine” at helping average developers generate more code, but that’s not the same as generating usable code.

The trick with AI and software development is to know where the rough edges are. Many developers don’t, and they rely too much on an LLM’s output. As one HackerNews commentator puts it, “I wonder how much user faith in ChatGPT is based on examples in which the errors are not apparent ... to a certain kind of user.” To be able to use AI effectively in software development, you need sufficient experience to know when you’re getting garbage from the LLM.

No simple solutions

Even as I type this, plenty of developers will disagree. Just read through the many comments on the HackerNews thread referenced above. In general, the counterarguments boil down to “of course you can’t put complete trust in LLM output, just as you can’t completely trust code you find on Stack Overflow, your IDE, etc.”

This is true, so far as it goes. But sometimes it doesn’t go quite as far as you’d hope. For example, while it’s fair to say developers shouldn’t put absolute faith in their IDE, we can safely assume it won’t “prang your program.” And what about basic things like not screwing up Lisp brackets? ChatGPT may well get those wrong but your IDE? Not likely.

What about Stack Overflow code? Surely some developers copy and paste unthinkingly, but more likely a savvy developer would first check to see votes and comments around the code. An LLM gives no such signals. You take it on faith. Or not. As one developer suggests, it’s smart to “treat both [Stack Overflow and LLM output as] probably wrong [and likely written by an] inexperienced developer.” But even in error, such code can “at least move me in the right direction.”

Again, this requires the developer to be skilled enough to recognize that the Stack Overflow code sample or the LLM code is wrong. Or perhaps she needs to be wise enough to only use it for something like a “200-line chunk of boilerplate for something mundane like a big table in a React page.” Here, after all, “you don’t need to trust it, just test it after it’s done.”

In short, as one developer concludes, “Trust it in the same way I trust a junior developer or intern. Give it tasks that I know how to do, can confirm whether it’s done right, but I don’t want to spend time doing it. That’s the sweet spot.” The developers who get the most from AI are going to be those who are smart enough to know when it’s wrong but still somewhat beneficial.

You’re holding it wrong

Back to Datasette founder Simon Wilison’s early contention that “getting the best results out of [AI] actually takes a whole bunch of knowledge and experience” because “a lot of it comes down to intuition.” He advises experienced developers to test the limits of different LLMs to gauge their relative strengths and weaknesses and to assess how to use them effectively even when they don’t work.

What about more junior developers? Is there any hope for them to use AI effectively? Doug Seven, general manager of Amazon CodeWhisperer and director of software development for Amazon Q, believes so. As he told me, coding assistants such as CodeWhisperer can be helpful even for less experienced developers. “They’re able to get suggestions that help them figure out where they’re going, and they end up having to interrupt other people [e.g., to ask for help] less often.”

Perhaps the right answer is, as usual, “It depends.”

And, importantly, the right answer to software development is generally not “write more code, faster.” Quite the opposite, as I’ve argued. The best developers spend less time writing code and more time thinking about the problems they’re trying to solve and the best way to approach them. LLMs can help here, as Willison has suggested: “ChatGPT (and GitHub Copilot) save me an enormous amount of ‘figuring things out’ time. For everything from writing a for loop in Bash to remembering how to make a cross-domain CORS request in JavaScript—I don’t need to even look things up anymore, I can just prompt it and get the right answer 80% of the time.”

Knowing where to draw the line on that “80% of the time” is, as noted, a skill that comes with experience. But the practice of using LLMs to get a general idea of how to write something in, say, Scala, can be helpful to all. As long as you keep one critical eye on the LLM’s output.