AI coding tools make developers slower but they think they're faster, study finds

[4]

smores (OP)

July 13

Link

This study has much more limited context than the title of the article suggests (though they do mention several other relevant studies as well), but I still thought it was interesting. It's...

This study has much more limited context than the title of the article suggests (though they do mention several other relevant studies as well), but I still thought it was interesting. It's limited to open source software maintainers, and it captures several reasons that LLMs may be particularly poorly suited to authoring code in that context. But far more interesting to me than the fact that the AI usage slowed down development (speed isn't everything, anyway), is that developers still felt that they had worked faster after having been slowed down by AI usage.

18 votes

raze2012
July 13 (edited July 13)
Link Parent
Makes sense to me. If you are using LLM's "correctly", it will do a lot of the boilerplate plumbing for you. When you are removing tedium, you can feel more productive, even if you are in fact...

is that developers still felt that they had worked faster after having been slowed down by AI usage.

Makes sense to me. If you are using LLM's "correctly", it will do a lot of the boilerplate plumbing for you. When you are removing tedium, you can feel more productive, even if you are in fact moving slower.

The "taking groceries in" metaphor is a good example. It's probably faster to use 2 or 3 trips if your car is close to your house. But it will feel better and more efficient moving all of it at once. Even if you are moving physically slower, taking stops to readjust bags, and exerting more energy in your muscles.

All to avoid the tedium of needing to take a longer walking path accomplish your goal. Walking feels slow as a low energy task, exerting your strength with a heavier load feels more exhilarating and active.

19 votes
[2]
SloMoMonday
July 13
Link Parent
Looking at the breakdown in the time-per-tasks in the study it appears that people spend considerably less time writing actual code and significantly more time idle. With what a slog testing,...

Looking at the breakdown in the time-per-tasks in the study it appears that people spend considerably less time writing actual code and significantly more time idle. With what a slog testing, research and from-scratch solutions are, i can understand why some people feel like a prompt responses is more efficient. Considering that and the fact this technology will obviously see performance gains going forward, I think idle time could be considered a sort of productivity gain. Even if its time that will inevitably be filled by juggling other tasks.

However, I personally feel like the added time spent on the code, research and testing could be considered the biggest non-financial investmens every software company should be making. Simply because they are a key part of institutional knowledge, skills development and refining critical thinking processes.

Research is a good way of learning not just where to find good info, but how to sort good and bad info, organizing important data for future recall and exposing oneself to related information that may come up in the future. LLMs calculate value based on frequency bias and overt metrics. However niche problems are poorly documented or involve multiple steps to diagnose while requiring novel solutions.

Similarly, from-scratch coding or refactoring may not be particularly fun or rewardin. But an important part of skills mastery is repetitive actions forcing your brain to find not just optimizations but also to develop an intuitive sense of correct methods and outcomes. Everyone that has worked in a field long enough has that story about the problem that couldn't be solved by entire teams over months. But theres that one senior guy that comes in and fixes it in an hour. That guy has already made all the mistakes and has seen all the problems so having that knowledge on hand is a true efficiency. It's the type of experience I don't think you can really capture in LLMs but it can be passed on from person to person.

And lastly, I'm a big believer in testing. While I'm not as aggressively anti code-LLMs any more, one of my hard lines is letting autonomous systems develope and run test cases. But it's one of the selling points of this tech.

A good test case is a sign that you understand what your code is supposed to do. To get that, you need to have clear outcomes and designs. For that, you need to know exactly what problems you are hoping to solve. It is a line that goes all the way up to leadership and a critical preventative measure for catastrophic problems. Its easy to think that your code is just a small cog in a massive machine. But software are information systems. Decision are made based on the information we have and developers have a duty to provide it at the highest quality.

Think back to that UK post-office scandal. System errors that led to untold suffering and death of multiple innocent people. Or improper social media algorithms that lead people down dark paths. Or an untested security update downing global services. Or the lack of UAT/training leading to an aircraft crash. I believe a big reason we are in so many global issues is because important indicators have be completely debased to protect the feelings of investors over accurately reflecting reality.

There's also the chance that I've completely missed the mark here and I'm just another person clinging to traditional virtue in the face of progress. I'm sure people in every industry said the same thing when machines came along and replicated decades of collective effort in seconds. My argument against that would be that the value of automation was in its precision. Same input,same output. I don't think I've seen any autonomous system as wildly inconsistent as natural language models.

13 votes
1. post_below
  July 13
  Link Parent
  I don't think you are. The value of spending time in the backend building, fixing, fucking up, improvising, and so on, is objectively real. Your example of the senior dev fixing a problem...
  
  There's also the chance that [...] I'm just another person clinging to traditional virtue in the face of progress
  
  I don't think you are. The value of spending time in the backend building, fixing, fucking up, improvising, and so on, is objectively real. Your example of the senior dev fixing a problem immediately that had defeated everyone else is one example but there are so many other ways it translates to quantifiable benefits.
  
  Collections of big applications, doing non trivial things, are at the heart of many aspects of daily life. It may seem, from a managerial perspective, that you ask for code to do a thing and then if you get code that does the thing everything is great. When in reality there are 800 ways do do the thing and only 3 of them are reasonably close to right when taking into consideration interactions with other parts of the system and the ways use will likely change over time.
  
  To put it another way, a large application is like an ecosystem. Ideally you understand all the parts of that ecosystem, the ways they interact and how they need to evolve together. If you don't understand it deeply then you're almost definitely going to do things that seem fine at the time but eventually won't be.
  
  Technical debt is easy to think of in terms of something that eventually needs to be paid back and then everything's ok but in the interim it often costs real money in hardware, employee time and customer retention. You can't ever get those costs back.
  
  I suspect that AI coding will cause a lot of extra technical debt to be accrued and, worse, some companies aren't going to replace those outgoing seniors with people that have been engaged enough to have the same level of understanding. So there won't be anyone in the room who can pay the debt back.
  
  Not that AI can't or shouldn't to used for coding. But at the moment the use cases that aren't self defeating from an organizational perspective are pretty narrow.
  
  4 votes

[3]

skybrian

July 13

Link

Simon Willison blogged about this study. Also, one of the participants in the study discussed it on Twitter. This is very relatable and was sometimes a problem for me even without AI.

Simon Willison blogged about this study. Also, one of the participants in the study discussed it on Twitter.

it's super easy to get distracted in the downtime while LLMs are generating. The social media attention economy is brutal, and I think people spend 30 mins scrolling while "waiting" for their 30-second generation.

This is very relatable and was sometimes a problem for me even without AI.

17 votes

[2]
winther
July 13
Link Parent
That is an interesting side effect. Every developer knows that context switching comes with a cost. Sounds like heavy LLM use for coding could lead to less deep focus work.

That is an interesting side effect. Every developer knows that context switching comes with a cost. Sounds like heavy LLM use for coding could lead to less deep focus work.

9 votes
1. skybrian
  July 13
  Link Parent
  I've also read about some developers having five agents going at once in separate VM's. That's a lot of task switching!
  
  I've also read about some developers having five agents going at once in separate VM's. That's a lot of task switching!
  
  4 votes

Raspcoffee

July 13

Link

While I think more research is needed especially for what the LLMs are used for, I can't say I'm very surprised. For things like unit tests and small things I've found that it can be useful but...

While I think more research is needed especially for what the LLMs are used for, I can't say I'm very surprised. For things like unit tests and small things I've found that it can be useful but only when it's small functions I can immediately confirm it does what I want. But the deeper it goes, I don't just have to check the validity of the code but I also have to understand how it works for further development.

Writing code is for me a part of my job, but not even the core of the issue: it's ensuring that the product/service I contribute to does it so satisfactory and that it's easy to maintain in the future.

This all said, again, I do want to see more research. Both because 16 is a low sample size, but also to see where it can be used well.

9 votes

feanne

July 13

Link

Makes sense to me. It's consistent with the saying "reading code is harder than writing code". If you didn't write the code yourself (or even if you wrote it but it's been a while so you've...

Makes sense to me. It's consistent with the saying "reading code is harder than writing code". If you didn't write the code yourself (or even if you wrote it but it's been a while so you've forgotten how you set it all up), you have to spend some time deciphering it before you can modify, extend, or debug it.

9 votes

krellor

July 13

Link

I think it really boils down to what your are using the tool for. Bug fixes seem like a particularly difficult one for AI to handle given how they operate under the hood. When I use AI tools, it...

I think it really boils down to what your are using the tool for. Bug fixes seem like a particularly difficult one for AI to handle given how they operate under the hood.

When I use AI tools, it is usually to create discrete functions where I can give it well defined inputs and outputs, like data parsing, or to help me rough out or prototype features. I can only think of a few times where I used AI to help with fixing a bug, and I gave it very explicit instructions.

Maybe an agent model that automatically reviews a bug report and appends a cursory analysis for the developer would be more helpful than trying to write the fix. Honestly, an agent model that analyzes big reports, and prompts the submitter for more details if they left things out would be awesome.

But I don't do much software work anymore. Mostly personal projects and one offs to expedite/augment my non IT work. So my experience is lagging early adopters of new tooling.

3 votes

vingtcinqunvingtcinq

July 13

Link

It surprised me. I'm definitely a person of the "if I'm doing something it feels fast" fallacy. If I keep on clicking, typing, doing, whatever, I always feel like I'm "going." In public transit, I...

It surprised me. I'm definitely a person of the "if I'm doing something it feels fast" fallacy. If I keep on clicking, typing, doing, whatever, I always feel like I'm "going." In public transit, I prefer walking instead of taking a transfer to another bus / train, because waiting is excruciatingly dull (among other issues, like sometimes that bus just does not come). I would have expected someone to tell me that incessant googling for debugging and whatnot feels busy but just is outdated. At the same time, it says the audience here was pretty familiar with their domain (general experience + "High developer familiarity with repositories") so it sounds like the comparison here is more about vibe coding than being tripped up by the unfamiliar?

I started using Gemini at work for questions about the language of the old monolith (stuff like "How do you mock things in $mock_library for $test_library in $ancient_version"-- I read the docs sometimes but I'm not invested in learning the language much) and it is kind of excruciatingly slow. It has helped me but every time I expect it to take 3 seconds and I just change tasks in the mean time.

2 votes

Link information

11 comments