18 votes

How AI assistance impacts the formation of coding skills

Posted January 31 by saturnV

Tags: artificial intelligence, language models.large, skills, research, programming, source.anthropic

https://www.anthropic.com/research/AI-assistance-coding-skills

Link information

This data is scraped automatically and may be incorrect.

Word count: 2140 words

5 comments

[3]
saturnV (OP)
January 31
Link
Evidence from anthropic around the tradeoffs of different ways of using AI, where the more you delegate the more your skills will atrophy. However, using AI to generate code snippets can still be...

Evidence from anthropic around the tradeoffs of different ways of using AI, where the more you delegate the more your skills will atrophy. However, using AI to generate code snippets can still be net positive. n=52 isn't super conclusive but still an interesting topic, looking forward to future research.

Also interesting that this is something anthropic researchers feel comfortable publishing about when it could be pretty easily spun into something anti-AI

11 votes
1. GOTO10
  February 2
  Link Parent
  You basically turn into midlevel management? ;)
  
  where the more you delegate the more your skills will atrophy
  
  You basically turn into midlevel management? ;)
  
  5 votes
2. Turtle42
  February 2
  Link Parent
  If generative AI and LLMs are here to stay I want Anthropic to be the poster child for how an AI company should do business. I know they’re not perfect but they don’t seem as intent on destroying...
  
  If generative AI and LLMs are here to stay I want Anthropic to be the poster child for how an AI company should do business. I know they’re not perfect but they don’t seem as intent on destroying the world as openAI is.
  
  4 votes
[2]
lonbar
January 31
Link
The bottom of the page links to the preprint on arXiv, which goes into more detail. I'm not in a position to quickly evaluate the methods and results, but an immediate worry would be that...

The bottom of the page links to the preprint on arXiv, which goes into more detail. I'm not in a position to quickly evaluate the methods and results, but an immediate worry would be that anthropic is not a disinterested party in this project. I wonder how much that has affected their design choices for this experiment and their interpretation of the results.

9 votes
1. sparksbet
  February 2
  Link Parent
  I'm reading through it so far to get my thoughts on it. I do think that the authors are being forthright researchers throughout, and I'm less likely to dismiss their results as being entirely...
  
  I'm reading through it so far to get my thoughts on it. I do think that the authors are being forthright researchers throughout, and I'm less likely to dismiss their results as being entirely propaganda because, honestly, the quantitative results don't even show statistically significant improvement in the time it took to complete the task for those in the AI condition!
  
  This is in part due to the sample size and distribution -- their results pretty clearly show that those in the AI condition had better task completion time only among the less experienced programmers (1-3 years of experience), and that there wasn't any difference. The small number of inexperienced programmers in their sample is a big part of why the improved task completion time for the AI condition didn't reach statistical significance. Since a larger gap between the AI and no-AI groups in task completion time is what I would have expected going into a study like this, I think a study that was more deliberately designed to make their AI products look good would have had a larger number of inexperienced programmers in the sample.
  
  I really liked the part where they discussed their pilot studies -- though it's a little depressing that they had so many troubles with noncompliance (those in the no-AI group using AI anyway) and understanding of basic Python syntax. But I think it was smart to have these pilots beforehand to eliminate issues like this (and I wish I knew with platforms they used so I could speculate more about why one of them had such high rates of noncompliance).
  
  I think some people might rush to criticize the qualitative assessment of participants' use of AI, but I think they did a good job establishing their typology and they were diligent about reporting how the higher-scoring typologies tended to also take more time than the lower-scoring ones. Even in the section where they attribute much of the lack of improvement in task completion time to the time spent interacting with the AI, they don't shy away from the idea that the time spent on this part of the task is actually a big contributor to learning, and that understanding and time-saving are at odds when using AI here:
  
  A more significant difference in completion
  time due to shorter interactions with AI assistance would likely translate to an even larger negative impact on skill formation.
  
  They also even investigated differences based on copy-pasting vs manually typing, which indicated a lot of the time improvement was simply from not needing to type it out yourself:
  
  e. Participants in the AI group who directly
  pasted (n = 9) AI code finished the tasks the fastest while participants who manually copied (n = 9) AI generated code or used a hybrid of both methods (n = 4) finished the task at a speed similar to the control condition (No AI).
  
  There was a smaller group of participants in the AI condition who mostly wrote their own code without copying or pasting the generated code (n = 4); these participants were relatively fast and demonstrated high proficiency by only asking AI assistant clarification questions. These results demonstrate that only a subset of AI-assisted interactions yielded productivity improvements.
  
  My biggest complaint with the qualitative side is that they don't go into how it interacts with programming experience. I highly suspect that the programmers who are exclusively using AI for conceptual help and then write their own code are likely to be more experienced, for instance. This could account for their being faster than even the copy-pasters despite writing their own code. Regardless of whether that's true, I think discussing the relationship between experience and the way that AI is used would be super interesting to delve into and I'm kinda disappointed they didn't do that in the paper.
  
  Ultimately I think that their methodology was sound, though, and I think the way they frame their results averts a lot of my concerns about the research being fluff propaganda, as it even warns about the potential future negative effects of AI coding assistants on our body of skilled programmers:
  
  Together, our results suggest that the aggressive incorporation of AI into the workplace can have negative impacts on the professional development workers if they do not remain cognitatively engaged. Given time constraints and organizational pressures, junior developers or other professionals may rely on AI to complete tasks as fast as possible at the cost of real skill development. Furthermore, we found that the biggest difference in test scores is between the debugging questions. This suggests that as companies transition to more AI code writing with human supervision, humans may not possess the necessary skills to validate and debug AI-written code if their skill formation was inhibited by using AI in the first place.
  
  5 votes