Part of me wishes it wasn't true but: AI coding is legit
I stay current on tech for both personal and professional reasons but I also really hate hype. As a result I've been skeptical of AI claims throughout the historic hype cycle we're currently in. Note that I'm using AI here as shorthand for frontier LLMs.
So I'm sort of a late adopter when it comes to LLMs. At each new generation of models I've spent enough time playing with them to feel like I understand where the technology is and can speak about its viability for different applications. But I haven't really incorporated it into my own work/life in any serious way.
That changed recently when I decided to lean all the way in to agent assisted coding for a project after getting some impressive boilerplate out of one of the leading models (I don't remember which one). That AI can do a competent job on basic coding tasks like writing boilerplate code is nothing new, and that wasn't the part that impressed me. What impressed me was the process, especially the degree to which it modified its behavior in practical ways based on feedback. In previous tests it was a lot harder to get the model to go against patterns that featured heavily in the training data, and then get it to stay true to the new patterns for the rest of the session. That's not true anymore.
Long story short, add me to the long list of people whose minds have been blown by coding agents. You can find plenty of articles and posts about what that process looks like so I won't rehash all the details. I'll only say that the comparisons to having your own dedicated junior or intern who is at once highly educated and dumb are apt. Maybe an even better comparison would be to having a team of tireless, emotionless, junior developers willing to respond to your requests at warp speed 24/7 for the price of 1/100th of one developer. You need the team comparison to capture the speed.
You've probably read, or experienced, that AI is good at basic tasks, boilerplate, writing tests, finding bugs and so on. And that it gets progressively worse as things get more complicated and the LoCs start to stack up. That's all true but one part that has changed, in more recent models, is the definition of "basic".
The bit that's difficult to articulate, and I think leads to the "having a nearly free assistant" comparisons, is what it feels like to have AI as a coding companion. I'm not going to try to capture it here, I'll just say it's remarkable.
The usual caveats apply, if you rely on agents to do extensive coding, or handle complex problems, you'll end up regretting it unless you go over every line with a magnifying glass. They will cheerfully introduce subtle bugs that are hard to catch and harder to fix when you finally do stumble across them. And that's assuming they can do the thing you're asking then to do at all. Beyond the basics they still abjectly fail a lot of the time. They'll write humorously bad code, they'll break unrelated code for no apparent reason, they'll freak out and get stuck in loops (that one suprised me in 2025). We're still a long way from agents that can actually write software on their own, despite the hype.
But wow, it's liberating to have an assistant that can do 100's of basic tasks you'd rather not be distracted by, answer questions accurately and knowledgeably, scan and report clearly about code, find bugs you might have missed and otherwise soften the edges of countless engineering pain points. And brainstorming! A pseudo-intelligent partner with an incomprehensibly wide knowledge base and unparalled pattern matching abilities is guaranteed to surface things you wouldn't have considered.
AI coding agents are no joke.
I still agree with the perspectives of many skeptics. Execs and middle managers are still out of their minds when they convince themselves that they can fire 90% of their teams and just have a few seniors do all the work with AI. I will read gleefully about the failures of that strategy over the coming months and years. The failure of their short sightedness and the cost to their organizations won't make up for the human cost of their decisions, but at least there will be consequences.
When it comes to AI in general I have all the mixed feelings. As an artist, I feel the weight of what AI is doing, and will do, to creative work. As a human I'm concerned about AI becoming another tool to funnel ever more wealth to the top. I'm concerned about it ruining the livelihoods of huge swaths of people living in places where there aren't systems that can handle the load of taking care of them. Or aren't even really designed to try. There are a lot of legitimate dystopian outcomes to be worried about.
Despite all that, actually using the technology is pretty exciting, which is the ultimate point of this post: What's your experience? Are you using agents for coding in practical ways? What works and what doesn't? What's your setup? What does it feel like? What do you love/hate about it?
LLMs are great for rubber ducking, helping you figure out what to do next, and writing generic boilerplate code that's been in its training data a million times (which is like half of your tasks in web development FWIW).
I genuinely don't believe it's good at anything else. I keep seeing posts like this that are shocked and amazed at how wonderful and efficient it is and I can't help but think of the study that found it actually causes more problems than it solved, despite people using it believing otherwise: https://arstechnica.com/ai/2025/07/study-finds-ai-tools-made-open-source-software-developers-19-percent-slower/
We're 3 years into the LLM craze and I've yet to see where the benefits are. If they were so good, why aren't they being used to easily fix open source issues? Even if it's just AI assisted? Because almost any code touched by AI that isn't ultra-generic is full of hard to find bugs and so many subtle issues it'll make the person reviewing your PR rather block you than deal with time wasting hallucinations.
There are like a million FOSS apps and tools out there and every time AI has a hand in "helping", it's obvious even when not mentioned. Because it produces bad code that looks right. And there hasn't been any major case of it actually helping fix major issues, the only thing I can see is that a lot of companies boasting using AI all the time coincidentally had the quality of their code/apps go down. See: Microsoft bragging about 30% of their code being written by AI, and suddenly new Windows 11 updates having weird and crazy bugs (like the task manager nonsense).
I mean, if your criteria when looking for AI code is “bad code that looks right”, definitionally you’ll only find bad code.
This feels like a CGI case, where all the CGI people remember from movies is bad… because they don’t notice the good examples to begin with, since the intention for most CGI is to be invisible.
The break in this reasoning is that we have strong CGI advocates. We have people who believe that CGI is that future and who would hold up as shining beacons any good examples. If AI had the capability of Silicon Valley's claims, they'd be right to. It would be revolutionary. It would be obvious, by looking at the rapidly growing collection of claims. We would have entire projects with thousands or millions of downstream users, maintained in whole or in large part by AI. Those projects are nowhere to be found. The only growing cries are from maintainers who are being overloaded in entirely new ways. The existence of bad CGI isn't the problem here, any tool can be misused, it's the lack of good CGI.
If that is the case, then you would still be able to find examples of good examples once you start looking for them. If the majority of AI work is supposed to be done by agents, capable of doing the entire process, including making PRs. And if that process is as revolutionary as the claims. Then, why isn't there an explosion in such PRs on a large amount of open source projects? Specifically, why am I not seeing these PRs on AI related open source projects? If I need to target it even more directly, why am I a not seeing hints of this being applied on code agent repositories?
Call me naïve, but you'd think that these specifically want to demonstrate how well their product works. Making an effort to distinguish PRs that are largely the work of their own agents. Yet, I am not seeing that mostly these "secondary" sources and a lot of "trust me, it is there and it as amazing".
And I am seeing something similar on this post. I think the title of the post is overselling what OP is actually getting out of it.
So far this tracks with my experiences as well and seems to hint at what the study also found. Using agentic AI more or less shifts the sort of work you are doing, but it doesn't really speed up the process all that much, if any. Which can still be a net benefit if you are the sort of person who doesn't like doing a lot of the basic tasks. Certainly if you often start with fresh Greenfield projects and not larger established code bases.
This reminded me of the recent drama with Google using AI to detect bugs and file them on open source projects. Specifically, if the AI is so good why is it dumping problems on people rather than solutions?
For additional context, although still very reduced, my understanding of the drama I saw was with ffmpeg and was surrounding the filing of a bug in a code path that only applies to some obscure format
that you have to opt in to by compiling ffmpeg yourself with flags to enable it. On one hand there's the "it's good to have all bugs tracked", but on the other is the "a trillion dollar company is dumping noisy work onto a project that doesn't have the man power to be flooded with AI generated stuff related to edge cases that no real humans even care about anyway." The reason your comment reminded me of this though is that there was a lot of commentary along the lines of, "if the AI is so good why didn't it also submit a PR? As it stands this tool is not automating away tedious human work, but instead automating dumping more manual work on humans."Edit: struck through a section that @gary identified as incorrect.
I think that's a bit like saying "if your compiler is so good at pointing out the errors in my code, why doesn't it just correct them?". AI is not some magical monolithic tool that can do everything all at once. Some people might try and claim that, but some people are idiots - it's the same with microservices and NoSQL and serverless and all the rest, in that some people will promise you the world and be completely wrong, but that doesn't make the underlying tool useless.
In this case, Google specifically trained a model to detect security bugs. I don't believe their system includes a general LLM, at least based on how old the project is. My impression is that it's classical ML stuff with a huge amount of training data. That system cannot fix bugs, but it can make finding them a lot easier. That is a useful task! You can't fix bugs without knowing where they are, and a lot of these old tools have a huge amount of very subtle code that is very difficult to analyse through conventional means.
Beyond that, I think Google's approach here seems fairly fair. They are not demanding that work be done for them - they're just creating CVEs, which let people know about issues but don't necessarily mean that something needs to be fixed immediately. They're also sponsoring ffmpeg's development, as well as contributing to the project themselves. And the tools they're using here can be used (and I believe have been used) to find more serious issues as well that have been useful and important to fix.
Having watched a video about the issue, I think you're fully correct here. There is a real CVE, and it's possible that a Google engineer will eventually fix it on Google's dime. It seems like a lot of people are complaining because "AI" when this kind of testing and reporting is nothing new for open source libraries.
There might be a better video on the topic, but here's the one I watched: https://youtu.be/fxtnI407djY
I left much of that out because it isn't that related to the complaint. The selling point of AI is that it isn't a monotool, so it seems a bit unfair to try to compare it to one when the AI companies are very clearly branding it as the opposite.
There's no reason whatsoever to suggest that they should use the same model to both generate tickets and generate PRs. But AI companies are claiming they have agents good enough to make PRs from tickets all on their own, so why aren't they? Surely Google is smart enough to use more than one tool, invoking additional tools that can work in entirely independent ways is the basis of the entire agent model, yet they don't.
On a personal note I don't really take that much issue with what Google did, although they definitely could have handled it better. But, when confronted with the question "where's the good AI generated code actually solving problems?" it's difficult to not think back to cases like this and wonder why code generation isn't integrated into toolchains like this one. The intuitive answer is that the claims of how good it is at actually generating good PRs are puffery.
People have forgotten that all work on open source projects is on a volunteer basis. A good bug report is a gift. You can take it or leave it. There’s no obligation for anyone to fix it.
That doesn’t change just because it’s Google. The idea that they have some additional obligations to an open source project because they’re a big company is just something people have made up.
Regarding creating pull requests, anyone could try using AI tools to fix bugs and see if they work. But these tools are still unreliable, so there’s no guarantee that they will work for any particular bug.
I think most of what drove it to become drama at all is that people don't consider AI-generated tickets as gifts. People are getting sick of spending their time triaging AI generated tickets that are now being referred to as "CVE slop". I'd wager that if this exact scenario happened 10 years ago it wouldn't have generated any drama at all, but now maintainers of many projects are seeing AI tickets wasting their time and are just so very sick of it.
That does not make this specific ticket slop. But it exists in a larger ecosystem where people are expending effort wading through piles of AI generated noise to uncover what's real. In such an ecosystem I can definitely see how people would get to the point of being like, "unless you're a real person reporting an issue impacting real people please just go away because we don't have time to deal with it."
Yeah, some "gifts" are spam, if they're low quality.
I agree that a lot of companies selling AI are overselling their capabilities. But that's always been true of people trying to sell you things. If you'd listened to MongoDB selling their database software when they first started out, you'd be amazed that anyone would use anything else at all, because it could apparently do everything you wanted, and bring about world peace as a side-hustle. Obviously that was just sales nonsense, but it doesn't necessarily mean that MongoDB can't be useful for specific use-cases.
I agree that claims that AI can generate an entire PR for you are mostly puffery, but I also don't think there's many people who seriously believe that, at least not without clear caveats about the nature and quality of those PRs.
It's actually on by default; at least it appears that the codec is enabled on Debian/Ubuntu. Being an obscure format has nothing to do with security and it was only mentioned by the ffmpeg developer that wanted to stress that Google shouldn't have disclosed the issue. But if the vulnerability is present, the vulnerability doesn't care if the format is obscure or not so long as your ffmpeg install will process the file (and it will).
Edit: relevant twitter thread
Admittedly this is why I hid behind "my understanding". I don't use Twitter so I saw this third hand somewhere where it was focused on the drama. I'm reasonably confident that I saw it claimed that you had to opt in at compilation time, but that doesn't make that claim I saw accurate. The fun question is if what I saw was intentionally misleading to hype up the drama or if it was an oversight.
The study you mentioned comes up often in conversations about AI. I think it's valuable but limited. In part because of the small sample size, but more because of the conditions. There's a learning curve to figuring out what AI can do and what it can't. There's an even bigger learning curve in figuring out how to make it work effectively through MCP, rules, skills and other automatic prompt additions. If the study groups consisted of engineers instructed not to use AI versus engineers using AI who already had extensive experience with it (and the .md files to go with it), I suspect the results would be different. You can absolutely waste a lot of time trying to get AI to do everything for you, and therefore things it's just not good at, but once you know what not to use it for, and have built guardrails against its most common mistakes, the experience changes.
Do we know they aren't? In a practical workflow that uses AI assistance but isn't using AI to actually write all the code, I'm not sure you'd be able to tell that AI was involved at all.
I don't disagree with you that AI produces bad code that looks right, or that it isn't as great as some people have been saying it is. That's been my experience as well. But it's also been my experience that, used thoughtfully, it's incredibly helpful.
Looking forward it's hard to imagine any future that doesn't involve AI as an integral part of software development. But also there will be carnage along the way. Those subtle bugs we've been talking about are silently building up in codebases everywhere and that will only get worse. Not to mention unnecessary, difficult to maintain code that doesn't technically contain bugs. I chuckle when I hear people talk about how many lines of code they've written in last X days with AI that would have taken them weeks or months otherwise. X lines that should have actually been X/4 lines. Get back to us in 6 months when you're wading through that mess trying to figure out how to maintain it without scrapping it completely.
Caveat to that though: Thought it will no doubt plateau at some point, right now models are rapidly getting better with each iteration, their capabilities will most likely be dramatically better in a year.
You place similar caveats in your post as well. I agree with them strongly, but I also think that you are closer to agreeing with the research than you might realize. If I am reading your post correctly, using agentic AI mostly shifted the work you did from writing certain basics to double-checking those basics. There is probably some back and forth happening, prompt refinement, etc.
I firmly believe you are seeing a mental net benefit of not having to do certain tasks from scratch anymore. At the same time, it is entirely possible that you are not seeing a much if any of a time benefit.
I am also not entirely sure if it is the agentic part, not just general model improvements. Then again, agents in this context is a bit of a fuzzy concept. A lot of the IDE tools that were around before the term agentic AI was ever coined are not ret conned to be called agentic.
To be clear, I don't think LLMs are entirely useless. I am a happy user of them myself, with all the same caveats and awareness of limitations you also mention. I personally just classify it as useful but underwhelming compared to the hype and I stand firmly by that.
Since I've started a project from scratch as an AI assist test, I can say for sure that I've saved time, knowing how long it would have otherwise taken. Even with the time I've wasted figuring out the quirks and limitations of the current top tier models I'm still way ahead. However the larger part of that saved time was in the early stages (more boilerplate, less ways for the AI to make mistakes, easier to review code) and the returns definitely diminish as the codebase grows and you get into the more complicated stuff. I'm ok with that though, the mental benefit is still pretty great. We'll see what the time comparison looks like at the end, I'll be happy to admit it if it doesn't save as much time as I'm expecting.
Edit: thinking more about this, I'd say the chances of a net savings are pretty good. I'm well past the point in the project where it's become mostly a waste of time to try to use AI for anything other than things like small tweaks, some testing, brainstorming, technical searching and autocomplete. So now it will proceed at the usual pace of me writing code, but with some of the boring and time consuming bits offloaded to the AI assistant. I think my odds of carrying my current lead across the finish line are solid.
Because open source people by and large are quite public about hating code generation, there is too much stigma for most people to admit that they are using code generation in open source projects. However, I can tell you with absolute certainty that there are in fact patches being submitted and accepted that were done with code generation tools. They just aren’t publicly announced as being done with code generation tools.
For some people and some projects, but that's very far from universal. It's not getting anyone blacklisted from the industry if they allow AI PRs. If AI is the future, some eager maintainer would give it a try, find it true, and start advertising that fact. There wouldn't be a stigma if it could back up its words with action.
That's not how stigma works. This is primarily a social issue.
It's not how stigma works, but I don't think my statement is inaccurate. The stigma would not evaporate from the face of the earth, but it'd be hugely diminished if it was more capable. People roll their eyes when an AI authored PR comes in because everyone can make a reliable, highly accurate assumption that it will not be worth the time it takes to finish reading the emoji-riddled description, let alone the PR itself. The stigma would be a lot less if that wasn't the case.
And even if I'm incorrect about all of the above, I think we can both agree that there's more than enough support, excitement, and bubble to insulate anyone who would want to defect and embrace the AI lifestyle, even under a pseudonym, while they build up and advertise enough inarguable evidence to convince outsiders of the way, the truth, and the light.
Very apt. The models will happily add dependencies willy-nilly. I am constantly telling it to make a seam and separate the concerns into smaller classes.
"Don't add a dependency on NPC to our Sensor Service just to get the range of the currently active weapon/ability. Make an IRangeProvider and pass that in." If I could knock it upside the head, I would. It's good about doing those minor corrections, though, which makes it useful.
I have a little armchair-psychologist theory that using language models tickles the same pathways that addictive activities such as gambling or video games do. There's anticipation of the models next response, the novelty of the response, near-misses as the model gets it so close but not quite right, and variable reinforcement that all of that brings together. I'm not suggesting LLMs are addicting, just that I've felt that my use of it tends to feel similarly to how I feel when playing gambling type games (slots, roguelikes). The other week I caught myself eagerly anticipating using a language model to solve some problem, rather than eagerly anticipating the solution to the problem itself. I'm still a wee bit wary about the endgame of these tools.
That's valid, and I'd say they are addicting. There's a lot to be wary of. Always true with new tech, but maybe never more true than it is now.
And yeah they love dependencies, which is a symptom of how much their training data (people) love them.
It often seems like that. Usually the models first take on a problem is quite decent, but more times than not, once I start asking for tweaks, improvements or fixes, things start to break down. The code gets overly convoluted, stuff that worked starts to break and I can spend lot of time trying to work around that - only to get back to mostly the first version. I had to remind myself to just stop at its starting point and take over from there.
I've had similar experiences with asking AI to implement tweaks or improvements. It's often better to do it yourself from the start rather than waste time trying to fix what the AI comes up with.
Something I've learned is that most models appear to have different "modes". They seem to often interpret requests for tweaks or small upgrades as a "quick fix" and will slack on fully understanding the context (codebase) in those instances.
Exactly what causes the behavior is black boxed. It could be system prompts or reinforcement training, as far as I know the AI companies aren't telling.
But it makes sense, having the full codebase context uses a lot of tokens and eats up a lot of the context window. Both of which have to be managed. Even with the all latest advances the models still get dumber when the context window is full and you don't want people to think your model sucks. If a request is using more tokens (more expensive) and producing worse results, that's a big PR problem.
I'm not saying they're doing it right, and I have no way of knowing what's really going on behind the scenes. My halfhearted attempts at jailbreaking newer models have been unsuccessful. But I know for sure that token use and context window size are big issues in the AI world.
For what it's worth you can force the model to read the necessary context. Whatever their system prompts might be they'll still mostly do what you ask if you preempt the tendencies you don't want. That being said, I use the same strategy as you much of the time: AI as a starting point rather than a real coder.
If that is the case that they are trying to mask problems, it has the exact opposite effect on me at least, as this behaviour makes the tools less useful in my opinion. I adopted the use of LSP in my editor because they provided a reliable and most of all deterministic help when writing code. Meaning I could count on its suggestions being correct, shortcuts working all the time. It is incredibly weird to me that we are as developers are somehow supposed to accept the unreliability of these tools and apparently have to find odd workarounds or jailbreaks in the hope that it will sometimes work. That is not a productivity booster, that is just adding frustrations into every day workflow.
It just depends on whether people keep in mind that it's just a tool. At work I see all the time people following it blindly like the answers it gives are gospel. It's like the issues we used to have with stack overflow on steroids. Imagine someone wholesale copying something they found on stack overflow proposing it as their own and then just copying the original comment from stack overflow as their description of the change. That effectively is what I see at work all the time and my leadership encourages usage of the tooling like this. They don't understand what they're talking about because nobody in leadership is going to have any experience with AI at all since it's so relatively new. They're all full of shit and I just have to deal with this and try to hold the fort down so the building doesn't burn down.
I've seen this at my work too, copilot can do a pretty good job at a lot of things but sometimes produces some garbage output, which my coworkers happily regurgitate. Just because copilot says something works some way, does not actually mean that's correct.
I hate reading a lot of technical writing nowadays. I have to ask myself, "Is this highly detailed system explanation LLM slop. Does this system even exist?"
It's crazy how often the slop slips in. Even in my personal usage, I've had LLM nonsense slip into my rough drafts when I'm trying to figure out how to rephrase something. It's a useful tool, but it's also exhausting at times.
I now trust random decade-old documentation written by an intern more than some senior engineers' "brain dumps" or "AI documentation" that are full of falsehoods.
Depending on the time and place, I think this can be fine. If they're to fix a very specific bug or follow best practices, it's useful to cite a highly rated answer on Stack Overflow.
If anything, I wish my coworkers were more willing to use references!
(If you were talking about someone that blindly copies an irrelevant answer, without attribution, I totally get your point. I blindly copied only once, and I improved a lot after someone explained why that's bad practice!)
The most profound article I've read on the topic, which supports my own findings, is that AI is just a tech-debt machine.
Yes, it will give you a functional prototype. It will get the bare-minimum to work, without considering readibility, maintainability or upgradability. You won't understand it, and anyone you pay to understand it, will prefer to write it their own way.
It writes what looks like working code, and is great for brainstorming, but it won't replace an engineer.
I don't understand what kind of models people use if this is their reality. The models I use consistently write better code than I do with my 20+ years of experience. Am I a better prompter, luckier, or just a very lousy manual coder?
When you've worked with, and directed, actual junior humans you get a Feel for how to describe the issue at hand so that it doesn't leave as much ambiguity.
I don't like saying it, but LLM prompting in many cases is a skill issue first and a tooling issue second.
It's just basic project management from stuff we figured out 30 years ago. Have a clear project goal, split it into small bits you can implement, have documentation available.
I have been coding for about 20 years myself and I also agree the stuff I see from just chatting with the AI is pretty good. A big part of it is that probably everything i learned and coded in the first 10 years of my career is not even used anymore or the processes have evolved so much at this point it might as well be another language.
I've been learning a lot of python this year starting to use AI more to ask it questions. I self taught myself python a few years back and so I got a broad overview of the language, but ultimately settled in on my solutions to problems. If I had a function or way of doing something that I researched and is working well, I just use that forever as long as it is applicable. I don't engage with developer social networks (I generally have found them unpleasant in the past) so don't have a finger on the pulse of the industry at large. I work at a mid to small sized company for a long while so once again there aren't a lot of people around unearthing new things.
All this is to say is that as time goes on I get better as I solve problems on some specific things, but my knowledge as a whole doesn't really improve. Since I started asking AI questions about things I've learned a good amount of things each day that I didn't know existed, or, didn't know how to apply it to my work. When the AI shows me something new I find myself going off on mini learning trips to see if I want to use the thing or not, what it does, how it works etc. It is very relevant to what my mind is working on and so the lessons stick a lot more too.
The other thing people don't mention is that if I have a task in another programming language, rather than needing a week or so to learn the language, with AI I can assume some basic understanding greater than my own and so can lean on it to learn specifically what I need to do rather than broadly learning the language and figuring it out on my own.
I think LLMs have some deficiencies for sure, but they are really amazing search engines through our language and it would be foolish imho to dismiss their practical uses just because business is businessing and over promising/hyping them up.
And which models are you using, please?
Usually what Cursor selects for me is good enough, but sometimes I force it to GPT-5 Codex to get a second opinion, so to speak.
Just to clarify, a lot of people won't understand the code, but I definitely do. I've been writing code for a very long time.
I agree that it won't replace engineers, but that won't stop companies from trying. I have to correct the first bit though, it absolutely writes code that works. There are specific circumstances where that's true and others where it isn't, and guardrails help quite a lot. The days of "everything AI writes will need to be rewritten or scrapped" are over though. Now we're in "a lot of what AI writes will need to be rewritten or scrapped, especially when it's prompted by non-coders".
I do wonder how bad AI is going to be for future language use.
I use F#. I use F# because I think it's a very "modern" language that has done A LOT to ensure you get strong typing while getting rid of the obnoxious boilerplate that comes with. You get very very reliable code that can be trusted to run as expected if it compiles ignoring IO. That on top of the entire dotnet world is wonderful
Naturally, AI is not exactly stellar with it. It's better than I expected, but it still either:
or
2 speaks for itself, but 1 is a very interesting problem.
One of the best things to do with F# is domain driven design, or basically "build your types, then build your functions".
AI, sorta kinda doesn't do this. It still codes in a very standard/imperative style, which means you're losing out on the whole "if it compiles, it runs" joy of functional. It'll sprinkle in mutable variables where they aren't needed and prefers for each to map, which sounds like it should just be a style choice, but it can have long reaching effects on maintaining code.
The point to all this though, is that of COURSE AI is less good on the language that is less used (by several significant margins). However what does that mean for the future language development?
Are we "version locked" into JS/Python/C#/Java/etc now? New language adoption is already hellishly hard, and often requires some "killer app" style moment (ruby on rails) or paradigm shift style moment (welcome to the web, here's your JS), but you still see a LOT of innovation in languages.
F# is far from the only thing playing with functional, or as I like to think of it, "diet functional" that isn't going to force a monad for side effect debugging/features. And yet as more and more AI's train on more and more regurgitated data, you've added yet another hurdle to clear on new adoption. Coders are already borderline superstitious when it comes to defending their lang of choice, but "oh well your new thing doesn't work with the AI" looks like it's going to be a huge one.
Edit-
Oh and
I do think THAT'S the big part. Everything right now is grossly subsidized. We're in the "get them addicted" phase of the product still where companies are hyperscaling or whatever and just shoving money out the airlock into getting adoption.
I expect that number to jump SUBSTANTIALLY once the market starts to really understand what does and doesn't work.
That's a good question. On the current course the question isn't whether or not AI only being really useful at certain languages will suppress adoption of newer ones, because of course it will, the question is how much of an impact will that have?
Definitely. I expect it will push adoption of open source models. We could be proactive and start crowdfunding them now. Aside: an AI would understand that 'substantially' in your comment is important because of the all caps. It might even inspire it to use a unicode emoji 📈
But it’s worse than that isn’t it? Because it also means limited success with new language features and paradigms. I’m old enough to remember seeing things like lambda expressions and other functional programming niceties being added to major languages, as well as JavaScript being upgraded from a “joke language” to one of the most popular programming languages out there after literally thousands of changes. If nobody uses new features because AI is writing all the code, then AI will never learn how to write it and those features will become a waste of time to implement.
I suppose if you're building something generic that's already been done a million times in the training data, and using a ubiquitious language like Javascript, the experience may be different. I've mostly tried using it for building things that are way outside my wheelhouse, uncommon, and tough to research (for example I recently tried rubber ducking off an LLM to help me build a Pebble smartwatch app in C), and the experience has been uniformly hot garbage--confidently steers me in the wrong direction time after time, suggests solutions that can't and would never work, and writes code that doesn't compile and sends me on wild goose chases. Maybe AI coding has a place in certain kinds of work, but it's worse than useless for anything I've ever tried leaning on it for.
Coding with AI is easy and perfectly valid. We've had tools to check for program correctness for decades, linters, type checkers, unit tests etc.
Who cares if the "stohastic parrot" "hallucinates", when what it produces passes all possible tests and does what I asked? =)
It matters a lot, depending on the completeness of your tests. Considering I also see people happily letting LLMs generate both tests and code it tends to happen a lot more.
Even before the advent of LLMs this was also such a weird argument to me. Since I have seen companies who started to dictate code coverage percentages and teams that maliciously complied with things like
assert(true.equals(true))in the tests themselves.Even if we assume good well written unit tests, passing those tests is the bare minimum in a good quality process. It doesn't say much about the integration, it doesn't tell you anything about the used dependencies, etc, etc.
Some of my greatest successes and funniest failures with generated code were using TDD. AI is pretty good at generating tests to match a spec, including being pretty decent at finding good edge cases to add to the test suite. It's also really good at over fitting the implementation to the tests or removing tests that it's having trouble getting working.
The over fitting can usually be solved by making it generate more tests. As an example, I was letting it generate a parser for a reasonably simple language for me. Parsing is reasonably easy and boilerplate heavy, so seemed like a good fit. Well, if all your tests use naming like
struct Testthe parser may, and in my case did, believe it could just discard the identifier "Test" since it seems to be the only legal value anyway. A few more tests that explicitly checked that other names worked made it fix it though.Also don't let it generate tests on the same context window as the code.
Write code -> reset -> ask it to write comprehensive tests.
Or better yet, use a different agent/LLM for writing the tests, as they have different priorities and features.
There are more complex code analysers like Roslyn for C# or Golangci-lint where you can explicitly check for tricks like that.
It's not a silver bullet, but instead of the LLM Agent getting you 50% there and claiming it's done, it can get you 80-90% there when you give it the proper tools to check and enforce its output.
Oh I am well aware there is tooling out there like static code analyzers, more detailed unit test analysis, etc, etc.
Most of those only say something about the potential quality of the code, not correctness of the implementation. Unit tests do say something about that but are a bare minimum as far as I am concerned.
So to answer your original question, I do care as it doesn't tell me nearly enough.
My rule for AI is always: it's great to use it if you have the expertise or some other method to validate the output. Outside that boundary, you're going to have a hard time knowing when it's gone off the rails. And eventually you're going to get burned.
The other problem that I see with AI coding specifically is that the agent's job is to generate code, so if there is any problem it can solve by generating code, it will do that with code generated in place. This leads code with a lot of repeated functionality, even worse because the repetition is slightly different every time. There's no sense of architecture or maintainability or clean interfaces unless you build that in yourself by putting guardrails on the prompts or getting it to rewrite the code according to your specification.
I found that I basically have two modes for using AI. If I'm getting into something that's completely new to me, I will ask it to do a large task from scratch (like generate a new module or script). It's easier to get started this way with something that's working even if it's not perfect . But I expect to have to rewrite that code pretty much from scratch once I understand the problem.
The other mode is basically flow enhancement. Co-pilot autocomplete (using Sonnet 4) is great for this. There's a lot of repetitive code tasks, like initializing every variable that was passed in as an argument to an init function of a class. The autocomplete is scary good at seeing the pattern of what you're doing and completing it. And because this kind of change is smaller in scope and directly related to the current cognitive task, it's much more reliable.
My biggest worry about AI coding is that it works great now when we have a bunch of experienced senior engineers who are able to validate the output of the code and guide the ai. But I don't know what happens as they retire and you're left with engineers who never managed to develop those skills because they were always doing AI coding.
Yes there will be a lot of burning happening. I'm curious to see what the cumulative impact will be.
About senior engineers retiring, that apocalypse is a little further away but I think it will also be exacerbated by seniors getting burned out at companies that are using AI as an excuse to hire less juniors. And if companies continue that pattern, what happens when there's no path from junior to senior because you can't get hired?
Thanks for sharing your personal experiences with AI coding, that's what I'm most interested in... but I'm not surprised the replies are trending more towards implications. Love or hate it, it's a tectonic technological shift.
You are right, it’s fine when the user knows what they are doing. Unfortunately, the more you use it, the more your skills deteriorate and you start losing the ability to correct it.
I've been integrating claude code into my workflow for the last month or so and for actually writing code it is quite useful. I would say upwards of 50% improvement in that area, going up or down depending on what the task is.
Now, of course, as a professional software engineer, the reality is that a minority of my time is spent actually writing code. So it's definitely not a 50% improvement in my overall efficiency. But just the hours it has spent, if you were to bill them, would easily recoup the cost.
And I'd also say that it's also just made me better at the rest of my job, because the parts of writing code that claude code is good at, is also some of the most boring and tedious and soul-sucking parts.
Some people try to make analogies about how it's like a junior engineer or whatever, but ultimately I don't think those kind of categorizations are useful. It is what it is - not quite like any kind of human. You just have to experiment to find good ways to fit in your workflow. There's no other way to figure out things that it'll be good and things it's bad at than just trying them.
Yes indeed. It's amazing how much that changes the development experience.
That's true, analogies don't capture it. But it's hard to talk about without them.
My main complaint is that they can be more trouble than they are worth. But it’s up to me to try to build a heuristic for when that is the case. I’m not sure it’s possible to build such a heuristic that is anything close to optimal. So I lean towards using it less. That way I learn more anyway.
I think of it as more of an advanced autocomplete with a large context window. If you can plan a project, know what data structures you want to use, what methods you want to implement, and vaguely how you want them to be implemented, AI coding assistants work great. If you only know vaguely what you want an entire project to accomplish, AI will make a lot of assumptions and probably not do what you wanted.
It’s like if you were an English speaking author who needed to write a book for a Spanish language market. You think in English, but have to write your English thoughts in Spanish. You may have a great book in English fairly quickly, but you do a lot of back and forth revisions with the editor to perfect the Spanish translation. If you had a dedicated translator, you could spend your time writing the best book you can and let someone else deal with the translation.
AI coding assistants are the same. Computer scientists can focus on the math behind new algorithms or program design while the coding assistants implement their designs into a coding language.
I'm not a coder and for me, the free ChatGPT is an awesome coding assistant!
Obviously I'm not going to use the results in any commercial capacity or even publish them. But they are good enough for personal life improvements, especially creating snippets for my Obsidian that would have taken me days to figure out on my own. Would have taken, and did, two years ago when I first started my Obsidian journey. I spent days researching how to do something, starting from what language to use (for example, the Dataview plugin has its normal query language but also DataviewJS - both completely foreign to me when I started). I did get things done and I'm very proud of myself! But the time it took to make a small bit of my vault better wasn't really worth it and I had to let go of some plans and dreams I had.
I still won't pay to use an LLM, but if I can speed up making stuff with the free versions, I will.
When I started experimenting, I often got completely unusable code, especially when trying to do one of the Javascript-ish languages that some Obsidian plugins understand. Often it was still faster for me, as a complete rookie, to perform online searches, read forums and learn how to do it myself. Python worked a lot better and was easier for me to troubleshoot too, just by looking at the result and trying to decipher what was going on.
Today it's gotten a lot better. I don't have to be super vigilant about some working part of the code changing irrationally when I ask for a tweak elsewhere. I often get either a working solution, or something close to one, if I give it some sort of structure to start from. It seems to be more adept with Javascript now too, mostly the common versions and not the ones specific to Obsidian plugins that are a lot more limited. But at least the results are now solid enough that I can usually guess why something isn't working and suggest alternative ideas.
My nicest experiences have been when I can intuit that some snippet I created could be more efficiently written - for example a CSS file that has a lot of repetition and I just don't know the correct syntax to simplify it. I also don't know enough terminology to do a traditional search. I can show the file to the LLM and ask it to simplify things, and it will explain why it does what it does, using the lingo I was unfamiliar with. Seeing it used in my specific context makes it quite clear what's what, so I end up learning a little more about coding basics as I go along. This is obviously at a super elementary level, like yesterday I learned what "selector", "grouped selector" and "attribute selector" are. I've known what a selector is for but didn't know the term for it. I knew there must be a grouped selector type of thing but didn't know the syntax, and an attribute selector was completely new info.
In short, my experience as a non-programmer doing programming is awesome! For the things I'm professional at, I don't use generative AI because it doesn't do nearly as good a job as I do as a human, and making art "faster" doesn't usually yield better results anyway because the work evolves during the process, and my brain can only grasp the evolving at certain speeds after which a speed increase becomes counterproductive. Even making some sorts of basic elements that I would then put together to create the work doesn't make sense. The quality of the ingredients matters as much as the big picture, a lot more so than it does in programming.
But yep, I'm guessing that the tech bros who wanted to become/replace artists aren't exactly getting what they wanted out of this - instead they got a reality where an artist can now be a bit of a tech sis without having to rely on some geek's help at every twist and turn (no shade to the geeks in my life that I adore - those who aren't invested in controlling and oppressing artists). Freedom! I'm fully expecting this resource to not be available for free forever, so I'm trying to use it as actively as I can so that by the time they take it away, I'll know my shit and can continue doing this on my own.
Thanks for posting, it's cool to hear about a (sort of) non-coder using these tools for personal projects. Using them professionally I have some version of the thought "this would be a shitshow for someone that couldn't read the code" pretty often. But I can see how it would be fantastic for one off scripts and relatively small projects. Some of the paid models can write code in that context with crazy high succes rates on the first try (80%+). They only start losing their minds when things get more complicated.
It sounds like you're saying, as an artist, that you don't see AI as a major threat? Can you share more? I've heard the opposite from a lot of people.
I know I'm not a good enough coder to lean too heavily on code that it generates. If I can't understand what it's doing, I can ask it to explain it; if I don't get it, or am too lazy to learn, then I don't use the code. It's exceptionally stupid, but oh so very patient and so very fast to come up with more idiotic broken code. It's handy if I already know how to write this and I can easily fix the obvious mistakes.
What I do find is handy is using it essentially as a search engine: what tools are other people using for this type of problem? Send me a link of when folks are discussing this problem. What are some novel ways people are using this tool for? Then I read the human discussion on this, or download the human discussed useful tool.
The other use I'm getting sloppy on: explain X to me, summarize Y, find me the relevant pdf amongst these 50 where topic Z is discussed.
Yeah AI is really good for searching and summarizing, which makes it really good for learning. And for having some(thing) else search stackexchange for you!
Sadly I could see AI slowly killing stackexchange and then where will it get its information?
It's already happening to discussions that it had mined from Reddit: fewer actual people are contributing now, and the contributions all look exactly the same. Maybe we go back to paying for expertise instead of having it free. Maybe we get used to code that doesn't work all the time, lots of page navigation dead ends and auto generated FAQs that don't match to reality, and the size of software to increase dramatically because there's no such thing as architecture or QA anymore
The conversation around AI is sadly insanely polluted with toupee fallacy, people using it incorrectly, for suboptimal use cases or indeed training it incorrectly. As well as the completely unrealistic standard of comparing 1 short prompt and 10 seconds of time to many man hours not counting education/training. Plus just plain ideological hatred/shilling. The best way is to look at concrete results rather than opinions.
I haven't given it a serious try in a couple of years. Are there good tutorials now?
I haven't really looked. I have gotten some insights from various blog posts but sadly I don't have links.
My best tip is go slow and note everything it does that you don't want so that you can come up with rules to make it less of an idiot in future sessions. In the current generation of models there are consistent patterns of bad behavior that you have to account for. And be skeptical when it feels like magic, sometimes it really kind of is, but just as often it's an illusion.
Insane value multiplier. Not even up for debate how useful and powerful it is.
Of course it's up for debate. It's an ongoing debate everywhere right now. The studies aren't in, the polls have barely been sent out, the conclusion is not done and written in stone.
I made an argument. You made a counter argument. Good interaction on the Internet!
I've long appreciated when someone has the ability to make a firm statement, designed to be arguable. Sometimes one must take a stance in order to illicit conversation.
You seem to not think it is useful or powerful, as I think it is.
Is the statement "This is not arguable" an argument? No facts or statements were put forth as proof or evidence.
I think it's a tool. It has uses, and I make use of it. But like the 10x engineer myth before it, I don't see any indications that multipliers are a thing that exist outside of games. The studies that have been published, while preliminary, hint at AI reducing productivity in the short term, and having the effect of loss of skill in the not-as-long-as-you-might-think term.
That's how some people get rich. They grasp at what they feel is right way before the science is settled and everyone agrees about it.
That's of course also how some people get poor :)