The day that software engineering fully transitions from thinking about and writing my own code into trying to read and understanding LLM extruded code is the day I'll quit being a software...
The day that software engineering fully transitions from thinking about and writing my own code into trying to read and understanding LLM extruded code is the day I'll quit being a software engineer. Still keeping my fingers crossed that day never comes.
This transition has happened to me, and I’m not enjoying it. I’m trying to find the right balance and accept that it’s a tool, but the incentives pushed by Silicon Valley CEOs poison that. I’m...
This transition has happened to me, and I’m not enjoying it. I’m trying to find the right balance and accept that it’s a tool, but the incentives pushed by Silicon Valley CEOs poison that. I’m getting projects done faster but I understand less and I’m more exhausted by it. It’s not a fun feeling realizing the thing you’ve worked so hard to be good at (coding) is being devalued rapidly. Yes yes I understand software engineering is more than coding but at my current and previous role the whole process has been upended. Every employee uses it to shocking levels and it’s made the pace breakneck.
It’s making me want to leave software engineering but to be honest I’m not sure I realistically can given my circumstances and preferences. Fun times!
The Luddites were right. The problem is not the technology (setting aside the power consumption and ethics problems of AI for sake of arguement). It is the way the capitalists will use it to...
Exemplary
The Luddites were right. The problem is not the technology (setting aside the power consumption and ethics problems of AI for sake of arguement).
It is the way the capitalists will use it to devalue and exploit labor, converting artisian goods into mass-produced crap.
A nationalized AI program, ala the Apollo program, would actually be great. All tech produced becomes public domain, proper oversight with clear rules (green energy and batteries only) mitigates environmental damage.
Instead we have a cold war between a few major players, creating much duplicate effort and hoarding good improvemenrs for competitive advantage. Building out for speed at any cost for market captured. Instead of optimizing for sustainablility.
The end result of our current path is that the capitalists use this newfound power to devalue labor. Instead of letting labor incorporate it and use to make better artisian goods.
The only way this gets better is to kill wealth inequality. The only way it gets much better is if we transition to a cooperation-based economy instead of a competitive one.
There should be 0 unemployment because every sector should have every incentive to add another worker, no incentives to fire them, and no tools to trap them.
You get all wages in rough magnitude of each other, minimize friction between hopping jobs, and eventually people will sort themselves into jobs that suit them.
I'll bet there are a handful of top lawyers that would love to be top frycooks if the pay was within the same order of magnitude.
I find myself having gone from (hand writing code and tasking the LLM with implementing sections of it) to (hand writing requirements and doing the supervisory/subtractive work the article talks...
I find myself having gone from (hand writing code and tasking the LLM with implementing sections of it) to (hand writing requirements and doing the supervisory/subtractive work the article talks about). Overall, I find this produces reasonable code at a much higher rate than I could otherwise do.
The biggest win for interacting with the LLM is ending my requirements with "gather context from the code, then ask me questions (at least X) about the task until you are sure you have a clear understanding" Where X scales with the complexity of the task.
I think LLMs are good at code but bad at architecture. Unfortunately, many software engineers are also bad at architecture. So reviews become important, and heavy handed reviews is one of my major tasks. Unfortunately, the rate people can churn out code exceeds the rate I (and other experienced engineers) can review it, so we can't catch everything. I settle for making the things I do catch into teachable moments for the author, so eventually the other engineers may get better at architecture.
I actually find them kind of mediocre with a sizeable portion of their code. Even using Opus 4.8 I’ll be disappointed with the quality of the code. Not always. But if you actually read the code as...
I actually find them kind of mediocre with a sizeable portion of their code. Even using Opus 4.8 I’ll be disappointed with the quality of the code. Not always. But if you actually read the code as a proper code review I think you’ll find yourself disappointed. I end up combing through and a lot of code gets deleted in the process while keeping the same functionality.
They’re still super useful. I use them all the time. But I don’t just blindly trust them as many seem to do.
I think it depends on the (haha) context. The broader the task is (or perhaps the more code the task requires to be complete), the more mediocre the result. Once the complexity of the task reaches...
I think it depends on the (haha) context.
The broader the task is (or perhaps the more code the task requires to be complete), the more mediocre the result. Once the complexity of the task reaches a level where code should be broken down into multiple functions, the choices the llm makes about how to break it down are almost universally bad. This is part of what I meant by LLMs being bad at architecture.
There is an effect with language and framework, too. For example, claude seems much stronger at generating a "correct"/reasonable component hierarchy in React+Mui than it is at the resource/service/repo breakdown in Java+Panache+Quarkus.
For a while I was using Cline with Claude primarily because it shows each change as a diff, and I can review it as it goes and say, "do this not that". However, it often requires a couple of passes to make even simple modifications to a file, so it's a bit tedious.
Lately I have been trying Codex (in the app) with gpt 5.4. I find it to be much faster at making the changes because it's iterating and correcting things internally and just showing me the final product. It is more work to review multiple changes at once, but I'm at least I'm reviewing the final product. Codex's review interface is superior to Claude Code's review interface. And both Claude Code and Codex seem to be more parsimonious with tokens than Cline.
To me the whole natural language conversation format is a huge barrier to setting up context properly. It reminds me of the pre-2000 peak pseudocode days, except much more verbose and needlessly...
To me the whole natural language conversation format is a huge barrier to setting up context properly. It reminds me of the pre-2000 peak pseudocode days, except much more verbose and needlessly conversational compared to terse, objective, and (generally) deterministic instructions that many programming languages are. Especially once you have built up a list of agent skills.
Luckily Claude can ingest mermaid diagrams so that's a mild win for not blowing through tokens I guess.
Not when the author just forwards the comments to the llm. The amount of comment responses I receive answered with a "Good catch, you are absolutely right! [...]" is very disheartening.
I think LLMs are good at code but bad at architecture. Unfortunately, many software engineers are also bad at architecture. So reviews become important, and heavy handed reviews is one of my major tasks. Unfortunately, the rate people can churn out code exceeds the rate I (and other experienced engineers) can review it, so we can't catch everything. I settle for making the things I do catch into teachable moments for the author, so eventually the other engineers may get better at architecture.
Not when the author just forwards the comments to the llm. The amount of comment responses I receive answered with a "Good catch, you are absolutely right! [...]" is very disheartening.
This is pretty similar to my process. For small fixes or bug investigations, I leave it pretty loose with just a paragraph of two (what I want, a bit of context or what I’ve done so far). For...
This is pretty similar to my process. For small fixes or bug investigations, I leave it pretty loose with just a paragraph of two (what I want, a bit of context or what I’ve done so far).
For bigger features, I start with a detailed hand-written outline that includes the goals/motivations, high level requirements, a general outline of how I think it should be implemented along with references to existing conventions, and any gotchas or important things worth highlighting. From that, I’ll have it generate a detailed plan, during which it’ll usually ask for clarifications and product decisions. Once the plan is done, I’ll go through several rounds of having claude do a clean-context audit looking for various issues, edge cases, validating assumptions and claims, things worth explicitly stating, etc. Nothing gets rubber stamped, and I provide a lot of direct feedback/pushback at every step. By the third or fourth iteration, it’s generally a really solid plan that can be implemented by claude.
After implementation, I’ll go through a few clean-context audits to validate the implementation versus the plan and any bugs, inconsistencies, or ambiguities that fell out of the implementation. By the time I hit direct code review, it’s usually pretty good, and any remaining issues are caught in end to end testing.
Overall, I find it to be faster, and together “we” certainly catch more issues up front with a lot less iteration during implementation.
I think a big factor in my favor is that we have a large, old codebase. There’s almost always a precedent to follow for general architecture, and certainly a lot to go on for style and convention. The existing comments, names, and git history provide pretty good context for why things are the way they are, which inform future decisions. I’ve given it a database schema and heavily limited read-only access so it can verify assumptions, investigate scale, and be be aware of performance.
I think some people call this architecture driven design. I've done a few of the design + implementation workflows and found it pretty good. But if it gets started down a weird path, I sometimes...
I think some people call this architecture driven design. I've done a few of the design + implementation workflows and found it pretty good. But if it gets started down a weird path, I sometimes have trouble getting it off that path, even if I reset the context. I think it has something to do with something in the design hitting a weird bias in the training weights.
I think a big factor in my favor is that we have a large, old codebase.
I have found that starting fresh is also pretty good, because you can get the patterns set early. The bad middle is when the codebase has several different patterns that represent evolution toward best practices, because then it might pick up on the wrong pattern.
But, admitting my own biases, I believe this approach is a productive way to engage with LLMs that retains the art of computer programming and properly acknowledges a dual reality: code has gotten much cheaper to create and complexity remains our apex predator.
Side note: Like most Disney stories, The Sorcerer's Apprentice is not a Disney creation. I assume most readers already knew this, but it had to be said.
Side note: Like most Disney stories, The Sorcerer's Apprentice is not a Disney creation. I assume most readers already knew this, but it had to be said.
The day that software engineering fully transitions from thinking about and writing my own code into trying to read and understanding LLM extruded code is the day I'll quit being a software engineer. Still keeping my fingers crossed that day never comes.
This transition has happened to me, and I’m not enjoying it. I’m trying to find the right balance and accept that it’s a tool, but the incentives pushed by Silicon Valley CEOs poison that. I’m getting projects done faster but I understand less and I’m more exhausted by it. It’s not a fun feeling realizing the thing you’ve worked so hard to be good at (coding) is being devalued rapidly. Yes yes I understand software engineering is more than coding but at my current and previous role the whole process has been upended. Every employee uses it to shocking levels and it’s made the pace breakneck.
It’s making me want to leave software engineering but to be honest I’m not sure I realistically can given my circumstances and preferences. Fun times!
The Luddites were right. The problem is not the technology (setting aside the power consumption and ethics problems of AI for sake of arguement).
It is the way the capitalists will use it to devalue and exploit labor, converting artisian goods into mass-produced crap.
A nationalized AI program, ala the Apollo program, would actually be great. All tech produced becomes public domain, proper oversight with clear rules (green energy and batteries only) mitigates environmental damage.
Instead we have a cold war between a few major players, creating much duplicate effort and hoarding good improvemenrs for competitive advantage. Building out for speed at any cost for market captured. Instead of optimizing for sustainablility.
The end result of our current path is that the capitalists use this newfound power to devalue labor. Instead of letting labor incorporate it and use to make better artisian goods.
The only way this gets better is to kill wealth inequality. The only way it gets much better is if we transition to a cooperation-based economy instead of a competitive one.
There should be 0 unemployment because every sector should have every incentive to add another worker, no incentives to fire them, and no tools to trap them.
You get all wages in rough magnitude of each other, minimize friction between hopping jobs, and eventually people will sort themselves into jobs that suit them.
I'll bet there are a handful of top lawyers that would love to be top frycooks if the pay was within the same order of magnitude.
I find myself having gone from (hand writing code and tasking the LLM with implementing sections of it) to (hand writing requirements and doing the supervisory/subtractive work the article talks about). Overall, I find this produces reasonable code at a much higher rate than I could otherwise do.
The biggest win for interacting with the LLM is ending my requirements with "gather context from the code, then ask me questions (at least X) about the task until you are sure you have a clear understanding" Where X scales with the complexity of the task.
I think LLMs are good at code but bad at architecture. Unfortunately, many software engineers are also bad at architecture. So reviews become important, and heavy handed reviews is one of my major tasks. Unfortunately, the rate people can churn out code exceeds the rate I (and other experienced engineers) can review it, so we can't catch everything. I settle for making the things I do catch into teachable moments for the author, so eventually the other engineers may get better at architecture.
I actually find them kind of mediocre with a sizeable portion of their code. Even using Opus 4.8 I’ll be disappointed with the quality of the code. Not always. But if you actually read the code as a proper code review I think you’ll find yourself disappointed. I end up combing through and a lot of code gets deleted in the process while keeping the same functionality.
They’re still super useful. I use them all the time. But I don’t just blindly trust them as many seem to do.
I think it depends on the (haha) context.
The broader the task is (or perhaps the more code the task requires to be complete), the more mediocre the result. Once the complexity of the task reaches a level where code should be broken down into multiple functions, the choices the llm makes about how to break it down are almost universally bad. This is part of what I meant by LLMs being bad at architecture.
There is an effect with language and framework, too. For example, claude seems much stronger at generating a "correct"/reasonable component hierarchy in React+Mui than it is at the resource/service/repo breakdown in Java+Panache+Quarkus.
For a while I was using Cline with Claude primarily because it shows each change as a diff, and I can review it as it goes and say, "do this not that". However, it often requires a couple of passes to make even simple modifications to a file, so it's a bit tedious.
Lately I have been trying Codex (in the app) with gpt 5.4. I find it to be much faster at making the changes because it's iterating and correcting things internally and just showing me the final product. It is more work to review multiple changes at once, but I'm at least I'm reviewing the final product. Codex's review interface is superior to Claude Code's review interface. And both Claude Code and Codex seem to be more parsimonious with tokens than Cline.
To me the whole natural language conversation format is a huge barrier to setting up context properly. It reminds me of the pre-2000 peak pseudocode days, except much more verbose and needlessly conversational compared to terse, objective, and (generally) deterministic instructions that many programming languages are. Especially once you have built up a list of agent skills.
Luckily Claude can ingest
mermaiddiagrams so that's a mild win for not blowing through tokens I guess.Not when the author just forwards the comments to the llm. The amount of comment responses I receive answered with a "Good catch, you are absolutely right! [...]" is very disheartening.
This is pretty similar to my process. For small fixes or bug investigations, I leave it pretty loose with just a paragraph of two (what I want, a bit of context or what I’ve done so far).
For bigger features, I start with a detailed hand-written outline that includes the goals/motivations, high level requirements, a general outline of how I think it should be implemented along with references to existing conventions, and any gotchas or important things worth highlighting. From that, I’ll have it generate a detailed plan, during which it’ll usually ask for clarifications and product decisions. Once the plan is done, I’ll go through several rounds of having claude do a clean-context audit looking for various issues, edge cases, validating assumptions and claims, things worth explicitly stating, etc. Nothing gets rubber stamped, and I provide a lot of direct feedback/pushback at every step. By the third or fourth iteration, it’s generally a really solid plan that can be implemented by claude.
After implementation, I’ll go through a few clean-context audits to validate the implementation versus the plan and any bugs, inconsistencies, or ambiguities that fell out of the implementation. By the time I hit direct code review, it’s usually pretty good, and any remaining issues are caught in end to end testing.
Overall, I find it to be faster, and together “we” certainly catch more issues up front with a lot less iteration during implementation.
I think a big factor in my favor is that we have a large, old codebase. There’s almost always a precedent to follow for general architecture, and certainly a lot to go on for style and convention. The existing comments, names, and git history provide pretty good context for why things are the way they are, which inform future decisions. I’ve given it a database schema and heavily limited read-only access so it can verify assumptions, investigate scale, and be be aware of performance.
I think some people call this architecture driven design. I've done a few of the design + implementation workflows and found it pretty good. But if it gets started down a weird path, I sometimes have trouble getting it off that path, even if I reset the context. I think it has something to do with something in the design hitting a weird bias in the training weights.
I have found that starting fresh is also pretty good, because you can get the patterns set early. The bad middle is when the codebase has several different patterns that represent evolution toward best practices, because then it might pick up on the wrong pattern.
https://lobste.rs/s/onbcu5/code_is_cheap_er
Side note: Like most Disney stories, The Sorcerer's Apprentice is not a Disney creation. I assume most readers already knew this, but it had to be said.
Great essay from Carson Goss as always. I learned a lot from his articles and his book "Hypermedia Systems".