9 votes

My LLM codegen workflow

Posted February 20 by bln

Tags: programming, development, artificial intelligence, author.harper reed, source.harper

https://harper.blog/2025/02/16/my-llm-codegen-workflow-atm/

Link information

This data is scraped automatically and may be incorrect.

Published: Feb 16 2025

2 comments

creesch
February 21 (edited February 21)
Link
It’s an interesting overview, and I appreciate seeing the perspective of someone who relies on LLMs more extensively than I do that isn't your typical "AI tech bro" view. While I use them...
- Exemplary
It’s an interesting overview, and I appreciate seeing the perspective of someone who relies on LLMs more extensively than I do that isn't your typical "AI tech bro" view. While I use them regularly, my approach is different. I treat them more as interactive rubber duckies rather than a foundation for my work. For me, LLMs are useful for breaking through roadblocks, clarifying ideas, or providing alternative perspectives, but I wouldn’t want them driving my entire workflow.

One thing I’ve found particularly useful is comparing different LLMs side by side. No single model is consistently good at every task, and I suspect this inconsistency is partly due to model updates happening behind the scenes. A clear downside of using the closed models through APIs as they are not transparent about that. Anecdotally, GPT models seem to handle Java better for me, while Claude is slightly stronger with vanilla JavaScript and HTML. But even that’s not reliable, so I often ask both models the same question and then have them critique each other’s answers. It helps catch inconsistencies and weaknesses I might have missed.

I also don’t really get tools like Cursor, Copilot, or other codegen assistants that try to write code directly for me. Every time I try them, I find their autocomplete suggestions more frustrating than helpful. They seem to be optimized for speed rather than accuracy, and I spend more time fixing their output than if I had just written it myself. I can see how they might work well for some people, especially those who structure their workflows around them, but they don’t feel like productivity boosters in my experience.

That said, LLMs are often neat helper tools for explaining concepts, deciphering messy legacy code, reviewing my own work before a PR (within reason, because they also do tend to suggest a lot of things that are outright silly or wrong), or giving quick refreshers on things I haven’t touched in a while. I regularly use them to break down complex code into digestible explanations, especially when dealing with dense, spaghetti-like logic. They’re also handy for sanity-checking my own changes sometimes they point out small issues I overlooked, even if I don’t trust them enough to take their suggestions at face value.

But, to me, there’s a clear distinction between using LLMs as a tool versus relying on them to do the work. Reading the post I am not sure the author sees it quite in the same way. Specifically when I read this section:
The workflow is like this:

set up the repo (boilerplate, uv init, cargo init, etc)

paste in prompt into claude

copy and paste code from claude.ai into IDE

run code, run tests, etc

…

if it works, move on to next prompt

if it doesn’t work, use repomix to pass the codebase to claude to debug

rinse repeat ✩₊˚.⋆☾⋆⁺₊✧
The process wouldn't be the same for me. But, more importantly, at the bolded bullet points I am missing an explicit critical look at the generated code. It isn't just important that it runs and test pass. Tests can be wrong and even if "works" that doesn't mean it works well or isn't setting you up for issues later. It is a theme I see with a lot of people who heavily rely on AI to generate code. I am not saying code should be perfect, but I shudder to think how many easily avoidable bugs are created by AI generated code this way.
I think the author here likely failed to mention the step but possibly is applying it anyway. However, I see a lot of people online treating LLM output as authoritative, blindly integrating it into projects, and skipping the critical thinking step. It reminds me of junior developers who copy-paste Stack Overflow answers without really understanding them, but now on a whole new scale.

I think this is where the danger lies. LLMs lose the plot over time, especially in longer interactions with complex codebases. They’ll initially generate something decent but then gradually introduce inconsistencies, regressions, or outright hallucinations. This is fine if you know what you’re doing and can spot these issues, but if you don’t have the expertise to validate the output, you’re just trusting the machine and hoping for the best. That’s why I wouldn’t recommend these workflows to junior devs or non-programmers. It’s too easy to create something that looks functional but is full of hidden problems.

Overall, I think LLMs are useful tools when used with caution. But the way people approach them makes all the difference. If you treat them as a way to enhance your own expertise, they can be useful tools. If you treat them as a replacement for expertise, you’re asking for trouble.

It is neat to see some of the tools mentioned. Repomix for example might be useful for some tasks. Aider I have seen mentioned before but to me it just seems like just another chat interface for people who prefer the terminal. But I might be missing something there. As I said, the code gen tools mentioned in the blog post aren't my favorite.
7 votes
bln (OP)
February 20
Link
A nice practical view about using LLM for coding. I have used those tools punctually for debugging and finding a function quickly, but this shows how their extensive use can work.

A nice practical view about using LLM for coding. I have used those tools punctually for debugging and finding a function quickly, but this shows how their extensive use can work.