Static analysis, dynamic analysis, and stochastic analysis
For a long time programmers have had two types of program verification tools, static analysis (like a compiler's checks) and dynamic analysis (running a test suite). I find myself using LLMs to analyze newly written code more and more. Even when they spit out a lot of false positives, I still find them to be a massive help. My workflow is something like this:
- Commit my changes
- Ask Claude Opus "Find problems with my latest commit"
- Look though its list and skip over false positives.
- Fix the true positives.
git add -A && git commit --amend --no-edit- Clear Claude's context
- Back to step 2.
I repeat this loop until all of the issues Claude raises are dismissable. I know there are a lot of startups building a SaaS for things like this (CodeRabbit is one I've seen before, I didn't like it too much) but I feel just doing the above procedure is plenty good enough and catches a lot of issues that could take more time to uncover if raised by manual testing.
It's also been productive to ask for any problems in an entire repo. It will of course never be able to perform a completely thorough review of even a modestly sized application, but highlighting any problem at all is still useful.
Someone recently mentioned to me that they use vision-capable LLMs to perform "aesthetic tests" in their CI. The model takes screenshots of each page before and after a code change and throws an error if it thinks something is wrong.
Haha... it really is useful. Tip: have the model write to a file with notes about skipped/dismissed items so it doesn't re-surface them on the next run.
I've been doing something similar. More of a quick context reminder. Helps keep the AI aware of what I am trying and tailors output to match.
I want it to be completely fresh on each run. I would rather re-read the same complaint 3 times than have different runs poison each others' context.
I have some common patterns for instructing Claude to output results to a file. I both agree with you that it gets a little hinky without the clear context step, and I would push back a tiny bit.
There are many useful ways of retaining the context you want while.effectively ensuring the LLM dismisses what you want, without draining all the context.
I generally do them as git issues, so once it’s done I will ask it to propose duplicate candidates which lets me have the code review along with accelerated one click sweeping out of dupes.
Your process of clearing context is useful and good for the loop you have.
My two cents would be to consider adding another loop/pattern.
Work with Claude, or start with your own effort, to describe and summarize your codebase into a ./repo/.claude/CLAUDE.md file. When you start up Claude in ./repo, Claude will automatically include that summary file in its context. You can then potentially save a few steps each time you clear the context and ask claude to loop for problems in your commit.
If you end up liking that new loop, ask claude to output a simple new skill, referenced as /verify-commit or similar. Let it basically output the skill and show you how to use it. Ask it what might be good semantic options to provide for the skill at the jump, versus prompting you within the run.
Both of these are similar base concepts we use hundreds of times a day across a tiny team of engineers. Really good stuff.
This is an interesting idea. I have a codebase at work that I haven't really used Claude with much, partly because it's legacy code and I'm not confident Claude would handle it terribly well (there are a lot of "gotchas" and I don't have buy-in from the team, understandably, to do a wholesale refactor). But just asking it to find problems with specific areas of the code or a single commit could be an interesting way to move the ball forward incrementally.
It's smart not to start with a significant refactor. As you nudge the ball forward, have Claude map out and summarize architecture, features, practices and patterns you encounter, modularized and loosely organized in some way. Use those files as context for future experiments (or demonstrations). Opus can actually be really good with legacy systems if it has enough context to pattern match effectively.
I don't use claude for work projects for a variety of reasons (including legal) but have been experimenting heavily with claude code personally. Specifically to have an actualized accurate impression of the current limitations and possibilities. What I've found is that well written skills help a lot in structuring the way model/harness approach projects making them much more useful. In a different comment I mentioned the superpowers skills actually enforcing things like TDD and structured development. But I also faced the issue of having Claude "onboard" on projects with a lot of legacy.
For that I have created two skills
codebase-discoveryandcodebase-auditwhich can be used as a starting point. They aren't perfect, but they have been useful to me in various ways. Specifically related to your situation they do surface areas Claude likely will not handle well. But they also offer some broader insight in the code base that I would otherwise not have considered.All the usual caveats about LLM usage still apply. The output of these skills is quite a lot and you still need to go over it in detail. Which is exhausting as our brains like to be lazy.
This is pretty much what
/simplifydoes in Claude Code btw. It launches three subagents to check for basic errors and usually finds some even in Claude's own code :D