10 votes

Static analysis, dynamic analysis, and stochastic analysis

For a long time programmers have had two types of program verification tools, static analysis (like a compiler's checks) and dynamic analysis (running a test suite). I find myself using LLMs to analyze newly written code more and more. Even when they spit out a lot of false positives, I still find them to be a massive help. My workflow is something like this:

  1. Commit my changes
  2. Ask Claude Opus "Find problems with my latest commit"
  3. Look though its list and skip over false positives.
  4. Fix the true positives.
  5. git add -A && git commit --amend --no-edit
  6. Clear Claude's context
  7. Back to step 2.

I repeat this loop until all of the issues Claude raises are dismissable. I know there are a lot of startups building a SaaS for things like this (CodeRabbit is one I've seen before, I didn't like it too much) but I feel just doing the above procedure is plenty good enough and catches a lot of issues that could take more time to uncover if raised by manual testing.

It's also been productive to ask for any problems in an entire repo. It will of course never be able to perform a completely thorough review of even a modestly sized application, but highlighting any problem at all is still useful.

Someone recently mentioned to me that they use vision-capable LLMs to perform "aesthetic tests" in their CI. The model takes screenshots of each page before and after a code change and throws an error if it thinks something is wrong.

10 comments

  1. [5]
    post_below
    Link
    Haha... it really is useful. Tip: have the model write to a file with notes about skipped/dismissed items so it doesn't re-surface them on the next run.

    stochastic analysis

    Haha... it really is useful. Tip: have the model write to a file with notes about skipped/dismissed items so it doesn't re-surface them on the next run.

    6 votes
    1. TheD00d
      Link Parent
      I've been doing something similar. More of a quick context reminder. Helps keep the AI aware of what I am trying and tailors output to match.

      I've been doing something similar. More of a quick context reminder. Helps keep the AI aware of what I am trying and tailors output to match.

      3 votes
    2. [3]
      teaearlgraycold
      Link Parent
      I want it to be completely fresh on each run. I would rather re-read the same complaint 3 times than have different runs poison each others' context.

      Tip: have the model write to a file with notes about skipped/dismissed items so it doesn't re-surface them on the next run.

      I want it to be completely fresh on each run. I would rather re-read the same complaint 3 times than have different runs poison each others' context.

      2 votes
      1. davek804
        Link Parent
        I have some common patterns for instructing Claude to output results to a file. I both agree with you that it gets a little hinky without the clear context step, and I would push back a tiny bit....

        I have some common patterns for instructing Claude to output results to a file. I both agree with you that it gets a little hinky without the clear context step, and I would push back a tiny bit.

        There are many useful ways of retaining the context you want while.effectively ensuring the LLM dismisses what you want, without draining all the context.

        3 votes
      2. unkz
        Link Parent
        I generally do them as git issues, so once it’s done I will ask it to propose duplicate candidates which lets me have the code review along with accelerated one click sweeping out of dupes.

        I generally do them as git issues, so once it’s done I will ask it to propose duplicate candidates which lets me have the code review along with accelerated one click sweeping out of dupes.

        2 votes
  2. davek804
    Link
    Your process of clearing context is useful and good for the loop you have. My two cents would be to consider adding another loop/pattern. Work with Claude, or start with your own effort, to...

    Your process of clearing context is useful and good for the loop you have.

    My two cents would be to consider adding another loop/pattern.

    Work with Claude, or start with your own effort, to describe and summarize your codebase into a ./repo/.claude/CLAUDE.md file. When you start up Claude in ./repo, Claude will automatically include that summary file in its context. You can then potentially save a few steps each time you clear the context and ask claude to loop for problems in your commit.

    If you end up liking that new loop, ask claude to output a simple new skill, referenced as /verify-commit or similar. Let it basically output the skill and show you how to use it. Ask it what might be good semantic options to provide for the skill at the jump, versus prompting you within the run.

    Both of these are similar base concepts we use hundreds of times a day across a tiny team of engineers. Really good stuff.

    4 votes
  3. [3]
    glesica
    Link
    This is an interesting idea. I have a codebase at work that I haven't really used Claude with much, partly because it's legacy code and I'm not confident Claude would handle it terribly well...

    This is an interesting idea. I have a codebase at work that I haven't really used Claude with much, partly because it's legacy code and I'm not confident Claude would handle it terribly well (there are a lot of "gotchas" and I don't have buy-in from the team, understandably, to do a wholesale refactor). But just asking it to find problems with specific areas of the code or a single commit could be an interesting way to move the ball forward incrementally.

    3 votes
    1. post_below
      Link Parent
      It's smart not to start with a significant refactor. As you nudge the ball forward, have Claude map out and summarize architecture, features, practices and patterns you encounter, modularized and...

      It's smart not to start with a significant refactor. As you nudge the ball forward, have Claude map out and summarize architecture, features, practices and patterns you encounter, modularized and loosely organized in some way. Use those files as context for future experiments (or demonstrations). Opus can actually be really good with legacy systems if it has enough context to pattern match effectively.

      2 votes
    2. creesch
      (edited )
      Link Parent
      I don't use claude for work projects for a variety of reasons (including legal) but have been experimenting heavily with claude code personally. Specifically to have an actualized accurate...

      I don't use claude for work projects for a variety of reasons (including legal) but have been experimenting heavily with claude code personally. Specifically to have an actualized accurate impression of the current limitations and possibilities. What I've found is that well written skills help a lot in structuring the way model/harness approach projects making them much more useful. In a different comment I mentioned the superpowers skills actually enforcing things like TDD and structured development. But I also faced the issue of having Claude "onboard" on projects with a lot of legacy.

      For that I have created two skills codebase-discovery and codebase-audit which can be used as a starting point. They aren't perfect, but they have been useful to me in various ways. Specifically related to your situation they do surface areas Claude likely will not handle well. But they also offer some broader insight in the code base that I would otherwise not have considered.

      All the usual caveats about LLM usage still apply. The output of these skills is quite a lot and you still need to go over it in detail. Which is exhausting as our brains like to be lazy.

      2 votes
  4. shrike
    Link
    This is pretty much what /simplify does in Claude Code btw. It launches three subagents to check for basic errors and usually finds some even in Claude's own code :D

    This is pretty much what /simplify does in Claude Code btw. It launches three subagents to check for basic errors and usually finds some even in Claude's own code :D

    1 vote