20 votes

Building a C compiler with a team of parallel Claudes

12 comments

  1. [5]
    Barney
    (edited )
    Link
    So I was curious and gave it a go. Tested it with a Munchausen number generator. Code #include <math.h> #include <stdbool.h> #include <stdio.h> #define MAX 440000000 int cache[10]; bool...
    • Exemplary

    So I was curious and gave it a go.

    Tested it with a Munchausen number generator.

    Code
    #include <math.h>
    #include <stdbool.h>
    #include <stdio.h>
    
    
    #define MAX 440000000
    
    int cache[10];
    
    bool is_munchausen(const int number)
    {
        int n = number;
        int total = 0;
    
        while (n > 0)
        {
            int digit = n % 10;
            total += cache[digit];
            if (total > number) {
                return false;
            }
            n = n / 10;
        }
    
        return total == number;
    }
    
    void set_cache()
    {
        cache[0] = 0;
        for (int i = 1; i <= 9; ++i) {
            cache[i] = pow(i, i);
        }
    }
    
    int main()
    {
        set_cache();
    
        for (int i = 0; i < MAX; ++i)
        {
            if (is_munchausen(i)) {
                printf("%d\n", i);
            }
        }
    
        return 0;
    }
    

    First of all, it didn't compile. It made me specify the absolute paths for every single header. It didn't find stddef and stdarg that's included in stdio, so I had to manually change those as well. Here is an open issue about this.

    Second of all, the performance is horrendous. It is pretty much equivalent to no optimisation with gcc.

    gcc -O3 takes 2.89 seconds to run on my machine, this took 9.52.

    To answer the inevitable AI shilling "but it wrote a C compiler!!!!". Yes, however, there are numerous C compilers out there which the AI was trained on. It didn't have to do anything novel, and didn't do a great job at that.

    Writing a no-optimisation C compiler isn't as glorious a feat as they make it sound, and the 100k lines of code are ridiculously high.

    30 votes
    1. post_below
      Link Parent
      To be fair to them, they didn't claim it was glorious in the blog post. It mentions that, with all optimizations enabled, it performs worse than GCC with all optimizations turned off. It also...

      To be fair to them, they didn't claim it was glorious in the blog post. It mentions that, with all optimizations enabled, it performs worse than GCC with all optimizations turned off. It also talks about the code quality being sub par.

      The frustrating thing is that's not likely how the media and bloggers will talk about it. It will be another round of AI doom "it's coming for your job". It will fuel the hot takes that AI can now truly just write software. It will help suck in a new round of vibe coders. Except this year they want to be called "vibe engineers".

      What it really is, like Cursor's far worse and more expensive example before it, is a somewhat interesting proof of concept. A few short years ago the possibility of agents creating any non trivial application autonomously was absurd.

      I hate the hype too. But if the well wasn't poisoned by hype, and the airwaves saturated with AI discussion, we'd all be at least a little bemused by this.

      13 votes
    2. saturnV
      Link Parent
      I partially agree with this, but most of the interestingness of this mentioned in the article is the fact that previous models couldn't do this but the newest one can (and anyways it's not an...

      and the 100k lines of code are ridiculously high.

      I partially agree with this, but most of the interestingness of this mentioned in the article is the fact that previous models couldn't do this but the newest one can (and anyways it's not an insane amount of overhead given the fact it handles so many edge-cases), e.g. tcc is ~40k lines, maybe you'd expect this one to be less because rust is more expressive, but still only an order of mag overhead (generously) which isn't that bad

      also there really aren't that many non-toy c compilers, I would be careful making it out as trivial, and I think much of your comment reads as disingenuous when reading the article and seeing how plain and uninterested in hype it is, e.g.

      The compiler is an interesting artifact on its own, but I focus here on what I learned about designing harnesses for long-running autonomous agent teams: how to write tests that keep agents on track without human oversight, how to structure work so multiple agents can make progress in parallel, and where this approach hits its ceiling.

      The compiler, however, is not without limitations. These include:
      It lacks the 16-bit x86 compiler that is necessary to boot Linux out of real mode. For this, it calls out to GCC (the x86_32 and x86_64 compilers are its own).
      It does not have its own assembler and linker; these are the very last bits that Claude started automating and are still somewhat buggy. The demo video was produced with a GCC assembler and linker.
      The compiler successfully builds many projects, but not all. It's not yet a drop-in replacement for a real compiler.
      The generated code is not very efficient. Even with all optimizations enabled, it outputs less efficient code than GCC with all optimizations disabled.
      The Rust code quality is reasonable, but is nowhere near the quality of what an expert Rust programmer might produce.

      The resulting compiler has nearly reached the limits of Opus’s abilities. I tried (hard!) to fix several of the above limitations but wasn’t fully successful. New features and bugfixes frequently broke existing functionality.

      Also I think it's relevant to this that the author nicolas carlini is a very skilled programmer in his own right, and has engaged deeply with the strengths and weaknesses of AI for years now (see https://nicholas.carlini.com/writing for both)

      4 votes
    3. [2]
      skybrian
      Link Parent
      I wonder how they compiled the Linux kernel? Does it take any compiler flags? I doubt they modified the source code to put absolute paths everywhere.

      I wonder how they compiled the Linux kernel? Does it take any compiler flags? I doubt they modified the source code to put absolute paths everywhere.

      2 votes
      1. Barney
        Link Parent
        I'm not entirely sure, though I suppose for 16 Claudes hardcoding some paths isn't really a problem. Here is an open issue about it. Most comments are similarly cynical, quite a fun read.

        I'm not entirely sure, though I suppose for 16 Claudes hardcoding some paths isn't really a problem.

        Here is an open issue about it. Most comments are similarly cynical, quite a fun read.

        3 votes
  2. [3]
    TurtleCracker
    Link
    Isn't this disingenuous? Presumably the model was trained on existing compilers and codebases. How can it be considered clean-room?

    This was a clean-room implementation (Claude did not have internet access at any point during its development); it depends only on the Rust standard library.

    Isn't this disingenuous? Presumably the model was trained on existing compilers and codebases. How can it be considered clean-room?

    20 votes
    1. [2]
      skybrian
      Link Parent
      The point of experiments like this is to see what the model can do. Yes, the model is trained on the Internet, but so were all the previous models they tried, and the previous models couldn't do...

      The point of experiments like this is to see what the model can do. Yes, the model is trained on the Internet, but so were all the previous models they tried, and the previous models couldn't do it.

      Maybe the wording isn't quite right, but I don't think anyone is misled, since they clarified what they meant in parentheses.

      8 votes
      1. TurtleCracker
        Link Parent
        Maybe? But that’s not what clean-room means. The equivalent for humans is if you let engineers read compiler source code for a few months before locking them in a room with no internet and had...

        Maybe? But that’s not what clean-room means. The equivalent for humans is if you let engineers read compiler source code for a few months before locking them in a room with no internet and had them make a compiler. It’s as close as you can get to an open book test without being one.

        6 votes
  3. [2]
    teaearlgraycold
    Link
    Copying my comment from HN:

    Copying my comment from HN:

    With just a few thousand dollars of API credits you too can inefficiently download a lossy copy of a C compiler!

    16 votes
    1. Barney
      Link Parent
      It's actually quite a bit more than a few thousand dollars :P I'm surprised how much it amounted to though. 100k lines is way too high for a bad C compiler, but even disregarding that, 5 lines /...

      [...] $20,000 in API costs [...]

      It's actually quite a bit more than a few thousand dollars :P

      I'm surprised how much it amounted to though. 100k lines is way too high for a bad C compiler, but even disregarding that, 5 lines / dollar is very expensive.

      6 votes
  4. skybrian
    Link
    From the article: ... ... ...

    From the article:

    To stress test it, I tasked 16 agents with writing a Rust-based C compiler, from scratch, capable of compiling the Linux kernel. Over nearly 2,000 Claude Code sessions and $20,000 in API costs, the agent team produced a 100,000-line compiler that can build Linux 6.9 on x86, ARM, and RISC-V.

    ...

    Previous Opus 4 models were barely capable of producing a functional compiler. Opus 4.5 was the first to cross a threshold that allowed it to produce a functional compiler which could pass large test suites, but it was still incapable of compiling any real large projects. My goal with Opus 4.6 was to again test the limits.

    ...

    This was a clean-room implementation (Claude did not have internet access at any point during its development); it depends only on the Rust standard library. The 100,000-line compiler can build Linux 6.9 on x86, ARM, and RISC-V. It can also compile QEMU, FFmpeg, SQlite, postgres, redis, and has a 99% pass rate on most compiler test suites including the GCC torture test suite. It also passes the developer's ultimate litmus test: it can compile and run Doom.

    ...

    ...

    As one particularly challenging example, Opus was unable to implement a 16-bit x86 code generator needed to boot into 16-bit real mode. While the compiler can output correct 16-bit x86 via the 66/67 opcode prefixes, the resulting compiled output is over 60kb, far exceeding the 32k code limit enforced by Linux. Instead, Claude simply cheats here and calls out to GCC for this phase (This is only the case for x86. For ARM or RISC-V, Claude’s compiler can compile completely by itself.)

    10 votes
  5. Rocket_Man
    Link
    This is great, as these models continue to get better we might see more web rendering engines start to compete with chrome. Plus more game engines, etc. Some will say it's 'slop' but honestly most...

    This is great, as these models continue to get better we might see more web rendering engines start to compete with chrome. Plus more game engines, etc. Some will say it's 'slop' but honestly most things are AI or not.

    4 votes