8 votes

Executing programs inside transformers with exponentially faster inference

3 comments

  1. indirection
    Link
    For a while I've dreamed of something like this, but with my basic knowledge of LLMs I didn't know it was feasible. Despite this, I found the article very readable (casual tone, explains things,...

    TL;DR

    Language models can solve tough math problems at research grade but struggle on simple computational tasks that involve reasoning over many steps and long context. Even multiplying two numbers or solving small Sudokus is nearly impossible unless they rely on external tools.

    But what does it take for an LLM itself to be as reliable and efficient as a computer?

    We answer this by literally building a computer inside a transformer. We turn arbitrary C code into tokens that the model itself can execute reliably for millions of steps in seconds.


    The model does not call an external tool. Instead, it executes the program directly via its transformer weights, producing an execution trace token by token and streaming results at more than 30k tokens/sec on a CPU.

    The key technical idea is a new decoding path for execution traces that turns the model's attention lookups from linear scans into queries that take logarithmic time, enabling millions of correct execution steps inside a single transformer run.

    For a while I've dreamed of something like this, but with my basic knowledge of LLMs I didn't know it was feasible. Despite this, I found the article very readable (casual tone, explains things, and lots of interactive visuals).

    Can the model learn to do pruned symbolic execution? Can it learn to explain its execution trace in English? I imagine this can make LLMs great debuggers, who use it to run code while "understanding" what is happening enough to know exactly where execution deviates. Maybe this can also help improve the LLM's world model, if during inference the LLM runs a world simulation; or if the LLM can run programs with ambiguous "natural" parts, itself can become a world simulation (or any other program with seamless ML integration).

    2 votes
  2. [2]
    skybrian
    Link
    This seems like a neat trick, but they don’t discuss the larger implications at all. Can this model read and write English as well or is it just a weird interpreter? How would the the language...

    This seems like a neat trick, but they don’t discuss the larger implications at all. Can this model read and write English as well or is it just a weird interpreter? How would the the language model and the computing model interact?

    Maybe they haven’t figured that out yet?

    1 vote
    1. archevel
      Link Parent
      It will probably be more efficient to generate the code and compile and execute it (or interpret it). Rather than running it in the LLM, but I've no hard evidence for that. I mentioned in the...

      It will probably be more efficient to generate the code and compile and execute it (or interpret it). Rather than running it in the LLM, but I've no hard evidence for that. I mentioned in the weekly tech project thread that I've been experimenting with Recursive Language Models. Essentially it is instructed to generate python code that gets executed in a REPL (statefull) that mediated the communication with the user and the model can decide on next steps based on the REPL output. The approach in the article seems to skip all observability, but I might be wrong.