14 votes

Executing programs inside transformers with exponentially faster inference

Posted March 13 by indirection

Tags: artificial intelligence, language models.large, computers, author.christos tzamos, source.percepta

https://www.percepta.ai/blog/can-llms-be-computers

Link information

This data is scraped automatically and may be incorrect.

Title: Can LLMs Be Computers? | Percepta
Published: Mar 11 2026

6 comments

[2]
d32
March 13
Link
I think this is silly. The described "workarounds" to the stated problem are not really workarounds, they are solutions. There's nothing more practical than expand the capabilities of the models...

I think this is silly. The described "workarounds" to the stated problem are not really workarounds, they are solutions. There's nothing more practical than expand the capabilities of the models than to give them access to tuned, curated set of tools. Model cannot do the math? Give it calculator. Model cannot do turing complete code execution? Give it entire computer!
Why would you hardcode computer into the model? How will you do updates/changes? How will you handle its access control? To use their analogy, why would you turn a human into frankenmonster cyborg airplane when you can just let him use the airplane?

7 votes
1. Minori
  March 13
  Link Parent
  Compiler bugs that require you to retrain the model...
  
  Compiler bugs that require you to retrain the model...
[2]
skybrian
March 13
Link
This seems like a neat trick, but they don’t discuss the larger implications at all. Can this model read and write English as well or is it just a weird interpreter? How would the the language...

This seems like a neat trick, but they don’t discuss the larger implications at all. Can this model read and write English as well or is it just a weird interpreter? How would the the language model and the computing model interact?

Maybe they haven’t figured that out yet?

4 votes
1. archevel
  March 13
  Link Parent
  It will probably be more efficient to generate the code and compile and execute it (or interpret it). Rather than running it in the LLM, but I've no hard evidence for that. I mentioned in the...
  
  It will probably be more efficient to generate the code and compile and execute it (or interpret it). Rather than running it in the LLM, but I've no hard evidence for that. I mentioned in the weekly tech project thread that I've been experimenting with Recursive Language Models. Essentially it is instructed to generate python code that gets executed in a REPL (statefull) that mediated the communication with the user and the model can decide on next steps based on the REPL output. The approach in the article seems to skip all observability, but I might be wrong.
[2]
indirection (OP)
March 13
Link
For a while I've dreamed of something like this, but with my basic knowledge of LLMs I didn't know it was feasible. Despite this, I found the article very readable (casual tone, explains things,...

TL;DR

Language models can solve tough math problems at research grade but struggle on simple computational tasks that involve reasoning over many steps and long context. Even multiplying two numbers or solving small Sudokus is nearly impossible unless they rely on external tools.

But what does it take for an LLM itself to be as reliable and efficient as a computer?

We answer this by literally building a computer inside a transformer. We turn arbitrary C code into tokens that the model itself can execute reliably for millions of steps in seconds.

The model does not call an external tool. Instead, it executes the program directly via its transformer weights, producing an execution trace token by token and streaming results at more than 30k tokens/sec on a CPU.

The key technical idea is a new decoding path for execution traces that turns the model's attention lookups from linear scans into queries that take logarithmic time, enabling millions of correct execution steps inside a single transformer run.

For a while I've dreamed of something like this, but with my basic knowledge of LLMs I didn't know it was feasible. Despite this, I found the article very readable (casual tone, explains things, and lots of interactive visuals).

Can the model learn to do pruned symbolic execution? Can it learn to explain its execution trace in English? I imagine this can make LLMs great debuggers, who use it to run code while "understanding" what is happening enough to know exactly where execution deviates. Maybe this can also help improve the LLM's world model, if during inference the LLM runs a world simulation; or if the LLM can run programs with ambiguous "natural" parts, itself can become a world simulation (or any other program with seamless ML integration).

3 votes
1. Minori
  March 13
  Link Parent
  Unless you mean that the world is code, I think this is limited to things we can compile. Consistent symbolic interpretation is a huge advance, but it seems like it's still slower and less...
  
  world model
  
  Unless you mean that the world is code, I think this is limited to things we can compile. Consistent symbolic interpretation is a huge advance, but it seems like it's still slower and less scalable than the current external tool approach.

Link information

TL;DR