8 votes

Flattening ASTs and other compiler data structures

Posted July 2, 2023 by skybrian

Tags: compilers, data structures, memory allocation, abstract syntax trees, arena allocation, interpreters

https://www.cs.cornell.edu/~asampson/blog/flattening.html

Link information

This data is scraped automatically and may be incorrect.

Word count: 2836 words

3 comments

[2]
isopod
July 2, 2023
Link
As I was reading, I was thinking, "Why not just use bytecode?" Bytecode would be contiguous in memory, compact, and optimizable during generation. So, basically most of the benefits that...

As I was reading, I was thinking, "Why not just use bytecode?" Bytecode would be contiguous in memory, compact, and optimizable during generation. So, basically most of the benefits that flattening an AST into an array of nodes would give.

Then the author arrives at the same conclusion at the end. Alas!

Nonetheless, I think that this article makes an interesting point that in simple projects or constrained environments, the approach of storing AST nodes in an array might be a pretty simple way to increase efficiency and locality. Bytecode is a lot of work, and that approach is dead simple. It's an interesting take.

2 votes
1. skybrian (OP)
  July 2, 2023
  Link Parent
  Yes, I think it's a neat way of deriving a "node code" from the AST. It's sort of like generating Forth code. The parser needs to write the nodes in postfix order. An if statement would need to be...
  
  Yes, I think it's a neat way of deriving a "node code" from the AST. It's sort of like generating Forth code. The parser needs to write the nodes in postfix order. An if statement would need to be written out in a similar order as Forth.
  
  It seems like preserving the ability to do AST operations would be useful for debugging. An optimizer could do another pass to "lower" it to a more optimized form.
  
  I didn't see anything in the blog post about handling nodes of varying sizes. If allocation is from an arena with no fixed cell size then you need to refer to things using pointers or byte offsets and it's more like bytecode. If you put different types of objects in different arrays then it's more like an entity component system or a column-oriented database.
  
  What if an array index is wrong? If you use union types then every access does a runtime check. That's fine for an expression since you're going to pattern-match on the type anyway, but the interpreter could go off the rails where it's executing the wrong code.
  
  If it's a byte offset then the interpreter is unsafe code, since it does pointer arithmetic and trusts the result.
  
  1 vote
Moonchild
July 2, 2023 (edited July 2, 2023)
Link
It's far from the first time that somebody has made a laborious, poor, and error-prone reinvention of the optimisations performed automatically by a good garbage collector ... :) (Edit: the more...

It's far from the first time that somebody has made a laborious, poor, and error-prone reinvention of the optimisations performed automatically by a good garbage collector ... :)

(Edit: the more structural defence is the second table here.)

1 vote