8
votes
Anybody have experience writing a scripting language?
Hello, I've had two ideas for games I find pretty compelling, and both would require different custom scripting languages. Unfortunately, I don't have any formal CS education or experience with parsing or scripting languages. So, I'm feeling a bit lost and thinking it'd be a herculean effort, is that correct?
Has anyone here written their own language or DSL and have any insights, resources, or starting points to share?
Yes, I've written toy parsers before. I think it's a lot of fun.
So, there's two main paths you can take. The first is to use a parser generator, the other is to handwrite it. They have their pros and cons - in fact, I think there's a bit of a "hump" in terms of difficulty between the two methods.
A parser generator will get you going very quickly. With a parser generator, you tell the library how the grammar works, and it'll make the parser for you. GCC, for instance, uses a parser generator, and it's also how you generally learn about it in class. However, it's hard to understand what the parse generator is actually doing, and it's also hard to give it good error messages. Additionally, the more you deviate from the happy path
(may be too deep of a rabbit hole but canonically the model of grammar you learn in class is called Context Free Grammars, or CFGs, and it turns out, in reality, most programming languages aren't CFGs, because context matters, and you have to do some hacking to make them work in parser generators)
However, you can also handwrite it. This has gained massive popularity recently, especially since RAM is plentiful now. Many modern languages like Rust and Swift use handwritten parsers. You'd be writing what's called a recursive descent parser - and it's honestly shocking how intuitive it can be. You get way more power, you know exactly how the parser works, and you can handwrite the error messages (which is the source of Rust's famous compiler errors).
(By "handwriting", that is to say, you write the parser with if statements, for loops, recursion, and the usual programming constructs to do the parsing).
If I were starting from scratch, I'd personally do a handwritten parser. The complete control is just very nice to have.
For resources, this is an excellent book where the author goes step by step to writing a handwritten recursive descent parser for their toy language: https://craftinginterpreters.com/parsing-expressions.html
For parser generators, the new hotness are packrat parsers. Some examples are https://pegjs.org/, https://github.com/yhirose/cpp-peglib. The classic for tabular parser generators is Bison.
Of course, that only gets to parsing. Execution is another step (although it tends to be pretty specific to what the language you're writing wants to do, so it's hard to give exact advice).
If you're writing for this a game engine, then this is where you'd start hooking things up to said game engine's primitives.
It's not a trivial task, but honestly you'd be surprised at how not hard it can be, especially if you limit your scope and define your grammar in a way that makes it easy to parse (this gets into another rabbithole, that of LL(k) and LR grammars)
Thank you stu2b50 this is a great task and seems really interesting. I'll definitely try out the handwritten approach as it's a bit of a learning opportunity.
I would strongly recommend Crafting Interpreters for learning how to write a programming language; though there are physical and ebook versions, the web version is available for free.
While the language it builds may not match what you need, the general structure and strategies used should apply to any scripting language. The first part of the book implements an interpreter for a reasonably complex language in Java, and the second part implements a more efficient bytecode compiler and virtual machine for that language in C.
Unless the second part sounds interesting, I'd recommend just following through the first part of the book, or even just reading the first part and using the relevant parts as a reference for implementing your own language. The performance benefits of the bytecode virtual machine is not worth the increased complexity of the design for this use case, in my opinion.
I've developed some small games and work as an SDET during the day, but a lot of my work is high level python cloud service stuff. There are probably other options than a custom scripting language but it's also something I'm interested in. The two use cases are...
I have not written a scripting language with it, but I have played with writing DSLs with Lark. It lets you relatively painlessly write grammars in an EBNF-ish format, and has tools to build parsers from your grammars. It also has tools to help visualize your parse trees with GraphViz (useful for debugging and developing your grammars).
There is a tutorial here for writing a LOGO-like language with Lark.
I have often tried writing scripting languages but never finished them. I did finish a reimplementation of a scripting language by porting Wren from C to Go. Wren is a little-used scripting language written by Bob Nystrom, who also wrote Crafting Interpreters, which I highly recommend. (Though a weakness is that it has nothing to say about implementing statically typed languages; no book can cover everything!)
It's pretty fun, but the problem is that the world doesn't need another scripting language, and it's kind of a lonely project. Also, designing a new language is harder than reimplementing an existing one. In particular, it would be hard to invent a language that's so good that I'm willing to live without nice tools.
Here are some language concepts that I think might be interesting to explore:
But it's probably better to look for smaller tasks that are about augmenting existing languages, rather than creating new ones.
Consider embedding or otherwise using an existing scripting language. I hear Lua is often used.