7 votes

Zig, parser combinators - and why they're awesome

4 comments

  1. [4]
    vektor
    Link
    Just briefly scrolling through this, I can't seem to find where it answers my perpetual question when it comes to parsers: How do I actually get the result? I usually end up writing my own,...

    Just briefly scrolling through this, I can't seem to find where it answers my perpetual question when it comes to parsers: How do I actually get the result? I usually end up writing my own, because I can't figure out how [hot parser library] gives me the resulting values. I don't care much whether this is malformed JSON, I want to find out what's in it.

    1 vote
    1. [3]
      Thra11
      Link Parent
      So as I understand it, this bit shows it actually being used: var p = &one_of.parser; var result = try p.parse(allocator, &reader); The idea is that they've combined some basic building blocks (A...

      So as I understand it, this bit shows it actually being used:

      var p = &one_of.parser;
      var result = try p.parse(allocator, &reader);
      

      The idea is that they've combined some basic building blocks (A Parser, Literal, which parses string literals, A parser combinator, OneOf, which takes a list of Parsers and tries them each in turn until one of them succeeds) to get a more complex Parser which tries to parse one of either "dog", "cat" or "sheep". This is var p. To get the result, they call parse. It takes:

      • an Allocator, which is largely irrelevant as far as parser combinators are concerned. It's just there because assuming success, parse needs to be able to allocate somewhere to put the result.
      • a Reader. This is the real input. It's any type that can provide a sequence of bytes through the Reader interface.
        and returns a result, which in this case is either null or a string, depending whether the input given was one of the strings it was looking for.

      I didn't find this article particularly easy to follow myself. I assumed that was because while I've met parser combinators before (mostly in Haskell's Parsec and friends), I'm really new to zig. Also, parser combinators get more interesting when you have more different sorts to chain together. Then they can be a quite elegant at expressing things like, "Skip a {, followed by zeroOrMore whitespace, then parse a literal which is oneOf these keys, skip a :, skip zeroOrMore whitespace, parse a something, then skip more whitespace, followed by }". (Such a parser should be able to take input like { knownKey: someValueOrOther } and return a list or tuple containing (knownKey, someValueOrOther), and because it is itself a parser, you can combine it with parser combinators to e.g. parse "at least one such expression").

      I feel like this particular article isn't particularly good as an introduction to parser combinators. It reads more as someone who's fairly familiar with parser combinators and is excited to implement them in zig. A better introduction to parser combinators in zig would require them to implement some more building blocks first so as to provide some more engaging examples.

      2 votes
      1. [2]
        vektor
        Link Parent
        Thanks for the detailed explanation! Oh, I have met parser combinators before, and they are a neat tool, absolutely. I wasn't so much confused about the modalities of building one, but more about...

        Thanks for the detailed explanation!

        Oh, I have met parser combinators before, and they are a neat tool, absolutely. I wasn't so much confused about the modalities of building one, but more about how to use this parser - or any parser, really - to extract a nugget of information out of a string of some kind of formal language. Maybe this is because all of my interactions with parsers and regexes are always so touch and go: Use it for this one case you need them, build it, test it, forget it. See you in half a year. The know-how just won't stick. Talking about regexes, maybe that contributes to my confusion, because what you usually do with them is just match them. Regexes are in a way as modular as parser combinators (just they did it decades earlier), but I'm not even sure you can use them to yield results that you're interested in.

        2 votes
        1. skybrian
          Link Parent
          There are a couple of different ways to go, depending on what you want the output to look like. Sometimes you're building something like an abstract syntax tree, in which case each parse function...

          There are a couple of different ways to go, depending on what you want the output to look like.

          Sometimes you're building something like an abstract syntax tree, in which case each parse function can construct and return the AST node for the part of the tree that it parsed.

          Alternatively, you are building some kind of table. In this case the table gets passed in as an argument, or in an object-oriented language, it's typically a field in the class that all the parse methods belong to. Each parse method adds entries to the table as it parses. This is how a list of errors can be collected, for example.

          At least, that's how it's done for hand-written parsers in non-functional languages. I assume functional languages have their own tricks.