23 votes

Programming Challenge: Build an Interpreter

Hello everyone! It has been a while since last programming challenge, it's time for another one!

This week's goal would be to build your own interpreter.

Interpreter is program that receives input and executes it. For example Python is interpreted language, meaning you are actually writing instructions for the interpreter, which does the magic.

Probably the easiest interpereter to write is Brainfuck interpreter. If someone here doesn't know, Brainfuck is programming language, which contains following instructions: ,.<>[]-+. Other characters are ignored. It has memory in form of array of integers. At the start, pointer that points to one specific memory cell points to cell 0. We can use < to move pointer to left (decrement) and > to move pointer to right (increment). . can be used to print value of cell the pointer is currently pointing to (ascii). , can be used to read one character from stdin and write it to memory. [ is beggining of loop and ] is end of loop. Loops can be nested. Loop is terminated when we reach ] character and current value in memory is equal to 0. - can be used to decrement value in memory by 1 and + can be used to increment value in memory by 1. Here's Hello World:

++++++++++[>+++++++>++++++++++>+++>+<<<<
-]>++.>+.+++++++..+++.>++.<<++++++++++++
+++.>.+++.------.--------.>+.>.

People with nothing to do today can attemp to make an interpreter for the Taxi programming language.

You can even make your own language! There are no limits for this challenge.

6 comments

  1. s4b3r6
    Link
    Okay, this is a not-quite Brainfuck language. It's unoptimised, because I span it up in ~15mins. To prove you can. It takes input from the first commandline argument. I'm pretty sure it's portable...

    Okay, this is a not-quite Brainfuck language.

    It's unoptimised, because I span it up in ~15mins. To prove you can.

    It takes input from the first commandline argument.

    I'm pretty sure it's portable C, but probably ISO, not ANSI.

    Plenty of style problems, like using int over size_t.

    I also expect it has a couple bugs.

    #include <stdio.h>
    #include <string.h>
    
    // Set up our stack to be operated on.
    #define STACK_LENGTH 3000
    int stack[STACK_LENGTH] = {0};
    int pointer = 0;
    
    // Increment, but stay in ASCII range.
    void inc() {
      if(stack[pointer] < 255)
        stack[pointer]++;
    }
    
    // Decrement, but stay positive.
    void dec() {
      if(stack[pointer] > 0)
        stack[pointer]--;
    }
    
    // Move unless we're at the end of the stack.
    void next() {
      if(pointer < STACK_LENGTH - 1)
        pointer++;
    }
    
    // Move unless we're at the top of the stack.
    void prev() {
      if(pointer > 0)
        pointer--;
    }
    
    void display() {
      // Woot, easy.
      printf("%c", stack[pointer]);
      // Make sure it prints.
      fflush(stdout);
    }
    
    // Read a single character.
    void readch() {
      int c = getchar();
      if(c != EOF)
        stack[pointer] = c;
        // Clear input buffer on some systems.
        while((c=getchar()!=EOF)){}
    }
    
    // The guts of our interpreter.
    void parse(char* program) {
      // No for loop, we need to be able to control
      // our position for our language to handle
      // looping.
      int pos = 0;
      int length = strlen(program);
    
      // Handle the current position in code.
      // Simple case switch because single char values.
      // Invalid characters are simply ignored.
      while(pos < length) {
        switch(program[pos]) {
          case('+'):
            inc();
            pos++;
            break;
          case('-'):
            dec();
            pos++;
            break;
          case('<'):
            prev();
            pos++;
            break;
          case('>'):
            next();
            pos++;
            break;
          case('.'):
            display();
            pos++;
            break;
          case(','):
            readch();
            pos++;
            break;
          case(']'):
            if(stack[pointer] != 0)
              while(program[pos] != '[') {
                pos--;
              }
            break;
        }
      }
    }
    
    int main(int argc, char* argv[]) {
      // Read the program if it works.
      if(argc > 1)
        parse(argv[1]);
      else
        // Complain the user didn't give a program.
        printf("%s\n", "Please provide a program.");
    }
    

    Every character is either a command, or ignored.

    • + Increments the current value of the stack cell. It won't increase beyond 255.
    • - Decrements the current value of the stack cell. It won't decrease beyond 0.
    • . Prints the current cell value as an ASCII character.
    • , Reads a single character from stdin, and replaces the cell with it's value. Characters with values greater than 255 are undefined behaviour.
    • [ Marks the beginning of a loop.
    • ] If the current cell is not 0, jump backwards to the previous [. Otherwise, continue processing.
    6 votes
  2. [3]
    Comment deleted by author
    Link
    1. Soptik
      Link Parent
      If you're right about the cause of this issue, wikipedia code probably isn't actually correct. What happens if you move pointer below zero is undefined, so you can do whatever you want in this...

      If you're right about the cause of this issue, wikipedia code probably isn't actually correct. What happens if you move pointer below zero is undefined, so you can do whatever you want in this situation.

      When the pointer moves outside the bounds of the array, some implementations will give an error message, some will try to extend the array dynamically, some will not notice and will produce undefined behavior, and a few will move the pointer to the opposite end of the array.

      (Wikipedia, Brainfuck)

      1 vote
    2. Crestwave
      (edited )
      Link Parent
      The Wikipedia example is not supposed to decrement the pointer below 0. It looks like you're just going to the first opposite bracket character you match without taking nesting into account, which...

      The Wikipedia example is not supposed to decrement the pointer below 0. It looks like you're just going to the first opposite bracket character you match without taking nesting into account, which the example does use.

      1 vote
  3. teaearlgraycold
    Link
    It might be fun to make a JIT compiler. Should be pretty easy considering the simplicity of the language.

    It might be fun to make a JIT compiler. Should be pretty easy considering the simplicity of the language.

    1 vote
  4. unknown user
    Link
    I have this if it counts (see below). I wrote it a couple years ago just for fun. /* optfuck.c --- Brainfuck with input as command line arguments. Copyright (C) 2016, 2017 Göktuğ Kayaalp <self...

    I have this if it counts (see below). I wrote it a couple years ago just for fun.

    /* optfuck.c --- Brainfuck with input as command line arguments.
    
       Copyright (C) 2016, 2017 Göktuğ Kayaalp <self |at| gkayaalp {dot} com>.
    
       Permission to use, copy, modify, and/or distribute this software
       for any purpose with or without fee is hereby granted, provided
       that the above copyright notice and this permission notice appear
       in all copies.
    
       THE SOFTWARE IS PROVIDED "AS IS" AND THE AUTHOR DISCLAIMS ALL
       WARRANTIES WITH REGARD TO THIS SOFTWARE INCLUDING ALL IMPLIED
       WARRANTIES OF MERCHANTABILITY AND FITNESS. IN NO EVENT SHALL THE
       AUTHOR BE LIABLE FOR ANY SPECIAL, DIRECT, INDIRECT, OR
       CONSEQUENTIAL DAMAGES OR ANY DAMAGES WHATSOEVER RESULTING FROM LOSS
       OF USE, DATA OR PROFITS, WHETHER IN AN ACTION OF CONTRACT,
       NEGLIGENCE OR OTHER TORTIOUS ACTION, ARISING OUT OF OR IN
       CONNECTION WITH THE USE OR PERFORMANCE OF THIS SOFTWARE.
    
       Hello world example:
       $ cc -std=c99 -o of optfuck.c
       $ ./of -B -B -B -B -B -B -B -B -j -D -B -B -B -B -j\
              -D -B -B -D -B -B -B -D -B -B -B -D -B -d -d\
              -d -d -b -J -D -B -D -B -D -b -D -D -B -j -d\
              -J -d -b -J -D -D -o -D -b -b -b -o -B -B -B\
              -B -B -B -B -o -o -B -B -B -o -D -D -o -d -b\
              -o -d -o -B -B -B -o -b -b -b -b -b -b -o -b\
              -b -b -b -b -b -b -b -o -D -D -B -o -D -B -B -o 
    
       The interpreter is created with help from
       <https://github.com/kgabis/brainfuck-c/>.
    
       Though I used a 4-byte integer as the representation for the
       instruction.  The higher 3 bits represent the instruction, (see op*
       macros), and the rest is the argument to it.
    
       Options help:
       	-D increment data pointer       -d decrement data pointer
    	-B incr byte at data pointer    -b decr byte at data pointer
    	-o write byte at data pointer   -g read byte from stdin
    	-j if the byte at data pointer is 0, jump after matching
    	   `-J'
    	-J if the byte at data pointer is nonzero, jump after
               matching `-j'
    
       Lots of options with a single `-' does not work.  Because I didn't
       bother.
     */
    
    #include<stdio.h>
    #include<stdint.h>
    #include<strings.h>
    
    #define MEMSIZ (4096)
    #define PRGSIZ (4096)
    #define STKSIZ (4096)
    unsigned short S[STKSIZ]; /* The stack. */
    unsigned int P = 0;	  /* The stack pointer. */
    #define PUS(x) (S[P++]=x) /* Push to the stack. */
    #define POP()  (S[--P])	  /* Pop the stack. */
    #define EMPT() (P==0)	  /* Stack empty? */
    #define FULL() (P==STKSIZ)	/* Stack full? */
    
    #define die(s) {puts(s);return 1;}
    
    #define opincdp  (0)
    #define opdecdp  (1)
    #define opincval (2)
    #define opdecval (3)
    #define opputval (4)
    #define opgetval (5)
    #define opjmpfwd (6)
    #define opjmpbck (7)
    
    #define opshift (29)
    #define opmask(x) (x<<opshift)
    
    int main(int argc, char **argv)
    {
    	if(1==argc){ printf(
    	"Options help:\n"
    	"\t-D increment data pointer       -d decrement data pointer\n"
    	"\t-B incr byte at data pointer    -b decr byte at data pointer\n"
    	"\t-o write byte at data pointer   -g read byte from stdin\n"
    	"\t-j if the byte at data pointer is 0, jump after matching\n"
    	"\t   `-J'\n"
    	"\t-J if the byte at data pointer is nonzero, jump after\n"
    	"\t   matching `-j'\n"
    	"\nData pointer starts out as 0 (zero), and the byte array is\n"
    	"bzeroed.  The program is limited to %d instructions, though\n"
    	"probably ARG_MAX would be more restrictive than that.\n"
    	"Negative values and negative data pointer is not allowed.\n\n"
    	"Copyright (C) Göktuğ Kayaalp, ISC Licence (see the source).\n",
    	PRGSIZ);
    	}
    
    	uint8_t mem[MEMSIZ];	/* The memory. */
    	uint32_t prg[PRGSIZ];	/* The program. */
    	uint32_t dp, jp, i; 	/* Data pointer, jump pointer. */
    	dp = 0; bzero(mem,MEMSIZ); bzero(prg,PRGSIZ); bzero(S,STKSIZ);
    
    	/* Compile */
    	if(argc>PRGSIZ) die("input bigger than max prog size!")
    	for(i=0;i<(argc-1);i++){
    		switch(argv[i+1][1]){
    		case 'D': prg[i]=(opincdp<<opshift);  break;
    		case 'd': prg[i]=(opdecdp<<opshift);  break;
    		case 'B': prg[i]=(opincval<<opshift); break;
    		case 'b': prg[i]=(opdecval<<opshift); break;
    		case 'o': prg[i]=(opputval<<opshift); break;
    		case 'g': prg[i]=(opgetval<<opshift); break;
    		case 'j':
    			if(FULL()) die("-j: stack full");
    			PUS(i);
    			prg[i]=(opjmpfwd<<opshift);
    			break;
    		case 'J':
    			if(EMPT()) die("unmatched -J");
    			jp = POP();
    			prg[i]=(opjmpbck<<opshift) | jp;
    			prg[jp]|=i;
    			break;
    		default: die("syntax error");
    		}
    	}
    	//return 0;
    	if(!EMPT()) die("syntax error: unbalanced -j/-J");
    
    	/* Eval */
    	for(i=0;i<(argc-1);i++){
    		switch(prg[i]>>opshift){
    		case opincdp:
    			dp++;
    			if(dp>PRGSIZ) die("segfault");
    			break;
    		case opdecdp:
    			if(dp==0) die("segfault");
    			dp--;
    			break;
    		case opincval:
    			if(mem[dp]==UINT8_MAX) die("data out of range");
    			mem[dp]++;
    			break;
    		case opdecval:
    			if(mem[dp]==0) die("data out of range");
    			mem[dp]--;
    			break;
    		case opputval: putchar(mem[dp]); break;
    		case opgetval: mem[dp]=(uint32_t)getchar(); break;
    		case opjmpfwd:
    			if(!mem[dp])i=(prg[i])^(opmask(opjmpfwd));
    			break;
    		case opjmpbck:
    			if(mem[dp]) i=(prg[i])^(opmask(opjmpbck));
    			break;
    		default: die("unrecognised bytecode!");
    		}
    	}
    
    	return 0;
    }
    

    Edit: remove ^L chars.

    1 vote
  5. Crestwave
    Link
    Yet another brainfuck interpreter... but in pure sh! It actually performs pretty well on fast implementations like dash, outperforming my Bash implementation (you can find it and more here) in...

    Yet another brainfuck interpreter... but in pure sh! It actually performs pretty well on fast implementations like dash, outperforming my Bash implementation (you can find it and more here) in certain areas. Info:

    • It is 100% POSIX compliant
      • While [, printf and read are not mandatory POSIX built-ins, they are required external utilities
    • It reads the program line by line from a file named in the first argument
      • If no arguments are passed, it reads from standard input
    • Cells wrap on 8-bit overflow and underflow
    • The cell is left unchanged on EOF
    • Negative pointers are allowed
    • Input is line buffered
    • Output is ASCII-encoded
    #!/bin/sh
    
    IFS=' '
    n='
    '
    val=0
    
    if [ "$#" -gt 0 ]; then
    	while read -r line; do
    		prog=$prog$line
    	done < "$1"
    else
    	while read -r line; do
    		prog=$prog$line
    	done
    fi
    
    while :; do
    	case $prog in
    		'>'*)
    			set -- ${tape_right:-0}
    			tape_left="$val $tape_left"
    			val=$1
    			shift
    			tape_right=$*
    			;;
    		'<'*)
    			set -- ${tape_left:-0}
    			tape_right="$val $tape_right"
    			val=$1
    			shift
    			tape_left=$*
    			;;
    		'+'*)
    			val=$(( val + 1 ))
    			[ "$val" -gt 255 ] && val=0
    			;;
    		'-'*)
    			val=$(( val - 1 ))
    			[ "$val" -lt 0 ] && val=255
    			;;
    		'.'*)
    			printf "\\$(printf %o "$val")"
    			;;
    		','*)
    			[ -z "$input" ] && read -r input && input=$input$n
    
    			if [ -n "$input" ]; then
    				val=$(
    					LC_CTYPE=C
    					printf %d "'${input%"${input#?}"}"
    				)
    				input=${input#?}
    			fi
    			;;
    		'['*)
    			if [ "$val" -eq 0 ]; then
    				depth=1
    				while :; do
    					bak=${prog%"${prog#[]><+-.,[]}"}$bak
    					prog=${prog#?}
    					case $prog in
    						'['*) depth=$(( depth + 1 )) ;;
    						']'*) depth=$(( depth - 1 )) ;;
    					esac
    					[ "$depth" -le 0 ] && break
    				done
    			fi
    			;;
    		']'*)
    			if [ "$val" -ne 0 ]; then
    				depth=1
    				while :; do
    					case $bak in
    						'['*) depth=$(( depth - 1 )) ;;
    						']'*) depth=$(( depth + 1 )) ;;
    					esac
    					prog=${bak%"${bak#?}"}$prog
    					bak=${bak#?}
    					[ "$depth" -le 0 ] && break
    				done
    			fi
    			;;
    		'')
    			break
    			;;
    	esac
    	bak=${prog%"${prog#[]><+-.,[]}"}$bak
    	prog=${prog#?}
    done
    
    1 vote