Writing a Language

6 minute read

In this tutorial I’m gonna show you how to write a very simple programming language called Ygor. The language itself is just a placeholder for what I really want to show, which is how to get started with language development in Racket.

The Ygor language definition

Ygor has only one type: integers. There is only one function sum. The language won’t have any syntax. You will input the abstract syntax tree directly.

Here is what Ygor will look like:

#lang ygor

(sum (const 5) (const 6))

Racket introduction

To develop Ygor, we’re gonna use racket. Racket has a very interesting framework for developing custom languages that extends on the power of lisp macros.

Download racket and add the bin folder to your PATH. That should give you access racket, drracket, raco and other useful stuff.

Even though you can use any editor, I seriously recommend DrRacket, which comes with the racket distribution.

In DrRacket you’ll see an editor on top, and the REPL on the bottom. Let’s try a simple program:

#lang racket

(define (square x) (* x x))

(define value 4)

(square value)

Racket is a lisp dialect based on scheme. There, end of introduction. The racket website has loads of very good documentation. Have fun.

Creating the initial project

We’ll start by creating a directory for the language:

$ cd ~/development
$ mkdir ygor
$ raco link ygor

The last command will link the ygor directory to the racket collections, making it perfect for development. To test it, let’s try requiring a sample file from that module. Create a file ygor/hello.rkt with the following content:

#lang racket
"Polka will never die."

And test it like this (in DrRacket):

(require ygor/hello)

We’re ready to start.

The hard way

This tutorial will be mostly backwards, comparing to other racket language tutorials you’ll find. Instead of beginning by defining the lexer and parser (or even grammar), we’ll start with how to tell racket to treat ygor as a language. I find this approach much more pratical. Later you can choose to focus on any of the steps with the appropiate depth. There is a great tutorial on how to implement brainf*ck with racket.

In DrRacket, try the following code:

#lang ygor

And hit run. You should get something like this:

Module Language: invalid module text
standard-module-name-resolver: collection not found
  collection: "ygor/lang"
  in collection directories:
   /Users/juanibiapina/Library/Racket/5.3.6/collects
   /Applications/Racket v5.3.6/collects
   sub-collection: "lang"
  in parent directories:
   /Volumes/development/ygor

That means racket is looking for a lang collection inside ygor. Let’s make one:

$ mkdir lang

And running again:

Module Language: invalid module text
. . ../../Applications/Racket v5.3.6/collects/mred/private/snipfile.rkt:324:2: open-input-file: cannot open input file
  path: /Volumes/development/ygor/lang/reader.rkt
  system error: No such file or directory; errno=2

So racket is looking for a reader.rkt file. Go ahead and create one. We’ll use an module reader, so add the following lines to reader.rkt:

#lang s-exp syntax/module-reader
ygor

The second line there tells racket to look for a file main.rkt inside the ygor collection. This file contains a module that provides all the top level bindings that will build the language. Let’s provide some initial content in main.rkt:

#lang racket

Let’s go back to our example and try to run a fake ygor program:

#lang ygor

(hello?)

You should get something like this:

module: no #%module-begin binding in the module's language

Let’s just provide #%module-begin for now, we’ll get back to it later (in main.rkt):

(provide #%module-begin)

Trying again:

hello?: unbound identifier;
 also, no #%app syntax transformer is bound in: hello?

Interactions disabled: ygor does not support a REPL (no #%top-interaction)

It tells you hello? is not defined. Let’s ignore the other errors for now.

In order to define what hello? is, we need to provide this definition. Add these two lines (in main.rkt):

(provide hello?)

(define-syntax-rule (hello?)
  (print "hello to you too!"))

And try running again. It should print “hello to you too!” to standard output. This is your first working version of a language that says hello. No kidding.

The REPL

The previous error message said something about ygor not supporting a REPL. A simple way to get it going is to just provide #%top-interaction straight from racket. Add this line to main.rkt:

(provide #%top-interaction)

Now if you run an ygor program from DrRacket, you get a REPL. From now on you can test all the examples directly there.

const

The next step is to allow the user to write ygor programs in the form of an abstract syntax tree. That means there won’t be any program “text” to parse. The user diretly inputs the syntax tree that will be evaluated. So let’s write a simple program:

#lang ygor

(const 5)

If you run this, you’ll get “const: unbound identifier;”. Let’s define const. We’ll create syntactic forms as structs in racket: (in main.rkt)

(provide const)
(struct const (v) #:transparent)

Constants will be represented as structs that hold a value v. We also export this struct with the provide clause. Running again:

const1: function application is not allowed;
 no #%app syntax transformer is bound in: (const1 5)

For racket to understand function applications (in this case const is a function that takes one argument and returns a struct), the #%app function must be defined. Let’s bring it from racket (in main.rkt):

(provide #%app)

Running again:

?: literal data is not allowed;
 no #%datum syntax transformer is bound in: 5

Same deal for literal data. Racket needs the #%datum function in order to understand literal data. Let’s provide it (in main.rkt):

(provide #%datum)

And running again, you can see it returns itself.

sum

Let’s add a sum function. First let’s sketch the syntax tree for a sum:

#lang ygor

(sum (const 42) (const 1))

We’ll need to define what sum is (in main.rkt):

(provide sum)
(struct sum (e1 e2) #:transparent)

sum is a struct that hold two other expressions. Run again (or try in the REPL):

(sum (const 1) (const 2))

Which returns itself, of course.

So at this point, we can type the AST of a Ygor program, and it will evaluate to itself. How can we make Ygor programs runnable?

eval

Let’s define a function to evaluate an Ygor program. We’ll call it ygor-eval (in main.rkt):

(provide ygor-eval)
(define (ygor-eval e)
  (match e
    [(const x) (const x)]
    [(sum e1 e2) (const (+ (const-v (ygor-eval e1)) (const-v (ygor-eval e2))))]))

In order to evaluate an expression e, we match this expression against the two possible cases in Ygor:

  1. e is a const with value x: returns itself
  2. e is a sum of two other expressions: return the sum (racket +) of the result of recursively evaluating both expressions (assuming they evaluate to const).

Try running this code now:

#lang ygor

(ygor-eval (sum (const 42) (const 1)))

And you should get (const 43).

Hooking up eval

We wouldn’t like to write every line in Ygor prefixed with ygor-eval. Let’s add a hook to automatically wrap every expression with ygor-eval. To do that, we’ll overwrite #%module-begin, which is a function that is automatically added by racket wrapping the body of a module, which is very convenient (in main.rkt):

(define-syntax (ygor-module-begin stx)
  (datum->syntax
     stx
     (cons (quote-syntax #%module-begin)
           (map (lambda (e)
                  (list (quote-syntax ygor-eval)
                        e))
                (cdr (syntax-e stx))))
     stx
     stx))

Remember how before we just provided #%module-begin from racket? Let’s replace the provided #%module-begin with our own overwritten version, defined above (in main.rkt):

(provide (rename-out [ygor-module-begin #%module-begin]))

The workings of ygor-module-begin are not very interesting to our purposes right now, but the idea is basically this: wrap every statement in the module body with ygor-eval. You can test now that any Ygor programs you run will automatically eval (unless you type it in the REPL, in which case it will still just print the AST, because we haven’t changed how the REPL works).

Becoming useful

There are a few things I’ve done in this tutorial you wouldn’t have actually done when writing your own language. On the other hand, this setup is the simplest possible one I could find that easily integrates into MUPL, the language you write for the Programming Languages course on coursera, which I seriously recommend every programmer to complete.

From this setup, you can replace the struct definitions I have given with the ones from the course and replace ygor-eval with eval-exp, from one of the course exercises.

The full code can be found on github. Have fun.

Comments