Build a Wasm Compiler in Roc

My latest hair-brained project is a WAT-to-Wasm compiler written in the Roc programming language. I explained my (ir)rationale for the project in Part 1 of this series, which also included an introduction to the technologies we’ll be using.

Note: Other articles in this series are collected here.

In this article, we’ll get started writing some Roc code. We won’t get to the point where we are doing anything with WAT or Wasm, yet, but we will be able to load an input file and parse some command line arguments.

Reminder: You are reading content that took a great deal of effort to craft, compose, and debug. If you appreciate this work, consider supporting me on Patreon or GitHub.

Let’s Write Some Roc

I have no idea if Roc is suitable for this project. I started it because I wanted to write some Roc code. The WAT to Wasm project was just an idea that popped into my head to do that. I just wanted to try out Roc.

In their own words, Roc is a fast, friendly, functional language. They have delightful definitions of those words on their homepage, and their tutorial is one of the most pleasant reads I’ve come across in the many programming language tours I have travelled. I am not necessarily going to explain every piece of Roc syntax that we encounter, so, as I suggested in part 1, you may want to read through that tutorial before continuing here.

A basic “Hello world” in Roc looks like this:

app [main] { pf: platform "https://github.com/roc-lang/basic-cli/releases/download/0.12.0/Lb8EgiejTUzbggO2HVVuPJFkwvvsfW6LojkLR20kTVE.tar.br" }

import pf.Stdout

main =
    Stdout.line! "Hello World!"

The first line tells Roc that we want to run our functional code within a specific version of the “basic-cli” platform. This platform is a runtime that knows how to read and write from standard out and the file system, among other things. Roc compiles your Roc code to machine code and merges it with an arbitrary platform to create a single binary. There is an interesting symmetry with Wasm itself, where you need some kind of host to compile and run the Wasm code and to manage system side effects.

In addition to supplying a runtime, the platform includes some side-effect causing code that we will need to access. The first line gives our platform an alias of pf and the import statement imports the Stdout module from pf. This module contains a function named line, which returns a Task. A Roc Task is kind of like a promise, future, coroutine, or task in many other language. The ! in the Stdout.line call is syntactic sugar for “await this task”.

If this has you concerned about the colored function problem, don’t worry. There’s only one way to access a Roc task (at least, within any one platform), and the blocking vs non-blocking question doesn’t come up.

The main = task is known in Roc as a definition. It assigns the results of an expression to a name (main in this case). The basic-cli platform expects a main def to be a task, and this becomes the main entry point for the Roc program.

To run the above code, save the file as main.roc and use the command roc dev. This will build the file in dev mode and run it.

Before we get into the details of actually compiling WAT code, we need to load that code into our program. To do that, we’ll need to accept a single filename and read that file from the file system. Once we have the file contents as a string, we can pass that to our yet-to-be-written compiler logic and do everything in a purely functional style. Eventually we’ll get the compiled Wasm bytes back and need to output those to a file.

CLI Arguments

Let’s start with the CLI arguments. The basic-CLI platform gives us a function that returns a Task, as described here.

To start, let’s ask the platform to output the list of arguments the user passed. We can use the list function from the Args module like this:


import pf.Task
import pf.Arg

main =
    args = Arg.list! {}

    dbg args

    Task.ok {}

Note: I’ll leave the “app [main]” line that imports the platform off this and future examples.

This code makes an args definition by calling the Arg.list! task exposed by the platform. We have to pass an empty record to this function, which is what the {} is on that line.

Then we use the dbg builtin to output a string representation of the arguments on the command line. dbg is the only (roc) platform-independent way to view an object on stdout, and it is stripped from production builds.

Finally, we set the “result” of this main definition to the output of the Task.ok {} function. The platform expects the main definition to always return a Task. In this case, there’s no failure case, so we just return a Task with an empty ok value. Introspecting using the LSP, I can see that the type of main is:

Task.Task {} []

The platform documentation defines a Task as Task ok err and it “represents an effect; an interaction with state outside your Roc program, such as the terminal’s standard output, or a file.”

In this case, we are returning a task with an empty record as the ok value, and an empty list of errors. This basically means our program can’t fail.

If I run the above program using e.g. roc dev -- foo, the output looks like this:

[main.roc:11] args = ["/var/folders/y4/j6_s7k615dd1vywjbg4jpg3m0000gn/T/.tmpWqPjHI/roc_app_binary", "foo"]

The first argument in the list is a temporary path to the dev-mode binary Roc built for me, and the “foo” is the single argument I passed after -- (anything before that would be interpreted as an argument to the roc dev subcommand).

Before we start introducing potential errors, a good policy in Roc development is to ensure all errors are properly handled. By default, Roc will bubble up any errors to the platform and it will output some kind of traceback that is probably not what a user wants to see. To catch this behaviour at compile time, we can add a type definition to the main task like this:

main : Task.Task {} [Exit I32 Str]
main =
   # ...the rest of the definition

That new line is adding a type to the definition. It means main will always contain a Task with an empty ok value, and an error value that will always be an exit Tag with an integer error code and a string interpretation.

Note: main is technically not a function. It is a task. It doesn’t accept arguments, and even though it is a multi-line definition, the code in it cannot be “called” other than when the main definition is loaded by the basic-cli platform.

Tags are a fascinating and unique concept in Roc. They are something like algebraic data types, or variadic types in some other languages, but they feel more… “dynamic”. They are statically typed at compile time, but the type inference is so awesome that it feels like writing dynamic code. Soundly typed dynamic code? Sign me up for more!

A tag is a literal value, similar to an Integer or a String. All tags must start with a capital letter; any time you see a capital letter in a value it is a tag (Note that capitals are also used in types).

The difference from most languages I’ve used is that they group themselves implicitly. You don’t have to define a variadic enum like in e.g. Rust:

enum StringResult {
  Ok(String)
  Err(String)
}

Instead, you can just return an Ok or an Err and Roc automatically infers that this is the return type of your function. And it still soundly type-checks everything! You can restrict tags to certain types when it is useful, but it’s not required.

Consider a well-formed main definition that expects exactly one command-line argument:

main : Task.Task {} [Exit I32 Str]
main =
    args = Arg.list! {}

    when args is
        [] | [_] -> Task.err (Exit 1 "No input filename provided")
        [_, _, _, ..] -> Task.err (Exit 2 "Too many arguments")
        [_, filename] ->
            dbg filename

            Task.ok {}

The when cause is an elegantly sparse syntax for pattern matching. The first pattern matches the case when there are zero or one arguments (since the command to run the program should always be the first argument, the former is technically impossible, but we cover it to satisfy the compiler).

The second match arm matches the case where there are three or more arguments. It returns a Task.err constructed by the Task.err function. Notice the parentheses after Task.err. Unlike many languages (but similar to Haskell), these are not defining the parameter list for the Task.err function. They are just used to add precedence. If we didn’t include them, Roc would throw an error because it thinks we are passing two arguments (Exit 2 and Too many arguments) to the Task.err function, as opposed to a single Exit tag with a payload that contains two values.

Now, change the dbg builtin to use a call to Stdout.line to output the argument:

        [_, filename] ->
            Stdout.line! filename

I’ve also removed the Task.ok. Stdout.line! returns a task, so we can almost just return that task from this arm, as shown. However, Roc complains with an interesting error message:

── TYPE MISMATCH in main.roc ───────────────────────────────────────────────────

Something is off with the body of the main definition:

 9│  main : Task.Task {} [Exit I32 Str]
10│  main =
11│      args = Arg.list! {}
                ^^^^^^^^^^^^

This Task.await call produces:

    Task {} [
        Exit (Num *) …,
        StdoutErr [
            BrokenPipe,
            Interrupted,
            Other Str,
            OutOfMemory,
            Unsupported,
            WouldBlock,
            WriteZero,
        ],
    ]

But the type annotation on main says it should be:

    Task {} [Exit I32 …]

────────────────────────────────────────────────────────────────────────────────

This message reflects the fact that the Stdout.line call can potentially fail (though it’s pretty unlikely). The documentation for the function indicates that it returns a Task {} [StdoutErr Err]. Since this is the last expression in the match arm, we are “returning” that potential error from main, and it is complaining because [StdoutErr Err] is not an accepted type for the main definition.

The easiest way to silence this error is to comment out the main : Task.Task {} [Exit I32 Str] type definition and let the platform deal with the output if StdoutErr is returned. But that would allow our main function to return any error type with a Task, making it too easy to forget to handle other errors ourselves.

An alternative would be to add StdOutErr _ to our main type definition like this:

main : Task.Task {} [Exit I32 Str, StdoutErr _]
main =
  # Rest of main

The square brackets in the second type argument to Task.Task indicate that main’s result is now a Tag union. Previously, that Tag union only allowed one type (Exit I32 Str), but now it also accepts StdoutErr _. The _ acts as a wildcard; there are over half a dozen different possible StdoutErr varienties, which show up in their own tag union. In this case, I’m saying “I don’t care what type of StdoutErr you receive, but you should expect any that could arise.

Given that StdoutErr is a fairly unlikely event in normal operation, we might be ok with the default handling if it ever happens in real life. But another alternative is to stick with the single Exit I32 Str error definition, and use the Result.mapErr function when we call Stdout.line, like this:

main : Task.Task {} [Exit I32 Str]
main =
    args = Arg.list! {}

    when args is
        [] | [_] -> Task.err (Exit 1 "No input filename provided")
        [_, _, _, ..] -> Task.err (Exit 2 "Too many arguments")
        [_, filename] ->
            Stdout.line filename |> Task.mapErr \_ -> (Exit 99 "System is failing")!

That Stdout.line line has changed quite a bit. Instead of awaiting the task, we are using the |> operator to “pipe” it to become the first argument in a call to the Task.mapErr function. The second argument is an anonymous function that takes the existing error (which we ignore with _) and returns a new error that obeys our Exit I32 Str type. Finally, we do await this new task (the one that has had its error mapped) by placing the ! at the end of the line.

Reading the file

Now that we have the filename, we can attempt to read its contents into a string. As with Stdout.write, this is a task that the basic-cli platform task does for us. Since it’s messing with the filesystem, it can result in a variety of errors (some of which are more likely than Stdout.write errors).

We’ll need to add an import to the top of our file, first:

import pf.File

Now we can replace the success arm in our main def with one that reads the file. This pipeline handles errors by mapping them to exit codes I made up on the fly:

        [_, filename] ->
            (
                filename
                |> File.readUtf8
                |> Task.mapErr \error ->
                    when error is
                        FileReadErr _ NotFound ->
                            Exit 3 "$(filename) does not exist"

                        FileReadErr _ _ ->
                            Exit 99 "Error reading $(filename)"

                        FileReadUtf8Err _ _ ->
                            Exit 4 "Unable to read UTF8 in $(filename)"
            )!
                |> Stdout.line
                |> Task.mapErr \_ ->
                    (Exit 99 "System is failing")!

Roc’s error messages are sometimes extremely friendly and helpful, and sometimes leave a lot to be desired, so this code took me a lot of time to get right. I knew what layout I wanted, but getting the syntax and compiler to cooperate was about what I would expect from such a young language. The errors that have been mapped to friendly responses are very friendly, though, so I have a high opinion of the dev team’s respect for its users.

There isn’t really anything new here; I set up a pipeline that reads the file contents and outputs an appropriate error if it doesn’t exit. Other errors just get bucketed in a couple “I give up” error codes. The parentheses wrapping the “read” task were the ones that caused so much trouble; I knew I wanted to pipe the result out to Stdout.line, but getting Roc to pick up the ! to await the result at the right time was tricky.

At any rate, I can run roc dev -- hello.wat and see the contents of that file on my terminal now. Let’s also write the code to output our result to a file with a similar name.

Writing the result

Let’s start by throwing together a “dummy” compiler function that just returns an arbitrary list of bytes:

compile : Str -> List U8
compile = \input ->
    dbg input

    Str.toUtf8 "TODO: Compile Input"

The first line is optional and just specifies the intended types for the function. Roc is able to infer all these types on its own, but by specifying it ourselves, we are able to both:

document what the function does
have the Roc compiler validate that the implementation matches our expectations

We can test this function by piping the output of our readUtf8 task to the compile function and outputting it. The tail end of the pipeline in main now looks like this:

                |> compile
                |> Str.fromUtf8
                |> Result.withDefault "Unable to convert UTF-8"
                |> Stdout.line
                |> Task.mapErr \_ ->
                    (Exit 99 "System is failing")!

The first three lines are new; I’m piping the output of the File.readUtf8 task to the compile function, then piping the resulting string to Str.fromUtf8. This operation returns a Result because it can fail (although in our case we know it won’t because we are certain we just wrote valid UTF-8). The Roc standard lib doesn’t have any sort of unwrap functionality, so I passed it to Result.withDefault instead.

I confirmed that it is outputting my dummy string on the console, so the compile function seems to be working. Not correctly, but it’s returning a list of bytes, which is enough for us to write the file.

The next step is to replace the bit that writes the bytes to stdout with something that writes them to a file. I’m liking this pipeline syntax, but the functions for writing bytes to a file take the path to be written to as the first argument. This means we can’t pipe it directly (I wish Roc had a feature to pipe to a specific argument in the function call like Gleam does).

We could just assign the bytes to a variable and write them in a separate step, but instead I’m going to write a new top-level function to handle this:

writeWithWasmExtension : List U8, Str -> Task.Task {} [Exit I32 Str]
writeWithWasmExtension = \bytes, inputFilename ->

    outputPath =
        inputFilename
        |> Path.fromStr
        |> Path.withExtension "wasm"

    outputPath
    |> Path.writeBytes bytes
    |> Task.mapErr \_ -> Exit 5 "Unable to write $(outputPath |> Path.display)"

Note: This code needs a new import pf.Path import at the top of the file.

This code accepts the file contents as the first argument to fit in with our pipeline. The second argument is a string containing the input filename. The return type is a Task.

In the body of the function, I set up two pipelines. The first constructs a new Path from the inputFilename and replaces the extension to get the output path. I capture the output path in a variable so I can reuse it in the Task.mapErr call in the second pipeline. We pipe the path into Path.writeBytes, passing the bytes as the second argument. This returns a task, which I map to a system exit suitable to be returned from main. It is debatable whether this is good design; it means that this function is only useful in the context of main, but it also means the main task doesn’t need to be cluttered with more mapErr calls. Since I don’t intend to use this function anywhere else, I chose to include the mapErr in the function.

The result is that all the code I added to main earlier to call compile and output it to Stdout can now be replaced with just two lines:

                |> compile
                |> writeWithWasmExtension filename

Now here’s the cool part: We just wrote the last side effect for this program! The compile function itself will be 100% functional; given the same input it will always have the same output. We have a lot of Roc code to write, but we’ve isolated the bits that need to interact directly with the platform to just two defs (main and writeWithWasmExtension).

In the next article, we’ll (start to) build a compiler!

Build a Wasm Compiler in Roc - Part 2

Let’s Write Some Roc

CLI Arguments

Reading the file

Writing the result

Dusty Phillips