Build a Wasm Compiler in Roc

In earlier articles, we implemented a tokenizer and parser to convert Wasm’s WAT syntax into an S-expression abstract syntax tree and started to implement a transformer to convert that AST into one more suitable for generating Wasm bytecode.

Now that the boilerplate is in place, this article will continue to implement recursive transformer functions, getting into some of the more complicated expressions in the “hello world” WAT example.

I should mention that the code in this article gets long and repetitive. I’m including all the code we’ll need for the hello world example, but if you feel like it’s getting a bit boring, feel free to skim bits of it! I now understand why most “introduction to compiler” articles focus on such a tiny amount of syntax! I feel committed now, though, so if you’re willing to bear with me, we’ll see this through to the end.

So far, we can parse (basic) modules, so long as they contain only memory and export expressions. Let’s next add a data expression. There is only one data expression in our “hello world” example, but it is undoubtedly the most important data in the world (considering how many times it has been typed over the years…):

    (data (offset (i32.const 8)) "hello world")

The spec for data segments is fairly straightforward, though I’m obviously going to simplify it because, “Oh my god, this blog series has gotten out of hand.”

The offset syntax is a bit odd. The children of that expression can be any sequence of constant Wasm instructions, though a single i32.const is most common. The (i32.const 8) instruction pushes the number 8 onto the runtime’s stack. However, my understanding is that these instructions are evaluated at compile time. In practice, the limitation to constant instructions means you can do a bit of arithmetic in there if you need to, but not much else. I’m going to try to make the data accept more than one instruction, but I’m not going to test it.

Another thing to note is that if there is only one instruction, the runtime is supposed to support leaving the offset keyword off. I’m not going to bother with that. It’s just a bunch of extra match arms without very much education!

Transforming Instructions

Let’s start by adding a new type for an i32.const instruction. In a real app, we’d probably try to separate things on the . because the instruction for e.g. i32.const and i64.const probably have plenty of overlap. In the example we are compiling, all instructions happen on i32, so I’m just going to treat it as a one name.

I32ConstInstruction : U32

Note that the value is a U32 instead of an I32, even though the wasm value is an I32. (The difference is that I32 allows negative numbers, but the maximum value is only half of the maximum of a U32). Wasm uses the I32 type for both signed and unsigned integers, and the interpretation depends on context. We don’t have that context, and I’m hoping only the runtime actually needs that context. In any event, our “Hello World” script doesn’t do any negative integer calculation, so we can get away with ignoring it.

Now we can add a transformI32ConstInstruction:

transformI32Const : Transformer I32ConstInstruction
transformI32Const = \children ->
    when children is
        [Term { term: Number value }] -> Ok value
        _ -> Err [{ position: { row: 0, column: 0 }, message: "Unexpected i32.const format. One number  expected." }]

Once again, I’m cheating on the error case. I should really match on any other term or S-expression so I can pull the position out of the first element like we did for transformMem. However, I found that caused a lot of Roc compiler discomfort, so let’s just pretend we’ve done that, ok?

Unlike our previous transform* functions, it is not legal to put a i32.const instruction at the top level in a module. So it doesn’t make sense to add a match arm for i32.const to the transformModule function. Instead, let’s flesh out a new function for transforming a list of instructions.

Instructions are the things that make an assembly-style language actually do things. In Wasm, they are usually doing things to the stack. The point is, there are a fair number of Wasm instructions, and each one has its own structure and prerequisites.

Can anyone smell a tagged union?

Let’s add a new Instruction type that is, yes indeed, a tagged union:

Instruction : [I32Const (Common.Positionable I32ConstInstruction)]

While we’re at it, let’s update the existing (currently empty) Data record type to include an offset as a list of instructions, as well as the bytes that are supposed to be placed at that offset in the default memory:

Data : {
    offset : List Instruction,
    bytes : Str,
}

I made it so that the offset could theoretically contain multiple instructions. I only added one type of instruction, but if someone wanted to support other const expressions, they would be able to add the types to this tag.

Now we can start roughing out a transformInstructions function that accepts a list of children and tries to build a list of Instruction records out of it:

transformInstructions : Transformer (List Instruction)
transformInstructions = \children ->
    List.walk
        children
        (Ok [])
        \currentResult, child ->
            when child is
                Term { position } ->
                    mergeError
                        currentResult
                        { position, message: "Expected instruction, got term" }

                SExpression { position, name: "i32.const", children: expressionChildren } ->
                    mergeResult
                        currentResult
                        (transformI32Const expressionChildren)
                        \currentList, element ->
                            List.append currentList (I32Const { position, element })

                SExpression { position, name } ->
                    mergeError
                        currentResult
                        { position, message: "Unexpected instruction $(name)" }

I went to the trouble of showing how to non-lazily handle errors this time. If we get a Term or an SExpression that is not an i32.const, we throw an error using the handy mergeError function. If we get an i32.const, we extract it using the transformI32Const function we defined earlier, and use mergeResult with a mapper function that appends the tag to a list.

When we add more instructions later, we’ll be able to add more match arms to this function.

Transforming the data section

Realistically, I could/should have added some expectations for the functions we just wrote, but the more Roc I write, the more confident I am that “if it compiles, it works” tends to hold true. Usually when I encounter languages that support this promise, they make me angry quickly because the “if it compiles” part is so slow to achieve. But Roc has been (so far) fast enough that I’m starting to rely on the compiler as rudimentary unit test coverage.

Probably not the best habit to get into, though, so let’s write an expectation now before we write the transformData function:

expect
    # (module
    #   (data (offset (i32.const 8)) "hello world")
    # )
    result = transform
        (
            SExpression {
                position: { column: 1, row: 1 },
                name: "module",
                state: Closed,
                children: [
                    SExpression {
                        position: { column: 3, row: 2 },
                        name: "data",
                        state: Closed,
                        children: [
                            SExpression {
                                position: { column: 9, row: 2 },
                                name: "offset",
                                state: Closed,
                                children: [
                                    SExpression {
                                        position: { column: 17, row: 2 },
                                        name: "i32.const",
                                        state: Closed,
                                        children: [
                                            Term {
                                                position: { column: 28, row: 2 },
                                                term: Number 8,
                                            },
                                        ],
                                    },
                                ],
                            },
                            Term {
                                position: { row: 1, column: 30 },
                                term: String "hello world",
                            },
                        ],
                    },
                ],
            }
        )

    result
    ==
    Ok {
        position: { row: 1, column: 1 },
        element: {
            funcs: [],
            mems: [],
            datas: [
                {
                    position: { row: 2, column: 3 },
                    element: {
                        offset: [I32Const { position: { row: 2, column: 17 }, element: 8 }],
                        bytes: "hello world",
                    },
                },
            ],
            imports: [],
            exports: [],
        },
    }

Ok, when that test passes (possibly with a couple tweaks, but I think I got it close), I will assert that data sections are “good enough”.

First, let’s define the transformData function, which starts out fairly familiar:

transformData : Transformer Data
transformData = \children ->
    when children is
        [SExpression { position, name: "offset", children: offsetChildren }, Term { term: String string }] ->
            transformInstructions offsetChildren
            |> Result.map \instructions -> {
                offset: instructions,
                bytes: string,
            }

        _ -> Err [{ position: { row: 0, column: 0 }, message: "Unexpected data segment" }]

I was able to use the Result.map function to translate a success condition into something that has been wrapped in a position. I went back to cheating on the error conditions, so our error positions are going to be annoying.

Then I needed to do some light copy-pasting to add data arms to the transformModule function:

            SExpression { position, name: "data", children: [] } ->
                mergeError currentResult { position, message: "Invalid data: offset and bytes required" }

            SExpression { position, name: "data", children: expressions } ->
                mergeResult currentResult (transformData expressions) \module, data -> { module &
                        datas: List.append module.datas { position, element: data },
                    }

The unit test passes, so let’s move on before we find something wrong with it!

Import expressions

Import expressions are rather complicated, and we’ll probably find it easiest to build them from the inside out. Here’s the import expression from our “hello world” sample:

    (import "wasi_snapshot_preview1" "fd_write" (func $fd_write (param i32 i32 i32 i32) (result i32)))

As usual, the WAT grammar has a few different structures for import expressions, and you can import things other than func from them, but we’re just going to model one of them.

The overall structure is an expression named import with three children. The first two children are strings representing the namespace and name of the import; these are used to communicate with the runtime that needs to hook up the import machinery.

The third param is an expression whose structure depends on what you are importing. Since we are importing a func, it is followed by the identifier of that func and two more expressions representing the list of params and list of results.

Other than the complexity of the expressions, the key thing to be aware of with imports is that (when adding functions) we need to update the function types section of the module as well as the imports section.

Let’s start at the inside by parsing types. Conveniently, “hello world” only uses one type (i32), so we can keep it simple. For imaginary future extensibility, we can make it a tagged union with (for now) just one tag:

Type : [I32]

For our target sample, the only place we need types is in parameter and results lists, so we may as well parse them as a list:

transformType : Transformer (List (Common.Positionable Type))
transformType = \children ->
    List.walk children (Ok []) \currentResult, child ->
        when child is
            Term { position, term: Name "i32" } ->
                mergeResult
                    currentResult
                    (Ok { position, element: I32 })
                    List.append

            Term { position, term: Name name } ->
                mergeError
                    currentResult
                    { position, message: "Unexpected type: $(name)" }

            Term { position } ->
                mergeError
                    currentResult
                    { position, message: "Not a type" }

            SExpression { position } ->
                mergeError
                    currentResult
                    { position, message: "Not a type" }

            _ -> crash "Hit unexpected match arm in transformType" # Pretty sure this shouldn't be necessary

Pretty straightforward; I tried to give slightly interesting error messages if it encounters anything other than an i32. The crash arm feels like it shouldn’t be necessary, but either I haven’t specified the other types exhaustively, or the complier wasn’t able to confirm that they were exhaustive.

Just because we can, let’s write some quick tests for this:

expect
    # (...i32, i32)
    result = transformType [
        Term { position: { row: 1, column: 5 }, term: Name "i32" },
        Term { position: { row: 1, column: 9 }, term: Name "i32" },
    ]

    result
    ==
    Ok [
        { element: I32, position: { column: 5, row: 1 } },
        { element: I32, position: { column: 9, row: 1 } },
    ]

expect
    # (...f32) isn't supported yet
    result = transformType [
        Term { position: { row: 1, column: 5 }, term: Name "f32" },
    ]

    result == Err [{ position: { column: 5, row: 1 }, message: "Unexpected type: f32" }]

Nothing too unusual in there, so let’s next try to parse a function signature. When we get around to generating code, the function types will go in a separate section of the binary, and must include both import types and the types of functions defined in the module.

We have two functions in our hello world syntax: one in the import statement, and one defined in the module body. We’ll focus on the import statement for now, as the signature of the one in the body is calculated differently. The WAT syntax looks like this:

(func $identifier (param ...) (result ...))

As usual, the formal grammar includes variations on this syntax, but for my sanity, I’ll assume they all look like this.

Let’s add a new FuncType top level definition:

FuncType : {
    identifier : Str,
    param : List (Common.Positionable Type),
    result : List (Common.Positionable Type),
}

Now we can write a rudimentary trnasformFuncType function:

transformFuncType : Transformer FuncType
transformFuncType = \children ->
    when children is
        [Term { position, term: Variable identifier }, SExpression { name: "param", children: paramChildren }, SExpression { name: "result", children: resultChildren }] ->
            withParams = mergeResult
                (Ok { identifier, param: [], result: [] })
                (transformTypes paramChildren)
                \current, param -> { current &
                        param,
                    }
            mergeResult
                withParams
                (transformTypes resultChildren)
                \current, result -> { current &
                        result,
                    }

        _ -> Err [{ position: { row: 0, column: 0 }, message: "Inappropriate error handling in transformFuncType" }]

I say “rudimentary” because I’m not even pretending to handle errors this time! Let’s add a similarly uninspired test:

expect
    # (... $identifier (param i32 i32) (result i32))
    result = transformFuncType [
        Term { position: { row: 1, column: 5 }, term: Variable "$identifier" },
        SExpression {
            position: { row: 1, column: 17 },
            name: "param",
            state: Closed,
            children: [
                Term { position: { row: 1, column: 24 }, term: Name "i32" },
                Term { position: { row: 1, column: 28 }, term: Name "i32" },
            ],
        },
        SExpression {
            position: { row: 1, column: 33 },
            name: "result",
            state: Closed,
            children: [
                Term { position: { row: 1, column: 41 }, term: Name "i32" },
            ],
        },
    ]

    result
    == Ok {
        identifier: "$identifier",
        param: [
            { position: { row: 1, column: 24 }, element: I32 },
            { position: { row: 1, column: 28 }, element: I32 },
        ],
        result: [
            { position: { row: 1, column: 41 }, element: I32 },
        ],
    }

Now rinse and repeat (and I truly apologize for the monotony) with the Import type:

Import : {
    namespace : Str,
    name : Str,
    definition : [Func Str],
}

The definition is a tagged union in case we want to add other types of imports later. But notice that it does not include the function definition we just went to so much trouble to parse! This is the type that we will be attaching to the Module type. We record the namespace and module in the imports section, but the funcType needs to go in a new types section:

Module : {
    types : List (Common.Positionable FuncType),
    funcs : List (Common.Positionable Func),
    mems : List (Common.Positionable Mem),
    datas : List (Common.Positionable Data),
    imports : List (Common.Positionable Import),
    exports : List (Common.Positionable Export),
}

We’ll also need to add types to emptyModule:

emptyModule = {
    types: [],
    funcs: [],
    mems: [],
    datas: [],
    imports: [],
    exports: [],
}

I had to update a couple module unit tests with types: [] as well.

So when we update the Module after parsing an import, we need to update two different fields on it.

Let’s add a TransformedImport type to be the return value of our upcoming transformImport function; it will look more like you expected:

# The result of `transformImport` is split into `types` and `imports` section
# on the module
TransformedImport : {
    namespace : Str,
    name : Str,
    definition : [Func FuncType],
}

Now we can write the half-assed transformImport implementation:

transformImport : Transformer TransformedImport
transformImport = \children ->
    when children is
        [Term { term: String namespace }, Term { term: String name }, SExpression { name: "func", children: funcChildren }] ->
            funcChildren
            |> transformFuncType
            |> Result.map \functionDefinition -> { namespace, name, definition: Func functionDefinition }

        _ -> Err [{ position: { row: 0, column: 0 }, message: "Unexpected import encountered" }]

Now we have to update the transformModule function to handle top-level import expressions. This isn’t going to be a straight-up copy-pasting like the previous when arms because we need to extract TransformedImport into separate types and imports sections in the module:

            SExpression { position, name: "import", children: [] } ->
                mergeError currentResult { position, message: "Invalid import: namespace, name, and definition required" }

            SExpression { position, name: "import", children: expressions } ->
                mergeResult
                    currentResult
                    (transformImport expressions)
                    (\module, { name, namespace, definition: Func func } ->
                        { module &
                            types: List.append module.types { position, element: func },
                            imports: List.append module.imports { position, element: { name, namespace, definition: Func func.identifier } },
                        }
                    )

I have to admit, this code was pretty tricky to craft. When you return the wrong type from a when arm, Roc just throws up its arms and says “something’s wrong in this when arm: here are the types I saw and what I expected.” But manually analyzing those types to see where it got confused (especially if recursive types are involved) can be pretty difficult.

My solution is often to hand the code to Claude.ai and say “what’s wrong?” But frankly, if the AI can figure it out, the complier should have been able to give me better feedback in the first place. I’m sure it’s coming!

The test for this is bloody huge to set up, but it’s not really any different from anything else we’ve defined:

expect
    # (module
    #   (import
    #     "wasi_snapshot_preview1"
    #     "fd_write"
    #     (func $fd_write (param i32 i32 i32 i32) (result i32))
    #   )
    # )
    result = transform
        (
            SExpression {
                position: { column: 1, row: 1 },
                name: "module",
                state: Closed,
                children: [
                    SExpression {
                        position: { column: 3, row: 2 },
                        name: "import",
                        state: Closed,
                        children: [
                            Term {
                                position: { row: 3, column: 4 },
                                term: String "wasi_snapshot_preview1",
                            },
                            Term {
                                position: { row: 3, column: 4 },
                                term: String "fd_write",
                            },
                            SExpression {
                                position: { column: 4, row: 5 },
                                name: "func",
                                state: Closed,
                                children: [
                                    Term {
                                        position: { row: 5, column: 10 },
                                        term: Variable "fd_write",
                                    },
                                    SExpression {
                                        position: { column: 20, row: 5 },
                                        name: "param",
                                        state: Closed,
                                        children: [
                                            Term {
                                                position: { column: 27, row: 5 },
                                                term: Name "i32",
                                            },
                                            Term {
                                                position: { column: 31, row: 5 },
                                                term: Name "i32",
                                            },
                                            Term {
                                                position: { column: 35, row: 5 },
                                                term: Name "i32",
                                            },
                                            Term {
                                                position: { column: 39, row: 5 },
                                                term: Name "i32",
                                            },
                                        ],
                                    },
                                    SExpression {
                                        position: { column: 44, row: 5 },
                                        name: "result",
                                        state: Closed,
                                        children: [
                                            Term {
                                                position: { column: 51, row: 5 },
                                                term: Name "i32",
                                            },
                                        ],
                                    },
                                ],
                            },
                        ],
                    },
                ],
            }
        )

    result
    ==
    Ok {
        position: { column: 1, row: 1 },
        element: {
            datas: [],
            exports: [],
            funcs: [],
            imports: [
                {
                    position: { column: 3, row: 2 },
                    element: {
                        definition: Func "fd_write",
                        name: "fd_write",
                        namespace: "wasi_snapshot_preview1",
                    },
                },
            ],
            mems: [],
            types: [
                {
                    position: { column: 3, row: 2 },
                    element: {
                        identifier: "fd_write",
                        param: [
                            {
                                element: I32,
                                position: { column: 27, row: 5 },
                            },
                            {
                                element: I32,
                                position: { column: 31, row: 5 },
                            },
                            {
                                element: I32,
                                position: { column: 35, row: 5 },
                            },
                            {
                                element: I32,
                                position: { column: 39, row: 5 },
                            },
                        ],
                        result: [
                            {
                                element: I32,
                                position: { column: 51, row: 5 },
                            },
                        ],
                    },
                },
            ],
        },
    }

This article has been rather code-heavy and, if you’ve been following the series, rather new-information-sparse. I apologize if it was boring to read… To be honest, it was pretty boring to write! We have one more type to transform in the next article, and then we’ll be ready to move on to code generation!

Build a Wasm Compiler in Roc - Part 10

Transforming Instructions

Transforming the data section

Import expressions

Dusty Phillips