In earlier articles, we implemented a tokenizer and parser to convert Wasm’s WAT syntax into an S-expression abstract syntax tree and started to implement a transformer to convert that AST into one more suitable for generating Wasm bytecode.

This article continues where we left off, building the last piece of the transformer. This piece needs to represent arbitrary instructions inside a function body, so it’s going to take a bit of massaging. But we’ll get there!

Where were we?

The code we want to be able to parse is the WAT syntax for “hello world”:

(module
    (import "wasi_snapshot_preview1" "fd_write" (func $fd_write (param i32 i32 i32 i32) (result i32)))

    (memory 1)
    (export "memory" (memory 0))
    (export "_start" (func $main))

    (data (offset (i32.const 8)) "hello world")

    (func $main
        (i32.const 0)
        (i32.const 8)
        (i32.store)

        (i32.const 4)
        (i32.const 11)
        (i32.store)

        (i32.const 1)
        (i32.const 0)
        (i32.const 1)
        (i32.const 20)
        (call $fd_write)
        (drop)
    )
)

Compared to higher level languages, this is a lot of code for hello world! We have so far parsed the import, memory, exports, and data expressions along with all their children. Now comes the hard part.

Analyzing WAT function syntax

The func keyword is followed by a $ identifier for the function. It can include some shortcuts to define types and exports inline, but we can skip that. I’m kind of surprised we don’t need a function type; I’m not sure if that’s because it’s a void function (no parameters or return value) or because it’s the _start function so the runtime can infer the type.

At any rate, it seems we can ignore types to compile this function in this particular code, so I will! Ignore them, that is.

The bulk of the function is taken up with instructions. Wasm has a pretty big collection of instructions, but we thankfully only need to implement four of them:

  • i32.const pushes a numeric value onto the stack. There are other const types, but we are only dealing with 32-bit integers. Further, we have the luxury of assuming they are all unsigned integers. We already parsed the i32.const instruction when we were setting up the data section in part 9!
  • i32.store pops two i32 values off the stack and uses them as the offset and value to insert a value into the default memory.
  • The call instruction invokes another function. The identifier of the function is an integer, but the WAT syntax is using a identifier that will map to that index depending which index our import (since $fd_write is an imported function) put that function at. Any arguments to the function must be on the stack at the time of the call.
  • The drop instruction just drops the topmost value from the stack. In this case, it effectively ignores the result of the call to the $fd_write function.

Note: It is possible to nest expressions in WAT to reduce verbosity. For example, (i32.store (i32.const 0) (i32.const 8)) can replace having three instructions in a row, and call can also have embedded arguments. I chose to keep the more verbose syntax strictly because it’s easier to transform.

Let’s take a quick break to make sure we understand what all the numbers in the hello world example are actually doing:

The memory 1 instruction tells the runtime to create a linear memory with at least one page (64 Kb, far more than we need) of data.

The code in our example uses the data expression to insert the words “hello world” in the default memory at position 8.

The two i32.store calls store the values 8 and 11 as 32-bit (4 byte) integers at positions 0 and 4. These effectively fill up the empty space we left before “hello world” in memory at position 8. So the memory now looks like this:

|8 <4 bytes> | 11 <4 bytes> | hello world |

The 8 represents the index of the first byte of “hello world”, which is at that position because that’s where we told data to write it. The 11 is the length of “hello world”.

The call instruction receives four constant numbers 1, 0, 1, and 20. These constants are pushed onto the runtime’s stack, and do not go “in” the linear memory. The first 1 is the file descriptor we want to write to (1 means stdout). The 0 is the index in memory of an “iovec”. In WASI, an iovec is an eight byte sequence that contains two integers. So the 8 and 11 we wrote independently before are now being passed to WASI as a single iovec. The second 1 says the length of the iovec is 1. Because of the interpretation of iovec, 1 iovec is 8 bytes.

Finally, the 20 that we pass to $fd_write is the location that we want fdwrite to write the result. In this case, we don’t care what the result is, so we just pick a position that is above the end of “hello world” in the memory.

Transforming Instructions

We already have a basic Instruction type and a transformer function to transform instructions, but both currently only support i32.const instructions. Let’s extend this to add i32.store:

Instruction : [I32Const (Common.Positionable U32), I32Store Common.Position]

The i32.store instruction uses the stack to load its values, so the instruction doesn’t have any parameters. However, we do want to store the position. I haven’t been very consistent about storing positions in the transformer because I don’t think they’ll be needed during code generation. However, in a real compiler, you would want to pass this information down to code generation so it can insert debug symbols.

Now we need to add i32.store to the transformInstructions transformer. This function is already walking the children and applying a when statement, so we just need to add one new match arm. Here is the function in its entirety:

transformInstructions : Transformer (List Instruction)
transformInstructions = \children ->
    List.walk
        children
        (Ok [])
        \currentResult, child ->
            when child is
                Term { position } ->
                    mergeError
                        currentResult
                        { position, message: "Expected instruction, got term" }

                SExpression { position, name: "i32.const", children: expressionChildren } ->
                    mergeResult
                        currentResult
                        (transformI32Const expressionChildren)
                        \currentList, element ->
                            List.append currentList (I32Const { position, element })

                SExpression { position, name: "i32.store", children: [] } ->
                    currentResult
                    |> Result.map \currentList -> List.append currentList (I32Store position)

                SExpression { position, name } ->
                    mergeError
                        currentResult
                        { position, message: "Unexpected instruction $(name)" }

The store arm is simpler than the i32.const because (in our version), the instruction doesn’t have any children. So we just map the result and append the instruction if it exists. (If it does have children, it would fall through to the wildcard arm).

For the call instruction, we need to store the identifier of the function being called:

Instruction : [
    I32Const (Common.Positionable U32),
    I32Store Common.Position,
    Call (Common.Positionable { identifier : Str }),
]

I have a feeling future editions of Call will need to store more than just an identifier, so I put it in a record.

The match arm is also fairly simple:

SExpression { position, name: "call", children: [Term { term: Variable identifier }] } ->
    currentResult
    |> Result.map \currentList -> List.append currentList (Call { position, element: { identifier } })

The compiler is satisfied, and I’m again feeling confident enough that I don’t need to test this function with expectations. We’ll have an expectation for the full “hello world” example at the end.

The last instruction is drop, which is just as simple:

Instruction : [
    I32Const (Common.Positionable U32),
    I32Store Common.Position,
    Call (Common.Positionable { identifier : Str }),
    Drop Common.Position,
]

and

                SExpression { position, name: "drop", children: [] } ->
                    currentResult
                    |> Result.map \currentList -> List.append currentList (Drop position)

Transforming Functions

We have an empty Func type in our module already, so let’s fill it up. For our purposes, a Func will always be just an identifier followed by a sequence of the instructions we just figured out how to parse. Here’s the type:

Func : {
    identifier : Str,
    instructions : List Instruction,
}

And the transform function:

transformFunc : Transformer Func
transformFunc = \children ->
    when children is
        [Term { term: Variable identifier }, .. as funcInstructions] ->
            funcInstructions
            |> transformInstructions
            |> Result.map \instructions -> {
                identifier,
                instructions,
            }

        _ -> Err [{ position: { row: 0, column: 0 }, message: "Unexpected Instruction" }]

And here’s a very large test for it. I sincerely regret trying to carry positions forward through everything, as it really made the code a lot more complicated. Realistic, yes, but perhaps not educational:

expect
    # (module
    #   (func $myfunc
    #     (i32.const 8)
    #     (i32.const 2)
    #     (i32.store)
    #   )
    # )
    result = transform
        (
            SExpression {
                position: { column: 1, row: 1 },
                name: "module",
                state: Closed,
                children: [
                    SExpression {
                        position: { column: 3, row: 2 },
                        name: "func",
                        state: Closed,
                        children: [
                            Term {
                                position: { row: 2, column: 9 },
                                term: Variable "myfunc",
                            },
                            SExpression {
                                position: { column: 6, row: 3 },
                                name: "i32.const",
                                state: Closed,
                                children: [
                                    Term {
                                        position: { row: 3, column: 13 },
                                        term: Number 8,
                                    },
                                ],
                            },
                            SExpression {
                                position: { column: 6, row: 4 },
                                name: "i32.const",
                                state: Closed,
                                children: [
                                    Term {
                                        position: { row: 4, column: 13 },
                                        term: Number 2,
                                    },
                                ],
                            },
                            SExpression {
                                position: { column: 6, row: 5 },
                                name: "i32.store",
                                state: Closed,
                                children: [],
                            },
                        ],
                    },
                ],
            }
        )

    result
    ==
    Ok {
        element: {
            datas: [],
            exports: [],
            funcs: [
                {
                    element: {
                        identifier: "myfunc",
                        instructions: [
                            I32Const { element: 8, position: { column: 6, row: 3 } },
                            I32Const { element: 2, position: { column: 6, row: 4 } },
                            I32Store { column: 6, row: 5 },
                        ],
                    },
                    position: { column: 3, row: 2 },
                },
            ],
            imports: [],
            mems: [],
            types: [],
        },
        position: { column: 1, row: 1 },
    }

And…. OMG, I think that’s it! Yes!

I am able to run roc dev -- hello.wat and it doesn’t crash. That tells me that we’re tokenizing, parsing, and transforming every piece of syntax that is encountered in that little file. It’s been a tough slog, but we made it this far!

In the next part, we’ll start doing code generation. I’m hoping that will be fairly straightforward, although I expect there’s going to be some confusing bit twiddling before we are able to match the output from wasm-tools. See you in part… 12, I think. I’ve lost count. Yes. Part 12, it is.