Build a Wasm Compiler in Roc - Part 11
In earlier articles, we implemented a tokenizer and parser to convert Wasm’s WAT syntax into an S-expression abstract syntax tree and started to implement a transformer to convert that AST into one more suitable for generating Wasm bytecode.
This article continues where we left off, building the last piece of the transformer. This piece needs to represent arbitrary instructions inside a function body, so it’s going to take a bit of massaging. But we’ll get there!
Where were we?
The code we want to be able to parse is the WAT syntax for “hello world”:
(module
(import "wasi_snapshot_preview1" "fd_write" (func $fd_write (param i32 i32 i32 i32) (result i32)))
(memory 1)
(export "memory" (memory 0))
(export "_start" (func $main))
(data (offset (i32.const 8)) "hello world")
(func $main
(i32.const 0)
(i32.const 8)
(i32.store)
(i32.const 4)
(i32.const 11)
(i32.store)
(i32.const 1)
(i32.const 0)
(i32.const 1)
(i32.const 20)
(call $fd_write)
(drop)
)
)
Compared to higher level languages, this is a lot of code for hello world! We have so far parsed the import, memory, exports, and data expressions along with all their children. Now comes the hard part.
Analyzing WAT function syntax
The func keyword is followed by a $ identifier for the function. It can
include some shortcuts to define types and exports inline, but we can skip
that. I’m kind of surprised we don’t need a function type; I’m not sure if
that’s because it’s a void function (no parameters or return value) or because
it’s the _start function so the runtime can infer the type.
At any rate, it seems we can ignore types to compile this function in this particular code, so I will! Ignore them, that is.
The bulk of the function is taken up with instructions. Wasm has a pretty big collection of instructions, but we thankfully only need to implement four of them:
i32.constpushes a numeric value onto the stack. There are otherconsttypes, but we are only dealing with 32-bit integers. Further, we have the luxury of assuming they are all unsigned integers. We already parsed thei32.constinstruction when we were setting up thedatasection in part 9!i32.storepops twoi32values off the stack and uses them as theoffsetandvalueto insert a value into the defaultmemory.- The
callinstruction invokes another function. The identifier of the function is an integer, but the WAT syntax is using a identifier that will map to that index depending which index our import (since$fd_writeis an imported function) put that function at. Any arguments to the function must be on the stack at the time of thecall. - The
dropinstruction just drops the topmost value from the stack. In this case, it effectively ignores the result of the call to the$fd_writefunction.
Note: It is possible to nest expressions in WAT to reduce verbosity. For example,
(i32.store (i32.const 0) (i32.const 8))can replace having three instructions in a row, andcallcan also have embedded arguments. I chose to keep the more verbose syntax strictly because it’s easier to transform.
Let’s take a quick break to make sure we understand what all the numbers in the hello world example are actually doing:
The memory 1 instruction tells the runtime to create a linear memory with at least
one page (64 Kb, far more than we need) of data.
The code in our example uses the data expression to insert the words “hello world”
in the default memory at position 8.
The two i32.store calls store the values 8 and 11 as 32-bit (4 byte)
integers at positions 0 and 4. These effectively fill up the empty space we
left before “hello world” in memory at position 8. So the memory now looks like
this:
|8 <4 bytes> | 11 <4 bytes> | hello world |
The 8 represents the index of the first byte of “hello world”, which is at
that position because that’s where we told data to write it. The 11 is the
length of “hello world”.
The call instruction receives four constant numbers 1, 0, 1, and 20. These
constants are pushed onto the runtime’s stack, and do not go “in” the linear
memory. The first 1 is the file descriptor we want to write to (1 means
stdout). The 0 is the index in memory of an “iovec”. In WASI, an iovec
is an eight byte sequence that contains two integers. So the 8 and 11
we wrote independently before are now being passed to WASI as a single
iovec. The second 1 says the length of the iovec is 1. Because of the
interpretation of iovec, 1 iovec is 8 bytes.
Finally, the 20 that we pass to $fd_write is the location that we want
fdwrite to write the result. In this case, we don’t care what the result is, so
we just pick a position that is above the end of “hello world” in the memory.
Transforming Instructions
We already have a basic Instruction type and a transformer function to
transform instructions, but both currently only support i32.const
instructions. Let’s extend this to add i32.store:
Instruction : [I32Const (Common.Positionable U32), I32Store Common.Position]
The i32.store instruction uses the stack to load its values, so the
instruction doesn’t have any parameters. However, we do want to store the
position. I haven’t been very consistent about storing positions in the
transformer because I don’t think they’ll be needed during code generation.
However, in a real compiler, you would want to pass this information down to
code generation so it can insert debug symbols.
Now we need to add i32.store to the transformInstructions transformer. This
function is already walking the children and applying a when statement, so we just
need to add one new match arm. Here is the function in its entirety:
transformInstructions : Transformer (List Instruction)
transformInstructions = \children ->
List.walk
children
(Ok [])
\currentResult, child ->
when child is
Term { position } ->
mergeError
currentResult
{ position, message: "Expected instruction, got term" }
SExpression { position, name: "i32.const", children: expressionChildren } ->
mergeResult
currentResult
(transformI32Const expressionChildren)
\currentList, element ->
List.append currentList (I32Const { position, element })
SExpression { position, name: "i32.store", children: [] } ->
currentResult
|> Result.map \currentList -> List.append currentList (I32Store position)
SExpression { position, name } ->
mergeError
currentResult
{ position, message: "Unexpected instruction $(name)" }
The store arm is simpler than the i32.const because (in our version), the
instruction doesn’t have any children. So we just map the result and append
the instruction if it exists. (If it does have children, it would fall through
to the wildcard arm).
For the call instruction, we need to store the identifier of the function
being called:
Instruction : [
I32Const (Common.Positionable U32),
I32Store Common.Position,
Call (Common.Positionable { identifier : Str }),
]
I have a feeling future editions of Call will need to store more than just an
identifier, so I put it in a record.
The match arm is also fairly simple:
SExpression { position, name: "call", children: [Term { term: Variable identifier }] } ->
currentResult
|> Result.map \currentList -> List.append currentList (Call { position, element: { identifier } })
The compiler is satisfied, and I’m again feeling confident enough that I don’t need to test this function with expectations. We’ll have an expectation for the full “hello world” example at the end.
The last instruction is drop, which is just as simple:
Instruction : [
I32Const (Common.Positionable U32),
I32Store Common.Position,
Call (Common.Positionable { identifier : Str }),
Drop Common.Position,
]
and
SExpression { position, name: "drop", children: [] } ->
currentResult
|> Result.map \currentList -> List.append currentList (Drop position)
Transforming Functions
We have an empty Func type in our module already, so let’s fill it up. For
our purposes, a Func will always be just an identifier followed by a sequence
of the instructions we just figured out how to parse. Here’s the type:
Func : {
identifier : Str,
instructions : List Instruction,
}
And the transform function:
transformFunc : Transformer Func
transformFunc = \children ->
when children is
[Term { term: Variable identifier }, .. as funcInstructions] ->
funcInstructions
|> transformInstructions
|> Result.map \instructions -> {
identifier,
instructions,
}
_ -> Err [{ position: { row: 0, column: 0 }, message: "Unexpected Instruction" }]
And here’s a very large test for it. I sincerely regret trying to carry positions forward through everything, as it really made the code a lot more complicated. Realistic, yes, but perhaps not educational:
expect
# (module
# (func $myfunc
# (i32.const 8)
# (i32.const 2)
# (i32.store)
# )
# )
result = transform
(
SExpression {
position: { column: 1, row: 1 },
name: "module",
state: Closed,
children: [
SExpression {
position: { column: 3, row: 2 },
name: "func",
state: Closed,
children: [
Term {
position: { row: 2, column: 9 },
term: Variable "myfunc",
},
SExpression {
position: { column: 6, row: 3 },
name: "i32.const",
state: Closed,
children: [
Term {
position: { row: 3, column: 13 },
term: Number 8,
},
],
},
SExpression {
position: { column: 6, row: 4 },
name: "i32.const",
state: Closed,
children: [
Term {
position: { row: 4, column: 13 },
term: Number 2,
},
],
},
SExpression {
position: { column: 6, row: 5 },
name: "i32.store",
state: Closed,
children: [],
},
],
},
],
}
)
result
==
Ok {
element: {
datas: [],
exports: [],
funcs: [
{
element: {
identifier: "myfunc",
instructions: [
I32Const { element: 8, position: { column: 6, row: 3 } },
I32Const { element: 2, position: { column: 6, row: 4 } },
I32Store { column: 6, row: 5 },
],
},
position: { column: 3, row: 2 },
},
],
imports: [],
mems: [],
types: [],
},
position: { column: 1, row: 1 },
}
And…. OMG, I think that’s it! Yes!
I am able to run roc dev -- hello.wat and it doesn’t crash. That tells me
that we’re tokenizing, parsing, and transforming every piece of syntax that is
encountered in that little file. It’s been a tough slog, but we made it this far!
In the next part, we’ll start doing code generation. I’m hoping that will be fairly straightforward, although I expect there’s going to be some confusing bit twiddling before we are able to match the output from wasm-tools. See you in part… 12, I think. I’ve lost count. Yes. Part 12, it is.