Build a Wasm Compiler in Roc - Part 11
In earlier articles, we implemented a tokenizer and parser to convert Wasm’s WAT syntax into an S-expression abstract syntax tree and started to implement a transformer to convert that AST into one more suitable for generating Wasm bytecode.
This article continues where we left off, building the last piece of the transformer. This piece needs to represent arbitrary instructions inside a function body, so it’s going to take a bit of massaging. But we’ll get there!
Where were we?
The code we want to be able to parse is the WAT syntax for “hello world”:
(module
(import "wasi_snapshot_preview1" "fd_write" (func $fd_write (param i32 i32 i32 i32) (result i32)))
(memory 1)
(export "memory" (memory 0))
(export "_start" (func $main))
(data (offset (i32.const 8)) "hello world")
(func $main
(i32.const 0)
(i32.const 8)
(i32.store)
(i32.const 4)
(i32.const 11)
(i32.store)
(i32.const 1)
(i32.const 0)
(i32.const 1)
(i32.const 20)
(call $fd_write)
(drop)
)
)
Compared to higher level languages, this is a lot of code for hello world! We have so far parsed the import, memory, exports, and data expressions along with all their children. Now comes the hard part.
Analyzing WAT function syntax
The func
keyword is followed by a $
identifier for the function. It can
include some shortcuts to define types and exports inline, but we can skip
that. I’m kind of surprised we don’t need a function type; I’m not sure if
that’s because it’s a void function (no parameters or return value) or because
it’s the _start
function so the runtime can infer the type.
At any rate, it seems we can ignore types to compile this function in this particular code, so I will! Ignore them, that is.
The bulk of the function is taken up with instructions. Wasm has a pretty big collection of instructions, but we thankfully only need to implement four of them:
i32.const
pushes a numeric value onto the stack. There are otherconst
types, but we are only dealing with 32-bit integers. Further, we have the luxury of assuming they are all unsigned integers. We already parsed thei32.const
instruction when we were setting up thedata
section in part 9!i32.store
pops twoi32
values off the stack and uses them as theoffset
andvalue
to insert a value into the defaultmemory
.- The
call
instruction invokes another function. The identifier of the function is an integer, but the WAT syntax is using a identifier that will map to that index depending which index our import (since$fd_write
is an imported function) put that function at. Any arguments to the function must be on the stack at the time of thecall
. - The
drop
instruction just drops the topmost value from the stack. In this case, it effectively ignores the result of the call to the$fd_write
function.
Note: It is possible to nest expressions in WAT to reduce verbosity. For example,
(i32.store (i32.const 0) (i32.const 8))
can replace having three instructions in a row, andcall
can also have embedded arguments. I chose to keep the more verbose syntax strictly because it’s easier to transform.
Let’s take a quick break to make sure we understand what all the numbers in the hello world example are actually doing:
The memory 1
instruction tells the runtime to create a linear memory with at least
one page (64 Kb, far more than we need) of data.
The code in our example uses the data
expression to insert the words “hello world”
in the default memory at position 8.
The two i32.store
calls store the values 8
and 11
as 32-bit (4 byte)
integers at positions 0 and 4. These effectively fill up the empty space we
left before “hello world” in memory at position 8. So the memory now looks like
this:
|8 <4 bytes> | 11 <4 bytes> | hello world |
The 8
represents the index of the first byte of “hello world”, which is at
that position because that’s where we told data
to write it. The 11
is the
length of “hello world”.
The call
instruction receives four constant numbers 1, 0, 1, and 20. These
constants are pushed onto the runtime’s stack, and do not go “in” the linear
memory. The first 1
is the file descriptor we want to write to (1 means
stdout). The 0
is the index in memory of an “iovec”. In WASI, an iovec
is an eight byte sequence that contains two integers. So the 8
and 11
we wrote independently before are now being passed to WASI
as a single
iovec. The second 1
says the length of the iovec is 1
. Because of the
interpretation of iovec
, 1 iovec is 8 bytes.
Finally, the 20 that we pass to $fd_write
is the location that we want
fdwrite
to write the result. In this case, we don’t care what the result is, so
we just pick a position that is above the end of “hello world” in the memory.
Transforming Instructions
We already have a basic Instruction
type and a transformer function to
transform instructions, but both currently only support i32.const
instructions. Let’s extend this to add i32.store
:
Instruction : [I32Const (Common.Positionable U32), I32Store Common.Position]
The i32.store
instruction uses the stack to load its values, so the
instruction doesn’t have any parameters. However, we do want to store the
position. I haven’t been very consistent about storing positions in the
transformer because I don’t think they’ll be needed during code generation.
However, in a real compiler, you would want to pass this information down to
code generation so it can insert debug symbols.
Now we need to add i32.store
to the transformInstructions
transformer. This
function is already walking the children and applying a when
statement, so we just
need to add one new match arm. Here is the function in its entirety:
transformInstructions : Transformer (List Instruction)
transformInstructions = \children ->
List.walk
children
(Ok [])
\currentResult, child ->
when child is
Term { position } ->
mergeError
currentResult
{ position, message: "Expected instruction, got term" }
SExpression { position, name: "i32.const", children: expressionChildren } ->
mergeResult
currentResult
(transformI32Const expressionChildren)
\currentList, element ->
List.append currentList (I32Const { position, element })
SExpression { position, name: "i32.store", children: [] } ->
currentResult
|> Result.map \currentList -> List.append currentList (I32Store position)
SExpression { position, name } ->
mergeError
currentResult
{ position, message: "Unexpected instruction $(name)" }
The store
arm is simpler than the i32.const
because (in our version), the
instruction doesn’t have any children. So we just map
the result and append
the instruction if it exists. (If it does have children, it would fall through
to the wildcard arm).
For the call
instruction, we need to store the identifier of the function
being called:
Instruction : [
I32Const (Common.Positionable U32),
I32Store Common.Position,
Call (Common.Positionable { identifier : Str }),
]
I have a feeling future editions of Call
will need to store more than just an
identifier, so I put it in a record.
The match arm is also fairly simple:
SExpression { position, name: "call", children: [Term { term: Variable identifier }] } ->
currentResult
|> Result.map \currentList -> List.append currentList (Call { position, element: { identifier } })
The compiler is satisfied, and I’m again feeling confident enough that I don’t need to test this function with expectations. We’ll have an expectation for the full “hello world” example at the end.
The last instruction is drop
, which is just as simple:
Instruction : [
I32Const (Common.Positionable U32),
I32Store Common.Position,
Call (Common.Positionable { identifier : Str }),
Drop Common.Position,
]
and
SExpression { position, name: "drop", children: [] } ->
currentResult
|> Result.map \currentList -> List.append currentList (Drop position)
Transforming Functions
We have an empty Func
type in our module already, so let’s fill it up. For
our purposes, a Func
will always be just an identifier followed by a sequence
of the instructions we just figured out how to parse. Here’s the type:
Func : {
identifier : Str,
instructions : List Instruction,
}
And the transform function:
transformFunc : Transformer Func
transformFunc = \children ->
when children is
[Term { term: Variable identifier }, .. as funcInstructions] ->
funcInstructions
|> transformInstructions
|> Result.map \instructions -> {
identifier,
instructions,
}
_ -> Err [{ position: { row: 0, column: 0 }, message: "Unexpected Instruction" }]
And here’s a very large test for it. I sincerely regret trying to carry positions forward through everything, as it really made the code a lot more complicated. Realistic, yes, but perhaps not educational:
expect
# (module
# (func $myfunc
# (i32.const 8)
# (i32.const 2)
# (i32.store)
# )
# )
result = transform
(
SExpression {
position: { column: 1, row: 1 },
name: "module",
state: Closed,
children: [
SExpression {
position: { column: 3, row: 2 },
name: "func",
state: Closed,
children: [
Term {
position: { row: 2, column: 9 },
term: Variable "myfunc",
},
SExpression {
position: { column: 6, row: 3 },
name: "i32.const",
state: Closed,
children: [
Term {
position: { row: 3, column: 13 },
term: Number 8,
},
],
},
SExpression {
position: { column: 6, row: 4 },
name: "i32.const",
state: Closed,
children: [
Term {
position: { row: 4, column: 13 },
term: Number 2,
},
],
},
SExpression {
position: { column: 6, row: 5 },
name: "i32.store",
state: Closed,
children: [],
},
],
},
],
}
)
result
==
Ok {
element: {
datas: [],
exports: [],
funcs: [
{
element: {
identifier: "myfunc",
instructions: [
I32Const { element: 8, position: { column: 6, row: 3 } },
I32Const { element: 2, position: { column: 6, row: 4 } },
I32Store { column: 6, row: 5 },
],
},
position: { column: 3, row: 2 },
},
],
imports: [],
mems: [],
types: [],
},
position: { column: 1, row: 1 },
}
And…. OMG, I think that’s it! Yes!
I am able to run roc dev -- hello.wat
and it doesn’t crash. That tells me
that we’re tokenizing, parsing, and transforming every piece of syntax that is
encountered in that little file. It’s been a tough slog, but we made it this far!
In the next part, we’ll start doing code generation. I’m hoping that will be fairly straightforward, although I expect there’s going to be some confusing bit twiddling before we are able to match the output from wasm-tools. See you in part… 12, I think. I’ve lost count. Yes. Part 12, it is.