Build a Wasm Compiler in Roc - Part 14
In earlier articles, we have implemented a tokenizer, parser, and transformer to convert WAT syntax to a Wasm AST and got started on the code generation.
This part will continue with code generation. We’ll start with the import section because it has a couple interesting complications that we need to take into account.
The Import Section
Consider this wat
import statement:
(module
(import "wasi_snapshot_preview1" "fd_write" (
func $fd_write (param i32 i32 i32 i32) (result i32))
)
)
If we look at the bytes in the binary representation of this section, it’s surprisingly long:
❯ wasm-tools dump main.wat
0x0 | 00 61 73 6d | version 1 (Module)
| 01 00 00 00
0x8 | 01 09 | type section
0xa | 01 | 1 count
--- rec group 0 (implicit) ---
0xb | 60 04 7f 7f | [type 0] SubType { is_final: true, supertype_idx: None, composite_type: Func(FuncType { params: [I32, I32, I32, I32], results: [I32] }) }
| 7f 7f 01 7f
0x13 | 02 23 | import section
0x15 | 01 | 1 count
0x16 | 16 77 61 73 | import [func 0] Import { module: "wasi_snapshot_preview1", name: "fd_write", ty: Func(0) }
| 69 5f 73 6e
| 61 70 73 68
| 6f 74 5f 70
| 72 65 76 69
| 65 77 31 08
| 66 64 5f 77
| 72 69 74 65
| 00 00
0x38 | 00 12 | custom section
0x3a | 04 6e 61 6d | name: "name"
| 65
0x3f | 01 0b | function name section
0x41 | 01 | 1 count
0x42 | 00 08 66 64 | Naming { index: 0, name: "fd_write" }
| 5f 77 72 69
| 74 65
The first eight bytes describe the wasm module as usual, and then we have 11
bytes describing the type
section, which we already covered in Part 12.
Then we see the import section itself, which we’ll discuss shortly.
But after the import section, we see a custom section
. Custom sections have
the section id 00
, but unlike other sections, they can go between any other
section and there can be more than one of them. Custom sections have a name
followed by any arbitrary bytes. Custom sections are completely ignored by the
WAT runtime, but they can be used to provide additional information to tooling such
as debuggers and LSPs.
The Wasm standard largely ignores custom sections, with the exception of a name section. If the custom section has the name “name”, then it is expected that the bytes in that section conform to the description in the Custom Section Appendix.
The name section is intended to attach names to identifiers such as functions. The functions are stored in a flat list and Wasm looks them up using integer indices into that list. However, if you want to e.g. run a debugger or convert Wasm code back to WAT code, it helps to know what the original human readable identifiers were for those names.
Since this is a for-fun project that nobody in their right mind will use for debugging, we’ll skip the custom name section, even though standard Wasm tools generate it.
That means we only need to generate the import
section, which is quite long
because it contains some strings:
0x13 | 02 23 | import section
0x15 | 01 | 1 count
0x16 | 16 77 61 73 | import [func 0] Import { module: "wasi_snapshot_preview1", name: "fd_write", ty: Func(0) }
| 69 5f 73 6e
| 61 70 73 68
| 6f 74 5f 70
| 72 65 76 69
| 65 77 31 08
| 66 64 5f 77
| 72 69 74 65
| 00 00
Because those bytes are kind of opaque, it may help to see which of them translate directly to characters by using xxd:
00000000: 00 61 73 6d 01 00 00 00 .asm....
00000008: 01 09 01 60 04 7f 7f 7f ...`....
00000010: 7f 01 7f 02 23 01 16 77 ....#..w
00000018: 61 73 69 5f 73 6e 61 70 asi_snap
00000020: 73 68 6f 74 5f 70 72 65 shot_pre
00000028: 76 69 65 77 31 08 66 64 view1.fd
00000030: 5f 77 72 69 74 65 00 00 _write..
The first byte is 0x02
, which is the id for the import section. The second byte
is 0x23
, which is the hex interpretation of the decimal 35
: there are 35
bytes in the import section.
Then we have a vector containing all the imports; the first byte in the vector
is the integer 01
reflecting the length of the vectior. There is only one
import in this example.
The 0x16
is the start of another vector and represents the fact that there are
22 bytes (hex 16
= decimal 22
) in the “namespace” string, which is
“wasi_snapshot_preview1”. The bytes for that string follow, ending with 0x31
.
Next we have the byte 08
, which is the length of the string, “fd_write”,
followed by the bytes of that string, which end with 0x65
.
Finally, we have two 00
bytes. The first of these is a flag that can be one
of 00
, 01
, 02
, or 03
, depending on what kind of item we are importing.
Since our sample app only supports importing functions, we will only need 00
,
for functions.
The final 00
is the LEB128 index in the types table of the function type
that is being imported. This may cause us a bit of trouble because it is not
self contained. Given just the module.imports
, we don’t know what index the
function type is in the table. We need module.types
as well.
Let’s try to solve that first. Our concatSection
function accepts a function
that accepts an arbitrary object and returns a List U8
. One option we have is
to make the “arbitrary object” be something other than module.imports
. For
example, it could be a tuple of (module.imports, module.types)
. But that feels
clunky. Let’s see if we can be more clever.
To solve this, we can create a function that accepts the module.types
and
converts it to a dictionary for easy lookup. The interesting part will be that
the return value of module types will be a function (so it’s a function that
returns a function). Specifically, it will be a function that is able to
satisfy the generateOne
contract when generating imports.
generateImportFactory : List (Common.Positionable Transformer.FuncType) -> (Transformer.Import -> List U8)
generateImportFactory = \funcTypes ->
typeDict =
funcTypes
|> List.map Common.extractElement
|> List.mapWithIndex \funcType, index -> (funcType.identifier, index)
|> Dict.fromList
\importDef ->
when importDef.definition is
Func identifier if Dict.contains typeDict identifier ->
# TODO: Format import here
[]
Func identifier ->
crash "Unexpected identifier shouldn't get past AST generation"
_ ->
crash "Haven't implemented non func imports"
I called my function generateImportFactory
, which I’m pretty sure is not an
idiomatic functional language name, and it very much gives me Java flashbacks.
The point is, this function accepts whatever the module’s types are and
converts those types to a “name to index” dictionary lookup. Then it returns a
function that needs to do the actual generateImport
logic. For now it’s an
empty list.
Now that it exists, hooking this factory up in generate
is pretty easy:
generate : Common.Positionable Transformer.Module -> List U8
generate = \modulePositioned ->
module = modulePositioned |> Common.extractElement
[0x00, 0x61, 0x73, 0x6d, 0x01, 0x00, 0x00, 0x00]
|> concatSection 0x01 module.types generateFuncType
|> concatSection 0x02 module.imports (generateImportFactory module.types)
|> concatSection 0x05 module.mems generateMem
|> concatSection 0x0b module.datas generateData
We need to call the new function with module.types
and pass the resulting
function to concatSection
.
Before we implement the actual import generation, let’s create a fragile and annoying to debug unit test:
expect
# (module
# (import "wasi_snapshot_preview1" "fd_write" (
# func $fd_write (param i32 i32 i32 i32) (result i32))
# )
# )
result = generate {
position: { row: 1, column: 1 },
element: {
types: [
{
position: { row: 3, column: 48 },
element: {
identifier: "fd_write",
param: [
{ element: I32, position: { row: 3, column: 30 } },
{ element: I32, position: { row: 3, column: 34 } },
{ element: I32, position: { row: 3, column: 38 } },
{ element: I32, position: { row: 3, column: 42 } },
],
result: [{ element: I32, position: { row: 3, column: 55 } }],
},
},
],
funcs: [],
mems: [],
datas: [],
imports: [
{
position: { row: 2, column: 4 },
element: {
namespace: "wasi_snapshot_preview1",
name: "fd_write",
definition: Func "fd_write",
},
},
],
exports: [],
},
}
result
== [
0x00, 0x61, 0x73, 0x6d,
0x01, 0x00, 0x00, 0x00,
0x01, 0x09, # type section 1
0x01, 0x60, 0x04, 0x7f, 0x7f,
0x7f, 0x7f, 0x01, 0x7f,
0x02, 0x23, # import section 2
0x01, # One import
0x16, # namespace length
0x77, 0x61, 0x73, 0x69, 0x5f, 0x73,
0x6e, 0x61, 0x70, 0x73, 0x68, 0x6f,
0x74, 0x5f, 0x70, 0x72, 0x65, 0x76,
0x69, 0x65, 0x77, 0x31,
0x08, # name length
0x66, 0x64, 0x5f, 0x77, 0x72, 0x69,
0x74, 0x65,
0x00, # Func type
0x00, # function id looked up in the types table.
]
Note: This is even more annoying in a real Roc file where the auto formatter messes up the comments.
Now we just need to replace the TODO
in generateImportFactory
with a pretty
little pipeline that concatenates vector representations of the namespace and
name to the func identifier 0x00
. Finally, we append the index of the type by
looking it up in the typedDict
we generated at the top of the outer function.
\importDef ->
when importDef.definition is
Func identifier if Dict.contains typeDict identifier ->
[]
|> List.concat (generateVector (Str.toUtf8 importDef.namespace) \b -> [b])
|> List.concat (generateVector (Str.toUtf8 importDef.name) \b -> [b])
|> List.append 0x00
|> List.concat (generateU32 ((Dict.get typeDict identifier) |> Result.withDefault 999999 |> Num.toU32))
The test passes now, so I’m going to claim imports are working! Only two more sections to go! Unfortunately, I have just determined that I messed up on the Transformer a bit, so we’ll need to go back and solve that.
Fixing function types in transformer
When I wrote the transformer for functions, I neglected to add the function to
the types
table in the AST. This is partially because the WAT syntax I was using
doesn’t mention the types, and the sample we are coding to is a function with
zero parameters or return values.
I don’t want to take the time to handle creating functions with parameters, so I’m going to resort to a horrible hack.
Our transformFunc
type currently only works on functions that don’t have parameters
or responses. Here is the current implementation:
transformFunc : Transformer Func
transformFunc = \children ->
when children is
[Term { term: Variable identifier }, .. as funcInstructions] ->
funcInstructions
|> transformInstructions
|> Result.map \instructions -> {
identifier,
instructions,
}
_ -> Err [{ position: { row: 0, column: 0 }, message: "Unexpected Instruction" }]
The single possible success arm assumes that everything that comes after the identifier in a function definition is an instruction. In reality, you can also pass param and result elements in a few different formats. For example, here’s a simple add function in WAT syntax:
(module
(func $add (export "add") (param $a i32) (param $b i32) (result i32)
(i32.add (local.get $a) (local.get $b))
)
)
If we tried to plug this into our current transformFunc
, it would call
transformInstruction
on a param
element and transformInstruction
would
choke on it as an invalid instruction (which it is).
If we want to process param and result SExpression
s, we’d need to use various
List
functions to extract the param and result ones. What I’m going to do
instead is make this arm legitimately only handle functions that don’t have any
param or result parameters.
To do this I’ll need a little helper function:
containsParamsOrResult : List Parser.Expression -> Bool
containsParamsOrResult = \children ->
List.any children \child ->
when child is
SExpression { name: "param" } | SExpression { name: "result" } -> Bool.true
_ -> Bool.false
This returns true if any of the elements in the list of expressions is an
SExpression
with a param or result name.
Then I can update the match arm in the transformFunc
function to use this new
function as a guard:
[Term { term: Variable identifier }, .. as funcInstructions] if !(containsParamsOrResult funcInstructions) ->
Next, I need to make transformFunc
return a tuple of Func
and FuncType
.
Since we’re only handling arms where there are no params or results, we can set
the FuncType
to be one with the given identifier, but empty param and result
lists:
transformFunc : Transformer (Func, FuncType)
transformFunc = \children ->
when children is
[Term { term: Variable identifier }, .. as funcInstructions] if !(containsParamsOrResult funcInstructions) ->
funcInstructions
|> transformInstructions
|> Result.map \instructions -> (
{
identifier,
instructions,
},
{
identifier,
param: [],
result: [],
},
)
_ -> Err [{ position: { row: 0, column: 0 }, message: "Unexpected Instruction" }]
Then we need to modify the when
arm in the transformModule
function so that it knows how to handle the
tuple we just returned:
SExpression { position, name: "func", children: expressions } ->
mergeResult currentResult (transformFunc expressions) \module, (func, funcType) -> { module &
types: List.append module.types { position, element: funcType },
funcs: List.append module.funcs { position, element: func },
}
Notice the callback we pass to mergeResult
; I use destructuring to convert
the tuple back to func
and funcType
variables. Previously, this function only
updated the funcs
, List but now it updates types
as well.
I have one expectation failing now because it doesn’t have anything in the
types
array when parsing functions. I’ll let you solve that one on your own.
Hopefully if/when we get around to implementing other functions, this change will make it easy to add functions with parameters or results.
This refactor should put our Transformer in a good position to actually implement functions, but I’m going to leave that for the next article.