Build a Wasm Compiler in Roc - Part 2
My latest hair-brained project is a WAT-to-Wasm compiler written in the Roc programming language. I explained my (ir)rationale for the project in Part 1 of this series, which also included an introduction to the technologies we’ll be using.
Note: Other articles in this series are collected here.
In this article, we’ll get started writing some Roc code. We won’t get to the point where we are doing anything with WAT or Wasm, yet, but we will be able to load an input file and parse some command line arguments.
Reminder: You are reading content that took a great deal of effort to craft, compose, and debug. If you appreciate this work, consider supporting me on Patreon or GitHub.
Let’s Write Some Roc
I have no idea if Roc is suitable for this project. I started it because I wanted to write some Roc code. The WAT to Wasm project was just an idea that popped into my head to do that. I just wanted to try out Roc.
In their own words, Roc is a fast, friendly, functional language. They have delightful definitions of those words on their homepage, and their tutorial is one of the most pleasant reads I’ve come across in the many programming language tours I have travelled. I am not necessarily going to explain every piece of Roc syntax that we encounter, so, as I suggested in part 1, you may want to read through that tutorial before continuing here.
A basic “Hello world” in Roc looks like this:
app [main] { pf: platform "https://github.com/roc-lang/basic-cli/releases/download/0.12.0/Lb8EgiejTUzbggO2HVVuPJFkwvvsfW6LojkLR20kTVE.tar.br" }
import pf.Stdout
main =
Stdout.line! "Hello World!"
The first line tells Roc that we want to run our functional code within a specific version of the “basic-cli” platform. This platform is a runtime that knows how to read and write from standard out and the file system, among other things. Roc compiles your Roc code to machine code and merges it with an arbitrary platform to create a single binary. There is an interesting symmetry with Wasm itself, where you need some kind of host to compile and run the Wasm code and to manage system side effects.
In addition to supplying a runtime, the platform includes some side-effect
causing code that we will need to access. The first line gives our platform an
alias of pf
and the import statement imports the Stdout
module from pf
.
This module contains a function named line
, which returns a Task
. A Roc
Task is kind of like a promise, future, coroutine, or task in many other
language. The !
in the Stdout.line
call is syntactic sugar for “await
this task”.
If this has you concerned about the colored function problem, don’t worry. There’s only one way to access a Roc task (at least, within any one platform), and the blocking vs non-blocking question doesn’t come up.
The main =
task is known in Roc as a definition. It assigns the results of
an expression to a name (main
in this case). The basic-cli platform expects a
main
def to be a task, and this becomes the main entry point for the Roc
program.
To run the above code, save the file as main.roc
and use the command roc dev
. This will build the file in dev mode and run it.
Before we get into the details of actually compiling WAT code, we need to load that code into our program. To do that, we’ll need to accept a single filename and read that file from the file system. Once we have the file contents as a string, we can pass that to our yet-to-be-written compiler logic and do everything in a purely functional style. Eventually we’ll get the compiled Wasm bytes back and need to output those to a file.
CLI Arguments
Let’s start with the CLI arguments. The basic-CLI platform gives us a function that returns a Task, as described here.
To start, let’s ask the platform to output the list of arguments the user
passed. We can use the list
function from the Args
module like this:
import pf.Task
import pf.Arg
main =
args = Arg.list! {}
dbg args
Task.ok {}
Note: I’ll leave the “app [main]” line that imports the platform off this and future examples.
This code makes an args
definition by calling the Arg.list!
task exposed by
the platform. We have to pass an empty record to this function, which is what
the {}
is on that line.
Then we use the dbg
builtin to output a string representation of the
arguments on the command line. dbg
is the only (roc) platform-independent way
to view an object on stdout, and it is stripped from production builds.
Finally, we set the “result” of this main definition to the output of the
Task.ok {}
function. The platform expects the main
definition to always
return a Task. In this case, there’s no failure case, so we just return a Task
with an empty ok value. Introspecting using the LSP, I can see that the type
of main
is:
Task.Task {} []
The platform
documentation defines
a Task as Task ok err
and it “represents an effect; an interaction with state
outside your Roc program, such as the terminal’s standard output, or a file.”
In this case, we are returning a task with an empty record as the ok value, and an empty list of errors. This basically means our program can’t fail.
If I run the above program using e.g. roc dev -- foo
, the output looks like this:
[main.roc:11] args = ["/var/folders/y4/j6_s7k615dd1vywjbg4jpg3m0000gn/T/.tmpWqPjHI/roc_app_binary", "foo"]
The first argument in the list is a temporary path to the dev-mode binary Roc
built for me, and the “foo” is the single argument I passed after --
(anything before that would be interpreted as an argument to the roc dev
subcommand).
Before we start introducing potential errors, a good policy in Roc development is to ensure all errors are properly handled. By default, Roc will bubble up any errors to the platform and it will output some kind of traceback that is probably not what a user wants to see. To catch this behaviour at compile time, we can add a type definition to the main task like this:
main : Task.Task {} [Exit I32 Str]
main =
# ...the rest of the definition
That new line is adding a type to the definition. It means main will always
contain a Task
with an empty ok
value, and an error
value that will always
be an exit Tag with an integer error code and a string interpretation.
Note: main is technically not a function. It is a task. It doesn’t accept arguments, and even though it is a multi-line definition, the code in it cannot be “called” other than when the main definition is loaded by the basic-cli platform.
Tags are a fascinating and unique concept in Roc. They are something like algebraic data types, or variadic types in some other languages, but they feel more… “dynamic”. They are statically typed at compile time, but the type inference is so awesome that it feels like writing dynamic code. Soundly typed dynamic code? Sign me up for more!
A tag is a literal value, similar to an Integer or a String. All tags must start with a capital letter; any time you see a capital letter in a value it is a tag (Note that capitals are also used in types).
The difference from most languages I’ve used is that they group themselves implicitly. You don’t have to define a variadic enum like in e.g. Rust:
enum StringResult {
Ok(String)
Err(String)
}
Instead, you can just return an Ok
or an Err
and Roc automatically infers
that this is the return type of your function. And it still soundly type-checks
everything! You can restrict tags to certain types when it is useful, but
it’s not required.
Consider a well-formed main definition that expects exactly one command-line argument:
main : Task.Task {} [Exit I32 Str]
main =
args = Arg.list! {}
when args is
[] | [_] -> Task.err (Exit 1 "No input filename provided")
[_, _, _, ..] -> Task.err (Exit 2 "Too many arguments")
[_, filename] ->
dbg filename
Task.ok {}
The when cause is an elegantly sparse syntax for pattern matching. The first pattern matches the case when there are zero or one arguments (since the command to run the program should always be the first argument, the former is technically impossible, but we cover it to satisfy the compiler).
The second match arm matches the case where there are three or more arguments.
It returns a Task.err
constructed by the Task.err
function. Notice the
parentheses after Task.err
. Unlike many languages (but similar to Haskell),
these are not defining the parameter list for the Task.err
function. They are
just used to add precedence. If we didn’t include them, Roc would throw an
error because it thinks we are passing two arguments (Exit 2
and Too many arguments
) to the Task.err
function, as opposed to a single Exit
tag with
a payload that contains two values.
Now, change the dbg
builtin to use a call to Stdout.line
to output the argument:
[_, filename] ->
Stdout.line! filename
I’ve also removed the Task.ok
. Stdout.line!
returns a task, so we can
almost just return that task from this arm, as shown. However, Roc complains
with an interesting error message:
── TYPE MISMATCH in main.roc ───────────────────────────────────────────────────
Something is off with the body of the main definition:
9│ main : Task.Task {} [Exit I32 Str]
10│ main =
11│ args = Arg.list! {}
^^^^^^^^^^^^
This Task.await call produces:
Task {} [
Exit (Num *) …,
StdoutErr [
BrokenPipe,
Interrupted,
Other Str,
OutOfMemory,
Unsupported,
WouldBlock,
WriteZero,
],
]
But the type annotation on main says it should be:
Task {} [Exit I32 …]
────────────────────────────────────────────────────────────────────────────────
This message reflects the fact that the Stdout.line
call can potentially fail
(though it’s pretty unlikely). The documentation
for the function indicates that it returns a Task {} [StdoutErr Err]
.
Since this is the last expression in the match arm, we are “returning” that potential
error from main
, and it is complaining because [StdoutErr Err]
is not an accepted
type for the main
definition.
The easiest way to silence this error is to comment out the main : Task.Task {} [Exit I32 Str]
type definition and let the platform deal with the output if
StdoutErr
is returned. But that would allow our main
function to return any
error type with a Task
, making it too easy to forget to handle other errors
ourselves.
An alternative would be to add StdOutErr _
to our main
type definition like this:
main : Task.Task {} [Exit I32 Str, StdoutErr _]
main =
# Rest of main
The square brackets in the second type argument to Task.Task
indicate that
main’s result is now a Tag union. Previously, that Tag union only allowed one
type (Exit I32 Str), but now it also accepts StdoutErr _
. The _
acts as a
wildcard; there are over half a dozen different possible StdoutErr
varienties, which show up in their own tag union. In this case, I’m saying “I
don’t care what type of StdoutErr
you receive, but you should expect any that
could arise.
Given that StdoutErr
is a fairly unlikely event in normal operation, we might
be ok with the default handling if it ever happens in real life. But another
alternative is to stick with the single Exit I32 Str
error definition, and
use the Result.mapErr
function when we call Stdout.line
, like this:
main : Task.Task {} [Exit I32 Str]
main =
args = Arg.list! {}
when args is
[] | [_] -> Task.err (Exit 1 "No input filename provided")
[_, _, _, ..] -> Task.err (Exit 2 "Too many arguments")
[_, filename] ->
Stdout.line filename |> Task.mapErr \_ -> (Exit 99 "System is failing")!
That Stdout.line
line has changed quite a bit. Instead of awaiting the task,
we are using the |>
operator to “pipe” it to become the first argument in a
call to the Task.mapErr
function. The second argument is an anonymous
function that takes the existing error (which we ignore with _) and returns a
new error that obeys our Exit I32 Str
type. Finally, we do await this new
task (the one that has had its error mapped) by placing the !
at the end of
the line.
Reading the file
Now that we have the filename, we can attempt to read its contents into a
string. As with Stdout.write
, this is a task that the basic-cli platform task
does for us. Since it’s messing with the filesystem, it can result in a variety
of errors (some of which are more likely than Stdout.write
errors).
We’ll need to add an import to the top of our file, first:
import pf.File
Now we can replace the success arm in our main
def with one that reads the
file. This pipeline handles errors by mapping them to exit codes I made up on
the fly:
[_, filename] ->
(
filename
|> File.readUtf8
|> Task.mapErr \error ->
when error is
FileReadErr _ NotFound ->
Exit 3 "$(filename) does not exist"
FileReadErr _ _ ->
Exit 99 "Error reading $(filename)"
FileReadUtf8Err _ _ ->
Exit 4 "Unable to read UTF8 in $(filename)"
)!
|> Stdout.line
|> Task.mapErr \_ ->
(Exit 99 "System is failing")!
Roc’s error messages are sometimes extremely friendly and helpful, and sometimes leave a lot to be desired, so this code took me a lot of time to get right. I knew what layout I wanted, but getting the syntax and compiler to cooperate was about what I would expect from such a young language. The errors that have been mapped to friendly responses are very friendly, though, so I have a high opinion of the dev team’s respect for its users.
There isn’t really anything new here; I set up a pipeline that reads
the file contents and outputs an appropriate error if it doesn’t exit. Other
errors just get bucketed in a couple “I give up” error codes. The parentheses
wrapping the “read” task were the ones that caused so much trouble; I knew
I wanted to pipe the result out to Stdout.line
, but getting Roc to pick up
the !
to await the result at the right time was tricky.
At any rate, I can run roc dev -- hello.wat
and see the contents of that file
on my terminal now. Let’s also write the code to output our result to a file
with a similar name.
Writing the result
Let’s start by throwing together a “dummy” compiler
function that just
returns an arbitrary list of bytes:
compile : Str -> List U8
compile = \input ->
dbg input
Str.toUtf8 "TODO: Compile Input"
The first line is optional and just specifies the intended types for the function. Roc is able to infer all these types on its own, but by specifying it ourselves, we are able to both:
- document what the function does
- have the Roc compiler validate that the implementation matches our expectations
We can test this function by piping the output of our readUtf8
task to the
compile function and outputting it. The tail end of the pipeline in main
now looks like this:
|> compile
|> Str.fromUtf8
|> Result.withDefault "Unable to convert UTF-8"
|> Stdout.line
|> Task.mapErr \_ ->
(Exit 99 "System is failing")!
The first three lines are new; I’m piping the output of the File.readUtf8
task to the compile function, then piping the resulting string to
Str.fromUtf8
. This operation returns a Result because it can fail (although
in our case we know it won’t because we are certain we just wrote valid UTF-8).
The Roc standard lib doesn’t have any sort of unwrap
functionality, so I
passed it to Result.withDefault
instead.
I confirmed that it is outputting my dummy string on the console, so the
compile
function seems to be working. Not correctly, but it’s returning a
list of bytes, which is enough for us to write the file.
The next step is to replace the bit that writes the bytes to stdout with something that writes them to a file. I’m liking this pipeline syntax, but the functions for writing bytes to a file take the path to be written to as the first argument. This means we can’t pipe it directly (I wish Roc had a feature to pipe to a specific argument in the function call like Gleam does).
We could just assign the bytes to a variable and write them in a separate step, but instead I’m going to write a new top-level function to handle this:
writeWithWasmExtension : List U8, Str -> Task.Task {} [Exit I32 Str]
writeWithWasmExtension = \bytes, inputFilename ->
outputPath =
inputFilename
|> Path.fromStr
|> Path.withExtension "wasm"
outputPath
|> Path.writeBytes bytes
|> Task.mapErr \_ -> Exit 5 "Unable to write $(outputPath |> Path.display)"
Note: This code needs a new
import pf.Path
import at the top of the file.
This code accepts the file contents as the first argument to fit in with our
pipeline. The second argument is a string containing the input filename. The
return type is a Task
.
In the body of the function, I set up two pipelines. The first constructs a new
Path
from the inputFilename
and replaces the extension to get the output
path. I capture the output path in a variable so I can reuse it in the
Task.mapErr
call in the second pipeline. We pipe the path into
Path.writeBytes
, passing the bytes as the second argument. This returns a
task, which I map to a system exit suitable to be returned from main
. It is
debatable whether this is good design; it means that this function is only
useful in the context of main
, but it also means the main
task doesn’t need
to be cluttered with more mapErr
calls. Since I don’t intend to use this
function anywhere else, I chose to include the mapErr
in the function.
The result is that all the code I added to main earlier to call compile
and output
it to Stdout
can now be replaced with just two lines:
|> compile
|> writeWithWasmExtension filename
Now here’s the cool part: We just wrote the last side effect for this program!
The compile function itself will be 100% functional; given the same input it
will always have the same output. We have a lot of Roc code to write, but we’ve
isolated the bits that need to interact directly with the platform to just two
defs (main
and writeWithWasmExtension
).
In the next article, we’ll (start to) build a compiler!