I’ve been curious about the Roc programming language for a while now, but I haven’t had/taken the time to really dig into it. I’ve read through the tutorial several times, but only yesterday did I actually sit down and install Roc and implement the tutorial. Today, I woke up with the ridiculous idea to build a compiler in the language.

A simple compiler, to be clear.

Note: Other articles in this series are collected here.

Another thing I’ve been wanting to learn more about is “What exactly is WebAssembly anyway?” I mean, I have a high level understanding, but I’ve never really dived into the various programs and binaries that actually interact with it.

So today I want to become familiar with two completely different ecosystems: Roc and Wasm, while doing something I’ve never done before: build a compiler from the ground up.

I want to make it clear that I have no idea what I’m doing. About 70% of the time, me not knowing what I’m doing turns into a decent blog article. The other 30% of the time they get deleted. In this case, it has turned into a (hopefully) decent multi-part blog series. I’m at 20,000 words and counting, so I decided to split it up into several articles. These will be released over the course of a few weeks, and you can preview the latest content on my Patreon account.

I should mention that because I don’t know what I’m doing, parts of this series include some red herrings and false starts. I usually try to edit those out while also acknowledging that real development is never as pristine as authors make it out to be. In this case, I’ve chosen to leave some in. I’ve seen a bizarre number of junior engineers ask if it’s “ok to search google” and let me tell you, I’ve been searching the web as I code for 25+ years and still do it every day. If you already know how to do something, it’s already been done so it’s not worth doing it again!

This article is an introduction to the Roc and Wasm ecosystems. Subsequent posts go into detail on getting Roc up and have way more code than I expected when I started out.

On monetary feedback

So far, I’ve been working between 10 and 16 hours a day on this project for one week. It’s only half done! If you value my content or the time I’ve put into it, please consider supporting me on Patreon or GitHub.

I’m currently (intentionally) unemployed / on sabbatical / pretending to start a company. This means I have a) free time for projects like this and b) no other source of income.

So if you think this content was worth something to you, a donation will be actively appreciated. Because my projects tend to cover a wide variety of topics that don’t strictly attract the same audience from one to the next, I suspect one-time payments on GitHub are more palatable than ongoing Patreon support, though I really appreciate the reliable ongoing Patreon income.

Enough begging, let’s get on with our preparations!

Some background reading

The Roc tutorial is one of the best-written introductions to a language I’ve ever read. Given that I have a hobby of reading and writing such tutorials, I consider those high accolades! I recommend reading through the tutorial before starting this series. Roc has some unique concepts that I haven’t seen combined in this way before, and this series will assume you have the passing knowledge of those concepts you would get by reading the tutorial (even if you don’t implement the examples).

On the topic of writing a compiler, I found a ridiculously approachable tutorial on building a compiler from scratch here. It compiles a simple Lisp-like syntax to Javascript. It is a concise, but comprehensive overview of the process that you can easily acquire in one sitting. This series doesn’t outright assume you have the level of knowledge that tutorial provides so you could skip it if you want.

I have spent a lot of time with my head in the Wasm spec. It’s pretty heavy reading and I don’t expect you to have read it to understand this tutorial. Wasm itself is pretty simple to understand, but the ecosystem has a lot of moving parts that are not as well-documented as they could be. Most of the documentation out there is about compiling existing languages (usually Rust) to Wasm, rather than understanding what is going on under the hood.

Necessary Tools

The wasm-tools package (brew install wasm-tools; probably similar for other operating systems and package managers) provides a binary for converting wat files to wasm files, as well as the inverse.

We’ll also need wasmtime (brew install wasmtime etc), a runtime for wasm files.

I imagine I’ll also be using, xxd and possibly nvim -b and :set display=uhex to introspect the binary formats.

I have also installed roc following the instructions.

Things I like about Roc

If you read through my archives, you’ll note I have a fairly long history of playing with random programming languages that nobody uses. I tend to read code like poetry and I appreciate languages that have a strong emphasis on aesthetics.

I’ve learned that I tend to like “functional but pragmatic” languages that rely heavily on pattern matching and have some unusual trait that makes them “different from all the rest.” Some of my favourite examples include Inko, Gleam, and Rescript, all of which I have written about in the past.

Roc has a fascinating approach to the “but pragmatic” part. The language itself is 100% side effect free, also known as “purely functional.” In general, this would mean it’s useless. Functional coding (sometimes) sounds good on paper, but in general, only programs that have side effects do anything useful.

So most functional languages have complicated and confusing escape hatches to document and manipulate side effects such as IO. Roc’s new and interesting idea is to leave the language purely functional, and force the programmer to choose a runtime that includes the specific side effects you are interested in. Roc by itself can’t even write to Stdout, but if you use the basic-CLI platform, you can call into the platform to do certain side effects useful when writing CLI tools, including accessing the filesystem, arguments, and stdout.

These platforms can be written in any language so long as they can conform to Roc’s C ABI. The most common platforms are written in Rust (as is the Roc compiler itself), but there are examples in Zig, C, C++, Go, Python and others.

The other thing I really like about Roc is that it is very simple and elegant. It is strongly and soundly typed, but the functional designs permits incredibly powerful strong typing to the point that writing Roc feels as dynamic as writing Python.

The thing I don’t like about Roc

It’s young. Sometimes the complier outright crashes or even segfaults. Frequently, the error messages are obtuse or non-existent (though the errors that the dev team have put effort into making friendly are a real joy to read). I’ve encountered random issues like “debugging a value causes a Rust-level panic,” “unit test works unless there are other files in the project,” and “Hey, that match pattern isn’t valid even though actually yes it is!”

But these are teething issues, and not core problems with the language itself. If development continues, I really think this language has a lot of potential someday. It’s much easier to reason about than OCAML or Haskell, has the developer velocity of Python or Javascript, and claims performance on par with Go or C# (but not C++ or Rust).

A brief Introduction Wasm and WAT

Possibly because of the poor choice of name, people tend to think of WebAssembly as “native code but for browsers.” It’s neither of those things. WebAssembly is more like a bytecode format and runtime, kinda similar to the bytecode formats used by Java and Python. The primary difference is that Wasm bytecode is easy to compile to native code on demand. This has a few benefits:

  • The Wasm runtime can transpile it to native instructions efficiently with excellent performance after it is delivered to the target computer.
  • Compilers that traditionally target native instructions can instead compile to Wasm with relatively little effort and get guaranteed portability.
  • WebAssembly can provide a “sandbox” that prevents malicious code from accesing operating system functionality (e.g. creating files) it hasn’t been permitted to. This is kind of like Roc’s platform architecture!

As for the “for browsers” part, Wasm doesn’t need to run in a browser and indeed, the WASI (WebAssembly System Interface) project allows you to bundle applications that interact with command line and operating system operations.

One of the reasons Wasm and WASI excites me is that it can be compiled once and run anywhere without the use of Docker. I truly despise Docker, and the idea that future projects can be built in a truly cross-platform ecosystem that runs anywhere without Docker is super motivating.

The main problem with Wasm, though, is that it is bytecode that is designed to be delivered to and interpreted by machines. That’s fine, but it means that one is extremely unlikely to write Wasm directly!

That’s where WAT comes in, the WebAssembly Text format. Admittedly, WAT is not a language you are likely to write directly either, but you can transpile back and forth between WAT and Wasm bytecode to visualize it.

Playing around with WAT syntax

It’s not normal to code directly in WAT, which is really no more than a textual format of an abstract syntax tree. Instead you’d write your code in AssemblyScript, Rust, C, or one of many other languages. Even Roc can compile to Wasm!

But since I’m building a WAT-to-Wasm compiler, I need to write my input and test code in WAT. Finding examples took a bit of digging, so I wanted to collect them in one place.

The simplest possible Wasm code is an empty module.

It looks like this in WAT:

(module)

This is an S-expression. An S-expression is basically a bunch of parenthesis describing an abstract syntax tree. Each expression starts with the name of the node inside parenthesis, followed by any child nodes and attributes. Think of Lisp, if thinking of Lisp is a thing you do.

The above code compiles to an eight-byte sequence, where the first four bytes are a “magic string” identifying this as a wasm binary and the next four are the Wasm version (1):

❯ wasm-tools parse main.wat > main.wasm

❯ wasm-tools dump main.wasm
 0x0 | 00 61 73 6d | version 1 (Module)
     | 01 00 00 00

Hello world is a little more complicated. Wasm doesn’t ship with a standard library. By itself it can’t even do basic IO (just like Roc!). So we need to depend on external libraries… but Wasm itself doesn’t support imports!

In Wasm, imports need to be handled by whatever host is loading the Wasm files. I’m using Wasmtime, which has in-built support for the WASI specification. So we can add an import from the wasi_snapshot_preview1 module and it just works, without having to hook it up in the runtime.

“Hello World” in WAT syntax looks like this:

(module
    (import "wasi_snapshot_preview1" "fd_write" (func $fd_write (param i32 i32 i32 i32) (result i32)))

    (memory 1)
    (export "memory" (memory 0))
    (export "_start" (func $main))

    (data (offset (i32.const 8)) "hello world")

    (func $main
        (i32.const 0)
        (i32.const 8)
        (i32.store)

        (i32.const 4)
        (i32.const 11)
        (i32.store)

        (i32.const 1)
        (i32.const 0)
        (i32.const 1)
        (i32.const 20)
        (call $fd_write)
        drop
    )
)

This is if we want to build a WASI binary to run as a command-line program. If we were targeting a browser (or Node), we would be able to import and call console.log from the Javascript runtime instead.

I won’t show the binary output of wasm-tools dump on this file as it’s virtually unintelligible, but I verified everything is working with these two commands:

❯ wasm-tools parse hello.wat > hello.wasm

❯ wasmtime hello.wasm
hello world

In future articles, our task will be complete when we can replace wasm-tools parse hello.wat > hello.wasm with roc run hello.wat and get the same output.

But I think that’s enough introduction for today. I know we haven’t gotten to any meaty code, yet, but I promise it is coming in future articles!