A Tutorial Introduction to Gleam -- Part 2

Introduction

This is the second in a series of articles exploring the Gleam programming language. The first article explored some of the most basic features of Gleam; just enough to say hello.

Hello is basically the first thing we learn in any language (whether human or programming). This article explores looping in Gleam. More specifically, it explores the fact that Gleam doesn’t have any looping constructs.

That’s right: none.

Patreon

If you find my content valuable, please do consider sponsoring me on Patreon account. I write about what interests me, and I usually target tutorial style articles for intermediate to advanced developers in a variety of languages.

I’m not sure how far I’m going to go with this series on Gleam, so if you’re particularly interested in the topic, a few dollars a month definitely gives me a bit of motivation to keep it up.

Immutable and Functional

To me, Gleam feels much more elegant than many of its predecessors in the functional programming wold, but at its core, it’s still a functional language. The defining feature of a pure functional language is that calling any function, given the same inputs, will have exactly the same effect. Another very common feature is immutable values.

That’s really just a bunch of long buzzwords to say “Nothing ever changes.” You can’t make a variable point to a new value, append to a list, or set a field on a record.

Considering that core purpose of a computer is to change electronic states several billion times a second, this may sound unintuitive. But actually, those states are not changing several billion times a second. Rather, it is creating a brand new state several billion times a second. And each state can be completely defined in terms of its previous state.

CPUs don’t know anything about functions or loops or if statements or any of the other niceties you have grown used to. All of that is just stuff humans put on top to make it easier to think about. Some people (strange people, in my opinion) think that loops do not make code easier to think about. Gleam programmers appear to be in that set of people.

Consider functions. In Gleam, the return value is always exactly the same if the arguments are the same. In Python, if you call len() on the same list before and after you add an element to it, len() will return different values (len is short for length if you don’t know Python. Apparently six letters is just too long for such a commonly called function).

In Gleam, if you call length on a list, it will always return the same number, because that list isn’t gonna change. It’s more stubborn than your crazy uncle (If you don’t have a crazy uncle to be stubborn at you, borrow someone else’s uncle, or substitute a crazy aunt). If you append something to the list you will get a new list. The old one is still there, and calling length on it will return the same value it always did. The new one will have a different length, of course, but its length ain’t gonna change either.

Append enough elements to a list and you’ll have met my extended family. We’re all pretty stubborn, to be honest. And crazy.

Iteration Matters

Excluding Go, which is satisfied with exactly one loop keyword, imperative languages typically have at least two broad classes of loops:

those that perform some action on every item in a collection. for ... in or for each are common syntaxes.
those that perform some action repeatedly until some condition is true. while or until are common keywords.

In functional languages such as Gleam, the first category is typically solved by calling a function on each item in the list, so no language statements are required to make shit happen. Basically, the body of the loop is a function; otherwise it’s the same thing. The second is usually solved using some form of recursion; calling the same function repeatedly from itself until the condition is met.

Data Structures Matter

The immutable nature of Gleam means that certain common operations use completely different underlying data structures. Consider lists. Most (imperative) languages implement their basic list or array data structure as a sequence of elements in memory one after another. It usually reserves some extra space “just in case” an element needs to be added to the array. If it didn’t do that, it would have to copy the whole damn thing to a different location in memory every time you added something. better to reserve, say, 2x the space the next time you copy it so you can add stuff for longer before copying it again.

It’s not like it’s that much memory, right?

But Gleam data structures are immutable. If you have a list of 10 elements, then you have a list of 10 elements. You can’t add an element to the list. You can create a new list with 11 elements in it, 10 of which are also in the old list. So there’s no point in reserving extra space.

But man, copying a bunch of data to a new piece of memory every time you want to add something to it doesn’t seem terribly performant, does it?

This is why the Erlang (which Gleam transpiles to) virtual machine uses linked lists. I tend to think of linked lists as being freaking useless unless you are coding an operating system or an interview at a company that thinks linked lists are all the rage. You’d think they were especially useless in an immutable language; if you aren’t allowed to manipulate the links, why bother?

Erlang is sneaky. I’m not saying it’s lying or anything. It just tends to cut corners. You see, if the “old” list can’t be changed, the new list can just be all like, “ok, I’m gonna point at this new element I’m adding, and it can point at the old list”. So it’s not even a list. It’s kind of a tree with lots of pointers to different heads that converge on a single tail.

Ugh, I’m having flashbacks to when I was writing blockchains. Yuck. If I wasn’t so grossed out, I’d draw a picture for you.

Note: I actually don’t know how Erlang stores this crap in memory. It’s probably sort of like I’ve described but more optimized. I’m just making it up to give you a mental model that works for me.

Anyway, this means there only needs to be one empty list, or one empty map. In the whole world. Since you can’t add anytihng to that empty thing, you know it’s not going to change. It’s just a place to start from when you create a new list from by adding something to the empty one.

Something Cool About Functions

This article is supposedly about looping, but Gleam doesn’t have any loops. (it’s also supposed to be about coding, but we haven’t written any code yet. I’m not a pathalogical liar, I swear).

So let’s talk about functions instead. I did that a bit in the previous article, but it was mostly just about how you can use the pipe to put the first argument of the function on the left side of the function name instead of inside parens on the right (This is a lot more useful than it sounds. Try it sometime).

Like all programming languages, functions can take aguments and return values. Like all programming languages, functions can take aguments and return values. Like some languages, those arguments are strongly typed (you can’t pass a string where an integer is expected), but you don’t have to specify the type if Gleam can infer it (but you should anyway, because a human in a hurry to fix a huge bug probably can’t infer it, at least not quickly).

Like no language I’ve ever seen, Gleam allows you to use different labels for the calling function from the value names used inside the called function.

For example (holy shit, he’s about to write some code!), consider this not-at-all-contrived example:

import gleam/io
import gleam/list
import gleam/int

pub fn multiply_nth_element(the_list: List(Int), index: Int, multiplier: Int) {
  assert Ok(element) =
    the_list
    |> list.at(index)
  element * multiplier
}

pub fn main() {
  let some_list = [1, 2, 8, 4, 3, 2]

  some_list
  |> multiply_nth_element(2, 4)
  |> int.to_string
  |> io.println
}

There’s nothing new here yet. There’s a function called multiply_nth_element that extracts an element from a list and multiplies it by a passed in value.

The names of the valiables all kinda make sense inside the function, but the caller might want to use different labels. So Gleam allows us to write this instead:

import gleam/io
import gleam/list
import gleam/int

pub fn multiply_nth_element(
  in the_list: List(Int),
  at index: Int,
  by multiplier: Int,
) {
  assert Ok(element) =
    list.at(in: the_list, get: index)
  element * multiplier
}

pub fn main() {
  let some_list = [1, 2, 8, 4, 3, 2]

  multiply_nth_element(in: some_list, at: 2, by: 4)
  |> int.to_string
  |> io.println
}

Note three things:

The multiply_nth_element function signature now has two “names” for each argument. in is used outside the function, and the_list is used inside. And at is used at the callsite, but index is used in the function body. I’ll leave by/multiplier as an exercise to the reader (it won’t be a tough workout).
Th list.at call has switched from pipe syntax to labelled syntax. This works because list.at is defined with labelled arguments, too.
The multiply_nth_element call has changed from piping some_list into the function to using labeled arguments, too.

The Gleam language reference indicates that this can be used to write Gleam code that looks more like English (or any preferred human language, come to think of it). It’s kind of neat, but I’m not sure if I like it yet. As a person who has to call a function, I now have way too many options. I’m a Python programmer. I like “one– and preferably only one –obvious way” mentality. Now, given the multiply_nth_element function, it’s extremely not obvious which of these I should use (they all work):

multiply_nth_element(some_list, 2, 4) (positional arguments)
some_list |> multiply_nth_element(2, 4) (pipe first arguments)
multiply_nth_element(in: some_list, at: 2, by: 4) (labeled arguments)
multiply_nth_element(at: 2, in: some_list, by: 4) (labels can be in any order)
some_list |> multiply_nth_element(2, by: 4) (How about all of the above?)

Anyway, it’s a thing that Gleam has. Moving on…

Looping For a Count

One of the most common looping mechanisms is to repeat a task a certain number of times. In C-like languages, it looks like for (int i = 0; i < 5; i++), which is frankly one of the most obnoxious syntaxes I’ve ever seen anywhere (that’s one reason I don’t like Go as much as I should; it’s the only looping construct the language has). Python makes it look a little better by moving the counting part into a function named range. This looks like for i in range(5).

In both cases, the loop “body” comes immediately after the syntax, and it’s implied that the body executes once for each iteration. Gleam is very similar to the Python version, except that the loop “body” is now a function call instead:

import gleam/io
import gleam/int
import gleam/iterator

pub fn main() {
  iterator.range(0, 5)
  |> iterator.map(fn(x) { io.println(int.to_string(x)) })
  |> iterator.run()
}

First, note the iterator.range(0, 5). That produces an iterator of five values, just like Python’s range (it could be iterator.range(from: 0, to: 5) if you wanted to be very explicit in the labeling). The resulting iterator gets passed into iterator.map using a pipe. This map call converts the existing iterator into a new iterator by calling an anonymous function (labelled with) on each of the elements returned by range.

The new iterator will contain five Nil elements. Can you see why Nil? Every function in Gleam returns the last evaluated expression. In this case, that is the result of io.println, which returns Nil.

io.println is one of those functions that functional purists hate because it has side effects (printing to the screen). The run function exists on iterators to trigger such side effects. Until run is called, all the other iterators only exist in potentia. The first one, iterator.range has the potential to generate five values, but it won’t generate any until run is called. That iterator gets consumed by a interator.map function which has the potential to generate five Nils, but not until run is called.

But when run gets called, look out! io.println will put the iterated numbers on the screen, just as expected.

The Gleam code is admittedly a bit harder to read than equivalent Python:

for i in range(5):
    print(i)

There are a few reasons for the difference in legibility, but it’s not really because of functional programming. One is Python’s dynamically typed nature. The print function in Python kindly converts anything we pass to it into a string so we don’t have to do it manually. Python’s lack of braces also contributes to readability in this case. Another is that I format all of my examples with the Gleam formatter before I post them here, and in this case, the formatter is making the code harder to read (in my opinion) than it needs to. If I change the imports to bring the necessary functions into scope and then format the code the way I think it should be formatted, it looks like this:

import gleam/io.{println as print}
import gleam/int.{to_string as str}
import gleam/iterator

pub fn main() {
  iterator.range(from: 0, to: 5)
  |> iterator.map(fn(x) {
      print(str(x))
  })
  |> iterator.run()
}

Still not as elegant as Python, but it’s not any worse than Java:

public class Main {
  public static void main(String args[]) {
    for (int i = 0; i < 5; i++) {
      System.out.println(
        Integer.toString(i)
      )
    }
  }
}

(I haven’t written Java in well over a decade, and dry-coded the above from scratch based on ancient memories, so forgive me if I got it wrong).

Looping over elements in a list

The previous example showed that the loop construct in Gleam is a function call rather than a statement or an expression. Kind of a weird concept, but it works well, especially when use pipes effectively. In fact, any list of values can be converted to an iterator and then iterated over in a similar way. Consider a simple function that prints each of the command line arguments on a separate line:

import gleam/io
import gleam/erlang
import gleam/iterator

pub fn main() {
  let arguments = erlang.start_arguments()

  arguments
  |> iterator.from_list
  |> iterator.map(io.println)
  |> iterator.run()
}

Recall start_arguments from Part 1 of this series, which presents a list of strings. We pipe this into from_list, which creates an iterator. Then we use the same map and run technique we used in the last one. Note how I simplified the map to just use io.println instead of creating an anonymous function. You can use any function as a map callback so long as it accepts the appropriate type.

A problem with this technique is that you can’t keep track of state between iterations. This is where things can get a bit interesting. Consider the following imperative Python code:

arguments = ["Hi", "I'm", "a", "string"]

count = 0
for argument in arguments:
    print(f"{count}. {argument}")
    count += 1

Yes, I know Python has enumerate() to do this more elegantly, but that would give away the secret too early. Python is primarily an imperative language, but many of its functional constructs are quite nice.

There are a few ways we could solve this using Gleam. The key is to look at the iterator documentation for the gleam standard ibrary. There are a whole bunch of different functions you can use to perform something on “each” element, possibly passing extra state for them. For example, we could use the fold function, which includes some “accumulated” state with each iteration. I don’t know why it’s called fold; I suspect it’s got something with mixing an egg into waffle batter.

pub fn main() {
  let arguments = erlang.start_arguments()

  arguments
  |> iterator.from_list
  |> iterator.fold(
    0,
    fn(count, x) {
      io.println(string.concat([int.to_string(count), ". ", x]))
      count + 1
    },
  )
}

Unlike map, which takes a list and a function that accepts one argument, the fold function takes a list, an initial state, and a function that accepts two arguments. That function then returns (the return value is the last expression in the function) the new state. In this case, our state is an integer. Initially it is zero, and we return it incremented by 1 for every function call processing an element in the list.

The state can be anything. Here’s a similar example that stores the list of strings to be printed in the state and only calls io.println at the end:

pub fn main() {
  let arguments = erlang.start_arguments()

  arguments
  |> iterator.from_list
  |> iterator.fold(
    [],
    fn(output, x) {
      let count = list.length(output) / 4
      output
      |> list.append([int.to_string(count), ". ", x, "\n"])
    },
  )
  |> string.concat
  |> io.print
}

Wow, that’s pretty long (and I even snipped out a bunch of gleam imports that you’ll have to find for yourself. Hint: import gleam/<something>). In this example, the “state” is a list of strings. Each call to the function adds four strings to the returned state:

the count
a . separator
the argument
and a newline

We still need the count of the number of inputs, so we divide the length of the big list by 4. Then we call list.append, which merges two distinct lists into one and returns the new bigger list. That result gets used as state in the next function call.

Once all the elements are processed, the resulting list is passed to string.concat and that result is piped to io.print. Note that it is io.print instead of io.println since we appended a trailing newline already while constructing the string.

Note that the list.length function needs to iterate over all the elements in the list every time, so it’s not very efficient. You could combine the two above methods by making the state passed in use a tuple of #(count, output):

pub fn main() {
  let arguments = erlang.start_arguments()

  arguments
  |> iterator.from_list
  |> iterator.fold(
    #(0, []),
    fn(state, x) {
      let #(count, output) = state
      let state =
        output
        |> list.append([int.to_string(count), ". ", x, "\n"])
      #(count + 1, state)
    },
  )
  |> pair.second
  |> string.concat
  |> io.print
}

Gleam Tuples

By the way, tuples in gleam are specified as #(val1, val2, ...). They can be heterogenous (the diferent values inside don’t have to be the same type). You can access individual values on them using my_tuple.0 or my_tuple.1, or you can extract all the values using a let pattern match such as let #(val1, val2, val3) = mytuple. These both kind of mess up the pretty pipeline syntax, so instead, I used the https://hexdocs.pm/gleam_stdlib/gleam/pair.html standard library module to extract the second value. The count isn’t relevant anymore.

Use the index function to enumerate

I was thinking of exploring some of the other methods to solve this problem, such as using reduce kind of like fold, or using zip to mix an iterator with a range of the same length. But I got bored, so let’s just skip to the correct solution: a function that knows how to add indices to the list already (like Python’s enumerate; the secret I didn’t want to give away).

In Gleam, it’s called index instead of enumerate, but it works the same. It consumes an iterator and spits out a new iterator over tuples of #(index, value). Here’s an example:

pub fn main() {
  let arguments = erlang.start_arguments()

  arguments
  |> iterator.from_list
  |> iterator.index
  |> iterator.map(fn(x) {
    io.println(string.concat([int.to_string(x.0), ". ", x.1]))
  })
  |> iterator.run
}

This time I’ve gone back to doing the println side effect inside the map function.

Continue

Most imperative languages have a continue statement that allows you to end a loop early. In fact, continue is my favourite keyword because it means the same thing and takes the same form in virtually every language that has it.

The continue keyword is typically used in one of two situations:

filtering: You are looping over elements, processing them, and adding a subset of the elements to another list.
early “returns”: You have determined that the operation in question does not apply to the current state of the loop, so there’s no further processing required. This usually happens with corner cases, failures, or null responses, for example.

Both of these are modelled easily using Gleam’s functional tools. Filtering is especially easy, as there is an aptly-named iterator.filter function that you can use. I’m having trouble coming up with a reasonable example, so let’s imagine something that filters out odd numbers (They’re called “odd” for a reason. I’m odd, too.):

import gleam/io
import gleam/iterator
import gleam/string

pub fn main() {
  iterator.range(0, 50)
  |> iterator.filter(fn(x) { x % 2 == 0 })
  |> iterator.map(int.to_string)
  |> iterator.map(io.println)
  |> iterator.run
}

This example has a few interesting points to discuss. The filter call is the most pertinent. It’s kind of like map except instead of returning a new type of value, the callback function returns either true or false, reflecting whether the value is good enough to be included in the iterator’s output. In this case, it discards anything where x % 2 is non-zero, which is to say, odd numbers.

Also note how the double use of map makes this pipeline trivially easy to read. The first map takes in an iterator of integers and returns an iterator of strings. The second one takes an iterator of integers, prints each one to the console, and returns an iterator of Nil values. This is a easier to read than a single map call that tries to do the conversion and printing in one step:

pub fn main() {
  iterator.range(0, 50)
  |> iterator.filter(fn(x) { x % 2 == 0 })
  |> iterator.map(fn(x) {
    x
    |> int.to_string
    |> io.println
  })
  |> iterator.run
}

The hard part is determining whether the problem you are trying to solve can be formulated as a filter operation! Not all filters are of the “just remove it from the list” variety.

Break

The break statement is not as delightful as continue because different languages use it for different things. I am very specifically talking about the way C-like languages use break in switch statements. If you have no idea what I’m talking about, just ignore this paragraph.

I mentioned that the example in the previous section wasn’t very good. That’s because, in my opinion, it would have made more sense to never produce the odd numbers in the first place.Then we’d never have to filter them out. Honestly, for this specific situation, I think the range function should accept a step argument. But it doesn’t, so I wrote my own using unfold:

pub fn step_range(
  from start: Int,
  to stop: Int,
  by step: Int,
) -> iterator.Iterator(Int) {
  iterator.unfold(
    start,
    fn(x) {
      case x {
        y if y < stop -> iterator.Next(element: y, accumulator: y + step)
        _ -> iterator.Done
      }
    },
  )
}

pub fn main() {
  step_range(0, 50, 2)
  |> iterator.map(int.to_string)
  |> iterator.map(io.println)
  |> iterator.run
}

I love this example because, in spite of being very ugly, it allows us to explore so many new concepts! As you might guess from the name, the unfold function is kind of the opposite of fold. Where fold is given an iterator and a function to create a single value, unfold is given a single value and a function to create an iterator.

The key to understanding unfold is its return value, which is an instance of the Step type. Step can take one of two shapes, Next if a value should be yielded from the iterator, or Done if there are no more values to yield. Union types like this are really common in functional languages. I think the first time I encountered them was while coding Rust. It took a bit of getting used to, but once I understood the concept I found them to be extremely valuable.

I use a case statement (I didn’t have much choice, since case is the only conditional in Gleam) to decide which of these two types (Next and Done) should be returned from the anonymous function. I discussed case in part 1 of thise series, but I think this is the first time we’ve encountered guards.

Really, guards are just if statements. They say, “In addition to matching the pattern, the value that is matched must meet the boolean condition to the right of the if.” In this case, if the value x (which is captured as y inside the arm) is less than stop, we return the Next value. Otherwise we’re done iterating.

Anyway, Done is the equivalent of break in an imperative looping language. You can return it any time you want to break out of the “body” of a loopy function. In this example, I’m “breaking” when the counter gets above a certain value.

The Next type can be tricky to understand because it is trying to model the opposite of fold under the hood. The Next constructor accepts two values. element is the value that is being yielded from the iterator. accumulator is the value that is passed into the next call to the anonymous function.

So on each step, we are yielding the “current” element, but also accumulating enough state to calculate the “next” element.

This code is admittedly kind of ugly and hard to read. I think that’s just the nature of unfold. It is probably more efficient than using range plus filter, but I would actually be inclined to use the latter just for code legibility.

Recursion

Most of the time, one of the iterator functions will be the right tool for the job. It may take a bit of getting used to, but if you think of the “body of the loop” as a function, it isn’t that much different from what you’re probably used to. I haven’t used Gleam enough to know how well this paradigm works in practice, but my experiences with other semi-functional language suggests that it works well. I generally counsel piping data through functions in the iterator module when you can, but when all else fails, you can use recursion.

Recursion is notoriously hard to teach because:

a. it is a bit mind-bending if you haven’t encountered it before. b. it isn’t nearly so useful as teachers and bloggers like to think.

Proof of b: most tutorials on recursion start with the fibonacci sequence, yet fibonacci is actually most efficiently solved using loops (at least, in languages that have loops). In Gleam, unfold would be a great solution to fibonacci.

Recursion always has two features:

A base case, which can trivially return a value.
A recursive case, where the function calls itself with new inputs that are somehow “closer” to the base case.

I had a quirky university prof describe recursion as the “Ask a friend” algorithm: Let’s say you have a stack of dishes you need to wash. You are quite lazy, so you wash one dish and ask a friend to wash the rest of them. Every friend does this, and eventually the last friend sees there are no more dishes to wash, so the recursion stops.

Recursion is all about understanding exactly one iteration of the loop. You don’t think about what your friend is going to do (your mind will get very bent if you try to trace it out). You just look at the inputs you have, perform the work in front of you, and decide whether to pass the rest of it off to a friend or return the “base” value.

To illustrate, let’s make a function that returns a list of all prime numbers between 1 and n. This is a fun recursive example because there is a well-known looping algorithm that supports it (The Sieve of Eratosthenes). But Gleam doesn’t have loops so we’ll have to port the algorithm to recursion. Fun problem? No, but at least it’s illustrative.

I’m just gonna hijack the pseudocode from the linked wikipedia article:

algorithm Sieve of Eratosthenes is
    input: an integer n > 1.
    output: all prime numbers from 2 through n.

    let A be an array of Boolean values, indexed by integers 2 to n,
    initially all set to true.
    
    for i = 2, 3, 4, ..., not exceeding √n do
        if A[i] is true
            for j = i2, i2+i, i2+2i, i2+3i, ..., not exceeding n do
                set A[j] := false

    return all i such that A[i] is true.

Cool, let’s port that to Gleam, but with a caveat: This is not going to be an efficient implementation in a functional language. The problem is that in an immutable language, we cannot set a value in the list. Instead, we need to create a new list from the existing one.

A diversion to set values in lists

In fact, it’s so inefficient that Gleam’s standard library doesn’t appear to have a function to “set” a value at an index in a list, so let’s explore that, first.

Note: We won’t actually be using this code, but it is useful to understand it when we get around implementing the actual sieve.

pub fn set(on: List(a), at index: Int, to value: a) -> List(a) {
  let #(before, [_old_element, ..after]) = list.split(on, at: index)
  list.append(before, [value, ..after])
}

Well there’s three lines of code full of new concepts. Note to self: If I ever turn this into a book, cover functions and generics before looping. This function is generic in that it can operate on List objects that contain any value. That’s what the two List(a) mean in the function signature. It’s saying, “I don’t care what type of value the list contains, but whatever it is, I will return a new List that contains the same type of element.

a is used again for the type of the value that is being set. We don’t know what type the value is, but Lists are homogenous, so it obviously has to be the same type as is stored in the incoming and outgoing Lists.

The second line is probably also going to require two paragraphs of explanation! The list.split function returns a tuple containing two lists: one containing all the elements before index, and one containing all the elements after index. We use pattern matching in a let to extract the two lists into separate variables.

But Gleam pattern matching is pretty magical, so there’s more! My plan is to discard the element currently at index so I can concatenate everything back together with a new element in its place. So while I pattern match on the two lists, I am also using Gleam’s list syntax for extracting the first element of the list: [name_for_first_element, ..the_rest_of_the_list].

If that line is still confusing, you can split it up:

  let #(before, old_with_after) = list.split(on, at: index)
  let [_old_element, ..after] = old_with_after

The third line (fourth if you did the splitty-uppy thingy) uses the inverse operator to prefix an element onto an existing list. We are effectively creating two new lists here. The first one we create contains the new value prefixed to the after list. The second one appends it to the existing before list.

The result is a list with all the elements before the index we wanted to change, the new element, and all the elements after that index. Which is what we wanted.

I’m not sure how expensive splitting and rejoining is in erlang, but I’m guessing we just iterated over the before part of the list at least twice to cause all this to happen. Erlang is probably able to reuse existing elements, so there probbaly isn’t a bunch of copying going on, but the simple truth is linked lists really shouldn’t be treated like arrays.

Back to recursion

Oh, right. We were talking about that sieve thing. Let’s talk about it some more. We clearly need a function that, given only n, returns an array of numbers:

pub fn sieve(n: Int) -> List(Int) {
 // ??
}

This makes sense in terms of the function that a caller of our new API would want to use, but it’s not enough to be our recursive function. The recursive function needs to know about the array of booleans as well as n. We’ll do the rest of the sieve function before we code up the recursive function so we know what the called function needs to accept and return.

The main thing we need is a list of booleans of the appropriate size. For fun, I’m going to show two ways we can create this using the iterator module (The magic 2 is the first prime number):

pub fn sieve(n: Int) -> List(Int) {
  True
  |> iterator.repeat
  |> iterator.take(n)
  |> iterator.to_list
  |> recursive_sieve(n, 2)
}

This introduces the iterator.repeat and the iterator.take functions. The first creates an infinite iterator that spits out the same value on every iteration. The second truncates that iterator to a certain number of elements. So we create an infinite iterator of True values, then truncate to a fixed number, then convert to a list.

The second option will look a little more familiar, as it uses the range and map functions we’ve seen before:

pub fn sieve(n: Int) -> List(Int) {
  iterator.range(0, n)
  |> iterator.map(fn(x) { True })
  |> iterator.to_list
  |> recursive_sieve(n, 2)
}

I think the first version looks tidier, but either way, we now know how our recursive_sieve function should look. Just to make sure I got the signature right, I created a dummy implementation of recursive_sieve that just returns an array of identical values (5, because 5 is… well, it’s the number I happened to hit on my keyboard):

pub fn recursive_sieve(known_primes: List(Bool), n: Int, current: Int) -> List(Int) {
  known_primes
  |> list.map(fn(x) { 5 })
}

I’ve also created a main function as follows:

pub fn main() {
  sieve(10)
  |> iterator.from_list
  |> iterator.map(int.to_string)
  |> iterator.map(io.println)
  |> iterator.run
}

With the sieve and recursive_sieve functions implemented as above, this now compiles and spews out a bunch (10, to be exact) of 5s. Five is prime, right, so we’re done?

All your sieve’s base are belong to us

Recall that recursion requires a base case and a recursive case that is somehow “smaller” than the original call. Looking at the original “loop-based pseudocode”, the base case is when our current index reaches the square root of n.

So the recursive case somehow needs to do some work and then get our current number closer to the square root of n.

At least that’s easy. We just have to incrementing it.

But I need to get me a square root first. Took a bit of searching in the standard library; we need a float rather than an int, then we can convert the result back to an int with ceiling and truncate:

  assert Ok(n_sqrt) =
    n
    |> int.to_float
    |> float.square_root
    |> result.map(float.ceiling)
    |> result.map(float.truncate)

As you can see, pipelines are very common in functional programming. I didn’t like them much when I first tried it, but the syntax has grown on me. It’s very similar to piping in bash.

We have to do the last two float operations inside result.map because float.square_root may have returned an error (if you tried to do square root of a negative value, for example; Gleam doesn’t appear to support complex numbers). result.map says, “if the result is ok, apply this function to the value inside the result otherwise don’t do anything”.

Then we unwrap the result value using assert. This makes the program crash if a square root can’t be calculated. That isn’t isn’t very nice, but if somebody passed a negative value to calculate a prime, then they deserve what they get, right?

Now let’s set up our base case. Once n gets to be higher than our square root, we somehow need to convert our list of booleans to a list of integers. The list of integers need to be the indexes of any integers that have True values in the known_primes list. Sounds like a pretty nasty pipeline, eh?

      known_primes
      |> iterator.from_list
      |> iterator.index
      |> iterator.filter(fn(entry) { entry.1 })
      |> iterator.map(fn(entry) { entry.0 })
      |> iterator.to_list

So… we convert the list of primes to an iterator, then we turn it into an interator over #(index, value) tuples, then we filter to only include entries that have True in the second element of the tuple, then we map it over the first element in that tuple (which is the index), and convert the result to a list. The code is definitely easier to read than this paragraph, but at least we’ve seen all these functions before.

Let’s recurse that sieve!

The trick is knowing when we need to call this base case instead of the recursive one. We can start with a simple guard on a case statement:

  case current {
    prime if prime > n_sqrt ->
      known_primes
      |> iterator.from_list
      |> iterator.index
      |> iterator.filter(fn(entry) { entry.1 })
      |> iterator.map(fn(entry) { entry.0 })
      |> iterator.to_list

    _ -> recursive_sieve(known_primes, n, current + 1)
  }

We aren’t actually doing any sieving yet. In the base case (current > n_sqrt), we convert the known_primes array as described above. Right now, that just returns all the values because known_primes is chock full of True values. In the (current) recursive case, we just increment current and call ourselves with the new values.

Of course, this isn’t the correct behaviour. If I call sieve(15), the number 12 should definitely not show up in it. I love the number 12 because it’s divisible by most of the numbers less than it. I have a theory that eggs and buns come in multiples of 12 because they can be evenly divided between members of a family with 1, 2, 3, 4, or 6 people in it. So long as you and your spouse don’t have exactly three kids, you’re golden. Then again, some of them probably don’t like eggs…

Our recursive case actually needs to perform slightly differently depending on whether the value at the current index in known_primes is True or not. If its False, we are already looking at a non-prime (aka composite) value, so we can do the simple recursive case the way it is. But if it is True we need to strip out all the multiples of that value.

I guess the first thing we need to know is whether or not the current value is actually prime:

  assert Ok(current_is_prime) =
    known_primes
    |> list.at(current)

The assert in this case is safe because we know that current will never be higher than n_sqrt, and we know that known_primes contains n boolean values. Therefore, at will never be greater than n and will not fail.

Now we need to set up our recursive cases. Turns out there are two of them. We could put a nested case inside the recursive arm of the current case, but let’s explore another option: matching on two values at the same time.

  case current, current_is_prime {
    prime, _ if prime > n_sqrt ->
      known_primes
      |> iterator.from_list
      |> iterator.index
      |> iterator.filter(fn(entry) { entry.1 })
      |> iterator.map(fn(entry) { entry.0 })
      |> iterator.to_list

    prime, True -> recursive_sieve(known_primes, n, current - 1)

    _, _ -> recursive_sieve(known_primes, n, current + 1)
  }

The case is matching on current (an integer) and current_is_prime (a boolean). In the first arm, our base case, we don’t care whether current_is_prime is True or False. We just care that prime is higher than the square root of n, so we need to do the ol’ convert the list of booleans to a list of integers dance.

Otherwise, we return the same recursive call (for now) in two different arms. In the first arm, we know that prime is a prime number and we will eventually need to remove all the multiples of it from known_primes by setting those values to False. The second arm is basically justt an else clause. It covers the case where we are looking at a known composite number less than n_sqrt. In that case, we just need to increment current and do the recursive call.

Some complex filtering of common multiples

So let’s do that messy bit (you were thinking it was already messy, weren’t you?). We need to create a new list where all the booleans at indices that are multiples of current have been set to false. It would be much more pleasant to just say, “we need to set the booleans at those indices to false,” but we don’t have the luxury of mutable state.

Instead, I’m going to do a variation on that rather intimidating set function we looked at earlier: We split the list into three pieces, namely the elements at indices lower than current (there can’t be any multiples in this part, so we know it ain’t gonna change), the current element, which will remain True because we know it is prime, and the tail of the list, which contans everything else and needs to have multiples stripped out.

    prime, True -> {
      let #(head, [_, ..tail]) =
        known_primes
        |> list.split(prime)
      
      // TODO
    }

Right, so list.split returns a tuple of head and tail, and we pattern match on the latter to effectively drop the first element of the tail.

Ok, I think we’re getting close! Can you feel it? I can feel it.

The next step is to loop through everything in tail and create a new list where the multiples are explicitly set to False. I’m storing the result in a list called filtered (replaces // TODO above):

      let filtered =
        tail
        |> list.index_map(fn(index, element) {
          let am_i_multiple = { prime + index + 1 } % current
          case am_i_multiple {
            0 -> False
            _ -> element
          }
        })

The index_map function is similar to |> iterate.index |> iterate.map. It’s like a standard map function that accepts two arguments for the index and the element value. It exists so you can do index and mapping operations on a list without converting it to an iterator, but also without creating an unnecessary intermediate list.

The math to determine whether or not we are dealing with a multiple of the current value took a bit of head scratching. We have to add the prime value to the index because that is the size of the head that we stripped off. And we need to add another 1 because that is the True value we discarded for the current index. Then we use a baby case statement to either set the value to False or leave it whatever it was before (True or False depending if it was a multiple of an earlier prime).

Now we have the original head list and a new filtered tail list and we need to smush them back together. Don’t forget to put that True value back in there:

      list.append(head, [True, ..filtered])
      |> recursive_sieve(n, current + 1)

As you can see, now that we have a new known_primes list, we can pass it into the next call to recursive_sieve so it can do it all over again.

Tail Recursion

Recursion can be useful and the Erlang virtual machine is heavily optimised to do it efficiently (or so I’ve heard. I’m as new to this as you are). However, if you structure your recursive calls in the right way, you may be able to help the virtual machine optimise it even further.

This is where so-called “tail recursion” comes into play. If you can structure your code such that the return value of the current iteration is equivalent to the return value of the next iteration.

I’ll be honest. I thought I was implementing a non-tail-recursive function for the Sieve of Eratosthenes problem. I was doing it on purpose because I wanted to demonstrate converting it to tail recursion. But when I looked at the finished product, I realized that it is already tail recursive! Gleam’s functional paradigm kind of forced my hand. Indeed, I’m not sure how to make this into a non-tail-recursive call.

I guess I can’t really complain if a language forces me to write optimized code …except when it interferes with my ability to explain concepts!

So let’s look at why this function is tail recursive. For reference, here’s the whole recursive_sieve function (in case you got lost as we built it up from scratch above):

pub fn recursive_sieve(
  known_primes: List(Bool),
  n: Int,
  current: Int,
) -> List(Int) {
  assert Ok(n_sqrt) =
    n
    |> int.to_float
    |> float.square_root
    |> result.map(float.ceiling)
    |> result.map(float.truncate)

  assert Ok(current_is_prime) =
    known_primes
    |> list.at(current)

  case current, current_is_prime {
    prime, _ if prime > n_sqrt ->
      known_primes
      |> iterator.from_list
      |> iterator.index
      |> iterator.filter(fn(entry) { entry.1 })
      |> iterator.map(fn(entry) { entry.0 })
      |> iterator.to_list

    prime, True -> {
      let #(head, [_, ..tail]) =
        known_primes
        |> list.split(prime)
      let filtered =
        tail
        |> list.index_map(fn(index, element) {
          let am_i_multiple = { prime + index + 1 } % current
          case am_i_multiple {
            0 -> False
            _ -> element
          }
        })
      list.append(head, [True, ..filtered])
      |> recursive_sieve(n, current + 1)
    }

    _, _ -> recursive_sieve(known_primes, n, current + 1)
  }
}

The key is that index_map is called to update the known_primes before we pass it into the next recursive_sieve call. The function doesn’t do any extra processing on the returned value. Therefore, the virtual machine does not need to keep the function state and local variables (called a stack frame) hanging around while it waits for the recursive calls to complete.

Writing tail recursive functions is normally harder than a standard recursive function. Generally you have to pass some sort of intermediate value or accumulator (in this case, it’s known_primes) into the called function so it has everything it needs to return everything at the end. This is starting to sound a lot like the fold function, right? Indeed, I’m pretty sure the sieve of Eratosthenes could be solved quite elegantly using fold. But I’ll leave that as an exercise for the reader. Personally, I’ve had enough of primes!

Summary

This post explored the question “Why does Gleam not have loops (or even if statements)”. The answer is that looping tends to be treated differently in functional languages.

The iterator and list modules in the gleam standard library contain everything you need to process sequences of values in interesting ways. When there are cases that aren’t solved by these modules, you can use recursion to perform the looping operation instead.

digressions because I haven’t been explaining things in an appropriate order. I’m not sure what I’ll write next; I’m meandering towards some stuff around OTP (open telecom platform), which is Erlang’s killer feature, but I might find some other ideas that need to be explored before we get there.