A Tutorial Introduction to Gleam -- Part 2
Introduction
This is the second in a series of articles exploring the Gleam programming language. The first article explored some of the most basic features of Gleam; just enough to say hello.
Hello is basically the first thing we learn in any language (whether human or programming). This article explores looping in Gleam. More specifically, it explores the fact that Gleam doesn’t have any looping constructs.
That’s right: none.
Patreon
If you find my content valuable, please do consider sponsoring me on Patreon account. I write about what interests me, and I usually target tutorial style articles for intermediate to advanced developers in a variety of languages.
I’m not sure how far I’m going to go with this series on Gleam, so if you’re particularly interested in the topic, a few dollars a month definitely gives me a bit of motivation to keep it up.
Immutable and Functional
To me, Gleam feels much more elegant than many of its predecessors in the functional programming wold, but at its core, it’s still a functional language. The defining feature of a pure functional language is that calling any function, given the same inputs, will have exactly the same effect. Another very common feature is immutable values.
That’s really just a bunch of long buzzwords to say “Nothing ever changes.” You can’t make a variable point to a new value, append to a list, or set a field on a record.
Considering that core purpose of a computer is to change electronic states several billion times a second, this may sound unintuitive. But actually, those states are not changing several billion times a second. Rather, it is creating a brand new state several billion times a second. And each state can be completely defined in terms of its previous state.
CPUs don’t know anything about functions or loops or if statements or any of the other niceties you have grown used to. All of that is just stuff humans put on top to make it easier to think about. Some people (strange people, in my opinion) think that loops do not make code easier to think about. Gleam programmers appear to be in that set of people.
Consider functions. In Gleam, the return value is always exactly the same if
the arguments are the same. In Python, if you call len() on the same list
before and after you add an element to it, len() will return different values
(len is short for length if you don’t know Python. Apparently six letters
is just too long for such a commonly called function).
In Gleam, if you call length on a list, it will always return the same
number, because that list isn’t gonna change. It’s more stubborn than your
crazy uncle (If you don’t have a crazy uncle to be stubborn at you, borrow
someone else’s uncle, or substitute a crazy aunt). If you append something to
the list you will get a new list. The old one is still there, and calling
length on it will return the same value it always did. The new one will have
a different length, of course, but its length ain’t gonna change either.
Append enough elements to a list and you’ll have met my extended family. We’re all pretty stubborn, to be honest. And crazy.
Iteration Matters
Excluding Go, which is satisfied with exactly one loop keyword, imperative languages typically have at least two broad classes of loops:
- those that perform some action on every item in a collection.
for ... inorfor eachare common syntaxes. - those that perform some action repeatedly until some condition is true.
whileoruntilare common keywords.
In functional languages such as Gleam, the first category is typically solved by calling a function on each item in the list, so no language statements are required to make shit happen. Basically, the body of the loop is a function; otherwise it’s the same thing. The second is usually solved using some form of recursion; calling the same function repeatedly from itself until the condition is met.
Data Structures Matter
The immutable nature of Gleam means that certain common operations use completely different underlying data structures. Consider lists. Most (imperative) languages implement their basic list or array data structure as a sequence of elements in memory one after another. It usually reserves some extra space “just in case” an element needs to be added to the array. If it didn’t do that, it would have to copy the whole damn thing to a different location in memory every time you added something. better to reserve, say, 2x the space the next time you copy it so you can add stuff for longer before copying it again.
It’s not like it’s that much memory, right?
But Gleam data structures are immutable. If you have a list of 10 elements, then you have a list of 10 elements. You can’t add an element to the list. You can create a new list with 11 elements in it, 10 of which are also in the old list. So there’s no point in reserving extra space.
But man, copying a bunch of data to a new piece of memory every time you want to add something to it doesn’t seem terribly performant, does it?
This is why the Erlang (which Gleam transpiles to) virtual machine uses linked lists. I tend to think of linked lists as being freaking useless unless you are coding an operating system or an interview at a company that thinks linked lists are all the rage. You’d think they were especially useless in an immutable language; if you aren’t allowed to manipulate the links, why bother?
Erlang is sneaky. I’m not saying it’s lying or anything. It just tends to cut corners. You see, if the “old” list can’t be changed, the new list can just be all like, “ok, I’m gonna point at this new element I’m adding, and it can point at the old list”. So it’s not even a list. It’s kind of a tree with lots of pointers to different heads that converge on a single tail.
Ugh, I’m having flashbacks to when I was writing blockchains. Yuck. If I wasn’t so grossed out, I’d draw a picture for you.
Note: I actually don’t know how Erlang stores this crap in memory. It’s probably sort of like I’ve described but more optimized. I’m just making it up to give you a mental model that works for me.
Anyway, this means there only needs to be one empty list, or one empty map. In the whole world. Since you can’t add anytihng to that empty thing, you know it’s not going to change. It’s just a place to start from when you create a new list from by adding something to the empty one.
Something Cool About Functions
This article is supposedly about looping, but Gleam doesn’t have any loops. (it’s also supposed to be about coding, but we haven’t written any code yet. I’m not a pathalogical liar, I swear).
So let’s talk about functions instead. I did that a bit in the previous article, but it was mostly just about how you can use the pipe to put the first argument of the function on the left side of the function name instead of inside parens on the right (This is a lot more useful than it sounds. Try it sometime).
Like all programming languages, functions can take aguments and return values. Like all programming languages, functions can take aguments and return values. Like some languages, those arguments are strongly typed (you can’t pass a string where an integer is expected), but you don’t have to specify the type if Gleam can infer it (but you should anyway, because a human in a hurry to fix a huge bug probably can’t infer it, at least not quickly).
Like no language I’ve ever seen, Gleam allows you to use different labels for the calling function from the value names used inside the called function.
For example (holy shit, he’s about to write some code!), consider this not-at-all-contrived example:
import gleam/io
import gleam/list
import gleam/int
pub fn multiply_nth_element(the_list: List(Int), index: Int, multiplier: Int) {
assert Ok(element) =
the_list
|> list.at(index)
element * multiplier
}
pub fn main() {
let some_list = [1, 2, 8, 4, 3, 2]
some_list
|> multiply_nth_element(2, 4)
|> int.to_string
|> io.println
}
There’s nothing new here yet. There’s a function called multiply_nth_element
that extracts an element from a list and multiplies it by a passed in value.
The names of the valiables all kinda make sense inside the function, but the caller might want to use different labels. So Gleam allows us to write this instead:
import gleam/io
import gleam/list
import gleam/int
pub fn multiply_nth_element(
in the_list: List(Int),
at index: Int,
by multiplier: Int,
) {
assert Ok(element) =
list.at(in: the_list, get: index)
element * multiplier
}
pub fn main() {
let some_list = [1, 2, 8, 4, 3, 2]
multiply_nth_element(in: some_list, at: 2, by: 4)
|> int.to_string
|> io.println
}
Note three things:
- The
multiply_nth_elementfunction signature now has two “names” for each argument.inis used outside the function, andthe_listis used inside. Andatis used at the callsite, butindexis used in the function body. I’ll leaveby/multiplieras an exercise to the reader (it won’t be a tough workout). - Th
list.atcall has switched from pipe syntax to labelled syntax. This works becauselist.atis defined with labelled arguments, too. - The
multiply_nth_elementcall has changed from pipingsome_listinto the function to using labeled arguments, too.
The Gleam language reference indicates that this can be used to write Gleam
code that looks more like English (or any preferred human language, come to
think of it). It’s kind of neat, but I’m not sure if I like it yet. As a person
who has to call a function, I now have way too many options. I’m a Python
programmer. I like “one– and preferably only one –obvious way” mentality.
Now, given the multiply_nth_element function, it’s extremely not obvious
which of these I should use (they all work):
multiply_nth_element(some_list, 2, 4)(positional arguments)some_list |> multiply_nth_element(2, 4)(pipe first arguments)multiply_nth_element(in: some_list, at: 2, by: 4)(labeled arguments)multiply_nth_element(at: 2, in: some_list, by: 4)(labels can be in any order)some_list |> multiply_nth_element(2, by: 4)(How about all of the above?)
Anyway, it’s a thing that Gleam has. Moving on…
Looping For a Count
One of the most common looping mechanisms is to repeat a task a certain number
of times. In C-like languages, it looks like for (int i = 0; i < 5; i++),
which is frankly one of the most obnoxious syntaxes I’ve ever seen anywhere
(that’s one reason I don’t like Go as much as I should; it’s the only looping
construct the language has). Python makes it look a little better by moving the
counting part into a function named range. This looks like for i in range(5).
In both cases, the loop “body” comes immediately after the syntax, and it’s implied that the body executes once for each iteration. Gleam is very similar to the Python version, except that the loop “body” is now a function call instead:
import gleam/io
import gleam/int
import gleam/iterator
pub fn main() {
iterator.range(0, 5)
|> iterator.map(fn(x) { io.println(int.to_string(x)) })
|> iterator.run()
}
First, note the iterator.range(0, 5). That produces an iterator of five
values, just like Python’s range (it could be iterator.range(from: 0, to: 5) if you wanted to be very explicit in the labeling). The resulting iterator
gets passed into iterator.map using a pipe. This map call converts the
existing iterator into a new iterator by calling an anonymous function
(labelled with) on each of the elements returned by range.
The new iterator will contain five Nil elements. Can you see why Nil? Every
function in Gleam returns the last evaluated expression. In this case, that is
the result of io.println, which returns Nil.
io.println is one of those functions that functional purists hate because it
has side effects (printing to the screen). The run function exists on iterators
to trigger such side effects. Until run is called, all the other iterators only
exist in potentia. The first one, iterator.range has the potential to
generate five values, but it won’t generate any until run is called. That
iterator gets consumed by a interator.map function which has the potential to
generate five Nils, but not until run is called.
But when run gets called, look out! io.println will put the iterated
numbers on the screen, just as expected.
The Gleam code is admittedly a bit harder to read than equivalent Python:
for i in range(5):
print(i)
There are a few reasons for the difference in legibility, but it’s not really
because of functional programming. One is Python’s dynamically typed nature.
The print function in Python kindly converts anything we pass to it into a
string so we don’t have to do it manually. Python’s lack of braces also
contributes to readability in this case. Another is that I format all of my
examples with the Gleam formatter before I post them here, and in this case,
the formatter is making the code harder to read (in my opinion) than it needs
to. If I change the imports to bring the necessary functions into scope and then
format the code the way I think it should be formatted, it looks like this:
import gleam/io.{println as print}
import gleam/int.{to_string as str}
import gleam/iterator
pub fn main() {
iterator.range(from: 0, to: 5)
|> iterator.map(fn(x) {
print(str(x))
})
|> iterator.run()
}
Still not as elegant as Python, but it’s not any worse than Java:
public class Main {
public static void main(String args[]) {
for (int i = 0; i < 5; i++) {
System.out.println(
Integer.toString(i)
)
}
}
}
(I haven’t written Java in well over a decade, and dry-coded the above from scratch based on ancient memories, so forgive me if I got it wrong).
Looping over elements in a list
The previous example showed that the loop construct in Gleam is a function call rather than a statement or an expression. Kind of a weird concept, but it works well, especially when use pipes effectively. In fact, any list of values can be converted to an iterator and then iterated over in a similar way. Consider a simple function that prints each of the command line arguments on a separate line:
import gleam/io
import gleam/erlang
import gleam/iterator
pub fn main() {
let arguments = erlang.start_arguments()
arguments
|> iterator.from_list
|> iterator.map(io.println)
|> iterator.run()
}
Recall start_arguments from Part 1 of this series, which presents a list of
strings. We pipe this into from_list, which creates an iterator. Then we use
the same map and run technique we used in the last one. Note how I
simplified the map to just use io.println instead of creating an anonymous
function. You can use any function as a map callback so long as it accepts the
appropriate type.
A problem with this technique is that you can’t keep track of state between iterations. This is where things can get a bit interesting. Consider the following imperative Python code:
arguments = ["Hi", "I'm", "a", "string"]
count = 0
for argument in arguments:
print(f"{count}. {argument}")
count += 1
Yes, I know Python has enumerate() to do this more elegantly, but that would
give away the secret too early. Python is primarily an imperative language, but
many of its functional constructs are quite nice.
There are a few ways we could solve this using Gleam. The key is to look at the
iterator documentation
for the gleam standard ibrary. There are a whole bunch of different functions
you can use to perform something on “each” element, possibly passing extra
state for them. For example, we could use the fold function, which includes
some “accumulated” state with each iteration. I don’t know why it’s called
fold; I suspect it’s got something with mixing an egg into waffle batter.
pub fn main() {
let arguments = erlang.start_arguments()
arguments
|> iterator.from_list
|> iterator.fold(
0,
fn(count, x) {
io.println(string.concat([int.to_string(count), ". ", x]))
count + 1
},
)
}
Unlike map, which takes a list and a function that accepts one argument, the
fold function takes a list, an initial state, and a function that accepts two
arguments. That function then returns (the return value is the last expression
in the function) the new state. In this case, our state is an integer.
Initially it is zero, and we return it incremented by 1 for every function call
processing an element in the list.
The state can be anything. Here’s a similar example that stores the list of
strings to be printed in the state and only calls io.println at the end:
pub fn main() {
let arguments = erlang.start_arguments()
arguments
|> iterator.from_list
|> iterator.fold(
[],
fn(output, x) {
let count = list.length(output) / 4
output
|> list.append([int.to_string(count), ". ", x, "\n"])
},
)
|> string.concat
|> io.print
}
Wow, that’s pretty long (and I even snipped out a bunch of gleam imports that
you’ll have to find for yourself. Hint: import gleam/<something>). In this
example, the “state” is a list of strings. Each call to the function adds
four strings to the returned state:
- the count
- a
.separator - the argument
- and a newline
We still need the count of the number of inputs, so we divide the length of the
big list by 4. Then we call list.append, which merges two distinct lists into
one and returns the new bigger list. That result gets used as state in the next
function call.
Once all the elements are processed, the resulting list is passed to
string.concat and that result is piped to io.print. Note that it is
io.print instead of io.println since we appended a trailing newline already
while constructing the string.
Note that the list.length function needs to iterate over all the elements in
the list every time, so it’s not very efficient. You could combine the two
above methods by making the state passed in use a tuple of #(count, output):
pub fn main() {
let arguments = erlang.start_arguments()
arguments
|> iterator.from_list
|> iterator.fold(
#(0, []),
fn(state, x) {
let #(count, output) = state
let state =
output
|> list.append([int.to_string(count), ". ", x, "\n"])
#(count + 1, state)
},
)
|> pair.second
|> string.concat
|> io.print
}
Gleam Tuples
By the way, tuples in gleam are specified as #(val1, val2, ...). They can be
heterogenous (the diferent values inside don’t have to be the same type). You
can access individual values on them using my_tuple.0 or my_tuple.1, or you
can extract all the values using a let pattern match such as let #(val1, val2, val3) = mytuple. These both kind of mess up the pretty pipeline syntax,
so instead, I used the
https://hexdocs.pm/gleam_stdlib/gleam/pair.html standard library
module to extract the second value. The count isn’t relevant anymore.
Use the index function to enumerate
I was thinking of exploring some of the other methods to solve this problem,
such as using reduce kind of like fold, or using zip to mix an iterator
with a range of the same length. But I got bored, so let’s just skip to the
correct solution: a function that knows how to add indices to the list already
(like Python’s enumerate; the secret I didn’t want to give away).
In Gleam, it’s called index instead of enumerate, but it works the same. It
consumes an iterator and spits out a new iterator over tuples of #(index, value). Here’s an example:
pub fn main() {
let arguments = erlang.start_arguments()
arguments
|> iterator.from_list
|> iterator.index
|> iterator.map(fn(x) {
io.println(string.concat([int.to_string(x.0), ". ", x.1]))
})
|> iterator.run
}
This time I’ve gone back to doing the println side effect inside the map
function.
Continue
Most imperative languages have a continue statement that allows you to end a
loop early. In fact, continue is my favourite keyword because it means the
same thing and takes the same form in virtually every language that has it.
The continue keyword is typically used in one of two situations:
- filtering: You are looping over elements, processing them, and adding a subset of the elements to another list.
- early “returns”: You have determined that the operation in question does not apply to the current state of the loop, so there’s no further processing required. This usually happens with corner cases, failures, or null responses, for example.
Both of these are modelled easily using Gleam’s functional tools. Filtering is
especially easy, as there is an aptly-named iterator.filter function that you
can use. I’m having trouble coming up with a reasonable example, so let’s
imagine something that filters out odd numbers (They’re called “odd” for a
reason. I’m odd, too.):
import gleam/io
import gleam/iterator
import gleam/string
pub fn main() {
iterator.range(0, 50)
|> iterator.filter(fn(x) { x % 2 == 0 })
|> iterator.map(int.to_string)
|> iterator.map(io.println)
|> iterator.run
}
This example has a few interesting points to discuss. The filter call is the
most pertinent. It’s kind of like map except instead of returning a new type
of value, the callback function returns either true or false, reflecting
whether the value is good enough to be included in the iterator’s output. In
this case, it discards anything where x % 2 is non-zero, which is to say, odd
numbers.
Also note how the double use of map makes this pipeline trivially easy to
read. The first map takes in an iterator of integers and returns an iterator of
strings. The second one takes an iterator of integers, prints each one to the
console, and returns an iterator of Nil values. This is a easier to read
than a single map call that tries to do the conversion and printing in one step:
pub fn main() {
iterator.range(0, 50)
|> iterator.filter(fn(x) { x % 2 == 0 })
|> iterator.map(fn(x) {
x
|> int.to_string
|> io.println
})
|> iterator.run
}
The hard part is determining whether the problem you are trying to solve can be formulated as a filter operation! Not all filters are of the “just remove it from the list” variety.
Break
The break statement is not as delightful as continue because different
languages use it for different things. I am very specifically talking about the
way C-like languages use break in switch statements. If you have no idea
what I’m talking about, just ignore this paragraph.
I mentioned that the example in the previous section wasn’t very good. That’s
because, in my opinion, it would have made more sense to never produce the
odd numbers in the first place.Then we’d never have to filter them out.
Honestly, for this specific situation, I think the range function should
accept a step argument. But it doesn’t, so I wrote my own using unfold:
pub fn step_range(
from start: Int,
to stop: Int,
by step: Int,
) -> iterator.Iterator(Int) {
iterator.unfold(
start,
fn(x) {
case x {
y if y < stop -> iterator.Next(element: y, accumulator: y + step)
_ -> iterator.Done
}
},
)
}
pub fn main() {
step_range(0, 50, 2)
|> iterator.map(int.to_string)
|> iterator.map(io.println)
|> iterator.run
}
I love this example because, in spite of being very ugly, it allows us to
explore so many new concepts! As you might guess from the name, the unfold
function is kind of the opposite of fold. Where fold is given an iterator and
a function to create a single value, unfold is given a single value and a
function to create an iterator.
The key to understanding unfold is its return value, which is an instance of
the Step type.
Step can take one of two shapes, Next if a value should be yielded from the
iterator, or Done if there are no more values to yield. Union types like this
are really common in functional languages. I think the first time I encountered
them was while coding Rust. It took a bit of getting used to, but once I
understood the concept I found them to be extremely valuable.
I use a case statement (I didn’t have much choice, since case is the only
conditional in Gleam) to decide which of these two types (Next and Done)
should be returned from the anonymous function. I discussed case in part 1 of
thise series, but I think this is the first time we’ve encountered guards.
Really, guards are just if statements. They say, “In addition to matching the
pattern, the value that is matched must meet the boolean condition to the right
of the if.” In this case, if the value x (which is captured as y inside
the arm) is less than stop, we return the Next value. Otherwise we’re done
iterating.
Anyway, Done is the equivalent of break in an imperative looping language.
You can return it any time you want to break out of the “body” of a loopy
function. In this example, I’m “breaking” when the counter gets above a certain
value.
The Next type can be tricky to understand because it is trying to model the
opposite of fold under the hood. The Next constructor accepts two values.
element is the value that is being yielded from the iterator. accumulator
is the value that is passed into the next call to the anonymous function.
So on each step, we are yielding the “current” element, but also accumulating enough state to calculate the “next” element.
This code is admittedly kind of ugly and hard to read. I think that’s just the
nature of unfold. It is probably more efficient than using range plusfilter, but I would actually be inclined to use the latter just for code
legibility.
Recursion
Most of the time, one of the iterator functions will be the right tool for the
job. It may take a bit of getting used to, but if you think of the “body of the
loop” as a function, it isn’t that much different from what you’re probably
used to. I haven’t used Gleam enough to know how well this paradigm works in
practice, but my experiences with other semi-functional language suggests that
it works well. I generally counsel piping data through functions in the
iterator module when you can, but when all else fails, you can use recursion.
Recursion is notoriously hard to teach because:
a. it is a bit mind-bending if you haven’t encountered it before. b. it isn’t nearly so useful as teachers and bloggers like to think.
Proof of b: most tutorials on recursion start with the fibonacci sequence, yet fibonacci is actually most efficiently solved using loops (at least, in languages that have loops). In Gleam, unfold would be a great solution to fibonacci.
Recursion always has two features:
- A base case, which can trivially return a value.
- A recursive case, where the function calls itself with new inputs that are somehow “closer” to the base case.
I had a quirky university prof describe recursion as the “Ask a friend” algorithm: Let’s say you have a stack of dishes you need to wash. You are quite lazy, so you wash one dish and ask a friend to wash the rest of them. Every friend does this, and eventually the last friend sees there are no more dishes to wash, so the recursion stops.
Recursion is all about understanding exactly one iteration of the loop. You don’t think about what your friend is going to do (your mind will get very bent if you try to trace it out). You just look at the inputs you have, perform the work in front of you, and decide whether to pass the rest of it off to a friend or return the “base” value.
To illustrate, let’s make a function that returns a list of all prime numbers
between 1 and n. This is a fun recursive example because there is a
well-known looping algorithm that supports it (The Sieve of
Eratosthenes). But Gleam
doesn’t have loops so we’ll have to port the algorithm to recursion. Fun
problem? No, but at least it’s illustrative.
I’m just gonna hijack the pseudocode from the linked wikipedia article:
algorithm Sieve of Eratosthenes is
input: an integer n > 1.
output: all prime numbers from 2 through n.
let A be an array of Boolean values, indexed by integers 2 to n,
initially all set to true.
for i = 2, 3, 4, ..., not exceeding √n do
if A[i] is true
for j = i2, i2+i, i2+2i, i2+3i, ..., not exceeding n do
set A[j] := false
return all i such that A[i] is true.
Cool, let’s port that to Gleam, but with a caveat: This is not going to be an efficient implementation in a functional language. The problem is that in an immutable language, we cannot set a value in the list. Instead, we need to create a new list from the existing one.
A diversion to set values in lists
In fact, it’s so inefficient that Gleam’s standard library doesn’t appear to have a function to “set” a value at an index in a list, so let’s explore that, first.
Note: We won’t actually be using this code, but it is useful to understand it when we get around implementing the actual sieve.
pub fn set(on: List(a), at index: Int, to value: a) -> List(a) {
let #(before, [_old_element, ..after]) = list.split(on, at: index)
list.append(before, [value, ..after])
}
Well there’s three lines of code full of new concepts. Note to self: If I ever
turn this into a book, cover functions and generics before looping.
This function is generic in that it can operate on List objects that
contain any value. That’s what the two List(a) mean in the function
signature. It’s saying, “I don’t care what type of value the list contains, but
whatever it is, I will return a new List that contains the same type of
element.
a is used again for the type of the value that is being set. We don’t know
what type the value is, but Lists are homogenous, so it obviously has to be
the same type as is stored in the incoming and outgoing Lists.
The second line is probably also going to require two paragraphs of
explanation! The list.split function returns a tuple containing two lists:
one containing all the elements before index, and one containing all the
elements after index. We use pattern matching in a let to extract the two
lists into separate variables.
But Gleam pattern matching is pretty magical, so there’s more! My plan is to
discard the element currently at index so I can concatenate everything back
together with a new element in its place. So while I pattern match on the two
lists, I am also using Gleam’s list syntax for extracting the first element of
the list: [name_for_first_element, ..the_rest_of_the_list].
If that line is still confusing, you can split it up:
let #(before, old_with_after) = list.split(on, at: index)
let [_old_element, ..after] = old_with_after
The third line (fourth if you did the splitty-uppy thingy) uses the inverse
operator to prefix an element onto an existing list. We are effectively
creating two new lists here. The first one we create contains the new value
prefixed to the after list. The second one appends it to the existing
before list.
The result is a list with all the elements before the index we wanted to change, the new element, and all the elements after that index. Which is what we wanted.
I’m not sure how expensive splitting and rejoining is in erlang, but I’m
guessing we just iterated over the before part of the list at least twice to
cause all this to happen. Erlang is probably able to reuse existing elements,
so there probbaly isn’t a bunch of copying going on, but the simple truth is
linked lists really shouldn’t be treated like arrays.
Back to recursion
Oh, right. We were talking about that sieve thing. Let’s talk about it some more.
We clearly need a function that, given only n, returns an array of numbers:
pub fn sieve(n: Int) -> List(Int) {
// ??
}
This makes sense in terms of the function that a caller of our new API would
want to use, but it’s not enough to be our recursive function. The recursive
function needs to know about the array of booleans as well as n. We’ll do the
rest of the sieve function before we code up the recursive function so we
know what the called function needs to accept and return.
The main thing we need is a list of booleans of the appropriate size. For fun,
I’m going to show two ways we can create this using the iterator module (The
magic 2 is the first prime number):
pub fn sieve(n: Int) -> List(Int) {
True
|> iterator.repeat
|> iterator.take(n)
|> iterator.to_list
|> recursive_sieve(n, 2)
}
This introduces the iterator.repeat and the iterator.take functions. The
first creates an infinite iterator that spits out the same value on every
iteration. The second truncates that iterator to a certain number of elements.
So we create an infinite iterator of True values, then truncate to a fixed
number, then convert to a list.
The second option will look a little more familiar, as it uses the range and
map functions we’ve seen before:
pub fn sieve(n: Int) -> List(Int) {
iterator.range(0, n)
|> iterator.map(fn(x) { True })
|> iterator.to_list
|> recursive_sieve(n, 2)
}
I think the first version looks tidier, but either way, we now know how our
recursive_sieve function should look. Just to make sure I got the signature
right, I created a dummy implementation of recursive_sieve that just returns
an array of identical values (5, because 5 is… well, it’s the number I
happened to hit on my keyboard):
pub fn recursive_sieve(known_primes: List(Bool), n: Int, current: Int) -> List(Int) {
known_primes
|> list.map(fn(x) { 5 })
}
I’ve also created a main function as follows:
pub fn main() {
sieve(10)
|> iterator.from_list
|> iterator.map(int.to_string)
|> iterator.map(io.println)
|> iterator.run
}
With the sieve and recursive_sieve functions implemented as above, this now
compiles and spews out a bunch (10, to be exact) of 5s. Five is prime, right,
so we’re done?
All your sieve’s base are belong to us
Recall that recursion requires a base case and a recursive case that is somehow
“smaller” than the original call. Looking at the original “loop-based
pseudocode”, the base case is when our current index reaches the square root
of n.
So the recursive case somehow needs to do some work and then get our
current number closer to the square root of n.
At least that’s easy. We just have to incrementing it.
But I need to get me a square root first. Took a bit of searching in the
standard library; we need a float rather than an int, then we can convert the
result back to an int with ceiling and truncate:
assert Ok(n_sqrt) =
n
|> int.to_float
|> float.square_root
|> result.map(float.ceiling)
|> result.map(float.truncate)
As you can see, pipelines are very common in functional programming. I didn’t like them much when I first tried it, but the syntax has grown on me. It’s very similar to piping in bash.
We have to do the last two float operations inside result.map because
float.square_root may have returned an error (if you tried to do square root
of a negative value, for example; Gleam doesn’t appear to support complex
numbers). result.map says, “if the result is ok, apply this function to the
value inside the result otherwise don’t do anything”.
Then we unwrap the result value using assert. This makes the program crash if
a square root can’t be calculated. That isn’t isn’t very nice, but if somebody
passed a negative value to calculate a prime, then they deserve what they get,
right?
Now let’s set up our base case. Once n gets to be higher than our square
root, we somehow need to convert our list of booleans to a list of integers.
The list of integers need to be the indexes of any integers that have True
values in the known_primes list. Sounds like a pretty nasty pipeline, eh?
known_primes
|> iterator.from_list
|> iterator.index
|> iterator.filter(fn(entry) { entry.1 })
|> iterator.map(fn(entry) { entry.0 })
|> iterator.to_list
So… we convert the list of primes to an iterator, then we turn it into an
interator over #(index, value) tuples, then we filter to only include entries
that have True in the second element of the tuple, then we map it over the
first element in that tuple (which is the index), and convert the result to a
list. The code is definitely easier to read than this paragraph, but at least
we’ve seen all these functions before.
Let’s recurse that sieve!
The trick is knowing when we need to call this base case instead of the recursive one. We can start with a simple guard on a case statement:
case current {
prime if prime > n_sqrt ->
known_primes
|> iterator.from_list
|> iterator.index
|> iterator.filter(fn(entry) { entry.1 })
|> iterator.map(fn(entry) { entry.0 })
|> iterator.to_list
_ -> recursive_sieve(known_primes, n, current + 1)
}
We aren’t actually doing any sieving yet. In the base case (current > n_sqrt), we convert the known_primes array as described above. Right now,
that just returns all the values because known_primes is chock full of True
values. In the (current) recursive case, we just increment current and call
ourselves with the new values.
Of course, this isn’t the correct behaviour. If I call sieve(15), the number
12 should definitely not show up in it. I love the number 12 because it’s
divisible by most of the numbers less than it. I have a theory that eggs and
buns come in multiples of 12 because they can be evenly divided between members
of a family with 1, 2, 3, 4, or 6 people in it. So long as you and your spouse
don’t have exactly three kids, you’re golden. Then again, some of them probably
don’t like eggs…
Our recursive case actually needs to perform slightly differently depending on
whether the value at the current index in known_primes is True or
not. If its False, we are already looking at a non-prime (aka composite)
value, so we can do the simple recursive case the way it is. But if it is
True we need to strip out all the multiples of that value.
I guess the first thing we need to know is whether or not the current value is actually prime:
assert Ok(current_is_prime) =
known_primes
|> list.at(current)
The assert in this case is safe because we know that current will never be
higher than n_sqrt, and we know that known_primes contains n boolean
values. Therefore, at will never be greater than n and will not fail.
Now we need to set up our recursive cases. Turns out there are two of them. We
could put a nested case inside the recursive arm of the current case, but
let’s explore another option: matching on two values at the same time.
case current, current_is_prime {
prime, _ if prime > n_sqrt ->
known_primes
|> iterator.from_list
|> iterator.index
|> iterator.filter(fn(entry) { entry.1 })
|> iterator.map(fn(entry) { entry.0 })
|> iterator.to_list
prime, True -> recursive_sieve(known_primes, n, current - 1)
_, _ -> recursive_sieve(known_primes, n, current + 1)
}
The case is matching on current (an integer) and current_is_prime (a
boolean). In the first arm, our base case, we don’t care whether
current_is_prime is True or False. We just care that prime is higher
than the square root of n, so we need to do the ol' convert the list of
booleans to a list of integers dance.
Otherwise, we return the same recursive call (for now) in two different arms.
In the first arm, we know that prime is a prime number and we will eventually
need to remove all the multiples of it from known_primes by setting those
values to False. The second arm is basically justt an else clause. It
covers the case where we are looking at a known composite number less than
n_sqrt. In that case, we just need to increment current and do the
recursive call.
Some complex filtering of common multiples
So let’s do that messy bit (you were thinking it was already messy, weren’t
you?). We need to create a new list where all the booleans at indices that are
multiples of current have been set to false. It would be much more pleasant
to just say, “we need to set the booleans at those indices to false,” but we
don’t have the luxury of mutable state.
Instead, I’m going to do a variation on that rather intimidating set function
we looked at earlier: We split the list into three pieces, namely the elements
at indices lower than current (there can’t be any multiples in this part, so
we know it ain’t gonna change), the current element, which will remain True
because we know it is prime, and the tail of the list, which contans everything
else and needs to have multiples stripped out.
prime, True -> {
let #(head, [_, ..tail]) =
known_primes
|> list.split(prime)
// TODO
}
Right, so list.split returns a tuple of head and tail, and we pattern match
on the latter to effectively drop the first element of the tail.
Ok, I think we’re getting close! Can you feel it? I can feel it.
The next step is to loop through everything in tail and create a new list
where the multiples are explicitly set to False. I’m storing the result in a
list called filtered (replaces // TODO above):
let filtered =
tail
|> list.index_map(fn(index, element) {
let am_i_multiple = { prime + index + 1 } % current
case am_i_multiple {
0 -> False
_ -> element
}
})
The index_map function is similar to |> iterate.index |> iterate.map. It’s
like a standard map function that accepts two arguments for the index and
the element value. It exists so you can do index and mapping operations on a
list without converting it to an iterator, but also without creating an
unnecessary intermediate list.
The math to determine whether or not we are dealing with a multiple of the
current value took a bit of head scratching. We have to add the prime value
to the index because that is the size of the head that we stripped off. And
we need to add another 1 because that is the True value we discarded for
the current index. Then we use a baby case statement to either set the value
to False or leave it whatever it was before (True or False depending if
it was a multiple of an earlier prime).
Now we have the original head list and a new filtered tail list and we need
to smush them back together. Don’t forget to put that True value back in
there:
list.append(head, [True, ..filtered])
|> recursive_sieve(n, current + 1)
As you can see, now that we have a new known_primes list, we can pass it into
the next call to recursive_sieve so it can do it all over again.
Tail Recursion
Recursion can be useful and the Erlang virtual machine is heavily optimised to do it efficiently (or so I’ve heard. I’m as new to this as you are). However, if you structure your recursive calls in the right way, you may be able to help the virtual machine optimise it even further.
This is where so-called “tail recursion” comes into play. If you can structure your code such that the return value of the current iteration is equivalent to the return value of the next iteration.
I’ll be honest. I thought I was implementing a non-tail-recursive function for the Sieve of Eratosthenes problem. I was doing it on purpose because I wanted to demonstrate converting it to tail recursion. But when I looked at the finished product, I realized that it is already tail recursive! Gleam’s functional paradigm kind of forced my hand. Indeed, I’m not sure how to make this into a non-tail-recursive call.
I guess I can’t really complain if a language forces me to write optimized code …except when it interferes with my ability to explain concepts!
So let’s look at why this function is tail recursive. For reference, here’s the
whole recursive_sieve function (in case you got lost as we built it up from
scratch above):
pub fn recursive_sieve(
known_primes: List(Bool),
n: Int,
current: Int,
) -> List(Int) {
assert Ok(n_sqrt) =
n
|> int.to_float
|> float.square_root
|> result.map(float.ceiling)
|> result.map(float.truncate)
assert Ok(current_is_prime) =
known_primes
|> list.at(current)
case current, current_is_prime {
prime, _ if prime > n_sqrt ->
known_primes
|> iterator.from_list
|> iterator.index
|> iterator.filter(fn(entry) { entry.1 })
|> iterator.map(fn(entry) { entry.0 })
|> iterator.to_list
prime, True -> {
let #(head, [_, ..tail]) =
known_primes
|> list.split(prime)
let filtered =
tail
|> list.index_map(fn(index, element) {
let am_i_multiple = { prime + index + 1 } % current
case am_i_multiple {
0 -> False
_ -> element
}
})
list.append(head, [True, ..filtered])
|> recursive_sieve(n, current + 1)
}
_, _ -> recursive_sieve(known_primes, n, current + 1)
}
}
The key is that index_map is called to update the known_primes before we
pass it into the next recursive_sieve call. The function doesn’t do any extra
processing on the returned value. Therefore, the virtual machine does not need
to keep the function state and local variables (called a stack frame) hanging
around while it waits for the recursive calls to complete.
Writing tail recursive functions is normally harder than a standard recursive
function. Generally you have to pass some sort of intermediate value or
accumulator (in this case, it’s known_primes) into the called function so it
has everything it needs to return everything at the end. This is starting to
sound a lot like the fold function, right? Indeed, I’m pretty sure the sieve
of Eratosthenes could be solved quite elegantly using fold. But I’ll leave
that as an exercise for the reader. Personally, I’ve had enough of primes!
Summary
This post explored the question “Why does Gleam not have loops (or even if statements)”. The answer is that looping tends to be treated differently in functional languages.
The iterator and list modules in the gleam standard library contain
everything you need to process sequences of values in interesting ways. When
there are cases that aren’t solved by these modules, you can use recursion to
perform the looping operation instead.
digressions because I haven’t been explaining things in an appropriate order. I’m not sure what I’ll write next; I’m meandering towards some stuff around OTP (open telecom platform), which is Erlang’s killer feature, but I might find some other ideas that need to be explored before we get there.