A Tutorial Introduction to Gleam -- Part 2
Introduction
This is the second in a series of articles exploring the Gleam programming language. The first article explored some of the most basic features of Gleam; just enough to say hello.
Hello is basically the first thing we learn in any language (whether human or programming). This article explores looping in Gleam. More specifically, it explores the fact that Gleam doesn’t have any looping constructs.
That’s right: none.
Patreon
If you find my content valuable, please do consider sponsoring me on Patreon account. I write about what interests me, and I usually target tutorial style articles for intermediate to advanced developers in a variety of languages.
I’m not sure how far I’m going to go with this series on Gleam, so if you’re particularly interested in the topic, a few dollars a month definitely gives me a bit of motivation to keep it up.
Immutable and Functional
To me, Gleam feels much more elegant than many of its predecessors in the functional programming wold, but at its core, it’s still a functional language. The defining feature of a pure functional language is that calling any function, given the same inputs, will have exactly the same effect. Another very common feature is immutable values.
That’s really just a bunch of long buzzwords to say “Nothing ever changes.” You can’t make a variable point to a new value, append to a list, or set a field on a record.
Considering that core purpose of a computer is to change electronic states several billion times a second, this may sound unintuitive. But actually, those states are not changing several billion times a second. Rather, it is creating a brand new state several billion times a second. And each state can be completely defined in terms of its previous state.
CPUs don’t know anything about functions or loops or if statements or any of the other niceties you have grown used to. All of that is just stuff humans put on top to make it easier to think about. Some people (strange people, in my opinion) think that loops do not make code easier to think about. Gleam programmers appear to be in that set of people.
Consider functions. In Gleam, the return value is always exactly the same if
the arguments are the same. In Python, if you call len()
on the same list
before and after you add an element to it, len()
will return different values
(len
is short for length
if you don’t know Python. Apparently six letters
is just too long for such a commonly called function).
In Gleam, if you call length
on a list, it will always return the same
number, because that list isn’t gonna change. It’s more stubborn than your
crazy uncle (If you don’t have a crazy uncle to be stubborn at you, borrow
someone else’s uncle, or substitute a crazy aunt). If you append
something to
the list you will get a new list. The old one is still there, and calling
length
on it will return the same value it always did. The new one will have
a different length
, of course, but its length ain’t gonna change either.
Append enough elements to a list and you’ll have met my extended family. We’re all pretty stubborn, to be honest. And crazy.
Iteration Matters
Excluding Go, which is satisfied with exactly one loop keyword, imperative languages typically have at least two broad classes of loops:
- those that perform some action on every item in a collection.
for ... in
orfor each
are common syntaxes. - those that perform some action repeatedly until some condition is true.
while
oruntil
are common keywords.
In functional languages such as Gleam, the first category is typically solved by calling a function on each item in the list, so no language statements are required to make shit happen. Basically, the body of the loop is a function; otherwise it’s the same thing. The second is usually solved using some form of recursion; calling the same function repeatedly from itself until the condition is met.
Data Structures Matter
The immutable nature of Gleam means that certain common operations use completely different underlying data structures. Consider lists. Most (imperative) languages implement their basic list or array data structure as a sequence of elements in memory one after another. It usually reserves some extra space “just in case” an element needs to be added to the array. If it didn’t do that, it would have to copy the whole damn thing to a different location in memory every time you added something. better to reserve, say, 2x the space the next time you copy it so you can add stuff for longer before copying it again.
It’s not like it’s that much memory, right?
But Gleam data structures are immutable. If you have a list of 10 elements, then you have a list of 10 elements. You can’t add an element to the list. You can create a new list with 11 elements in it, 10 of which are also in the old list. So there’s no point in reserving extra space.
But man, copying a bunch of data to a new piece of memory every time you want to add something to it doesn’t seem terribly performant, does it?
This is why the Erlang (which Gleam transpiles to) virtual machine uses linked lists. I tend to think of linked lists as being freaking useless unless you are coding an operating system or an interview at a company that thinks linked lists are all the rage. You’d think they were especially useless in an immutable language; if you aren’t allowed to manipulate the links, why bother?
Erlang is sneaky. I’m not saying it’s lying or anything. It just tends to cut corners. You see, if the “old” list can’t be changed, the new list can just be all like, “ok, I’m gonna point at this new element I’m adding, and it can point at the old list”. So it’s not even a list. It’s kind of a tree with lots of pointers to different heads that converge on a single tail.
Ugh, I’m having flashbacks to when I was writing blockchains. Yuck. If I wasn’t so grossed out, I’d draw a picture for you.
Note: I actually don’t know how Erlang stores this crap in memory. It’s probably sort of like I’ve described but more optimized. I’m just making it up to give you a mental model that works for me.
Anyway, this means there only needs to be one empty list, or one empty map. In the whole world. Since you can’t add anytihng to that empty thing, you know it’s not going to change. It’s just a place to start from when you create a new list from by adding something to the empty one.
Something Cool About Functions
This article is supposedly about looping, but Gleam doesn’t have any loops. (it’s also supposed to be about coding, but we haven’t written any code yet. I’m not a pathalogical liar, I swear).
So let’s talk about functions instead. I did that a bit in the previous article, but it was mostly just about how you can use the pipe to put the first argument of the function on the left side of the function name instead of inside parens on the right (This is a lot more useful than it sounds. Try it sometime).
Like all programming languages, functions can take aguments and return values. Like all programming languages, functions can take aguments and return values. Like some languages, those arguments are strongly typed (you can’t pass a string where an integer is expected), but you don’t have to specify the type if Gleam can infer it (but you should anyway, because a human in a hurry to fix a huge bug probably can’t infer it, at least not quickly).
Like no language I’ve ever seen, Gleam allows you to use different labels for the calling function from the value names used inside the called function.
For example (holy shit, he’s about to write some code!), consider this not-at-all-contrived example:
import gleam/io
import gleam/list
import gleam/int
pub fn multiply_nth_element(the_list: List(Int), index: Int, multiplier: Int) {
assert Ok(element) =
the_list
|> list.at(index)
element * multiplier
}
pub fn main() {
let some_list = [1, 2, 8, 4, 3, 2]
some_list
|> multiply_nth_element(2, 4)
|> int.to_string
|> io.println
}
There’s nothing new here yet. There’s a function called multiply_nth_element
that extracts an element from a list and multiplies it by a passed in value.
The names of the valiables all kinda make sense inside the function, but the caller might want to use different labels. So Gleam allows us to write this instead:
import gleam/io
import gleam/list
import gleam/int
pub fn multiply_nth_element(
in the_list: List(Int),
at index: Int,
by multiplier: Int,
) {
assert Ok(element) =
list.at(in: the_list, get: index)
element * multiplier
}
pub fn main() {
let some_list = [1, 2, 8, 4, 3, 2]
multiply_nth_element(in: some_list, at: 2, by: 4)
|> int.to_string
|> io.println
}
Note three things:
- The
multiply_nth_element
function signature now has two “names” for each argument.in
is used outside the function, andthe_list
is used inside. Andat
is used at the callsite, butindex
is used in the function body. I’ll leaveby/multiplier
as an exercise to the reader (it won’t be a tough workout). - Th
list.at
call has switched from pipe syntax to labelled syntax. This works becauselist.at
is defined with labelled arguments, too. - The
multiply_nth_element
call has changed from pipingsome_list
into the function to using labeled arguments, too.
The Gleam language reference indicates that this can be used to write Gleam
code that looks more like English (or any preferred human language, come to
think of it). It’s kind of neat, but I’m not sure if I like it yet. As a person
who has to call a function, I now have way too many options. I’m a Python
programmer. I like “one– and preferably only one –obvious way” mentality.
Now, given the multiply_nth_element
function, it’s extremely not obvious
which of these I should use (they all work):
multiply_nth_element(some_list, 2, 4)
(positional arguments)some_list |> multiply_nth_element(2, 4)
(pipe first arguments)multiply_nth_element(in: some_list, at: 2, by: 4)
(labeled arguments)multiply_nth_element(at: 2, in: some_list, by: 4)
(labels can be in any order)some_list |> multiply_nth_element(2, by: 4)
(How about all of the above?)
Anyway, it’s a thing that Gleam has. Moving on…
Looping For a Count
One of the most common looping mechanisms is to repeat a task a certain number
of times. In C-like languages, it looks like for (int i = 0; i < 5; i++)
,
which is frankly one of the most obnoxious syntaxes I’ve ever seen anywhere
(that’s one reason I don’t like Go as much as I should; it’s the only looping
construct the language has). Python makes it look a little better by moving the
counting part into a function named range
. This looks like for i in range(5)
.
In both cases, the loop “body” comes immediately after the syntax, and it’s implied that the body executes once for each iteration. Gleam is very similar to the Python version, except that the loop “body” is now a function call instead:
import gleam/io
import gleam/int
import gleam/iterator
pub fn main() {
iterator.range(0, 5)
|> iterator.map(fn(x) { io.println(int.to_string(x)) })
|> iterator.run()
}
First, note the iterator.range(0, 5)
. That produces an iterator of five
values, just like Python’s range
(it could be iterator.range(from: 0, to: 5)
if you wanted to be very explicit in the labeling). The resulting iterator
gets passed into iterator.map
using a pipe. This map
call converts the
existing iterator into a new iterator by calling an anonymous function
(labelled with
) on each of the elements returned by range
.
The new iterator will contain five Nil
elements. Can you see why Nil
? Every
function in Gleam returns the last evaluated expression. In this case, that is
the result of io.println
, which returns Nil
.
io.println
is one of those functions that functional purists hate because it
has side effects (printing to the screen). The run
function exists on iterators
to trigger such side effects. Until run
is called, all the other iterators only
exist in potentia. The first one, iterator.range
has the potential to
generate five values, but it won’t generate any until run
is called. That
iterator gets consumed by a interator.map
function which has the potential to
generate five Nil
s, but not until run
is called.
But when run
gets called, look out! io.println
will put the iterated
numbers on the screen, just as expected.
The Gleam code is admittedly a bit harder to read than equivalent Python:
for i in range(5):
print(i)
There are a few reasons for the difference in legibility, but it’s not really
because of functional programming. One is Python’s dynamically typed nature.
The print
function in Python kindly converts anything we pass to it into a
string so we don’t have to do it manually. Python’s lack of braces also
contributes to readability in this case. Another is that I format all of my
examples with the Gleam formatter before I post them here, and in this case,
the formatter is making the code harder to read (in my opinion) than it needs
to. If I change the imports to bring the necessary functions into scope and then
format the code the way I think it should be formatted, it looks like this:
import gleam/io.{println as print}
import gleam/int.{to_string as str}
import gleam/iterator
pub fn main() {
iterator.range(from: 0, to: 5)
|> iterator.map(fn(x) {
print(str(x))
})
|> iterator.run()
}
Still not as elegant as Python, but it’s not any worse than Java:
public class Main {
public static void main(String args[]) {
for (int i = 0; i < 5; i++) {
System.out.println(
Integer.toString(i)
)
}
}
}
(I haven’t written Java in well over a decade, and dry-coded the above from scratch based on ancient memories, so forgive me if I got it wrong).
Looping over elements in a list
The previous example showed that the loop construct in Gleam is a function call rather than a statement or an expression. Kind of a weird concept, but it works well, especially when use pipes effectively. In fact, any list of values can be converted to an iterator and then iterated over in a similar way. Consider a simple function that prints each of the command line arguments on a separate line:
import gleam/io
import gleam/erlang
import gleam/iterator
pub fn main() {
let arguments = erlang.start_arguments()
arguments
|> iterator.from_list
|> iterator.map(io.println)
|> iterator.run()
}
Recall start_arguments
from Part 1 of this series, which presents a list of
strings. We pipe this into from_list
, which creates an iterator. Then we use
the same map
and run
technique we used in the last one. Note how I
simplified the map to just use io.println
instead of creating an anonymous
function. You can use any function as a map callback so long as it accepts the
appropriate type.
A problem with this technique is that you can’t keep track of state between iterations. This is where things can get a bit interesting. Consider the following imperative Python code:
arguments = ["Hi", "I'm", "a", "string"]
count = 0
for argument in arguments:
print(f"{count}. {argument}")
count += 1
Yes, I know Python has enumerate()
to do this more elegantly, but that would
give away the secret too early. Python is primarily an imperative language, but
many of its functional constructs are quite nice.
There are a few ways we could solve this using Gleam. The key is to look at the
iterator documentation
for the gleam standard ibrary. There are a whole bunch of different functions
you can use to perform something on “each” element, possibly passing extra
state for them. For example, we could use the fold
function, which includes
some “accumulated” state with each iteration. I don’t know why it’s called
fold; I suspect it’s got something with mixing an egg into waffle batter.
pub fn main() {
let arguments = erlang.start_arguments()
arguments
|> iterator.from_list
|> iterator.fold(
0,
fn(count, x) {
io.println(string.concat([int.to_string(count), ". ", x]))
count + 1
},
)
}
Unlike map
, which takes a list and a function that accepts one argument, the
fold
function takes a list, an initial state, and a function that accepts two
arguments. That function then returns (the return value is the last expression
in the function) the new state. In this case, our state is an integer.
Initially it is zero, and we return it incremented by 1 for every function call
processing an element in the list.
The state can be anything. Here’s a similar example that stores the list of
strings to be printed in the state and only calls io.println
at the end:
pub fn main() {
let arguments = erlang.start_arguments()
arguments
|> iterator.from_list
|> iterator.fold(
[],
fn(output, x) {
let count = list.length(output) / 4
output
|> list.append([int.to_string(count), ". ", x, "\n"])
},
)
|> string.concat
|> io.print
}
Wow, that’s pretty long (and I even snipped out a bunch of gleam imports that
you’ll have to find for yourself. Hint: import gleam/<something>
). In this
example, the “state” is a list of strings. Each call to the function adds
four strings to the returned state:
- the count
- a
.
separator - the argument
- and a newline
We still need the count of the number of inputs, so we divide the length of the
big list by 4. Then we call list.append
, which merges two distinct lists into
one and returns the new bigger list. That result gets used as state in the next
function call.
Once all the elements are processed, the resulting list is passed to
string.concat
and that result is piped to io.print
. Note that it is
io.print
instead of io.println
since we appended a trailing newline already
while constructing the string.
Note that the list.length
function needs to iterate over all the elements in
the list every time, so it’s not very efficient. You could combine the two
above methods by making the state passed in use a tuple of #(count, output)
:
pub fn main() {
let arguments = erlang.start_arguments()
arguments
|> iterator.from_list
|> iterator.fold(
#(0, []),
fn(state, x) {
let #(count, output) = state
let state =
output
|> list.append([int.to_string(count), ". ", x, "\n"])
#(count + 1, state)
},
)
|> pair.second
|> string.concat
|> io.print
}
Gleam Tuples
By the way, tuples in gleam are specified as #(val1, val2, ...)
. They can be
heterogenous (the diferent values inside don’t have to be the same type). You
can access individual values on them using my_tuple.0
or my_tuple.1
, or you
can extract all the values using a let
pattern match such as let #(val1, val2, val3) = mytuple
. These both kind of mess up the pretty pipeline syntax,
so instead, I used the
https://hexdocs.pm/gleam_stdlib/gleam/pair.html standard library
module to extract the second value. The count isn’t relevant anymore.
Use the index function to enumerate
I was thinking of exploring some of the other methods to solve this problem,
such as using reduce
kind of like fold
, or using zip
to mix an iterator
with a range of the same length. But I got bored, so let’s just skip to the
correct solution: a function that knows how to add indices to the list already
(like Python’s enumerate
; the secret I didn’t want to give away).
In Gleam, it’s called index
instead of enumerate
, but it works the same. It
consumes an iterator and spits out a new iterator over tuples of #(index, value)
. Here’s an example:
pub fn main() {
let arguments = erlang.start_arguments()
arguments
|> iterator.from_list
|> iterator.index
|> iterator.map(fn(x) {
io.println(string.concat([int.to_string(x.0), ". ", x.1]))
})
|> iterator.run
}
This time I’ve gone back to doing the println
side effect inside the map
function.
Continue
Most imperative languages have a continue
statement that allows you to end a
loop early. In fact, continue
is my favourite keyword because it means the
same thing and takes the same form in virtually every language that has it.
The continue
keyword is typically used in one of two situations:
- filtering: You are looping over elements, processing them, and adding a subset of the elements to another list.
- early “returns”: You have determined that the operation in question does not apply to the current state of the loop, so there’s no further processing required. This usually happens with corner cases, failures, or null responses, for example.
Both of these are modelled easily using Gleam’s functional tools. Filtering is
especially easy, as there is an aptly-named iterator.filter
function that you
can use. I’m having trouble coming up with a reasonable example, so let’s
imagine something that filters out odd numbers (They’re called “odd” for a
reason. I’m odd, too.):
import gleam/io
import gleam/iterator
import gleam/string
pub fn main() {
iterator.range(0, 50)
|> iterator.filter(fn(x) { x % 2 == 0 })
|> iterator.map(int.to_string)
|> iterator.map(io.println)
|> iterator.run
}
This example has a few interesting points to discuss. The filter
call is the
most pertinent. It’s kind of like map
except instead of returning a new type
of value, the callback function returns either true
or false
, reflecting
whether the value is good enough to be included in the iterator’s output. In
this case, it discards anything where x % 2
is non-zero, which is to say, odd
numbers.
Also note how the double use of map
makes this pipeline trivially easy to
read. The first map takes in an iterator of integers and returns an iterator of
strings. The second one takes an iterator of integers, prints each one to the
console, and returns an iterator of Nil
values. This is a easier to read
than a single map call that tries to do the conversion and printing in one step:
pub fn main() {
iterator.range(0, 50)
|> iterator.filter(fn(x) { x % 2 == 0 })
|> iterator.map(fn(x) {
x
|> int.to_string
|> io.println
})
|> iterator.run
}
The hard part is determining whether the problem you are trying to solve can be formulated as a filter operation! Not all filters are of the “just remove it from the list” variety.
Break
The break
statement is not as delightful as continue
because different
languages use it for different things. I am very specifically talking about the
way C-like languages use break
in switch
statements. If you have no idea
what I’m talking about, just ignore this paragraph.
I mentioned that the example in the previous section wasn’t very good. That’s
because, in my opinion, it would have made more sense to never produce the
odd numbers in the first place.Then we’d never have to filter them out.
Honestly, for this specific situation, I think the range
function should
accept a step
argument. But it doesn’t, so I wrote my own using unfold
:
pub fn step_range(
from start: Int,
to stop: Int,
by step: Int,
) -> iterator.Iterator(Int) {
iterator.unfold(
start,
fn(x) {
case x {
y if y < stop -> iterator.Next(element: y, accumulator: y + step)
_ -> iterator.Done
}
},
)
}
pub fn main() {
step_range(0, 50, 2)
|> iterator.map(int.to_string)
|> iterator.map(io.println)
|> iterator.run
}
I love this example because, in spite of being very ugly, it allows us to
explore so many new concepts! As you might guess from the name, the unfold
function is kind of the opposite of fold
. Where fold is given an iterator and
a function to create a single value, unfold
is given a single value and a
function to create an iterator.
The key to understanding unfold
is its return value, which is an instance of
the Step type.
Step
can take one of two shapes, Next
if a value should be yielded from the
iterator, or Done
if there are no more values to yield. Union types like this
are really common in functional languages. I think the first time I encountered
them was while coding Rust. It took a bit of getting used to, but once I
understood the concept I found them to be extremely valuable.
I use a case
statement (I didn’t have much choice, since case
is the only
conditional in Gleam) to decide which of these two types (Next
and Done
)
should be returned from the anonymous function. I discussed case
in part 1 of
thise series, but I think this is the first time we’ve encountered guards.
Really, guards are just if
statements. They say, “In addition to matching the
pattern, the value that is matched must meet the boolean condition to the right
of the if
.” In this case, if the value x
(which is captured as y
inside
the arm) is less than stop
, we return the Next
value. Otherwise we’re done
iterating.
Anyway, Done
is the equivalent of break
in an imperative looping language.
You can return it any time you want to break out of the “body” of a loopy
function. In this example, I’m “breaking” when the counter gets above a certain
value.
The Next
type can be tricky to understand because it is trying to model the
opposite of fold
under the hood. The Next
constructor accepts two values.
element
is the value that is being yielded from the iterator. accumulator
is the value that is passed into the next call to the anonymous function.
So on each step, we are yielding the “current” element, but also accumulating enough state to calculate the “next” element.
This code is admittedly kind of ugly and hard to read. I think that’s just the
nature of unfold
. It is probably more efficient than using range
plusfilter
, but I would actually be inclined to use the latter just for code
legibility.
Recursion
Most of the time, one of the iterator functions will be the right tool for the
job. It may take a bit of getting used to, but if you think of the “body of the
loop” as a function, it isn’t that much different from what you’re probably
used to. I haven’t used Gleam enough to know how well this paradigm works in
practice, but my experiences with other semi-functional language suggests that
it works well. I generally counsel piping data through functions in the
iterator
module when you can, but when all else fails, you can use recursion.
Recursion is notoriously hard to teach because:
a. it is a bit mind-bending if you haven’t encountered it before. b. it isn’t nearly so useful as teachers and bloggers like to think.
Proof of b: most tutorials on recursion start with the fibonacci sequence, yet fibonacci is actually most efficiently solved using loops (at least, in languages that have loops). In Gleam, unfold would be a great solution to fibonacci.
Recursion always has two features:
- A base case, which can trivially return a value.
- A recursive case, where the function calls itself with new inputs that are somehow “closer” to the base case.
I had a quirky university prof describe recursion as the “Ask a friend” algorithm: Let’s say you have a stack of dishes you need to wash. You are quite lazy, so you wash one dish and ask a friend to wash the rest of them. Every friend does this, and eventually the last friend sees there are no more dishes to wash, so the recursion stops.
Recursion is all about understanding exactly one iteration of the loop. You don’t think about what your friend is going to do (your mind will get very bent if you try to trace it out). You just look at the inputs you have, perform the work in front of you, and decide whether to pass the rest of it off to a friend or return the “base” value.
To illustrate, let’s make a function that returns a list of all prime numbers
between 1
and n
. This is a fun recursive example because there is a
well-known looping algorithm that supports it (The Sieve of
Eratosthenes). But Gleam
doesn’t have loops so we’ll have to port the algorithm to recursion. Fun
problem? No, but at least it’s illustrative.
I’m just gonna hijack the pseudocode from the linked wikipedia article:
algorithm Sieve of Eratosthenes is
input: an integer n > 1.
output: all prime numbers from 2 through n.
let A be an array of Boolean values, indexed by integers 2 to n,
initially all set to true.
for i = 2, 3, 4, ..., not exceeding √n do
if A[i] is true
for j = i2, i2+i, i2+2i, i2+3i, ..., not exceeding n do
set A[j] := false
return all i such that A[i] is true.
Cool, let’s port that to Gleam, but with a caveat: This is not going to be an efficient implementation in a functional language. The problem is that in an immutable language, we cannot set a value in the list. Instead, we need to create a new list from the existing one.
A diversion to set values in lists
In fact, it’s so inefficient that Gleam’s standard library doesn’t appear to have a function to “set” a value at an index in a list, so let’s explore that, first.
Note: We won’t actually be using this code, but it is useful to understand it when we get around implementing the actual sieve.
pub fn set(on: List(a), at index: Int, to value: a) -> List(a) {
let #(before, [_old_element, ..after]) = list.split(on, at: index)
list.append(before, [value, ..after])
}
Well there’s three lines of code full of new concepts. Note to self: If I ever
turn this into a book, cover functions and generics before looping.
This function is generic in that it can operate on List
objects that
contain any value. That’s what the two List(a)
mean in the function
signature. It’s saying, “I don’t care what type of value the list contains, but
whatever it is, I will return a new List
that contains the same type of
element.
a
is used again for the type of the value
that is being set. We don’t know
what type the value is, but List
s are homogenous, so it obviously has to be
the same type as is stored in the incoming and outgoing List
s.
The second line is probably also going to require two paragraphs of
explanation! The list.split
function returns a tuple containing two lists:
one containing all the elements before index
, and one containing all the
elements after index
. We use pattern matching in a let
to extract the two
lists into separate variables.
But Gleam pattern matching is pretty magical, so there’s more! My plan is to
discard the element currently at index
so I can concatenate everything back
together with a new element in its place. So while I pattern match on the two
lists, I am also using Gleam’s list syntax for extracting the first element of
the list: [name_for_first_element, ..the_rest_of_the_list]
.
If that line is still confusing, you can split it up:
let #(before, old_with_after) = list.split(on, at: index)
let [_old_element, ..after] = old_with_after
The third line (fourth if you did the splitty-uppy thingy) uses the inverse
operator to prefix an element onto an existing list. We are effectively
creating two new lists here. The first one we create contains the new value
prefixed to the after
list. The second one appends it to the existing
before
list.
The result is a list with all the elements before the index we wanted to change, the new element, and all the elements after that index. Which is what we wanted.
I’m not sure how expensive splitting and rejoining is in erlang, but I’m
guessing we just iterated over the before
part of the list at least twice to
cause all this to happen. Erlang is probably able to reuse existing elements,
so there probbaly isn’t a bunch of copying going on, but the simple truth is
linked lists really shouldn’t be treated like arrays.
Back to recursion
Oh, right. We were talking about that sieve thing. Let’s talk about it some more.
We clearly need a function that, given only n
, returns an array of numbers:
pub fn sieve(n: Int) -> List(Int) {
// ??
}
This makes sense in terms of the function that a caller of our new API would
want to use, but it’s not enough to be our recursive function. The recursive
function needs to know about the array of booleans as well as n
. We’ll do the
rest of the sieve
function before we code up the recursive function so we
know what the called function needs to accept and return.
The main thing we need is a list of booleans of the appropriate size. For fun,
I’m going to show two ways we can create this using the iterator
module (The
magic 2 is the first prime number):
pub fn sieve(n: Int) -> List(Int) {
True
|> iterator.repeat
|> iterator.take(n)
|> iterator.to_list
|> recursive_sieve(n, 2)
}
This introduces the iterator.repeat
and the iterator.take
functions. The
first creates an infinite iterator that spits out the same value on every
iteration. The second truncates that iterator to a certain number of elements.
So we create an infinite iterator of True
values, then truncate to a fixed
number, then convert to a list.
The second option will look a little more familiar, as it uses the range
and
map
functions we’ve seen before:
pub fn sieve(n: Int) -> List(Int) {
iterator.range(0, n)
|> iterator.map(fn(x) { True })
|> iterator.to_list
|> recursive_sieve(n, 2)
}
I think the first version looks tidier, but either way, we now know how our
recursive_sieve
function should look. Just to make sure I got the signature
right, I created a dummy implementation of recursive_sieve
that just returns
an array of identical values (5, because 5 is… well, it’s the number I
happened to hit on my keyboard):
pub fn recursive_sieve(known_primes: List(Bool), n: Int, current: Int) -> List(Int) {
known_primes
|> list.map(fn(x) { 5 })
}
I’ve also created a main
function as follows:
pub fn main() {
sieve(10)
|> iterator.from_list
|> iterator.map(int.to_string)
|> iterator.map(io.println)
|> iterator.run
}
With the sieve
and recursive_sieve
functions implemented as above, this now
compiles and spews out a bunch (10, to be exact) of 5
s. Five is prime, right,
so we’re done?
All your sieve’s base are belong to us
Recall that recursion requires a base case and a recursive case that is somehow
“smaller” than the original call. Looking at the original “loop-based
pseudocode”, the base case is when our current
index reaches the square root
of n
.
So the recursive
case somehow needs to do some work and then get our
current
number closer to the square root of n
.
At least that’s easy. We just have to incrementing it.
But I need to get me a square root first. Took a bit of searching in the
standard library; we need a float rather than an int, then we can convert the
result back to an int with ceiling
and truncate
:
assert Ok(n_sqrt) =
n
|> int.to_float
|> float.square_root
|> result.map(float.ceiling)
|> result.map(float.truncate)
As you can see, pipelines are very common in functional programming. I didn’t like them much when I first tried it, but the syntax has grown on me. It’s very similar to piping in bash.
We have to do the last two float
operations inside result.map
because
float.square_root
may have returned an error (if you tried to do square root
of a negative value, for example; Gleam doesn’t appear to support complex
numbers). result.map
says, “if the result is ok, apply this function to the
value inside the result otherwise don’t do anything”.
Then we unwrap the result value using assert
. This makes the program crash if
a square root can’t be calculated. That isn’t isn’t very nice, but if somebody
passed a negative value to calculate a prime, then they deserve what they get,
right?
Now let’s set up our base case. Once n
gets to be higher than our square
root, we somehow need to convert our list of booleans to a list of integers.
The list of integers need to be the indexes of any integers that have True
values in the known_primes
list. Sounds like a pretty nasty pipeline, eh?
known_primes
|> iterator.from_list
|> iterator.index
|> iterator.filter(fn(entry) { entry.1 })
|> iterator.map(fn(entry) { entry.0 })
|> iterator.to_list
So… we convert the list of primes to an iterator, then we turn it into an
interator over #(index, value)
tuples, then we filter to only include entries
that have True
in the second element of the tuple, then we map it over the
first element in that tuple (which is the index), and convert the result to a
list. The code is definitely easier to read than this paragraph, but at least
we’ve seen all these functions before.
Let’s recurse that sieve!
The trick is knowing when we need to call this base case instead of the recursive one. We can start with a simple guard on a case statement:
case current {
prime if prime > n_sqrt ->
known_primes
|> iterator.from_list
|> iterator.index
|> iterator.filter(fn(entry) { entry.1 })
|> iterator.map(fn(entry) { entry.0 })
|> iterator.to_list
_ -> recursive_sieve(known_primes, n, current + 1)
}
We aren’t actually doing any sieving yet. In the base case (current > n_sqrt
), we convert the known_primes
array as described above. Right now,
that just returns all the values because known_primes
is chock full of True
values. In the (current) recursive case, we just increment current
and call
ourselves with the new values.
Of course, this isn’t the correct behaviour. If I call sieve(15)
, the number
12
should definitely not show up in it. I love the number 12 because it’s
divisible by most of the numbers less than it. I have a theory that eggs and
buns come in multiples of 12 because they can be evenly divided between members
of a family with 1, 2, 3, 4, or 6 people in it. So long as you and your spouse
don’t have exactly three kids, you’re golden. Then again, some of them probably
don’t like eggs…
Our recursive case actually needs to perform slightly differently depending on
whether the value at the current
index in known_primes
is True
or
not. If its False
, we are already looking at a non-prime (aka composite)
value, so we can do the simple recursive case the way it is. But if it is
True
we need to strip out all the multiples of that value.
I guess the first thing we need to know is whether or not the current value is actually prime:
assert Ok(current_is_prime) =
known_primes
|> list.at(current)
The assert
in this case is safe because we know that current
will never be
higher than n_sqrt
, and we know that known_primes
contains n
boolean
values. Therefore, at
will never be greater than n
and will not fail.
Now we need to set up our recursive cases. Turns out there are two of them. We
could put a nested case
inside the recursive arm of the current case
, but
let’s explore another option: matching on two values at the same time.
case current, current_is_prime {
prime, _ if prime > n_sqrt ->
known_primes
|> iterator.from_list
|> iterator.index
|> iterator.filter(fn(entry) { entry.1 })
|> iterator.map(fn(entry) { entry.0 })
|> iterator.to_list
prime, True -> recursive_sieve(known_primes, n, current - 1)
_, _ -> recursive_sieve(known_primes, n, current + 1)
}
The case
is matching on current
(an integer) and current_is_prime
(a
boolean). In the first arm, our base case, we don’t care whether
current_is_prime
is True
or False
. We just care that prime
is higher
than the square root of n
, so we need to do the ol' convert the list of
booleans to a list of integers dance.
Otherwise, we return the same recursive call (for now) in two different arms.
In the first arm, we know that prime
is a prime number and we will eventually
need to remove all the multiples of it from known_primes
by setting those
values to False
. The second arm is basically justt an else
clause. It
covers the case where we are looking at a known composite number less than
n_sqrt
. In that case, we just need to increment current
and do the
recursive call.
Some complex filtering of common multiples
So let’s do that messy bit (you were thinking it was already messy, weren’t
you?). We need to create a new list where all the booleans at indices that are
multiples of current
have been set to false
. It would be much more pleasant
to just say, “we need to set the booleans at those indices to false,” but we
don’t have the luxury of mutable state.
Instead, I’m going to do a variation on that rather intimidating set
function
we looked at earlier: We split the list into three pieces, namely the elements
at indices lower than current
(there can’t be any multiples in this part, so
we know it ain’t gonna change), the current element, which will remain True
because we know it is prime, and the tail of the list, which contans everything
else and needs to have multiples stripped out.
prime, True -> {
let #(head, [_, ..tail]) =
known_primes
|> list.split(prime)
// TODO
}
Right, so list.split
returns a tuple of head and tail, and we pattern match
on the latter to effectively drop the first element of the tail.
Ok, I think we’re getting close! Can you feel it? I can feel it.
The next step is to loop through everything in tail
and create a new list
where the multiples are explicitly set to False
. I’m storing the result in a
list called filtered
(replaces // TODO
above):
let filtered =
tail
|> list.index_map(fn(index, element) {
let am_i_multiple = { prime + index + 1 } % current
case am_i_multiple {
0 -> False
_ -> element
}
})
The index_map
function is similar to |> iterate.index |> iterate.map
. It’s
like a standard map
function that accepts two arguments for the index
and
the element value. It exists so you can do index and mapping operations on a
list without converting it to an iterator, but also without creating an
unnecessary intermediate list.
The math to determine whether or not we are dealing with a multiple of the
current value took a bit of head scratching. We have to add the prime
value
to the index because that is the size of the head
that we stripped off. And
we need to add another 1
because that is the True
value we discarded for
the current index. Then we use a baby case
statement to either set the value
to False
or leave it whatever it was before (True
or False
depending if
it was a multiple of an earlier prime).
Now we have the original head
list and a new filtered
tail list and we need
to smush them back together. Don’t forget to put that True
value back in
there:
list.append(head, [True, ..filtered])
|> recursive_sieve(n, current + 1)
As you can see, now that we have a new known_primes
list, we can pass it into
the next call to recursive_sieve
so it can do it all over again.
Tail Recursion
Recursion can be useful and the Erlang virtual machine is heavily optimised to do it efficiently (or so I’ve heard. I’m as new to this as you are). However, if you structure your recursive calls in the right way, you may be able to help the virtual machine optimise it even further.
This is where so-called “tail recursion” comes into play. If you can structure your code such that the return value of the current iteration is equivalent to the return value of the next iteration.
I’ll be honest. I thought I was implementing a non-tail-recursive function for the Sieve of Eratosthenes problem. I was doing it on purpose because I wanted to demonstrate converting it to tail recursion. But when I looked at the finished product, I realized that it is already tail recursive! Gleam’s functional paradigm kind of forced my hand. Indeed, I’m not sure how to make this into a non-tail-recursive call.
I guess I can’t really complain if a language forces me to write optimized code …except when it interferes with my ability to explain concepts!
So let’s look at why this function is tail recursive. For reference, here’s the
whole recursive_sieve
function (in case you got lost as we built it up from
scratch above):
pub fn recursive_sieve(
known_primes: List(Bool),
n: Int,
current: Int,
) -> List(Int) {
assert Ok(n_sqrt) =
n
|> int.to_float
|> float.square_root
|> result.map(float.ceiling)
|> result.map(float.truncate)
assert Ok(current_is_prime) =
known_primes
|> list.at(current)
case current, current_is_prime {
prime, _ if prime > n_sqrt ->
known_primes
|> iterator.from_list
|> iterator.index
|> iterator.filter(fn(entry) { entry.1 })
|> iterator.map(fn(entry) { entry.0 })
|> iterator.to_list
prime, True -> {
let #(head, [_, ..tail]) =
known_primes
|> list.split(prime)
let filtered =
tail
|> list.index_map(fn(index, element) {
let am_i_multiple = { prime + index + 1 } % current
case am_i_multiple {
0 -> False
_ -> element
}
})
list.append(head, [True, ..filtered])
|> recursive_sieve(n, current + 1)
}
_, _ -> recursive_sieve(known_primes, n, current + 1)
}
}
The key is that index_map
is called to update the known_primes
before we
pass it into the next recursive_sieve
call. The function doesn’t do any extra
processing on the returned value. Therefore, the virtual machine does not need
to keep the function state and local variables (called a stack frame) hanging
around while it waits for the recursive calls to complete.
Writing tail recursive functions is normally harder than a standard recursive
function. Generally you have to pass some sort of intermediate value or
accumulator (in this case, it’s known_primes
) into the called function so it
has everything it needs to return everything at the end. This is starting to
sound a lot like the fold
function, right? Indeed, I’m pretty sure the sieve
of Eratosthenes could be solved quite elegantly using fold
. But I’ll leave
that as an exercise for the reader. Personally, I’ve had enough of primes!
Summary
This post explored the question “Why does Gleam not have loops (or even if statements)”. The answer is that looping tends to be treated differently in functional languages.
The iterator
and list
modules in the gleam standard library contain
everything you need to process sequences of values in interesting ways. When
there are cases that aren’t solved by these modules, you can use recursion to
perform the looping operation instead.
digressions because I haven’t been explaining things in an appropriate order. I’m not sure what I’ll write next; I’m meandering towards some stuff around OTP (open telecom platform), which is Erlang’s killer feature, but I might find some other ideas that need to be explored before we get there.