A Tutorial Introduction to Gleam -- Part 5
Introduction
This is the fifth in a series of articles exploring the Gleam programming language. In the most recent article, we started exploring how Gleam interfaces with ERLang’s powerful OTP concurrency framework to brute force some passwords. However, it was suboptimal, partially because I didn’t know what I was doing, and partially because I didn’t have time to go into some of the deeper details. I also had a super valuable tip from the Gleam discussion group that I wanted to go into.
Patreon
If you find my content valuable, please do consider sponsoring me on Patreon account. I write about what interests me, and I usually target tutorial style articles for intermediate to advanced developers in a variety of languages.
I’m not sure how far I’m going to go with this series on Gleam, so if you’re particularly interested in the topic, a few dollars a month definitely gives me a bit of motivation to keep it up.
Other Articles in Series
I’ve collected links to the articles in this series here.
First, A Correction: Function Capturing is Awesome
In the last article, I mentioned that I didn’t like the API design around the
crypto.hash
function because I couldn’t pipe a value into it. Turns out I was
wrong about this. There is a special argument name, _
, that we can use to
create partially applied functions. So, for example, I can clean up the
following code:
assert Ok(password) =
erlang.start_arguments()
|> list.at(0)
|> result.map(bit_string.from_string)
let target_hash = crypto.hash(crypto.Sha256, password)
to look like this:
assert Ok(target_hash) =
erlang.start_arguments()
|> list.at(0)
|> result.map(bit_string.from_string)
|> result.map(crypto.hash(crypto.Sha256, _))
This function capturing syntax says, “Create a new function that accepts
exactly one argument, and passes that argument into the original crypto-hash
function as the second parameter.” You may have seen this in other languages,
using words like “currying”, or “partials”. I prefer the latter because it
reminds me of the
book
(sneaky sneaky warning: That’s an affiliate link!) by the excellent author Dan
Wells.
We can use this syntax to tidy up a couple other places in our permute_strings.gleam
code. For example, consider the process.start
code in count_permutations
:
process.start(fn() {
let count =
permutations_recursive(letters(), n - 1)
|> list.length
process.send(sender, count)
})
We don’t need that count
variable anymore! Such beauty:
process.start(fn() {
permutations_recursive(letters(), n - 1)
|> list.length
|> process.send(sender, )
})
It’s a small change, but much more readable.
Indeed, the entire count_permutations
function can be simplified from the original:
pub fn count_permutations(count: Int) -> Result(Int, String) {
case count {
x if x < 0 -> Error("count must be non-negative")
0 -> Ok(0)
n -> {
let receivers =
letters()
|> list.map(fn(letter) {
let #(sender, receiver) = process.new_channel()
process.start(fn() {
let count =
permutations_recursive(letters(), n - 1)
|> list.length
process.send(sender, count)
})
receiver
})
Ok(
receivers
|> list.map(process.receive_forever)
|> list.fold(0, fn(acc, x) { acc + x }),
)
}
}
}
to a bit less indented (indentation is very good approximation of complexity, I find) version:
pub fn count_permutations(count: Int) -> Result(Int, String) {
case count {
x if x < 0 -> Error("count must be non-negative")
0 -> Ok(0)
n ->
letters()
|> list.map(fn(_letter) {
let #(sender, receiver) = process.new_channel()
process.start(fn() {
permutations_recursive(letters(), n - 1)
|> list.length
|> process.send(sender, _)
})
receiver
})
|> list.map(process.receive_forever)
|> list.fold(0, fn(acc, x) { acc + x })
|> Ok
}
}
We can similarly remove receivers
and found
from hash_permutations
, which currently looks like this:
pub fn hash_permutations(
target: BitString,
count: Int,
) -> Result(String, String) {
case count {
x if x < 0 -> Error("count must be non-negative")
0 -> Error("No Match")
n -> {
let receivers =
letters()
|> list.map(fn(letter) {
let #(sender, receiver) = process.new_channel()
process.start(fn() {
let found =
permutations_recursive(letters(), n - 1)
|> list.map(fn(s) {
let candidate =
string_builder.append_builder(letter, s)
|> string_builder.to_string
let hashed_candidate =
crypto.hash(
crypto.Sha256,
candidate
|> bit_string.from_string,
)
case hashed_candidate {
t if t == target -> Some(candidate)
_ -> option.None
}
})
process.send(sender, found)
})
receiver
})
receivers
|> list.flat_map(process.receive_forever)
|> list.find_map(fn(opt) { option.to_result(opt, Nil) })
|> result.replace_error("No Match")
}
}
}
I had hopes of removing candidate
and hashed
candidate, but both values are needed in the conditional (one in the comparison, one in the return value), so I had to settle for reordering their pipeline:
pub fn hash_permutations(
target: BitString,
count: Int,
) -> Result(String, String) {
case count {
x if x < 0 -> Error("count must be non-negative")
0 -> Error("No Match")
n ->
letters()
|> list.map(fn(letter) {
let #(sender, receiver) = process.new_channel()
process.start(fn() {
permutations_recursive(letters(), n - 1)
|> list.map(fn(s) {
let candidate =
string_builder.append_builder(letter, s)
|> string_builder.to_string
let hashed_candidate =
candidate
|> bit_string.from_string
|> crypto.hash(crypto.Sha256, _)
case hashed_candidate {
t if t == target -> Some(candidate)
_ -> option.None
}
})
|> process.send(sender, _)
})
receiver
})
|> list.flat_map(process.receive_forever)
|> list.find_map(fn(opt) { option.to_result(opt, Nil) })
|> result.replace_error("No Match")
}
}
This code is still a lot to digest because it’s trying to do too much in one place. Let’s fix that next.
Splitting Out Concurrency
The function passed into process.start
is big enough to stand on its own. Let’s split it out into a few separate functions so we can “see” the separate threads of control more easily. First, let’s create a check_candidate
function, which checks if a candidate matches the target hash and returns an Option
one way or another:
fn check_candidate(candidate: String, target: BitString) {
let hashed_candidate =
candidate
|> bit_string.from_string
|> crypto.hash(crypto.Sha256, _)
case hashed_candidate {
t if t == target -> Some(candidate)
_ -> option.None
}
}
Let’s also create a function that checks all the strings of a certain length to see if any of them meet a certain target value:
fn check_count(
count: Int,
letter: string_builder.StringBuilder,
target: BitString,
sender: process.Sender(List(option.Option(String))),
) {
permutations_recursive(letters(), count - 1)
|> list.map(fn(s) {
string_builder.append_builder(letter, s)
|> string_builder.to_string
|> check_candidate(target)
})
|> process.send(sender, _)
}
This covers the work that is happening in each separate thread (remember, there is one thread per letter right now). So the hash_permutations
function gets simplified to:
pub fn hash_permutations(
target: BitString,
count: Int,
) -> Result(String, String) {
case count {
x if x < 0 -> Error("count must be non-negative")
0 -> Error("No Match")
n ->
letters()
|> list.map(fn(letter) {
let #(sender, receiver) = process.new_channel()
process.start(fn() { check_count(n, letter, target, sender) })
receiver
})
|> list.flat_map(process.receive_forever)
|> list.find_map(fn(opt) { option.to_result(opt, Nil) })
|> result.replace_error("No Match")
}
}
Now this function is doing only two things: Kicking off the processes and searching the results. And now that I look at it, I see that the searching part is really very expensive. Let’s fix that.
Scan The List In Subprocess
The processes are currently returning a huge list of Options
, almost all of
which are None
. Searching this list for a single Some
value seems like a
big waste of time, especially since it is all being done in the main process
with no parallelism. Let’s do the flattening and searching inside check_count
instead, so we have a much smaller set of values to search in parallel.
Start by changing the sender
argument tocheck_count
from a
process.Sender(List(option.Option(String))
to remove the List
part. Then
it’s mostly a matter of following compiler errors until you’re sending a single
Option
. I found a couple different ways to construct a pipeline that flattens
the result. Here’s one:
fn check_count(
count: Int,
letter: string_builder.StringBuilder,
target: BitString,
sender: process.Sender(option.Option(String)),
) {
permutations_recursive(letters(), count - 1)
|> list.map(fn(s) {
string_builder.append_builder(letter, s)
|> string_builder.to_string
|> check_candidate(target)
})
|> option.values
|> list.first
|> option.from_result
|> process.send(sender, _)
}
This variation uses option.values
to extract all the Some
options from the
list, and then uses list.first
to get the first one if there is one. This
function returns a Result
; if there were no Some
values, this will return
an Error
. Either way, we convert the Result(String)
to an Option(String)
to be returned via the sender.
This is not the most efficient solution to this problem, however. The values
call has to loop through all the options, even though we immediately throw away
any options after the first one. Further, it is extremely unlikely that there
will be more than one matching value in the list (hash collisions are possible,
but not realistically probable). So let’s use a variation that stops searching
as soon as we find something that matches:
fn check_count(
count: Int,
letter: string_builder.StringBuilder,
target: BitString,
sender: process.Sender(option.Option(String)),
) {
permutations_recursive(letters(), count - 1)
|> list.map(fn(s) {
string_builder.append_builder(letter, s)
|> string_builder.to_string
|> check_candidate(target)
})
|> list.find(option.is_some)
|> option.from_result
|> option.flatten
|> process.send(sender, _)
}
This code uses list.find
to search theList(Option(String))
for an is_some
option. That will return a Result(Option(String))
depending on whether a
match was found. The option.from_result
function converts this to an
Option(Option(String))
type, which can be converted to a single level
option
using option.flatten
. In the (normal) case where there are no
matches, it still has to loop over everything, but at least when we find a
match, we can return it immediately.
You will also need to make a small change to hash_permutations
. The
list.flat_map(process.receive_forever)
needs to be changed to a simple
list.map
. There is no flattening necessary anymore, because receive_forever
is now receiving options instead of options of lists.
This small change made searching almost twice as fast as the version that had to do all the searching in the outer function!
One Channel
There is no law saying that a channel can only be sent to or received from
exactly one process. We can change our hash_permutations
function to make
a single channel at the beginning of the function instead of creating 26
channels, one for each letter:
pub fn hash_permutations(
target: BitString,
count: Int,
) -> Result(String, String) {
let #(sender, receiver) = process.new_channel()
case count {
x if x < 0 -> Error("count must be non-negative")
0 -> Error("No Match")
n ->
letters()
|> list.map(fn(letter) {
process.start(fn() { check_count(n, letter, target, sender) })
receiver
})
|> list.map(process.receive_forever)
|> list.find_map(fn(opt) { option.to_result(opt, Nil) })
|> result.replace_error("No Match")
}
}
Here, I’ve moved the let #(sender, receiver)
out to the top level so we only
have one channel instead of one per process. The list.map
still returns a
reciever
, but now it’s the same receiver
every time. I did it this way to
ensure that the list.map(process.receive_forever)
call is still called
exactly 26 times (one for each alphabet-inspired process).
These 26 calls are needed to be sure that the subprocess is finished; each one
returns None
if it can’t find what it’s looking for.
We can simplify… well honestly, we’re making it more complicated, so maybe we
should say we can complify the system so that we send at most one value into
the channel: the successfully found candidate. If we do this, though, we need
some other method of determining whether the 26 processes have completed. Let’s
start by removing the Option
from the channel. Each subprocess will either
send a successful candidate into the channel or not send anything at all.
Start by modifying check_count
so it calls process.send
inside an
option.map
call:
fn check_count(
count: Int,
letter: string_builder.StringBuilder,
target: BitString,
sender: process.Sender(String),
) {
permutations_recursive(letters(), count - 1)
|> list.map(fn(s) {
string_builder.append_builder(letter, s)
|> string_builder.to_string
|> check_candidate(target)
|> option.map(process.send(sender, _))
})
}
Once again, we can use that fancy _
partial argument to craft the mapped
function. Notice that I’ve removed (We’re still in simplify mode) the list
searching stuff at the end of the function. Because it’s called inside
option.map
, the process.send
call is now only called if the value is Some
(in other words, if a matching candidate was found.) This is all pretty cool,
but we’ve got some compiler errors because hash_permutations
thinks it is
still working with an channel that sends 26 Option
s, not at most one
String
. Let’s fix that next:
pub fn hash_permutations(
target: BitString,
count: Int,
) -> Result(String, String) {
let #(sender, receiver) = process.new_channel()
case count {
x if x < 0 -> Error("count must be non-negative")
0 -> Error("No Match")
n -> {
letters()
|> list.map(fn(letter) {
process.start(fn() { check_count(n, letter, target, sender) })
})
process.receive_forever(receiver)
|> Ok
}
}
}
The main difference here is that we’ve removed the list.map
over
process.receive_forever
. Instead, we just call receive_forever
directly on
the receiver
. We can (sort of) safely pipe the result directly into Ok
,
since the receiver now only receives successfully hashed candidates.
This works great so long as the argument you pass in is actually one that our
brute force algorithm can find. But there is no guarantee that the targeted
value is actually going to be find. At the moment, my call to
hash_permutations
looks like this:
case permute_strings.hash_permutations(target_hash, 5) {
Ok(plaintext) -> io.println(string.append("Found ", plaintext))
Error(message) -> io.println(message)
}
Note the magic number 5
. I chose 5 characters because it allows me to test
the algorithm quickly without waiting for it to search a larger character
space. If I run gleam run passw
, it successfully finds the password in record
time. But if I run gleam run password
it never exits because the hash of
the eight character password
is not found in the five character search space.
That’s not great. In fact, it sucks. Let’s solve this the heavy-handed way
first. Instead of using the somewhat dangerous (because it may never return)
receive_forever
, let’s use receive
. The difference is that receive
accepts a timeout argument and returns a Result
. We can map the error to
ensure that it always returns:
pub fn hash_permutations(
target: BitString,
count: Int,
) -> Result(String, String) {
let #(sender, receiver) = process.new_channel()
case count {
x if x < 0 -> Error("count must be non-negative")
0 -> Error("No Match")
n -> {
letters()
|> list.map(fn(letter) {
process.start(fn() { check_count(n, letter, target, sender) })
})
process.receive(receiver, 10000)
|> result.map_error(fn(_) { "No Match" })
}
}
}
Now gleam run password
runs for 10 seconds and then prints No Match
. That’s
better than never exiting, but it’s still not wonderful. For one thing, it
takes 10 whole seconds to print No Match
even if the subprocesses completed
earlier. Much worse, if we change our hash_permutations
call to accept eight
character strings instead of five, gleam run password
will still fail after
10 seconds because it would take more than 10 seconds to find a match (even
searching in parallel).
So ideally, we have some way of verifying that the process has completed. One
reasonable course of action is to go back to the process.map(receiver)
version, and honestly, that seems like the most sensible thnig to do. But I’m
more interested in experimenting than being sensible today.
Monitoring Processes
Gleam OTP gives us the ability to monitor a process to see if it has exited. It
returns a Receiver
with an exit message. We can hook that up to each of our
processes and wait for them all to exit before checking the “real” receiver.
This will look very familiar:
pub fn hash_permutations(
target: BitString,
count: Int,
) -> Result(String, String) {
let #(sender, receiver) = process.new_channel()
case count {
x if x < 0 -> Error("count must be non-negative")
0 -> Error("No Match")
n -> {
letters()
|> list.map(fn(letter) {
process.start(fn() { check_count(n, letter, target, sender) })
|> process.monitor_process
})
|> list.map(process.receive_forever)
|> string.inspect
|> io.println
io.println("all processes exited")
process.receive(receiver, 100)
|> result.map_error(fn(_) { "No Match" })
}
}
}
We’ve been ignoring it up until now, but the process.start
function returns
an integer process ID (PID). We pipe this into the process.monitor_process
function, which returns a Receiver
over process.ProcessDown
types. This
returned receiver is implicitly returned from the anonymous function we passed
into process.start
, which means we can use the same
list.map(process.receive_forever)
format we were using before we started this
exercise. For fun, I passed the result of the list.map
call to the new (in
gleam 0.22) string.inspect
function, which outputs the value in a Gleam-like
syntax. I don’t actually care about this output, but it helped me understand
what was going on and I wanted to share it.
Once we’ve mapped over everything, we use io.println
to output a mostly
uninformative message. Then we use process.receive
on our channel just like
we did before. However, I know that the processes have exited now, so I can use
a much shorter timeout; at this point, we know that either the value has been
sent by the processes or it can’t be found at all, since the processes are
finished running.
Summary
That’s about all I feel like writing for today. I tried to optimize further by experimenting with the process.merge_receivers
function. My hope was to be able to return early when a match is found. That way we don’t have to wait for all the other processes to fail to find anything when we already know the answer. But I gave up after I ended up with a seven line pipeline that used half the iterator
module and it still didn’t work. Maybe you can solve it and share the result with me?
You might be wondering how this is better than the version where we explicitly
returned Option
s to indicate that each process has completed, and in my
opinion, it is no better. I think monitor_process
is more useful for higher
level parallel primitives such as actors and supervisors. (I thought this
article was going to be about actors and supervisors, but I discovered I wasn’t
ready for them yet…). But hopefully you understand Gleam processes a bit better now (I know I do).
I’m planning for my next article to be on compiling Gleam to Javascript and discovering what it’s like as a frontend language. Does it work with vitejs? Is it as fast as Rescript? How does React look?