A Tutorial Introduction to Gleam -- Part 5

Introduction

This is the fifth in a series of articles exploring the Gleam programming language. In the most recent article, we started exploring how Gleam interfaces with ERLang’s powerful OTP concurrency framework to brute force some passwords. However, it was suboptimal, partially because I didn’t know what I was doing, and partially because I didn’t have time to go into some of the deeper details. I also had a super valuable tip from the Gleam discussion group that I wanted to go into.

Patreon

If you find my content valuable, please do consider sponsoring me on Patreon account. I write about what interests me, and I usually target tutorial style articles for intermediate to advanced developers in a variety of languages.

I’m not sure how far I’m going to go with this series on Gleam, so if you’re particularly interested in the topic, a few dollars a month definitely gives me a bit of motivation to keep it up.

First, A Correction: Function Capturing is Awesome

In the last article, I mentioned that I didn’t like the API design around the crypto.hash function because I couldn’t pipe a value into it. Turns out I was wrong about this. There is a special argument name, _, that we can use to create partially applied functions. So, for example, I can clean up the following code:

  assert Ok(password) =
    erlang.start_arguments()
    |> list.at(0)
    |> result.map(bit_string.from_string)
  let target_hash = crypto.hash(crypto.Sha256, password)

to look like this:

  assert Ok(target_hash) =
    erlang.start_arguments()
    |> list.at(0)
    |> result.map(bit_string.from_string)
    |> result.map(crypto.hash(crypto.Sha256, _))

This function capturing syntax says, “Create a new function that accepts exactly one argument, and passes that argument into the original crypto-hash function as the second parameter.” You may have seen this in other languages, using words like “currying”, or “partials”. I prefer the latter because it reminds me of the book (sneaky sneaky warning: That’s an affiliate link!) by the excellent author Dan Wells.

We can use this syntax to tidy up a couple other places in our permute_strings.gleam code. For example, consider the process.startcode in count_permutations:

          process.start(fn() {
            let count =
              permutations_recursive(letters(), n - 1)
              |> list.length
            process.send(sender, count)
          })

We don’t need that count variable anymore! Such beauty:

          process.start(fn() {
              permutations_recursive(letters(), n - 1)
              |> list.length
              |> process.send(sender, )
          })

It’s a small change, but much more readable.

Indeed, the entire count_permutations function can be simplified from the original:

pub fn count_permutations(count: Int) -> Result(Int, String) {
  case count {
    x if x < 0 -> Error("count must be non-negative")
    0 -> Ok(0)
    n -> {
      let receivers =
        letters()
        |> list.map(fn(letter) {
          let #(sender, receiver) = process.new_channel()
          process.start(fn() {
            let count =
              permutations_recursive(letters(), n - 1)
              |> list.length
            process.send(sender, count)
          })
          receiver
        })
      Ok(
        receivers
        |> list.map(process.receive_forever)
        |> list.fold(0, fn(acc, x) { acc + x }),
      )
    }
  }
}

to a bit less indented (indentation is very good approximation of complexity, I find) version:

pub fn count_permutations(count: Int) -> Result(Int, String) {
  case count {
    x if x < 0 -> Error("count must be non-negative")
    0 -> Ok(0)
    n ->
      letters()
      |> list.map(fn(_letter) {
        let #(sender, receiver) = process.new_channel()
        process.start(fn() {
          permutations_recursive(letters(), n - 1)
          |> list.length
          |> process.send(sender, _)
        })
        receiver
      })
      |> list.map(process.receive_forever)
      |> list.fold(0, fn(acc, x) { acc + x })
      |> Ok
  }
}

We can similarly remove receivers and found from hash_permutations, which currently looks like this:

pub fn hash_permutations(
  target: BitString,
  count: Int,
) -> Result(String, String) {
  case count {
    x if x < 0 -> Error("count must be non-negative")
    0 -> Error("No Match")
    n -> {
      let receivers =
        letters()
        |> list.map(fn(letter) {
          let #(sender, receiver) = process.new_channel()
          process.start(fn() {
            let found =
              permutations_recursive(letters(), n - 1)
              |> list.map(fn(s) {
                let candidate =
                  string_builder.append_builder(letter, s)
                  |> string_builder.to_string
                let hashed_candidate =
                  crypto.hash(
                    crypto.Sha256,
                    candidate
                    |> bit_string.from_string,
                  )
                case hashed_candidate {
                  t if t == target -> Some(candidate)
                  _ -> option.None
                }
              })
            process.send(sender, found)
          })
          receiver
        })
      receivers
      |> list.flat_map(process.receive_forever)
      |> list.find_map(fn(opt) { option.to_result(opt, Nil) })
      |> result.replace_error("No Match")
    }
  }
}

I had hopes of removing candidate and hashed candidate, but both values are needed in the conditional (one in the comparison, one in the return value), so I had to settle for reordering their pipeline:

pub fn hash_permutations(
  target: BitString,
  count: Int,
) -> Result(String, String) {
  case count {
    x if x < 0 -> Error("count must be non-negative")
    0 -> Error("No Match")
    n ->
      letters()
      |> list.map(fn(letter) {
        let #(sender, receiver) = process.new_channel()
        process.start(fn() {
          permutations_recursive(letters(), n - 1)
          |> list.map(fn(s) {
            let candidate =
              string_builder.append_builder(letter, s)
              |> string_builder.to_string
            let hashed_candidate =
              candidate
              |> bit_string.from_string
              |> crypto.hash(crypto.Sha256, _)
            case hashed_candidate {
              t if t == target -> Some(candidate)
              _ -> option.None
            }
          })
          |> process.send(sender, _)
        })
        receiver
      })
      |> list.flat_map(process.receive_forever)
      |> list.find_map(fn(opt) { option.to_result(opt, Nil) })
      |> result.replace_error("No Match")
  }
}

This code is still a lot to digest because it’s trying to do too much in one place. Let’s fix that next.

Splitting Out Concurrency

The function passed into process.start is big enough to stand on its own. Let’s split it out into a few separate functions so we can “see” the separate threads of control more easily. First, let’s create a check_candidate function, which checks if a candidate matches the target hash and returns an Option one way or another:

fn check_candidate(candidate: String, target: BitString) {
  let hashed_candidate =
    candidate
    |> bit_string.from_string
    |> crypto.hash(crypto.Sha256, _)
  case hashed_candidate {
    t if t == target -> Some(candidate)
    _ -> option.None
  }
}

Let’s also create a function that checks all the strings of a certain length to see if any of them meet a certain target value:

fn check_count(
  count: Int,
  letter: string_builder.StringBuilder,
  target: BitString,
  sender: process.Sender(List(option.Option(String))),
) {
  permutations_recursive(letters(), count - 1)
  |> list.map(fn(s) {
    string_builder.append_builder(letter, s)
    |> string_builder.to_string
    |> check_candidate(target)
  })
  |> process.send(sender, _)
}

This covers the work that is happening in each separate thread (remember, there is one thread per letter right now). So the hash_permutations function gets simplified to:

pub fn hash_permutations(
  target: BitString,
  count: Int,
) -> Result(String, String) {
  case count {
    x if x < 0 -> Error("count must be non-negative")
    0 -> Error("No Match")
    n ->
      letters()
      |> list.map(fn(letter) {
        let #(sender, receiver) = process.new_channel()
        process.start(fn() { check_count(n, letter, target, sender) })
        receiver
      })
      |> list.flat_map(process.receive_forever)
      |> list.find_map(fn(opt) { option.to_result(opt, Nil) })
      |> result.replace_error("No Match")
  }
}

Now this function is doing only two things: Kicking off the processes and searching the results. And now that I look at it, I see that the searching part is really very expensive. Let’s fix that.

Scan The List In Subprocess

The processes are currently returning a huge list of Options, almost all of which are None. Searching this list for a single Some value seems like a big waste of time, especially since it is all being done in the main process with no parallelism. Let’s do the flattening and searching inside check_count instead, so we have a much smaller set of values to search in parallel.

Start by changing the sender argument tocheck_count from a process.Sender(List(option.Option(String)) to remove the List part. Then it’s mostly a matter of following compiler errors until you’re sending a single Option. I found a couple different ways to construct a pipeline that flattens the result. Here’s one:

fn check_count(
  count: Int,
  letter: string_builder.StringBuilder,
  target: BitString,
  sender: process.Sender(option.Option(String)),
) {
  permutations_recursive(letters(), count - 1)
  |> list.map(fn(s) {
    string_builder.append_builder(letter, s)
    |> string_builder.to_string
    |> check_candidate(target)
  })
  |> option.values
  |> list.first
  |> option.from_result
  |> process.send(sender, _)
}

This variation uses option.values to extract all the Some options from the list, and then uses list.first to get the first one if there is one. This function returns a Result; if there were no Some values, this will return an Error. Either way, we convert the Result(String) to an Option(String) to be returned via the sender.

This is not the most efficient solution to this problem, however. The values call has to loop through all the options, even though we immediately throw away any options after the first one. Further, it is extremely unlikely that there will be more than one matching value in the list (hash collisions are possible, but not realistically probable). So let’s use a variation that stops searching as soon as we find something that matches:

fn check_count(
  count: Int,
  letter: string_builder.StringBuilder,
  target: BitString,
  sender: process.Sender(option.Option(String)),
) {
  permutations_recursive(letters(), count - 1)
  |> list.map(fn(s) {
    string_builder.append_builder(letter, s)
    |> string_builder.to_string
    |> check_candidate(target)
  })
  |> list.find(option.is_some)
  |> option.from_result
  |> option.flatten
  |> process.send(sender, _)
}

This code uses list.find to search theList(Option(String)) for an is_some option. That will return a Result(Option(String)) depending on whether a match was found. The option.from_result function converts this to an Option(Option(String)) type, which can be converted to a single level option using option.flatten. In the (normal) case where there are no matches, it still has to loop over everything, but at least when we find a match, we can return it immediately.

You will also need to make a small change to hash_permutations. The list.flat_map(process.receive_forever) needs to be changed to a simple list.map. There is no flattening necessary anymore, because receive_forever is now receiving options instead of options of lists.

This small change made searching almost twice as fast as the version that had to do all the searching in the outer function!

One Channel

There is no law saying that a channel can only be sent to or received from exactly one process. We can change our hash_permutations function to make a single channel at the beginning of the function instead of creating 26 channels, one for each letter:

pub fn hash_permutations(
  target: BitString,
  count: Int,
) -> Result(String, String) {
  let #(sender, receiver) = process.new_channel()
  case count {
    x if x < 0 -> Error("count must be non-negative")
    0 -> Error("No Match")
    n ->
      letters()
      |> list.map(fn(letter) {
        process.start(fn() { check_count(n, letter, target, sender) })
        receiver
      })
      |> list.map(process.receive_forever)
      |> list.find_map(fn(opt) { option.to_result(opt, Nil) })
      |> result.replace_error("No Match")
  }
}

Here, I’ve moved the let #(sender, receiver) out to the top level so we only have one channel instead of one per process. The list.map still returns a reciever, but now it’s the same receiver every time. I did it this way to ensure that the list.map(process.receive_forever) call is still called exactly 26 times (one for each alphabet-inspired process).

These 26 calls are needed to be sure that the subprocess is finished; each one returns None if it can’t find what it’s looking for.

We can simplify… well honestly, we’re making it more complicated, so maybe we should say we can complify the system so that we send at most one value into the channel: the successfully found candidate. If we do this, though, we need some other method of determining whether the 26 processes have completed. Let’s start by removing the Option from the channel. Each subprocess will either send a successful candidate into the channel or not send anything at all.

Start by modifying check_count so it calls process.send inside an option.map call:

fn check_count(
  count: Int,
  letter: string_builder.StringBuilder,
  target: BitString,
  sender: process.Sender(String),
) {
  permutations_recursive(letters(), count - 1)
  |> list.map(fn(s) {
    string_builder.append_builder(letter, s)
    |> string_builder.to_string
    |> check_candidate(target)
    |> option.map(process.send(sender, _))
  })
}

Once again, we can use that fancy _ partial argument to craft the mapped function. Notice that I’ve removed (We’re still in simplify mode) the list searching stuff at the end of the function. Because it’s called inside option.map, the process.send call is now only called if the value is Some (in other words, if a matching candidate was found.) This is all pretty cool, but we’ve got some compiler errors because hash_permutations thinks it is still working with an channel that sends 26 Options, not at most one String. Let’s fix that next:

pub fn hash_permutations(
  target: BitString,
  count: Int,
) -> Result(String, String) {
  let #(sender, receiver) = process.new_channel()
  case count {
    x if x < 0 -> Error("count must be non-negative")
    0 -> Error("No Match")
    n -> {
      letters()
      |> list.map(fn(letter) {
        process.start(fn() { check_count(n, letter, target, sender) })
      })
      process.receive_forever(receiver)
      |> Ok
    }
  }
}

The main difference here is that we’ve removed the list.map over process.receive_forever. Instead, we just call receive_forever directly on the receiver. We can (sort of) safely pipe the result directly into Ok, since the receiver now only receives successfully hashed candidates.

This works great so long as the argument you pass in is actually one that our brute force algorithm can find. But there is no guarantee that the targeted value is actually going to be find. At the moment, my call to hash_permutations looks like this:

  case permute_strings.hash_permutations(target_hash, 5) {
    Ok(plaintext) -> io.println(string.append("Found ", plaintext))
    Error(message) -> io.println(message)
  }

Note the magic number 5. I chose 5 characters because it allows me to test the algorithm quickly without waiting for it to search a larger character space. If I run gleam run passw, it successfully finds the password in record time. But if I run gleam run password it never exits because the hash of the eight character password is not found in the five character search space.

That’s not great. In fact, it sucks. Let’s solve this the heavy-handed way first. Instead of using the somewhat dangerous (because it may never return) receive_forever, let’s use receive. The difference is that receive accepts a timeout argument and returns a Result. We can map the error to ensure that it always returns:

pub fn hash_permutations(
  target: BitString,
  count: Int,
) -> Result(String, String) {
  let #(sender, receiver) = process.new_channel()
  case count {
    x if x < 0 -> Error("count must be non-negative")
    0 -> Error("No Match")
    n -> {
      letters()
      |> list.map(fn(letter) {
        process.start(fn() { check_count(n, letter, target, sender) })
      })
      process.receive(receiver, 10000)
      |> result.map_error(fn(_) { "No Match" })
    }
  }
}

Now gleam run password runs for 10 seconds and then prints No Match. That’s better than never exiting, but it’s still not wonderful. For one thing, it takes 10 whole seconds to print No Match even if the subprocesses completed earlier. Much worse, if we change our hash_permutations call to accept eight character strings instead of five, gleam run password will still fail after 10 seconds because it would take more than 10 seconds to find a match (even searching in parallel).

So ideally, we have some way of verifying that the process has completed. One reasonable course of action is to go back to the process.map(receiver) version, and honestly, that seems like the most sensible thnig to do. But I’m more interested in experimenting than being sensible today.

Monitoring Processes

Gleam OTP gives us the ability to monitor a process to see if it has exited. It returns a Receiver with an exit message. We can hook that up to each of our processes and wait for them all to exit before checking the “real” receiver. This will look very familiar:

pub fn hash_permutations(
  target: BitString,
  count: Int,
) -> Result(String, String) {
  let #(sender, receiver) = process.new_channel()
  case count {
    x if x < 0 -> Error("count must be non-negative")
    0 -> Error("No Match")
    n -> {
      letters()
      |> list.map(fn(letter) {
        process.start(fn() { check_count(n, letter, target, sender) })
        |> process.monitor_process
      })
      |> list.map(process.receive_forever)
      |> string.inspect
      |> io.println

      io.println("all processes exited")
      process.receive(receiver, 100)
      |> result.map_error(fn(_) { "No Match" })
    }
  }
}

We’ve been ignoring it up until now, but the process.start function returns an integer process ID (PID). We pipe this into the process.monitor_process function, which returns a Receiver over process.ProcessDown types. This returned receiver is implicitly returned from the anonymous function we passed into process.start, which means we can use the same list.map(process.receive_forever) format we were using before we started this exercise. For fun, I passed the result of the list.map call to the new (in gleam 0.22) string.inspect function, which outputs the value in a Gleam-like syntax. I don’t actually care about this output, but it helped me understand what was going on and I wanted to share it.

Once we’ve mapped over everything, we use io.println to output a mostly uninformative message. Then we use process.receive on our channel just like we did before. However, I know that the processes have exited now, so I can use a much shorter timeout; at this point, we know that either the value has been sent by the processes or it can’t be found at all, since the processes are finished running.

Summary

That’s about all I feel like writing for today. I tried to optimize further by experimenting with the process.merge_receivers function. My hope was to be able to return early when a match is found. That way we don’t have to wait for all the other processes to fail to find anything when we already know the answer. But I gave up after I ended up with a seven line pipeline that used half the iterator module and it still didn’t work. Maybe you can solve it and share the result with me?

You might be wondering how this is better than the version where we explicitly returned Options to indicate that each process has completed, and in my opinion, it is no better. I think monitor_process is more useful for higher level parallel primitives such as actors and supervisors. (I thought this article was going to be about actors and supervisors, but I discovered I wasn’t ready for them yet…). But hopefully you understand Gleam processes a bit better now (I know I do).

I’m planning for my next article to be on compiling Gleam to Javascript and discovering what it’s like as a frontend language. Does it work with vitejs? Is it as fast as Rescript? How does React look?