Advent of Code: Expressing yourself

Justin Wernick <>
Curly Tail, Curly Braces
2020-01-24

Abstract

Every year, I participate in the Advent of Code programming advent calendar. This year, I set myself the challenge to complete the puzzles using only pure expressions in Rust. In this article, I share some of the techniques I used and how it worked out.

What is the Advent of Code?

Every year, I participate in the Advent of Code. It's an advent calendar, counting down the days to Christmas, but the twist is that it isn't a chocolate behind each door: it's a programming puzzle! Yes, each day has a small challenge for you to solve in your favourite programming language.

Usually, I eagerly wait each morning for the next puzzle to unlock and race for a solution. I hack together code, prioritizing getting the answer as fast as possible. This year, with my 8 month old baby girl, my mornings were much less calm and focused than they used to be. Racing wasn't an option, so I decided to set a different personal challenge for myself:

I would do all of the challenges in the Rust programming language without any statements or mutable state.

I recently completed my final day and saved Santa (not quite in time to save Christmas, but close enough). In this article, I'm going to go through:

Why This Challenge?

The heart of this challenge is really exploring how far I can take Rust when it comes to pure expressions.

At its heart, this is an exercise for my personal growth as a software engineer, and for a bit of fun.

Why Purity?

A function is said to be "pure" in functional programming circles if:

  1. Its output is determined by its input, and nothing else. If a function is given the same input, it should always return the same output.

  2. The act of evaluating the function doesn't have any side effects, it only returns a value.

These two properties are really two sides of the same coin: the function doesn't pull in any outside state, and it doesn't write out any outside state.

A reason to aim for pure functions is that it's easier to reason about code that uses pure functions. For example, take a look at these two functions. The only difference is the order that the inner functions are called. Do these two functions do the same thing?

fn one_way(a: i32, b: i32) -> i32 {
    let c = do_something(a, b);
    let d = do_something_else(a, b);
    c + d    
}

fn another_way(a: i32, b: i32) -> i32 {
    let d = do_something_else(a, b);
    let c = do_something(a, b);
    c + d    
}

If the two functions are pure, then yes they do! If the two functions are not pure, then they could actually be different. What if do_something_else changes a global variable that do_something depends on? What if they both print something on the console, and the messages now appear in a different order? At a high level this is the reason to strive for purity:

If you know the functions are pure you have more power to reason about how they will behave without looking at what they do.

Why Expressions?

I don't have a good theoretical reason to only use expressions. In fact, I'm pretty sure that this a bad idea in the general case. However, it took me out of my comfort zone and forced me to take a new approach to writing code.

Caveats and Exceptions

I made two exceptions to the no statements rule.

The first relates to IO. Functional programming has a complicated relationship with the fundamentally impure actions of reading in data from the outside world, and then acting on the outside world to tell you the answer it computed. There are solutions (look up the IO monad if you're looking for them), but they mostly come down to deferring the impure actions to the boundaries of your program. My exception for my challenge is that my main function can do all of my IO in whatever way is easiest. Reading input, killing the program with a non-zero exit code if there are errors, printing the output... this is all fine for the main function to do, and nowhere else.

The other exception is that I can pull in libraries to help where necessary, even if I know the library must be mutating internal state to work. Most functional programming languages have data structures built into their standard library that support these patterns. I needed to bring in external dependencies. From the perspective of this challenge, writing my code in such a way that it doesn't mutate state it owns is the important thing, not how a library manages its internal state.

Techniques

So, I can't use statements, and I can't use mutable state. What techniques did I use to overcome these limitations?

Persistent Data Structures

I could not have done this challenge without persistent data structures. I maybe could have faked my way around most of the other techniques, but this one is essential.

Persistent data structures (also known as immutable data structures) are data structures that aren't modified in place when you call the traditional "mutation" functions. Instead, functions like insert leave the current data structure unmodified, and return a NEW data structure that has the change!

// rpds="0.7.0"
use rpds::List;

let empty_list = List::new();
let one_element_list = empty_list.push_front("something");
println!("Still empty: {}", empty_list);
println!("Has stuff in it: {}", one_element_list);

My initial reaction when I first heard about persistent data structures was "There's no way that's efficient! Sure it seems fine for small lists, but it if you use it on anything bigger you'd run out of RAM with all of those copies!"

Luckily I was wrong. There are efficient persistent data structures that, while providing this immutable API, are doing clever things in the background to reuse as much of the existing data as possible. I used the RPDS (Rust Persistent Data Structures) crate, which provides immutable lists, vectors, sets, and maps. Everything I needed for the fundamental building blocks of my solutions.

Iterators

I've written before about Rust iterators (and why I think they're awesome). Without mutable state, or statements, iterators were my primary means of dealing with loopy scenarios. The classic functional programming pattern is to use recursion for this, but since Rust doesn't have tail call optimization, recursion would have given me stack overflow problems.

A common iterator pattern that emerged in this challenge, that I haven't really used much before this, is to use std::iter::successors to create an iterator where each item depends on the previous item. This can take the place of many traditional loops.

Here's an example from day 24, where I had to implement something very similar to Conway's Game of Life and find the first repeated state. I'm going to show the code, followed by a textual description of what it's doing.

impl GameOfLifeState {
    fn first_repeated_state(&self) -> GameOfLifeState {
        iter::successors(
            Some((RedBlackTreeSet::new(), self.clone())),
            |(seen, state)| Some((seen.insert(state.clone()), state.next())),
        )
        .find(|(seen, state)| seen.contains(state))
        .unwrap()
        .1
    }
}

This is just the high level iteration part, if you want to see the full code for the day it's here.

As you can see from my attempt to describe the code in prose, this code is incredibly dense. I know that it can be a bit difficult to read if you aren't used to using iterators, but I've found over time that I like this approach more and more. Reading it gets easier over time too.

Memoization

The next trick from functional programming is a performance optimization trick.

One of the neat properties of pure functions is "referential transparency". In plainer language, if a function is pure, you can replace a "real" call to the function with the value that the function would have returned if you called it, and the rest of the program doesn't know the difference.

Memoization is a technique where the function maintains a memo internally. When someone calls the function, it first checks if the result you need is in the memo. If it is, it gives you the answer from the memo. If the result isn't in the memo, then the function is actually called, and the result is put in the memo. From the calling function's point of view nothing changes, but the second time the memoized function is called with the same arguments it's much faster.

If a function call takes a relatively long time, and you need to call it in a tight loop, memoization means that you're only paying the heavy performance cost the first time you call the function for each unique input.

The question that I expect you to be asking now is "Hang on, that sounds a lot like your memo is mutable state!" and you're right. At least, you would be right if I wrote the memoization code myself. Instead, I used the Cached crate which lets me wrap my pure function in their macro.

This example comes from day 19, where I had to search for the closest area that Santa can fit his spaceship into my tractor beam (because Advent of Code is awesome)! Evaluating whether a point is in the tractor beam or not is an expensive operation, and thanks to the way I wrote the rest of my algorithm I'm hitting it a lot. Memoization helped.

// Slow and unmemoized
fn tractor_beam_is_active(program: IntcodeProgram, x: usize, y: usize) -> bool {
    program
        .with_input(list![Intcode::from(x), Intcode::from(y)])
        .execute() == Ok(vector![Intcode::from(1)])
}

// Still slow, but only the first time because it's memoized
cached_key! {
    TRACTOR_BEAM_IS_ACTIVE: UnboundCache<(usize, usize), bool> = UnboundCache::new();
    Key = { (x, y) };
    fn tractor_beam_is_active(program: IntcodeProgram, x: usize, y: usize) -> bool = {
        program
            .with_input(list![Intcode::from(x), Intcode::from(y)])
            .execute() == Ok(vector![Intcode::from(1)])
    }
}

Some might consider this cheating, since I know that there probably is some sort of mutable hashmap being used by the Cached crate. However, I also know that whatever I write at the end of the day the computer is actually going to be doing some things that change bytes in memory. The point is for MY code to be collections of pure functions, and to expose pure interfaces for other code to call. Abstractions like memoization are compatible with pure functions.

More Function Calls!

This is the simplest technique: if you want to store something in a statement rather pass it to a new function.

Another tip I want to add on top is that it's possible to declare the new function INSIDE your other function. The benefit of this is that the inner function can only be seen by your function. This is perfect for 'helper functions' that only make sense in the context of the outer function. Use this with restraint, because it can get a bit difficult to read if the functions get long.

// With statements, I would do something like this:
fn calculate_something_clever(a: i32, b:i32) -> i32 {
    let c = calculate_c(a);
    a + b * c
}

// While avoiding statements, this pattern became something like this:
fn calculate_something_clever(a: i32, b:i32) -> i32 {
    fn inner_func(a: i32, b: i32, c: i32) -> i32 {
        a + b * c
    }

    inner_func(a, b, calculate_c(a))
}

Findings

Having gone through all this effort, what did I learn? How was the code that I ended up with different from the code I'd usually write?

Composability

One of the bold claims made by functional programming enthusiasts is that pure functions are easier to compose together. In other words, if you have many small functions, it's easier to stick them together to make big functions if those small functions are pure.

I found that this is indeed the case. This was especially true when it came to using my small pure functions in iterators.

An area where I was surprised to see this working well was the recurring Intcode computer puzzles. On the second day of the challenge, the puzzle introduced a computer that you need to write an emulator for, that modifies its own program memory as it runs. I thought my quest for purity was going to end right there, but I created a struct that matched the data in the Intcode computer and started adding small functions. Like with my persistent data structures, each "mutating" operation actually returns a whole new Intcode computer.

fn with_memory_set(&self, address: Intcode, value: Intcode) -> IntcodeProgram {
  IntcodeProgram {
    memory: self.memory.insert(address, value),
    ..self.clone()
  }
}

fn with_instruction_pointer_offset(&self, offset: usize) -> IntcodeProgram {
  IntcodeProgram {
    instruction_pointer: self.instruction_pointer.clone() + offset,
    ..self.clone()
  }
}

This approach looks like it has a lot of cloning, and it does, but the majority of my Intcode computer is persistent data structures. They're already doing clever things internally to share data, so the performance impact of cloning was minimal.

Then I started adding functions that compose these small functions together.

fn add(&self, mode: &Intcode) -> IntcodeProgram {
    self.with_instruction_pointer_offset(4).with_memory_set(
        self.get_literal(3, mode),
        self.get(1, mode) + self.get(2, mode),
    )
}

Before I knew it, I had composed together the functions to run the whole Intcode program to termination. As a happy bonus, when it came time to solve problems that used Intcode programs, I could evaluate the whole program in a pure way without any mutable state.

Testability

Another thing that got easier was writing unit tests. Most of my functions were small and focused because I wasn't writing any statements. Most of them took some data structure, and returned some property about the data. This is the ideal case for writing unit tests. It's easy to set up the test case, easy to write the assertion, and when the functions are pure you don't have to set up any global state to make it do the right thing.

If your pure functions are relatively fast, they may also be a good candidate for property based testing.

I will admit that I didn't really write tests until things were already going wrong. This is Advent of Code, not a serious production system. If I have the answer it's good enough! It was really convenient that when things weren't going well, and I didn't have the right answer, the code was already setup for writing tests.

Observability

Observability is an area that my code suffered this year. It turns out that it's really difficult to write debug logging statements if you can't write statements. Further, when everything's wrapped up in expressions, it can be difficult to extract the information you need to know about.

Now you may be asking why I'm talking about log statements instead of just pulling out a debugger. This is a bit ridiculous in the context of Advent of Code, but recently I've been thinking about logging in the context of actually knowing what a production system is doing. How can you verify that a system running on a cloud server somewhere is actually doing the right thing if you don't know what it's doing? To me, finding the places in your code that gives you the right balance of useful information, without information overload, is the first step to having useful system logs in a production system.

The pattern that you should generally apply is to pull the information you're interested in to a variable, then log the variable. That doesn't work if you can't write statements, which is why I had trouble.

If you're working with iterators and need to see what's happening with them, then I can recommend using the inspect function. It lets you add logging to the data going through an iterator.

Iterators are Awesome, but Infinite Iterators Could Use a Bit More Love

This challenge pushed me to use iterators even when I usually would have written a normal loop. Usually the pattern was:

  1. Define an infinite iterator that wraps up all the state you would have been tracking in your loop (iter::successors is usually involved here).

  2. Use find to get the case that would have had you exiting your loop.

  3. unwrap, because the Rust standard library does not have any special cases for infinite iterators and can't tell that find will always return a value. The None case is, unfortunately, infinite iteration.

That third point, where you need to unwrap an Option because you know that it will never be a None, is a pain. It's also risky, since you could invalidate that assumption by doing something else further up in the chain of iterator functions.

I personally would like to see more functionality around infinite iterators, that become regular finite iterators when you call certain operations on them like take. This could also stop you from accidentally calling functions on your infinite iterator that are guaranteed to cause an infinite loop, like count or fold. Maybe I should try writing a library that provides these infinite iterators and see how well it works out.

Persistent Data Structure Crates are Awesome, but Could Use Some Quality of Life Improvements

I was really happy with the RPDS, the persistent data structure crate that I used. It did exactly what I needed it to do: provide a set of data structures with pure, immutable interfaces, and good performance. I think that it is a great option for anyone who needs persistent data structures.

The one thing that bugged me while using it is that it made debug logging rather complicated. I wanted to #[derive(Debug)] on my structs, and it started showing me all of the internal details of the persistent data structure. To its credit, RPDS does implement Display for its data structures, which does what I want it to. It's just that I need to do a custom implementation of Debug to actually use Display for these data structures if I want to clean up my logging output.

Really this is a minor gripe, and can probably be solved by pulling in another crate. I see that a crate called Derivative provides more options around customizing a struct's debug formatting, but I haven't used it extensively enough to have a well formed opinion on how well it works yet.

Conclusions

This experience has definitely had an impact on how I'll be writing code going forward. I won't necessarily be doing everything in expressions, but I will be

I will also keep RPDS in mind if I need persistent data structures, but I'll probably go back to using the Rust standard library's data structures as my default. This is nothing against RPDS, but rather that I think the Rust standard library's data structures are fantastic general-purpose building blocks, and one less dependency to worry about.

I can only encourage others to try challenges like this, where you find a way to disrupt how you usually program and push yourself a bit out of your comfort zone.


If you'd like to share this article on social media, please use this link: https://www.worthe-it.co.za/blog/2020-01-24-advent-of-code-expressing-yourself.html

Tag: blog


Related Articles

Rust Iterators

In my opinion, iterators are one of the most powerful tools you can add to your toolbelt as a programmer. I've met many programmers who struggled to understand the various iterator functions and how to use them. In this article, I explore a number of iterator functions available in Rust's standard library, and explain them by giving an example of how you would get the same effect using for loops.

Why functional programmers should care about Rust

In this article, I present an argument for why people who are passionate about functional programming should consider learning the Rust programming language.