## To Mock a Mockingbird or: How I learned to stop worrying and learned combinatory logic

Yesterday I finished reading one of the most remarkable books I read in my life: “To Mock a Mockingbird and Other Logic Puzzles: Including an Amazing Adventure in Combinatory Logic” by Raymond Smullyan. When I finished reading The Little Schemer that book was listed as one of the suggested further readings. The title was quite intriguing so I got the book and started reading it. That was a year ago and I finished the book yesterday. Why? Because I got stuck and couldn’t understand some of the material. Luckily I now had some time to approach the book once again and grok it.

As the title suggests the book is a collection of logic puzzles. Out of six parts of the book – 25 chapters total – two are devoted to general logic puzzles, many of them about different aspects of truth telling. These can be regarded as a warm-up because in the third part the book makes a sudden turn towards combinatory logic. And this is the moment I found difficult in the book. Of course Smullyan doesn’t expect that readers work with combinators so he camouflages them as singing birds. Having some mathematical background I rejected this cover and tried to approach problems formally. Now, after reading the book, I think this was a major mistake that lead to my failure. I wasn’t able to deal with first 10 puzzles but I was more or less able to follow the solutions. Still I felt that reading solutions without being able to solve puzzles by myself was cheating so I gave up. A few months later I made another approach to the book but the results were exactly the same. Three weeks ago I made a third attempt, but I decided not to give up even if I won’t be able to come up with my own solutions. I figured that being able to only understand given solutions is completely fine. That decision turned out to be a good one. Although at first I wasn’t able to solve puzzles on my own at some point things just clicked. I solved one puzzle, then another and another and I realized that I know how to solve most of the puzzles. From now on the book went quite smoothly. Part four about logical paradoxes and inconsistencies in logical systems gave me some problems and I was afraid that each subsequent part will be equally challenging but it turned out that it was not the case. Part five gives a nice overview of computations using SKI combinators, while part six presents Church encoding of natural numbers and culminates with a proof of Gödel’s theorem.

The book is challenging on the one hand but it is also very rewarding. One of the most satisfying moments was realizing that I am able to write expressions like

$g\tilde{a}\tilde{b}=Z\tilde{a}f(Z\tilde{b}t(g(P\tilde{a})(P\tilde{b})))$

and then seeing that these are in fact nothing more but recursive programs. However it took me some work to get to this point and I think I can give future readers some advice about approaching the book. I’ve seen people on the internet saying they don’t understand parts of the book about combinators. In fact this is the same problem I faced on my initial contact. So my advice number one is: don’t give up. It looks like working with combinators just isn’t an inborn talent for most of us. It’s something that we just need to learn and it’s completely fine if you can’t figure out solutions on your own. It is important however to follow solutions given in the book. At some point you should understand how all of this works. Also don’t make my mistake of trying too formalize things to much. Don’t try to apply presented facts about combinators to anything else you know like maths or functional programming. This seems to only bring confusion. At some point things will become clear, but until then be patient. I wasn’t and in my attempts I even tried to introduce universal and existential quantification, which was a complete overkill. Also, it is very important to make notes. Keep conclusions of the puzzles in a notebook so you have easy access to a list of known facts. This is crucial since some solutions are based on facts proved 100 pages earlier. I also suggest having a separate scratch-pad.

# Summary

To “Mock a Mockingbrd” was a very insightful book and one of the most unusual ones I have read. It presents difficult material in a fun and entertaining way, but don’t be fooled – you will spend hours with pen and paper to complete this book. I highly recommend it to anyone interested in logic and functional programming. What functional programming has to do with it? Unrelated as it may seem I feel that this book has given me knowledge necessary to tackle type-level programming in Haskell. Just three weeks ago this seemed like a complete black magic and now it looks comprehensible.

## Why foldr works for infinite lists and foldl doesn’t

About two months ago I wrote a post about expressing foldl in terms of foldr. Back then I left one question open – why does foldr work for infinite lists, while foldl doesn’t? I finally found some time sit down and find the answer.

First of all the fact that foldl doesn’t work for infinite lists, while foldr does, was counter-intuitive for me. I thought that since foldl consumes the list from the beginning it should be able to stop at some point and return the result. On the other hand foldr was explained to me as consuming a list from the right, that is from the end. Since infinite lists have no end it seemed to me that foldr shouldn’t be able to handle infinite lists.

Here are the definitions of folds from Haskell 2010 Language Report:

foldl :: (a -> b -> a) -> a -> [b] -> a foldl _ z []     = z foldl f z (x:xs) = foldl f (f z x) xs
foldr :: (a -> b -> b) -> b -> [a] -> b foldr _ z [] = z foldr f z (x:xs) = f x (foldr f z xs)

These two definitions supported my incorrect intuition. After all they show clearly that foldl processes the first argument of a list immediately, while foldr needs to process whole list with foldr before it can pass the result to f. Or at least I thought that they show this.

As I already said all the above intuitions are wrong. My mistake became clear to me when I explicitly wrote how recursion looks for each fold. For foldl the recursion is:

f (... (f ( f (f z x1) x2) x3) ...) xn

For foldr recursion looks like this:

f x1 (f x2 (f x3 (...(f xn z) ...)))

Now you can see that for foldl you need to get to the end of the list to make the most outer call. In case of foldr you need the first element of the list and the result of processing the rest of the list with foldr. Unless you can determine the value of f without the need for its second parameter! This is in fact the case for some operators, logical conjunction for example – if first parameter is False then we can conclude that the whole expression is False, without the need to evaluate the second argument. Therefore foldr will work for infinite lists if the accumulating function is lazy in its second argument. One might ask if foldl will work for infinite lists if the accumulating function is lazy in its first argument. The answer is no – you still need the last element of a list to calculate the value of first call and there is no last element for infinite lists.

Looking at the fold definitions given earlier I made one embarrassing omission. Recursion in foldl is unconditional! The recursive call is being made no matter what. The only way to stop the recursion is getting to the end of a list, but for infinite lists this ain’t gonna happen. In case of foldr recursion is conditional – it depends on the first argument of f, assuming of course that f is lazy for its second argument. Moreover, looking at the implementation of foldr given above you can see that it in fact works from the left! My intuition about foldr was so fixed on it consuming the list from the right that I even missed this obvious fact when writing this post. Big thanks to nh2, who pointed that out in his comment. So in the end consuming a list from the right is about grouping of terms with parentheses.

As a final remark let me note that definitions of foldl and foldr in Haskell libraries are slightly different from those given in the Haskell report. GHC.Base defines foldr as:

foldr :: (a -> b -> b) -> b -> [a] -> b -- foldr _ z [] = z -- foldr f z (x:xs) = f x (foldr f z xs) {-# INLINE [0] foldr #-} -- Inline only in the final stage, after the foldr/cons rule has had a chance -- Also note that we inline it when it has *two* parameters, which are the -- ones we are keen about specialising! foldr k z = go where go [] = z go (y:ys) = y k go ys

While Data.List contains following definition of foldl:

foldl :: (a -> b -> a) -> a -> [b] -> a foldl f z0 xs0 = lgo z0 xs0 where lgo z [] = z lgo z (x:xs) = lgo (f z x) xs

Semantics are of course identical with folds defined in the report.

## Expressing foldl in terms of foldr

In chapter 4 of Real World Haskell there is a challenge for the reader: express foldl using foldr. Authors warn that this is not trivial and I must admit that I have not attempted this exercise leaving it for later. Yesterday I read Graham Hutton’s “Tutorial on the universality and expressiveness of fold” and it happens that, among other things, it presents an approach that can be applied to solve this problem. Hutton first presents a specific case in which sum function is defined using foldl expressed with foldr. Then he gives a general formula for expressing foldl with foldr. Despite having a derivation for one specific case and a ready result (without its derivation) it wasn’t straightforward to provide my own derivation of the general solution. This was a mind bending exercise and I’d like to go through the details here.

The solution is based on using the so called universal property of foldr1. This property states that if we have some function g defined as:

g [] = v g (x:xs) = f x (g xs)

then

g = foldr f v

Indeed, if we substitute foldr f v into definition of g we get a definition of foldr:

foldr f v [] = v foldr f v (x:xs) = f x (foldr f v xs)

Moreover, foldr f v is a unique solution to the defining equations of g.

Recall that foldl is defined as:

foldl f v [] = v foldl f v (x:xs) = foldl f (f v x) xs

The base case of foldr and foldl is identical, but the recursive one is not. Moreover, the recursive case of foldl cannot be rewritten in the form f x (g xs). This means that we need to apply some transformation to definition of foldl so it can be rewritten in that form. Let’s create a function foldl2 defined as:

foldl2 f [] v = v foldl2 f (x:xs) v = foldl2 f xs (f v x)

Nothing special so far. We just made a function that is the same as foldl, but has the last two parameters swapped. We can rewrite the base case as:

foldl2 f [] v = id v

id is the identity function that accepts one parameter and returns that parameter unchanged. Now we remove the v parameter:

foldl2 f [] = id

Such transformation is known as η-reduction2. Let us now concentrate on the recursive case. It can be rewritten as

foldl2 f (x:xs) v = (\w -> foldl2 f xs (f w x)) v

We created an anonymous function with one of the parameters of f factored out. This expression can also be η-reduced:

foldl2 f (x:xs) = \w -> foldl2 f xs (f w x)

Let’s factor out second parameter of function f in the same manner:

foldl2 f (x:xs) = (\y w -> foldl2 f xs (f w y)) x

And finally let’s factor out foldl2 f xs and just pass it as another parameter to the lambda expression:

foldl2 f (x:xs) = (\y h w -> h (f w y)) x (foldl2 f xs)

We’re almost there. Recall that the universal property requires function of the form3:

g (x:xs) = k x (g xs)

And it so happens that we just converted foldl2 to that form, where k = \y h w -> h (f w y). Comparing last two equation we see that g = foldl2 f, but from the universal property we also know that g = foldr k v, which means that foldl2 f = foldr k v. The notation here might be a bit confusing. From the base case we determined the value of v in the last equality to be equal to id function, which yields foldl2 f = foldr k id. Substituting the value of k we get:

foldl2 f = foldr (\y h w -> h (f w y)) id

Original definition of foldl2 had two more parameters, but they were removed by η-reductions. Let’s restore these two parameters by adding them to both lhs and rhs:

foldl2 f xs v = foldr (\y h w -> h (f w y)) id xs v

Recall that foldl2 was only a convenience function we used for the derivation. Going back to the original foldl function yields the final result:

foldl f v xs = foldr (\y h w -> h (f w y)) id xs v

OK, formulas don’t lie, but this result is definitely not an intuitive one and deserves some good explanation. You may be surprised that four parameters are passed into foldr, but this should become clear in a moment. We will play with it to get some intuition on how this works.

Let us begin with verifying that this expression is type-correct. Type of foldr is:

ghci> :t foldr foldr :: (a -> b -> b) -> b -> [a] -> b

So the first parameter to foldr should be of type (a -> b -> b). Lambda that we pass to foldr as the first parameter uses f. This is the function that is passed as first parameter to foldl. Since foldl has type:

ghci> :t foldl foldl :: (a -> b -> a) -> a -> [b] -> a

We require that f has type a -> b -> a. Let’s define simplest possible function that has that type and then check the type of lambda passed to foldr:

ghci> let f = \a b -> a ghci> :t (\y h w -> h (f w y)) (\y h w -> h (f w y)) :: t2 -> (t1 -> t) -> t1 -> t

Recall that -> is right-associative, which means that above type is equivalent to t2 -> (t1 -> t) -> (t1 -> t). Parentheses at the end can be added and the meaning is the same. This corresponds to our expected type of (a -> b -> b). Here, the value of b is assumed to be t1 -> t. If we substitute (t1 -> t) for b in the type signature of foldr we get

(a -> (t1 -> t) -> (t1 -> t)) -> (t1 -> t) -> [a] -> (t1 -> t)

Note that last parentheses can be dropped, which result in function that has four parameters:

(a -> (t1 -> t) -> (t1 -> t)) -> (t1 -> t) -> [a] -> t1 -> t

We already verified that lambda passed to foldr is of type (a -> (t1 -> t) -> (t1 -> t)). The second parameter, id function, is of type (a -> a), which corresponds to (t1 -> t) in the type signature. Therefore usage of id imposes additional restriction that t1 ~ t4, which means that type signature can be rewritten as:

(a -> (t -> t) -> (t -> t)) -> (t -> t) -> [a] -> t -> t

[a] corresponds to parameter xs, the list that we are folding. t corresponds to initial value of accumulator and the last t is the return type.

Now that we have verified type-correctness of the solution, let’s see how it works in practice. Let’s say we want to fold a list of three elements using (+) as folding function and 0 as initial value of the accumulator. In other words, we want to calculate the sum of elements in a list. If we use foldr, the evaluation process will look like this:

foldr (+) 0 [1,2,3] = (+) 1 (foldr (+) 0 [2,3]) = (+) 1 ((+) 2 (foldr (+) 0 [3])) = (+) 1 ((+) 2 ((+) 3 (foldr (+) 0 []))) = (+) 1 ((+) 2 ((+) 3 0 )) = (+) 1 ((+) 2 3) = (+) 1 5 = 6

If we instead use foldl, evaluation will look like this:

foldl (+) 0 [1,2,3] = foldl ((+) 0 1) [2,3] = foldl ((+) ((+) 0 1) 2) [3] = foldl ((+) ((+) ((+) 0 1) 2) 3) [] = ((+) ((+) ((+) 0 1) 2) 3) = ((+) ((+) 1 2) 3) = ((+) 3 3) = 6

Both folds produce the same result, which is a direct consequence of first duality theorem for folds5. Now let’s see how evaluation will proceed if we use foldl expressed using foldr:

foldl (+) 0 [1,2,3] = foldr (\y h w -> h ((+) w y)) id [1,2,3] 0 = (\h w -> h ((+) w 1)) (foldr (\y h w -> h ((+) w y)) id [2,3]) 0 = (\h w -> h ((+) w 1)) (\h w -> h ((+) w 2)) (foldr (\y h w -> h ((+) w y)) id [3]) 0 = (\h w -> h ((+) w 1)) (\h w -> h ((+) w 2)) (\h w -> h ( (+) w 3) ) (foldr (\y h w -> h ( (+) w y) ) id []) 0 = (\h w -> h ((+) w 1)) (\h w -> h ((+) w 2)) (\h w -> h ((+) w 3)) id 0 = (\h w -> h ((+) w 1)) (\h w -> h ((+) w 2)) (w -> id ((+) w 3)) 0 = (\h w -> h ((+) w 1)) (w -> id ((+) ((+) w 2) 3)) 0 = (w -> id ((+) ((+) ((+) w 1) 2) 3)) 0 = id ((+) ((+) ((+) 0 1) 2) 3))

We’ve reached expression that is the same as the one we reached when evaluating foldl. Well, in fact that is what we expected. After all this is also foldl! So the whole trick is based on using foldr to generate a function that accepts initial value of the accumulator and produces the same expression we would get when using foldl (plus the identity function).

I hope that this post made it clear how to express foldl using foldr. This is of course by no means an exhaustive treatment of the subject of folds. There’s a lot more. I think that Hutton’s paper is a good starting point. Bird’s and Wadler’s “Introduction to Functional Programming” also seems to be a very valuable resource, though I’ve read only chapter about folds6. There’s still some more stuff to figure out about folds, like the difference in behaviour of foldr and foldl for infinite lists or expressing foldr using foldl (for finite lists only).

1. Graham Hutton in his paper uses the name fold to denote foldr. I’m sticking with foldr so my derivation remains consistent with Haskell []
2. η is pronounced eta. []
3. Notice that I renamed the name of the first function from f to k to prevent the name clash. []
4. ~ notation means that two types are the same. []
5. See: Bird, R., Wadler, P. “Introduction to functional programming”, 1st ed., p. 68 []
6. There’s also a second edition of this book authored only by Bird, but I wasn’t able to find it. []

Staypressed theme by Themocracy