## Expressing foldl in terms of foldr

In chapter 4 of Real World Haskell there is a challenge for the reader: express `foldl` using `foldr`. Authors warn that this is not trivial and I must admit that I have not attempted this exercise leaving it for later. Yesterday I read Graham Hutton’s “Tutorial on the universality and expressiveness of fold” and it happens that, among other things, it presents an approach that can be applied to solve this problem. Hutton first presents a specific case in which `sum` function is defined using `foldl` expressed with `foldr`. Then he gives a general formula for expressing `foldl` with `foldr`. Despite having a derivation for one specific case and a ready result (without its derivation) it wasn’t straightforward to provide my own derivation of the general solution. This was a mind bending exercise and I’d like to go through the details here.

The solution is based on using the so called universal property of `foldr`1. This property states that if we have some function `g` defined as:

```g [] = v g (x:xs) = f x (g xs)```

then

`g = foldr f v`

Indeed, if we substitute `foldr f v` into definition of `g` we get a definition of `foldr`:

```foldr f v [] = v foldr f v (x:xs) = f x (foldr f v xs)```

Moreover, `foldr f v` is a unique solution to the defining equations of `g`.

Recall that `foldl` is defined as:

```foldl f v [] = v foldl f v (x:xs) = foldl f (f v x) xs```

The base case of `foldr` and `foldl` is identical, but the recursive one is not. Moreover, the recursive case of `foldl` cannot be rewritten in the form `f x (g xs)`. This means that we need to apply some transformation to definition of `foldl` so it can be rewritten in that form. Let’s create a function `foldl2` defined as:

```foldl2 f [] v = v foldl2 f (x:xs) v = foldl2 f xs (f v x)```

Nothing special so far. We just made a function that is the same as `foldl`, but has the last two parameters swapped. We can rewrite the base case as:

`foldl2 f [] v = id v`

`id` is the identity function that accepts one parameter and returns that parameter unchanged. Now we remove the `v` parameter:

`foldl2 f [] = id`

Such transformation is known as ?-reduction2. Let us now concentrate on the recursive case. It can be rewritten as

`foldl2 f (x:xs) v = (\w -> foldl2 f xs (f w x)) v`

We created an anonymous function with one of the parameters of `f` factored out. This expression can also be ?-reduced:

`foldl2 f (x:xs) = \w -> foldl2 f xs (f w x)`

Let’s factor out second parameter of function `f` in the same manner:

`foldl2 f (x:xs) = (\y w -> foldl2 f xs (f w y)) x`

And finally let’s factor out `foldl2 f xs` and just pass it as another parameter to the lambda expression:

`foldl2 f (x:xs) = (\y h w -> h (f w y)) x (foldl2 f xs)`

We’re almost there. Recall that the universal property requires function of the form3:

`g (x:xs) = k x (g xs)`

And it so happens that we just converted `foldl2` to that form, where `k = \y h w -> h (f w y)`. Comparing last two equation we see that `g = foldl2 f`, but from the universal property we also know that `g = foldr k v`, which means that `foldl2 f = foldr k v`. The notation here might be a bit confusing. From the base case we determined the value of `v` in the last equality to be equal to `id` function, which yields `foldl2 f = foldr k id`. Substituting the value of `k` we get:

`foldl2 f = foldr (\y h w -> h (f w y)) id`

Original definition of `foldl2` had two more parameters, but they were removed by ?-reductions. Let’s restore these two parameters by adding them to both lhs and rhs:

`foldl2 f xs v = foldr (\y h w -> h (f w y)) id xs v`

Recall that `foldl2` was only a convenience function we used for the derivation. Going back to the original `foldl` function yields the final result:

`foldl f v xs = foldr (\y h w -> h (f w y)) id xs v`

OK, formulas don’t lie, but this result is definitely not an intuitive one and deserves some good explanation. You may be surprised that four parameters are passed into `foldr`, but this should become clear in a moment. We will play with it to get some intuition on how this works.

Let us begin with verifying that this expression is type-correct. Type of `foldr` is:

```ghci> :t foldr foldr :: (a -> b -> b) -> b -> [a] -> b```

So the first parameter to `foldr` should be of type `(a -> b -> b)`. Lambda that we pass to `foldr` as the first parameter uses `f`. This is the function that is passed as first parameter to `foldl`. Since `foldl` has type:

```ghci> :t foldl foldl :: (a -> b -> a) -> a -> [b] -> a```

We require that `f` has type `a -> b -> a`. Let’s define simplest possible function that has that type and then check the type of lambda passed to `foldr`:

```ghci> let f = \a b -> a ghci> :t (\y h w -> h (f w y)) (\y h w -> h (f w y)) :: t2 -> (t1 -> t) -> t1 -> t```

Recall that `->` is right-associative, which means that above type is equivalent to `t2 -> (t1 -> t) -> (t1 -> t)`. Parentheses at the end can be added and the meaning is the same. This corresponds to our expected type of `(a -> b -> b)`. Here, the value of `b` is assumed to be `t1 -> t`. If we substitute (`t1 -> t`) for `b` in the type signature of `foldr` we get

`(a -> (t1 -> t) -> (t1 -> t)) -> (t1 -> t) -> [a] -> (t1 -> t)`

Note that last parentheses can be dropped, which result in function that has four parameters:

`(a -> (t1 -> t) -> (t1 -> t)) -> (t1 -> t) -> [a] -> t1 -> t`

We already verified that lambda passed to `foldr` is of type `(a -> (t1 -> t) -> (t1 -> t))`. The second parameter, `id` function, is of type `(a -> a)`, which corresponds to `(t1 -> t)` in the type signature. Therefore usage of `id` imposes additional restriction that `t1` ~ `t`4, which means that type signature can be rewritten as:

`(a -> (t -> t) -> (t -> t)) -> (t -> t) -> [a] -> t -> t`

`[a]` corresponds to parameter `xs`, the list that we are folding. `t` corresponds to initial value of accumulator and the last `t` is the return type.

Now that we have verified type-correctness of the solution, let’s see how it works in practice. Let’s say we want to fold a list of three elements using `(+)` as folding function and `0` as initial value of the accumulator. In other words, we want to calculate the sum of elements in a list. If we use `foldr`, the evaluation process will look like this:

```foldr (+) 0 [1,2,3] = (+) 1 (foldr (+) 0 [2,3]) = (+) 1 ((+) 2 (foldr (+) 0 [3])) = (+) 1 ((+) 2 ((+) 3 (foldr (+) 0 []))) = (+) 1 ((+) 2 ((+) 3 0 )) = (+) 1 ((+) 2 3) = (+) 1 5 = 6```

If we instead use `foldl`, evaluation will look like this:

```foldl (+) 0 [1,2,3] = foldl ((+) 0 1) [2,3] = foldl ((+) ((+) 0 1) 2) [3] = foldl ((+) ((+) ((+) 0 1) 2) 3) [] = ((+) ((+) ((+) 0 1) 2) 3) = ((+) ((+) 1 2) 3) = ((+) 3 3) = 6```

Both folds produce the same result, which is a direct consequence of first duality theorem for folds5. Now let’s see how evaluation will proceed if we use `foldl` expressed using `foldr`:

```foldl (+) 0 [1,2,3] = foldr (\y h w -> h ((+) w y)) id [1,2,3] 0 = (\h w -> h ((+) w 1)) (foldr (\y h w -> h ((+) w y)) id [2,3]) 0 = (\h w -> h ((+) w 1)) (\h w -> h ((+) w 2)) (foldr (\y h w -> h ((+) w y)) id [3]) 0 = (\h w -> h ((+) w 1)) (\h w -> h ((+) w 2)) (\h w -> h ( (+) w 3) ) (foldr (\y h w -> h ( (+) w y) ) id []) 0 = (\h w -> h ((+) w 1)) (\h w -> h ((+) w 2)) (\h w -> h ((+) w 3)) id 0 = (\h w -> h ((+) w 1)) (\h w -> h ((+) w 2)) (w -> id ((+) w 3)) 0 = (\h w -> h ((+) w 1)) (w -> id ((+) ((+) w 2) 3)) 0 = (w -> id ((+) ((+) ((+) w 1) 2) 3)) 0 = id ((+) ((+) ((+) 0 1) 2) 3))```

We’ve reached expression that is the same as the one we reached when evaluating `foldl`. Well, in fact that is what we expected. After all this is also `foldl`! So the whole trick is based on using `foldr` to generate a function that accepts initial value of the accumulator and produces the same expression we would get when using `foldl` (plus the identity function).

I hope that this post made it clear how to express `foldl` using `foldr`. This is of course by no means an exhaustive treatment of the subject of folds. There’s a lot more. I think that Hutton’s paper is a good starting point. Bird’s and Wadler’s “Introduction to Functional Programming” also seems to be a very valuable resource, though I’ve read only chapter about folds6. There’s still some more stuff to figure out about folds, like the difference in behaviour of `foldr` and `foldl` for infinite lists or expressing `foldr` using `foldl` (for finite lists only).

1. Graham Hutton in his paper uses the name fold to denote `foldr`. I’m sticking with `foldr` so my derivation remains consistent with Haskell []
2. ? is pronounced eta. []
3. Notice that I renamed the name of the first function from `f` to `k` to prevent the name clash. []
4. ~ notation means that two types are the same. []
5. See: Bird, R., Wadler, P. “Introduction to functional programming”, 1st ed., p. 68 []
6. There’s also a second edition of this book authored only by Bird, but I wasn’t able to find it. []

## Is recursion good?

In today’s post I’m going to write a little bit about recursion. This text is aimed at programmers familiar with structural and object-oriented programming in languages like C/C++ or Java. I’m going to briefly summarize what recursion is, what happens when a recursive function is called, then I’ll explain what is proper tail recursion, tail call optimization and how to convert normal recursion into proper tail recursion using accumulator. Ready? Then let’s begin.

# A brief overview

“Recursion in computer science is a method where the solution to a problem depends on solutions to smaller instances of the same problem” – that’s what Wikipedia says. The easiest example is the definition of a factorial:

```factorial 0 = 1 factorial n = n * factorial (n - 1)```

This means that factorial of 0 is 1 (this is called a base case, which must exist in order to prevent infinite recursion), and factorial of 3 is 3 * factorial 2, which is 3 * 2 * factorial 1, which is 3 * 2 * 1 * factorial 0, which is 3 * 2 * 1 * 1 which is 6. The above code is also a valid Haskell program. Many problems in computer science are recursive by nature. Although every recursive algorithm can be converted into an iterative one – otherwise it wouldn’t be possible to perform recursion on sequential computers like the ones we have today – it often turn out that the recursive algorithm is much easier to write and understand1. Nevertheless, recursive definitions are often considered bad. Why? Let’s take a look at factorial definition in C:

```int factorial( int n ) { if ( n == 0 ) return 1; return n * factorial( n - 1 ); }```

When a recursive call is made – in fact when any function call is made –  a new stack frame is created. Stack frame contains copies of arguments passed to function, return address of the procedure and local variables of a function. Creating a stack frame takes a bit of time and if there are many recursive calls there is a risk of overflowing the stack. There are other risk as well, e.g. unnecessary recalculation of some values as in the case of Fibonacci sequence. Many programmers therefore consider that it is better to avoid recursive algorithms and replace them with iterative ones even if they are harder to write and understand. Looking from a Java or C/C++ point of view this might be the right thinking. But what if we could avoid creating a stack frame during the recursive call? Well, sometimes we can.

# Proper tail recursion

Let’s rewrite our factorial function in C:

```int factorial( int n ) { if ( n == 0 ) return 1; int temp = factorial( n - 1 ); return n * temp; }```

Recursive call is made somewhere in the middle of a function. The result from this call is then used to perform more computations and produce a new result. This is called “body recursion”. Now consider something like this:

```int print_loop( int n ) { if ( n == 0 ) return 0; int m = 2 * n; printf( "i%\n", m ); return print_loop( n - 1 ); }```

This function prints a decreasing sequence of even numbers from 2n to 2 (inclusive) and then returns 0. In this example when we reach the stopping condition (i.e. n == 0) we return a result. This result is then returned by the recursive functions without further processing. It means that recursive call is the last thing that is done within a function and when that call returns, the calling function itself will also return immediately. We call this “tail recursion”. When a function returns, its stack frame is removed. The return address from that frame is used to jump to the original calling point and since the return is the last thing we do in a tail recursive function this means that the remaining elements in the stack (arguments, locals) are discarded. Therefore, the call stack that we build is mostly useless – the only thing we need is the return address which is used only to get back to another return address. This can be optimized. In properly tail recursive function, instead of creating new stack frame when making a call, we can overwrite existing stack frame: replace old copies of arguments with the new ones, old local variables will be reused and the original return address is kept. This is called “tail call optimization” or “tail call elimination”.

# Tail recursion using accumulator

As we’ve seen, our factorial definition isn’t tail recursive. Luckily, there is a technique that allows us to convert body recursion into tail recursion. It is based on using additional parameter – called the accumulator – that accumulates the result of calculations. Let’s rewrite our C function once more, this time to introduce additional parameter which holds the results of computations up to a given call:

```int factorial( int n ) { return factorial_help( n, 1 ); }   int factorial_helper( int n, int acc ) { if ( n == 0 ) return acc; return factorial_helper( n - 1, n * acc ); }```

Let’s see how it works:

```factorial( 3 ) ->
factorial_helper( 3, 1 ) ->
factorial_helper( 2, 3 * 1 ) ->
factorial_helper( 1, 3 * 2 ) ->
factorial_helper( 0, 6 ) ->
6```

That was simple, wasn’t it? This technique can be used practically always, but it’s not always effective (e.g. when building lists). Does this mean you can use this trick to in your C or Java programs? Well, you could write your recursion using accumulator, but neither Java nor C are required to perform tail call optimization (they might do it in some cases, but they don’t have to in general), so performance of your programs could decrease (more parameters to copy). This is especially a problem for Java, since there are functional languages that run on Java Virtual machine (Scala and Clojure) and that cannot use tail call optimization2. Let my quote a paper “The role of study of programming languages in the education of a programmer” by Daniel P. Friedman:

“We don’t want to be bothered with design flaws that have been dropped into languages by well-meaning designers and implementators. Some example of this are (…) Java’s lack of support for tail calls. There are likely very well-intentioned reasons for these mistakes, but mistakes they are, nonetheless. Guy Steele, (…) co-author of ‘Java Language Specification” now works for SUN and he has communicated with me that he was promised back in 1997 that this flaw would be fixed. Here it is 2001 and there is still no resolution of this problem on the horizon.”

It is 2012 and still no solution. I won’t get deeper into this matter (I don’t feel sufficiently competent for that), but if you’re a Java programmer than reading this might be interesting.

# “What should I do now?”

So what languages are properly tail recursive? That’s a good question and I’m still trying to figure out the answer. Certainly Scheme (see: R5RS, section 3.5) and Erlang (but see the Erlang myths). Haskell is… ummm… different and I don’t yet fully understand it’s memory model (check here and here). As I said, some compilers of C and Java can sometimes optimize tail recursion.

The most important question is: should we really care to use recursion? As I said in the beginning, many computer science problems are recursive by nature, but whether you approach them in an iterative or recursive approach mostly depends on your language. In Java it would be unnatural to process a list using recursion. In Haskell or Erlang it’s idiomatic, since these languages have the operator to get the head of the list (tail of the list is the processed recursively) but they don’t have looping constructs. Most of us programmers expect a simple answer and when we have it we follow assume-don’t-verify policy. I think there is no general rule for tail recursion. There are cases when it’s faster, there are cases when it’s slower so you really have to measure the performance and choose the faster algorithm. You also should be aware of compilers internals, how it optimizes your code, how it calls functions, manages parameters and so on.

I’ve been learning FP for about 3 months now and I must say that approaching problems in a recursive manner have changed my point of view greatly. I find recursive solutions much easier to understand because they are constructed by decomposing the problem into simple cases. This is easier to code and therefore less error prone, so the more I program in Scheme or Haskell the more I wish Java had features of functional languages. If you’re still not convinced about power and usefulness of recursion then perhaps you should read this.

UPDATE (05/04/2012): Tail recursion and TCO is also discussed in fourth chapter of Real World Haskell (read it here).

1. Try googling iterative version of Quick-sort algorithm to see a good example of this. []
2. Clojure has a workaround for this []

Staypressed theme by Themocracy