Why foldr works for infinite lists and foldl doesn’t

About two months ago I wrote a post about expressing foldl in terms of foldr. Back then I left one question open – why does foldr work for infinite lists, while foldl doesn’t? I finally found some time sit down and find the answer.

First of all the fact that foldl doesn’t work for infinite lists, while foldr does, was counter-intuitive for me. I thought that since foldl consumes the list from the beginning it should be able to stop at some point and return the result. On the other hand foldr was explained to me as consuming a list from the right, that is from the end. Since infinite lists have no end it seemed to me that foldr shouldn’t be able to handle infinite lists.

Here are the definitions of folds from Haskell 2010 Language Report:

foldl :: (a -> b -> a) -> a -> [b] -> a
foldl _ z []     = z
foldl f z (x:xs) = foldl f (f z x) xs
foldr :: (a -> b -> b) -> b -> [a] -> b
foldr _ z [] = z
foldr f z (x:xs) = f x (foldr f z xs)

These two definitions supported my incorrect intuition. After all they show clearly that foldl processes the first argument of a list immediately, while foldr needs to process whole list with foldr before it can pass the result to f. Or at least I thought that they show this.

As I already said all the above intuitions are wrong. My mistake became clear to me when I explicitly wrote how recursion looks for each fold. For foldl the recursion is:

f (... (f ( f (f z x1) x2) x3) ...) xn

For foldr recursion looks like this:

f x1 (f x2 (f x3 (...(f xn z) ...)))

Now you can see that for foldl you need to get to the end of the list to make the most outer call. In case of foldr you need the first element of the list and the result of processing the rest of the list with foldr. Unless you can determine the value of f without the need for its second parameter! This is in fact the case for some operators, logical conjunction for example – if first parameter is False then we can conclude that the whole expression is False, without the need to evaluate the second argument. Therefore foldr will work for infinite lists if the accumulating function is lazy in its second argument. One might ask if foldl will work for infinite lists if the accumulating function is lazy in its first argument. The answer is no – you still need the last element of a list to calculate the value of first call and there is no last element for infinite lists.

Looking at the fold definitions given earlier I made one embarrassing omission. Recursion in foldl is unconditional! The recursive call is being made no matter what. The only way to stop the recursion is getting to the end of a list, but for infinite lists this ain’t gonna happen. In case of foldr recursion is conditional – it depends on the first argument of f, assuming of course that f is lazy for its second argument. Moreover, looking at the implementation of foldr given above you can see that it in fact works from the left! My intuition about foldr was so fixed on it consuming the list from the right that I even missed this obvious fact when writing this post. Big thanks to nh2, who pointed that out in his comment. So in the end consuming a list from the right is about grouping of terms with parentheses.

As a final remark let me note that definitions of foldl and foldr in Haskell libraries are slightly different from those given in the Haskell report. GHC.Base defines foldr as:

foldr :: (a -> b -> b) -> b -> [a] -> b
-- foldr _ z []     =  z
-- foldr f z (x:xs) =  f x (foldr f z xs)
{-# INLINE [0] foldr #-}
-- Inline only in the final stage, after the foldr/cons rule has had a chance
-- Also note that we inline it when it has *two* parameters, which are the 
-- ones we are keen about specialising!
foldr k z = go
            go []     = z
            go (y:ys) = y `k` go ys

While Data.List contains following definition of foldl:

foldl :: (a -> b -> a) -> a -> [b] -> a
foldl f z0 xs0 = lgo z0 xs0
                lgo z []     =  z
                lgo z (x:xs) = lgo (f z x) xs

Semantics are of course identical with folds defined in the report.

Y-combinator in Matlab

For over 3 years my main programming language was Matlab. Matlab was designed for scientific computations – it has a lot of build in functions for numerical computation as well as some syntactic sugar which allows to manipulate arrays easily. Matlab is imperative, supports object oriented programming (though the implementation is very bad) and uses dynamic typing, so all type checking is done at runtime. One of Matlab’s features is the ability to store function handles in variables. Does this ring a bell?

Yes! Functions as first-class citizens. This should allow to do some functional programming, right? I decided to give it a try and write Y-combinator in Matlab.

A few words about Y-combinator

Let me first write a few words about Y-combinator in case you’re not familiar with it. Look at this recursive definition of Fibonacci function:

function val = fib( n )
  if ( n == 0 || n == 1 )
    val = 1;
    val = fib( n - 1 ) + fib( n - 2 );

This recursive function – and probably all other recursive functions that you’ve seen – works because we are able to give name to a function, which allows the definition to refer to itself. What would happen however if we were unable to give name to a function? This might seem a bit abstract, but think about anonymous lambdas. As the name suggests they are anonymous. They have no name and therefore cannot refer to themselves. But there is a way to make anonymous recursive functions by using the Y-combinator. I will not go into details of how and why the Y-combinator works the way it does, but I strongly encourage readers to explore this subject. The best way to learn about Y-combinator is to walk through its derivation. This is a truly mind-bending exercise. I needed about 5 days to understand how Y-combinator works but when I finally did it was one of these “ooooohh” moments.

You will find a derivation of Y-combinator in the 9th chapter of “The Little Schemer”. The book might be a bit hard to find and I consider this derivation to be a bit criptic (though the book itself is great). Luckily Peteris Krumins extended derivation from “The Little Schemer”. I will base my post on his results. So, the final version of the Y-combinator written in Scheme is:

(define Y
  (lambda (le)
    ((lambda (f) (f f))
     (lambda (f)
       (le (lambda (x) ((f f) x)))))))

and the example of usage (also in Scheme) is:

((Y (lambda (length)
     (lambda (list)
         ((null? list) 0)
          (add1 (length (cdr list))))))))
 '(a b c d e f g h i j))

The above listing shows an anonymous recursive function that calculates the length of a list.

I will present my results to match those above as closely as possible.

Anonymous functions in Matlab

In order to work with Y-combinator we will have to define anonymous functions. In the Scheme code above an anonymous function for calculating the length of a list is passed as a parameter to Y-combinator. It turns out however that anonymous functions in Matlab have some limitations. Let’s take a look at the documentation:

The syntax for creating an anonymous function from an expression is

fhandle = @(arglist) expr

Starting from the right of this syntax statement, the term expr represents the body of the function: the code that performs the main task your function is to accomplish. This consists of any single, valid MATLAB expression.

The fact that Matlab allows anonymous functions to consist of only one expressions has serious consequences. Imperative languages divide all their language constructs into two categories: expressions, which return some value and statements, which don’t return any value1. Sadly, the control-flow constructs like if and for are statements, which means that we can’t include them in an anonymous function. This is problem, because length function shown above needs a conditional instruction to check if the list passed to it is empty or not.

Therefore, our first step is to create a new version of if construct which will be an expression and not a statement. There are a few different ways to achieve this. I decided to use cell arrays. Cell arrays are Matlab’s data structure similar to arrays, except for the fact that every cell can hold different type of value. My custom if instruction will take two parameters: a predicate that evaluates either to 1 (Matlab’s true) or 0 (Matlab’s false) and a cell array with two function handles. The code looks like this:

if_ = @( pred_, cond_ ) cond_{ 2 - pred_ }();

The pred_ variable is the predicate – either 0 or 1 – and cond_ is a cell array. If the predicate is 0 then second function in cond_ cell array will be used. If the pred_ is 1 then if_ will use first function in cond_ cell array2. Notice that there’s () after cell array index. This is a function application. This means that after selecting one of two function handles, the function pointed by it is executed immediately and if_ returns value returned by that function. Here’s an example3:

 >> if_ ( 1 == 2, { @() disp( 'The predicate is true'), @() disp( 'The predicate is false' ) } )
The predicate is false

Had we removed the parens, if_ would return a function handle allowing it to be evaluated later:

 >> if_ ( 1 == 2, { @() disp( 'The predicate is true'), @() disp( 'The predicate is false' ) } )
ans =
  @()disp('The predicate is false')

This is somewhat similar to lazy evaluation.

These are not the only limitations of Matlab. Consider the example below.

f = @(x) x == 1
g = @(x) x

We created two functions: f tests its parameter for equality with 1, while g is the identity function – it returns its parameter. If we apply g to f, we should get f, that is a handle to a function that tests its parameter for equality with one:

>> g(f)
ans =

That is what we expected. We got a function handle to anonymous function that accepts one parameter. It is reasonable to expect that we can now pass parameter to that handle. Unfortunately, Matlab will not allow us to do so:

 >> (g(f))(1)
Error: Unbalanced or unexpected parenthesis or bracket.
>> g(f)(1)
Error: ()-indexing must appear last in an index expression.

So we cannot chain anonymous function calls that easily. We have to use Matlab’s feval function, that evaluates a function handle with given parameters:

>> feval(g(f), 1)
ans =

The Y-Combinator

With this knowledge we are now able to rewrite Scheme code presented earlier. Here’s how Y-combinator looks like:

Y   = @( le ) feval(            ...
          @( f ) f( f ),        ...
          @( h )                ...
            le( @( x ) feval( h( h ), x ) ) );

This is almost the same as Scheme code presented earlier. We just replaced lambda with @ (both denote anonymous function declaration) and changed function application syntax to match Matlab. Before we rewrite the length function in Matlab let us define some helper functions:

if_  = @( pred_, cond_ ) cond_{ 2 - pred_ }();
add1 = @( x ) x + 1;
cdr  = @( list ) list( 2 : end );

We’ve already seen if_, but it’s here just to remind you that we need that declaration. The add1 function increments its parameter by one, while cdr emulates Scheme’s cdr function that returns tail of a list. Finally we can see Y-combinator in action:

feval( Y ( @( length_ )                           ...
  @( list )                                       ...
    if_(                                          ...
      isempty( list ), {                          ...
        @() 0,                                    ...
        @() add1( length_( cdr( list ) ) ) } ) ), ...
  ['a', 'b', 'c', 'd', 'e', 'f', 'g', 'h', 'i', 'j' ] )

This code can be placed into a script (say Y_combinator.m) and evaluated:

 >> Y_combinator
ans =

Conclusions and further reading

As you can see, Matlab’s support for function handles allows to write Y-combinator. The result looks fairly simple, but I must admit that it wasn’t straightforward and I had some hard time struggling with syntax. Matlab’s functional programing capabilities are very limited, but there are a few more things that can be done. A more in-depth treatment can be found on A Page of Insanity blog here. The solution presented there is very similar to mine. Check out also this gist to see a different approach. See also Mike Vanier’s blog for more details on Y-combinator. I find Mike’s derivation a bit harder to follow, but Mike discusses both strict ans lazy versions of Y-combinator (I used only strict version).

  1. Please tell me if there exists an imperative language that does not have this distinction. []
  2. Matlab uses 1-based indexing []
  3. Text preceded by >> is typed into Matlab prompt []

Staypressed theme by Themocracy