Posts tagged: recursion

The basics of coinduction

I don’t remember when I first heard the terms “coinduction” and “corecursion” but it must have been quite long ago. I had this impression that they are yet another of these difficult theoretical concepts and that I should learn about them one day. That “one day” happened recently while reading chapter 5 of “Certified Programming with Dependent Types”. It turns out that basics of coinduction are actually quite simple. In this post I’ll share with you what I already know on the subject.

Recursion in Haskell

Let’s begin with looking at Haskell because it is a good example of language not formalizing coinduction in any way. Two features of Haskell are of interest to us. First one is laziness. Thanks to Haskell being lazy we can write definitions like these (in GHCi):

ghci> let ones = 1 : ones
ghci> let fib = zipWith (+) (1:fib) (1:1:fib)

ones is – as the name implies – an infinite sequence (list) of ones. fib is a sequence of Fibonacci numbers. Both these definitions produce infinite lists but we can use these definitions safely because laziness allows us to force a finite number of elements in the sequence:

ghci> take 5 ones
[1,1,1,1,1]
ghci> take 10 fib
[2,3,5,8,13,21,34,55,89,144]

Now consider this definition:

ghci> let inf = 1 + inf

No matter how hard we try there is no way to use the definition of inf in a safe way. It always causes an infinite loop:

ghci> (0 /= inf)
*** Exception: <<loop>>

The difference between definitions of ones or fib an the definition of inf is that the former use something what is called a guarded recursion. The term guarded comes from the fact that recursive reference to self is hidden under datatype constructor (or: guarded by a constructor). The way lazy evaluation is implemented gives a guarantee that we can stop the recursion by not evaluating the recursive constructor argument. This kind of infinite recursion can also be called productive recursion, which means that although recursion is infinite each recursive call is guaranteed to produce something (in my examples either a 1 or next Fibonacci number). By contrast recursion in the definition of inf is not guarded or productive in any way.

Haskell happily accepts the definition of inf even though it is completely useless. When we write Haskell programs we of course don’t want them to fall into silly infinite loops but the only tool we have to prevent us from writing such code is our intelligence. Situation changes when it comes to….

Dependently-typed programming languages

These languages deeply care about termination. By “termination” I mean ensuring that a program written by the user is guaranteed to terminate for any input. I am aware of two reasons why these languages care about termination. First reason is theoretical: without termination the resulting language is inconsistent as logic. This happens because non-terminating term can prove any proposition. Consider this non-terminating Coq definition:

Fixpoint evil (A : Prop) : A := evil A.

If that definition was accepted we could use it to prove any proposition. Recall that when it comes to viewing types as proofs and programs as evidence “proving a proposition” means constructing a term of a given type. evil would allow to construct a term inhabiting any type A. (Prop is a kind of logical propositions so A is a type.) Since dependently-typed languages aim to be consistent logics they must reject non-terminating programs. Second reason for checking termination is practical: dependently typed languages admit functions in type signatures. If we allowed non-terminating functions then typechecking would also become non-terminating and again this is something we don’t want. (Note that Haskell gives you UndecidableInstances that can cause typechecking to fall into an infinite loop).

Now, if you paid attention on your Theoretical Computer Science classes all of this should ring a bell: the halting problem! The halting problem says that the problem of determining whether a given Turing machine (read: a given computer program) will ever terminate is undecidable. So how is that possible that languages like Agda, Coq or Idris can answer that question? That’s simple: they are not Turing-complete (or at least their terminating subsets are not Turing complete). (UPDATE: but see Conor McBride’s comment below.) They prohibit user from using some constructs, probably the most important one being general recursion. Think of general recursion as any kind of recursion imaginable. Dependently typed languages require structural recursion on subterms of the arguments. That means that if a function receives an argument of an inductive data type (think: algebraic data type/generalized algebraic data type) then you can only make recursive calls on terms that are syntactic subcomponents of the argument. Consider this definition of map in Idris:

map : (a -> b) -> List a -> List b
map f []      = []
map f (x::xs) = f x :: map f xs

In the second equation we use pattern matching to deconstruct the list argument. The recursive call is made on xs, which is structurally smaller then the original argument. This guarantees that any call to map will terminate. There is a silent assumption here that the List A argument passed to map is finite, but with the rules given so far it is not possible to construct infinite list.

So we just eliminated non-termination by limiting what can be done with recursion. This means that our Haskell definitions of ones and fib would not be accepted in a dependently-typed language because they don’t recurse on an argument that gets smaller and as a result they construct an infinite data structure. Does that mean we are stuck with having only finite data structures? Luckily, no.

Coinduction to the rescue

Coinduction provides a way of defining and operating on infinite data structures as long as we can prove that our operations are safe, that is they are guarded and productive. In what follows I will use Coq because it seems that it has better support for coinduction than Agda or Idris (and if I’m wrong here please correct me).

Coq, Agda and Idris all require that a datatype that can contain infinite values has a special declaration. Coq uses CoInductive keyword instead of Inductive keyword used for standard inductive data types. In a similar fashion Idris uses codata instead of data, while Agda requires ∞ annotation on a coinductive constructor argument.

Let’s define a type of infinite nat streams in Coq:

CoInductive stream : Set :=
| Cons : nat -> stream -> stream.

I could have defined a polymorphic stream but for the purpose of this post stream of nats will do. I could have also defined a Nil constructor to allow finite coinductive streams – declaring data as coinductive means it can have infinite values, not that it must have infinite values.

Now that we have infinite streams let’s revisit our examples from Haskell: ones and fib. ones is simple:

CoFixpoint ones : stream := Cons 1 ones.

We just had to use CoFixpoint keyword to tell Coq that our definition will be corecursive and it is happily accepted even though a similar recursive definition (ie. using Fixpoint keyword) would be rejected. Allow me to quote directly from CPDT:

whereas recursive definitions were necessary to use values of recursive inductive types effectively, here we find that we need co-recursive definitions to build values of co-inductive types effectively.

That one sentence pins down an important difference between induction and coinduction.

Now let’s define zipWith and try our second example fib:

CoFixpoint zipWith (f : nat -> nat -> nat) (a : stream)
                   (b : stream) : stream :=
  match a, b with
    | Cons x xs, Cons y ys => Cons (f x y) (zipWith f xs ys)
  end.
 
CoFixpoint fib : stream :=
   zipWith plus (Cons 1 fib) (Cons 1 (Cons 1 fib)).

Unfortunately this definition is rejected by Coq due to “unguarded recursive call”. What exactly goes wrong? Coq requires that all recursive calls in a corecursive definition are:

  1. direct arguments to a data constructor
  2. not inside function arguments

Our definition of fib violates the second condition – both recursive calls to fib are hidden inside arguments to zipWith function. Why does Coq enforce such a restriction? Consider this simple example:

Definition tl (s : stream) : stream :=
  match s with
    | Cons _ tl' => tl'
  end.
 
CoFixpoint bad : stream := tl (Cons 1 bad).

tl is a standard tail function that discards the first element of a stream and returns its tail. Just like our definition of fib the definition of bad places the corecursive call inside a function argument. I hope it is easy to see that accepting the definition of bad would lead to non-termination – inlining definition of tl and simplifying it leads us to:

CoFixpoint bad : stream := bad.

and that is bad. You might be thinking that the definition of bad really has no chance of working whereas our definition of fib could in fact be run safely without the risk of non-termination. So how do we persuade Coq that our corecursive definition of fib is in fact valid? Unfortunately there seems to be no simple answer. What was meant to be a simple exercise in coinduction turned out to be a real research problem. This past Monday I spent well over an hour with my friend staring at the code and trying to come up with a solution. We didn’t find one but instead we found a really nice paper “Using Structural Recursion for Corecursion” by Yves Bertot and Ekaterina Komendantskaya. The paper presents a way of converting definitions like fib to a guarded and productive form accepted by Coq. Unfortunately the converted definition looses the linear computational complexity of the original definition so the conversion method is far from perfect. I encourage to read the paper. It is not long and is written in a very accessible way. Another set of possible solutions is given in chapter 7 of CPDT but I am very far from labelling them as “accessible”.

I hope this post demonstrates that basics ideas behind coinduction are actually quite simple. For me this whole subject of coinduction looks really fascinating and I plan to dive deeper into it. I already have my eyes set on several research papers about coinduction so there’s a good chance that I’ll write more about it in future posts.

Why foldr works for infinite lists and foldl doesn’t

About two months ago I wrote a post about expressing foldl in terms of foldr. Back then I left one question open – why does foldr work for infinite lists, while foldl doesn’t? I finally found some time sit down and find the answer.

First of all the fact that foldl doesn’t work for infinite lists, while foldr does, was counter-intuitive for me. I thought that since foldl consumes the list from the beginning it should be able to stop at some point and return the result. On the other hand foldr was explained to me as consuming a list from the right, that is from the end. Since infinite lists have no end it seemed to me that foldr shouldn’t be able to handle infinite lists.

Here are the definitions of folds from Haskell 2010 Language Report:

foldl :: (a -> b -> a) -> a -> [b] -> a
foldl _ z []     = z
foldl f z (x:xs) = foldl f (f z x) xs
foldr :: (a -> b -> b) -> b -> [a] -> b
foldr _ z [] = z
foldr f z (x:xs) = f x (foldr f z xs)

These two definitions supported my incorrect intuition. After all they show clearly that foldl processes the first argument of a list immediately, while foldr needs to process whole list with foldr before it can pass the result to f. Or at least I thought that they show this.

As I already said all the above intuitions are wrong. My mistake became clear to me when I explicitly wrote how recursion looks for each fold. For foldl the recursion is:

f (... (f ( f (f z x1) x2) x3) ...) xn

For foldr recursion looks like this:

f x1 (f x2 (f x3 (...(f xn z) ...)))

Now you can see that for foldl you need to get to the end of the list to make the most outer call. In case of foldr you need the first element of the list and the result of processing the rest of the list with foldr. Unless you can determine the value of f without the need for its second parameter! This is in fact the case for some operators, logical conjunction for example – if first parameter is False then we can conclude that the whole expression is False, without the need to evaluate the second argument. Therefore foldr will work for infinite lists if the accumulating function is lazy in its second argument. One might ask if foldl will work for infinite lists if the accumulating function is lazy in its first argument. The answer is no – you still need the last element of a list to calculate the value of first call and there is no last element for infinite lists.

Looking at the fold definitions given earlier I made one embarrassing omission. Recursion in foldl is unconditional! The recursive call is being made no matter what. The only way to stop the recursion is getting to the end of a list, but for infinite lists this ain’t gonna happen. In case of foldr recursion is conditional – it depends on the first argument of f, assuming of course that f is lazy for its second argument. Moreover, looking at the implementation of foldr given above you can see that it in fact works from the left! My intuition about foldr was so fixed on it consuming the list from the right that I even missed this obvious fact when writing this post. Big thanks to nh2, who pointed that out in his comment. So in the end consuming a list from the right is about grouping of terms with parentheses.

As a final remark let me note that definitions of foldl and foldr in Haskell libraries are slightly different from those given in the Haskell report. GHC.Base defines foldr as:

foldr :: (a -> b -> b) -> b -> [a] -> b
-- foldr _ z []     =  z
-- foldr f z (x:xs) =  f x (foldr f z xs)
{-# INLINE [0] foldr #-}
-- Inline only in the final stage, after the foldr/cons rule has had a chance
-- Also note that we inline it when it has *two* parameters, which are the 
-- ones we are keen about specialising!
foldr k z = go
          where
            go []     = z
            go (y:ys) = y `k` go ys

While Data.List contains following definition of foldl:

foldl :: (a -> b -> a) -> a -> [b] -> a
foldl f z0 xs0 = lgo z0 xs0
             where
                lgo z []     =  z
                lgo z (x:xs) = lgo (f z x) xs

Semantics are of course identical with folds defined in the report.

Y-combinator in Matlab

For over 3 years my main programming language was Matlab. Matlab was designed for scientific computations – it has a lot of build in functions for numerical computation as well as some syntactic sugar which allows to manipulate arrays easily. Matlab is imperative, supports object oriented programming (though the implementation is very bad) and uses dynamic typing, so all type checking is done at runtime. One of Matlab’s features is the ability to store function handles in variables. Does this ring a bell?

Yes! Functions as first-class citizens. This should allow to do some functional programming, right? I decided to give it a try and write Y-combinator in Matlab.

A few words about Y-combinator

Let me first write a few words about Y-combinator in case you’re not familiar with it. Look at this recursive definition of Fibonacci function:

function val = fib( n )
  if ( n == 0 || n == 1 )
    val = 1;
  else
    val = fib( n - 1 ) + fib( n - 2 );
  end
end

This recursive function – and probably all other recursive functions that you’ve seen – works because we are able to give name to a function, which allows the definition to refer to itself. What would happen however if we were unable to give name to a function? This might seem a bit abstract, but think about anonymous lambdas. As the name suggests they are anonymous. They have no name and therefore cannot refer to themselves. But there is a way to make anonymous recursive functions by using the Y-combinator. I will not go into details of how and why the Y-combinator works the way it does, but I strongly encourage readers to explore this subject. The best way to learn about Y-combinator is to walk through its derivation. This is a truly mind-bending exercise. I needed about 5 days to understand how Y-combinator works but when I finally did it was one of these “ooooohh” moments.

You will find a derivation of Y-combinator in the 9th chapter of “The Little Schemer”. The book might be a bit hard to find and I consider this derivation to be a bit criptic (though the book itself is great). Luckily Peteris Krumins extended derivation from “The Little Schemer”. I will base my post on his results. So, the final version of the Y-combinator written in Scheme is:

(define Y
  (lambda (le)
    ((lambda (f) (f f))
     (lambda (f)
       (le (lambda (x) ((f f) x)))))))

and the example of usage (also in Scheme) is:

((Y (lambda (length)
     (lambda (list)
       (cond
         ((null? list) 0)
         (else
          (add1 (length (cdr list))))))))
 '(a b c d e f g h i j))

The above listing shows an anonymous recursive function that calculates the length of a list.

I will present my results to match those above as closely as possible.

Anonymous functions in Matlab

In order to work with Y-combinator we will have to define anonymous functions. In the Scheme code above an anonymous function for calculating the length of a list is passed as a parameter to Y-combinator. It turns out however that anonymous functions in Matlab have some limitations. Let’s take a look at the documentation:

The syntax for creating an anonymous function from an expression is

fhandle = @(arglist) expr

Starting from the right of this syntax statement, the term expr represents the body of the function: the code that performs the main task your function is to accomplish. This consists of any single, valid MATLAB expression.

The fact that Matlab allows anonymous functions to consist of only one expressions has serious consequences. Imperative languages divide all their language constructs into two categories: expressions, which return some value and statements, which don’t return any value1. Sadly, the control-flow constructs like if and for are statements, which means that we can’t include them in an anonymous function. This is problem, because length function shown above needs a conditional instruction to check if the list passed to it is empty or not.

Therefore, our first step is to create a new version of if construct which will be an expression and not a statement. There are a few different ways to achieve this. I decided to use cell arrays. Cell arrays are Matlab’s data structure similar to arrays, except for the fact that every cell can hold different type of value. My custom if instruction will take two parameters: a predicate that evaluates either to 1 (Matlab’s true) or 0 (Matlab’s false) and a cell array with two function handles. The code looks like this:

if_ = @( pred_, cond_ ) cond_{ 2 - pred_ }();

The pred_ variable is the predicate – either 0 or 1 – and cond_ is a cell array. If the predicate is 0 then second function in cond_ cell array will be used. If the pred_ is 1 then if_ will use first function in cond_ cell array2. Notice that there’s () after cell array index. This is a function application. This means that after selecting one of two function handles, the function pointed by it is executed immediately and if_ returns value returned by that function. Here’s an example3:

 >> if_ ( 1 == 2, { @() disp( 'The predicate is true'), @() disp( 'The predicate is false' ) } )
The predicate is false

Had we removed the parens, if_ would return a function handle allowing it to be evaluated later:

 >> if_ ( 1 == 2, { @() disp( 'The predicate is true'), @() disp( 'The predicate is false' ) } )
ans =
  @()disp('The predicate is false')

This is somewhat similar to lazy evaluation.

These are not the only limitations of Matlab. Consider the example below.

f = @(x) x == 1
g = @(x) x

We created two functions: f tests its parameter for equality with 1, while g is the identity function – it returns its parameter. If we apply g to f, we should get f, that is a handle to a function that tests its parameter for equality with one:

>> g(f)
ans =
  @(x)x==1

That is what we expected. We got a function handle to anonymous function that accepts one parameter. It is reasonable to expect that we can now pass parameter to that handle. Unfortunately, Matlab will not allow us to do so:

 >> (g(f))(1)
(g(f))(1)
|
Error: Unbalanced or unexpected parenthesis or bracket.
 
>> g(f)(1)
Error: ()-indexing must appear last in an index expression.

So we cannot chain anonymous function calls that easily. We have to use Matlab’s feval function, that evaluates a function handle with given parameters:

>> feval(g(f), 1)
ans =
  1

The Y-Combinator

With this knowledge we are now able to rewrite Scheme code presented earlier. Here’s how Y-combinator looks like:

Y   = @( le ) feval(            ...
          @( f ) f( f ),        ...
          @( h )                ...
            le( @( x ) feval( h( h ), x ) ) );

This is almost the same as Scheme code presented earlier. We just replaced lambda with @ (both denote anonymous function declaration) and changed function application syntax to match Matlab. Before we rewrite the length function in Matlab let us define some helper functions:

if_  = @( pred_, cond_ ) cond_{ 2 - pred_ }();
add1 = @( x ) x + 1;
cdr  = @( list ) list( 2 : end );

We’ve already seen if_, but it’s here just to remind you that we need that declaration. The add1 function increments its parameter by one, while cdr emulates Scheme’s cdr function that returns tail of a list. Finally we can see Y-combinator in action:

feval( Y ( @( length_ )                           ...
  @( list )                                       ...
    if_(                                          ...
      isempty( list ), {                          ...
        @() 0,                                    ...
        @() add1( length_( cdr( list ) ) ) } ) ), ...
  ['a', 'b', 'c', 'd', 'e', 'f', 'g', 'h', 'i', 'j' ] )

This code can be placed into a script (say Y_combinator.m) and evaluated:

 >> Y_combinator
ans =
  10

Conclusions and further reading

As you can see, Matlab’s support for function handles allows to write Y-combinator. The result looks fairly simple, but I must admit that it wasn’t straightforward and I had some hard time struggling with syntax. Matlab’s functional programing capabilities are very limited, but there are a few more things that can be done. A more in-depth treatment can be found on A Page of Insanity blog here. The solution presented there is very similar to mine. Check out also this gist to see a different approach. See also Mike Vanier’s blog for more details on Y-combinator. I find Mike’s derivation a bit harder to follow, but Mike discusses both strict ans lazy versions of Y-combinator (I used only strict version).

  1. Please tell me if there exists an imperative language that does not have this distinction. []
  2. Matlab uses 1-based indexing []
  3. Text preceded by >> is typed into Matlab prompt []

Staypressed theme by Themocracy