Let no escape!

I observed that the number of people familiar with Core and using it for debugging purposes is relatively big to the number of people familiar with other two intermediate representations used by GHC further in the pipeline: STG and Cmm. Some seem to treat these two a bit like black magic :-) I have to admit that I used to treat them like this before I came to MSR to work on the GHC backend. Now I am a bit more familiar with code generator and I’d like to share some of my knowledge. Today I will present an optimization called “let-no-escape” (LNE for short). I believe it is almost unknown amongst Haskellers and GHC hackers. Main reasons for that are: a) it is performed in the code generator (STG-to-Cmm pass) and, as I said, people tend to stay away from that part of the compiler; b) it was not described in any paper (“We didn’t consider it to be worth a paper” – Simon Marlow), nor it is described on GHC wiki – the only chance to learn about was to find it in the source code. My plan is to explain what and why LNE does.

Motivation

Consider this imperative code in C:

if (y > 0) {
  x = 3;
} else {
  x = 4;
}
...x...// use x

Earlier I mentioned the Cmm intermediate language used by GHC. It is a low-level imperative language, something between C and assembly. If we were to rewrite above C program into Cmm we would get something like this:

     if (y > 0) goto L1; else goto L2;
L1 : x = 3
     goto L3;
L2 : x = 4;
     goto L3;
L3 : ....x... // use x

In both C and Cmm code we would expect the generated assembly to check value of y, perform a conditional jump to one of the branches, set value of x to 3 or 4 and finally perform unconditional jump from the branch to code that follows the if construct (i.e. L3 label in Cmm version).

All of this seems obvious and looks very simple, but it gets more complicated when we start thinking in a functional setting. How do we implement this logic in Haskell? Well, we could do something like this:

let j x = ...x... -- use x
in case (y > 0) of
      True  -> j 3
      False -> j 4

We introduced a binding j, which corresponds to code after the conditional (again, L3 in the Cmm example). case expression corresponds to if statement. In the branches we just make a call to j and pass desired value of x as a parameter. There’s a problem here though. If we simply compile this code then j will be compiled as a function. It will be a closure with two entry points (slow and fast) and possibly stack check and heap check at the very beginning. So with C we had an unconditional jump, whereas here we get a function call with all its overhead. This is very bad and we want something better. One easy way to avoid call to j would be inlining it in every case alternative. But this duplicates code so this is no good either. We really want to produce code that is as fast as produced by C and without unnecessary duplication. Let-no-escape optimization allows us to achieve that.

Let-no-escape

When we compile core to STG we distinguish between two types of let bindings: normal and let-no-escape ones. A binding is a let-no-escape binding if it is non-updatable (so it must have some parameters), it is called in a tail position with exactly the right number of arguments and it is “guaranteed to be entered before the stack retreats – i.e. it is not enclosed in a heap-allocated closure or passed as an argument to something”1. Normal bindings – i.e. bindings for which at least one of these conditions does not hold – are compiled as a closure and follow the calling convention i.e. they expect their parameters to be passed in some particular registers or on the stack. Let-no-escape bindings are compiled in a special way: they are not closures and don’t follow the calling convention. Instead they are compiled as a normal block of Cmm code that expects its entry parameters to be located in some particular local variables (which will later be mapped to hardware registers or spilled to the stack). To “call” a LNE binding we place parameters in the correct local variables and jump to the block of Cmm code.

Let’s look once again at our example program:

let j x = ...x... -- use x
in case (y > 0) of
      True  -> j 3
      False -> j 4

The binding for j is a let-no-escape one: it is non-updatable, in a tail position, all call sites pass exactly the required number of arguments and it is not passed as a parameter. This means that we can compile j as a let-no-escape binding – it will not be a closure and will expect its parameter in a local variable. Call sites for j – i.e. True and False alternatives of a case expression – need to place the value of argument x in a place where j expects it. This brings us to Cmm code that I presented at the top of this post:

     if (y > 0) goto L1; else goto L2;
L1 : x = 3
     goto L3;
L2 : x = 4;
     goto L3;
L3 : ....x... // use x

The conditional if implements case expression that selects one of alternatives. L1 is the True branch, while L2 is the False alternative. We compiled j in such a way that it expects its parameter to be passed in a local variable x, so every call site needs to initialize x with the value it wants to pass to j as an argument. After initializing the argument we perform unconditional jump to code that implements the binding (instead of making a call). In this example L3 implements the j let-no-escape binding. It is not a callable closure, but a normal block of Cmm code which expects its argument to be passed in a local variable x. In this way we compiled a functional program to efficient low-level code that is comparable with C.

Summary

This was a conceptual explanation of what are let-no-escape bindings and why we use them. A separate question is “how is this optimisation implemented by GHC?”. Relevant code is located in a couple of places, mostly in codeGen/ directory (STG -> Cmm pass). If you want to explore implementation details a good starting point might be getCallMethod function and CallMethod data type in StgCmmClosure module and cgIdApp in StgCmmExpr module.

  1. Quote from source file compiler/stgSyn/CoreToStg.lhs []

5 Responses to “Let no escape!”

  1. One thing I’m curious about is that the proposed Haskell code performs an inversion of the structure of the Cmm code. That is, I would have expected the Haskell code to look more like: (let x = if y > 0 then 3 else 4 in …x…). This formulation avoids the code duplication issue of inlining j, and strictness analysis should be able to determine that x can be forced and stored in a register before entering j.

    So what are some examples where this approach would fail? Why prefer LNE over this approach?

  2. Jan Stolarek says:

    Both your code and mine are perfectly valid – programmer may choose any of them depending on her coding style. We don’t want to force programmer to use “the only correct style of programming”. Instead, we want to ensure that no matter which version we choose it will be optimized. Your version is optimized with strictness analysis, mine is with LNE.

    Another question is whether the code of the form I have written is generated automatically during the compilation as a result of Core-to-Core transformations. I admit I don’t know if that happens, but I would suspect that some kind of let floating might lead to code that I presented. In such case we certainly need LNE to compile it efficiently.

    This brings us to the concept of join points (this is described in Santos’ thesis “Compilation by Transformation in Non-Strict Functional Languages”). GHC uses them to avoid code duplication (e.g. when introduced by case-of-case transformation). The question here is whether join points are compiled with LNE. If this was the case it would mean that my information that “LNE binding must have at least one argument” is incorrect. I admit I based it on the comment in the source code, without checking whether the code actually does that. Perhaps the comment is outdated? Perhaps it is not and join points are compiled using some different mechanism? I’ll try to find some time to look into the code and find the answer.

  3. Jan Stolarek says:

    I looked at GHC sources and concluded that join points are implemented using let-no-escape bindings. In fact I even wrote comments about this! I must’ve repressed that memory :-)

    There is one thing I don’t understand though. Comment in the source code says that a binding must be non-updatable to be a LNE binding. This means that a binding must have at least one parameter or be a lambda. Finally, I can imagine a join-point that does not have any parameters and is not a lambda, which implies that such a join point could be updated. This facts contradict each other and I don’t know where did I go wrong here. I mailed this to ghc-devs – see here.

  4. Nick Frisby says:

    “Non-updatable” just requires that it’s a value, right? Not necessary a lambda? I think I recall seeing LNEs for a string value in some parser in nofib.

  5. Jan Stolarek says:

    Um… but values are updatable. If you say:

    let x = [1..3]
    in x

    Then x is a value and it is updatable (i.e. it is created as a thunk and only evaluated on demand).

Leave a Reply

(required)

Staypressed theme by Themocracy