Posts tagged: stg

Let no escape!

I observed that the number of people familiar with Core and using it for debugging purposes is relatively big to the number of people familiar with other two intermediate representations used by GHC further in the pipeline: STG and Cmm. Some seem to treat these two a bit like black magic :-) I have to admit that I used to treat them like this before I came to MSR to work on the GHC backend. Now I am a bit more familiar with code generator and I’d like to share some of my knowledge. Today I will present an optimization called “let-no-escape” (LNE for short). I believe it is almost unknown amongst Haskellers and GHC hackers. Main reasons for that are: a) it is performed in the code generator (STG-to-Cmm pass) and, as I said, people tend to stay away from that part of the compiler; b) it was not described in any paper (“We didn’t consider it to be worth a paper” – Simon Marlow), nor it is described on GHC wiki – the only chance to learn about was to find it in the source code. My plan is to explain what and why LNE does.

Motivation

Consider this imperative code in C:

if (y > 0) {
  x = 3;
} else {
  x = 4;
}
...x...// use x

Earlier I mentioned the Cmm intermediate language used by GHC. It is a low-level imperative language, something between C and assembly. If we were to rewrite above C program into Cmm we would get something like this:

     if (y > 0) goto L1; else goto L2;
L1 : x = 3
     goto L3;
L2 : x = 4;
     goto L3;
L3 : ....x... // use x

In both C and Cmm code we would expect the generated assembly to check value of y, perform a conditional jump to one of the branches, set value of x to 3 or 4 and finally perform unconditional jump from the branch to code that follows the if construct (i.e. L3 label in Cmm version).

All of this seems obvious and looks very simple, but it gets more complicated when we start thinking in a functional setting. How do we implement this logic in Haskell? Well, we could do something like this:

let j x = ...x... -- use x
in case (y > 0) of
      True  -> j 3
      False -> j 4

We introduced a binding j, which corresponds to code after the conditional (again, L3 in the Cmm example). case expression corresponds to if statement. In the branches we just make a call to j and pass desired value of x as a parameter. There’s a problem here though. If we simply compile this code then j will be compiled as a function. It will be a closure with two entry points (slow and fast) and possibly stack check and heap check at the very beginning. So with C we had an unconditional jump, whereas here we get a function call with all its overhead. This is very bad and we want something better. One easy way to avoid call to j would be inlining it in every case alternative. But this duplicates code so this is no good either. We really want to produce code that is as fast as produced by C and without unnecessary duplication. Let-no-escape optimization allows us to achieve that.

Let-no-escape

When we compile core to STG we distinguish between two types of let bindings: normal and let-no-escape ones. A binding is a let-no-escape binding if it is non-updatable (so it must have some parameters), it is called in a tail position with exactly the right number of arguments and it is “guaranteed to be entered before the stack retreats – i.e. it is not enclosed in a heap-allocated closure or passed as an argument to something”1. Normal bindings – i.e. bindings for which at least one of these conditions does not hold – are compiled as a closure and follow the calling convention i.e. they expect their parameters to be passed in some particular registers or on the stack. Let-no-escape bindings are compiled in a special way: they are not closures and don’t follow the calling convention. Instead they are compiled as a normal block of Cmm code that expects its entry parameters to be located in some particular local variables (which will later be mapped to hardware registers or spilled to the stack). To “call” a LNE binding we place parameters in the correct local variables and jump to the block of Cmm code.

Let’s look once again at our example program:

let j x = ...x... -- use x
in case (y > 0) of
      True  -> j 3
      False -> j 4

The binding for j is a let-no-escape one: it is non-updatable, in a tail position, all call sites pass exactly the required number of arguments and it is not passed as a parameter. This means that we can compile j as a let-no-escape binding – it will not be a closure and will expect its parameter in a local variable. Call sites for j – i.e. True and False alternatives of a case expression – need to place the value of argument x in a place where j expects it. This brings us to Cmm code that I presented at the top of this post:

     if (y > 0) goto L1; else goto L2;
L1 : x = 3
     goto L3;
L2 : x = 4;
     goto L3;
L3 : ....x... // use x

The conditional if implements case expression that selects one of alternatives. L1 is the True branch, while L2 is the False alternative. We compiled j in such a way that it expects its parameter to be passed in a local variable x, so every call site needs to initialize x with the value it wants to pass to j as an argument. After initializing the argument we perform unconditional jump to code that implements the binding (instead of making a call). In this example L3 implements the j let-no-escape binding. It is not a callable closure, but a normal block of Cmm code which expects its argument to be passed in a local variable x. In this way we compiled a functional program to efficient low-level code that is comparable with C.

Summary

This was a conceptual explanation of what are let-no-escape bindings and why we use them. A separate question is “how is this optimisation implemented by GHC?”. Relevant code is located in a couple of places, mostly in codeGen/ directory (STG -> Cmm pass). If you want to explore implementation details a good starting point might be getCallMethod function and CallMethod data type in StgCmmClosure module and cgIdApp in StgCmmExpr module.

  1. Quote from source file compiler/stgSyn/CoreToStg.lhs []

Getting friendly with STG

I’ve been spending last months on developing GHC. No rocket science so far, just a bit of hacking here and there. The biggest thing I am working on is ticket #6135, which is about changing some of the existing PrimOps to return unboxed Int# instead of Bool. This means that the result of comparing two unboxed values will be either an unboxed 0# or unboxed 1#, instead of a tagged pointer to statically allocated object representing True or False. This modification will allow to write branchless algorithms in Haskell. I promise to write about this one day, but today I want to blog about a different topic.

It so happens that things I’ve been doing in GHC require me to make changes in the code generator. This is a bit challenging for me, because the code generator is something that didn’t interest me much when I started to learn about compilers. Probably the main reason for this is that code generation means dealing with assembly. I’ve been programming for about 16 years and only two languages caused me problems when I tried to learn them. Assembly is one of them1. I have been learning it for one year during my studies and, although I had no problems with understanding the idea behind assembly and writing short snippets of code, writing a larger piece of code always ended up in a headache.

It looks that the time has come to overcome my fear. During last months I’ve been reading a lot of assembly generated by GHC and I even made some attempts at writing assembly code by myself (well, using intrinsics, but I guess that counts). But between Haskell source code and the generated executable there are many intermediate steps. From my observations it seems that many Haskellers have basic knowledge of Core – GHC’s intermediate language. Most have also heard about other two intermediate representations used by GHC – STG and Cmm – but it seems that few people know them, unless they hack the compiler. And since I’m hacking the compiler I should probably have more knowledge about these two representations, right?

There’s a classic paper by Simon Peyton-Jones “Implementing lazy functional languages on stock hardware: the Spineless Tagless G-machine”. It is quite long – 87 pages total – and, being published in 1992, it is mostly out of date. These two things kept me from reading it, although I think that being out of date was only a pretext for me to avoid reading almost 90 pages of text. But, since I need to learn about STG, I finally decided to give it a shot. Reading the paper took my four days. Paper is very well written and in general is an easy read. I was afraid that I might not understand formal description of operational semantics of STG, but it turned out to be well explained so I had no problem with that. The major problem turned out to be the amount of knowledge I had to learn while reading. This resulted in problems with fully understanding last sections of the paper. Not because they are more difficult than the initial ones, but because I didn’t fully remember all the details that were discussed earlier. An important question is which information is not up to date. I’m not yet familiar with the existing implementation, but it seems that many things have changed: the Spineless Tagless G-machine is not tagless any more since the introduction of pointer tagging; curried function are now evaluated using eval/apply convention, while the paper describes push/enter; the paper discusses only compilation to C, while currently C back-end is becoming deprecated in favour of native code generator and LLVM; and finally the layout of closures is now slightly different than the one presented in the paper. I am almost certain that garbage collection is also performed differently. These are the differences that I noticed, which means that really a lot has changed since the publication over 20 years ago. Surprisingly, this doesn’t seem like a big problem, because the most important thing is that the paper presents an idea of how STG works, while the mentioned changes are only not so important details.

So, now that I have a basic idea of how STG works, what comes next? There are a few follow up papers:

  • “The STG runtime system (revised)” – an updated description of STG written in 1999 by Simon Peyton Jones and Simon Marlow. I guess it’s also outdated, but still probably worth reading. It has only 65 pages :)
  • “Making a Fast Curry. Push-Enter vs. Eval-Apply for Higher-order Languages” – this described the mentioned eval/apply and push/enter strategies. Already read this one.
  • “Faster Laziness Using Dynamic Pointer Tagging” – this will tell you why STG is not tagless. Read this one also.

And once I’ll deal with STG I’ll have to learn about Cmm.

  1. In case you’re interested, the other one is Io []

Staypressed theme by Themocracy