Posts tagged: ghc

Configuring GHC development environment – tips & tricks

I started working on GHC about a year ago. Since then I learned how important it is to organize my working environment to relieve me of doing common and repeatable tasks. In this post I describe some of my configuration and practices. If you prefer reading code than lengthy blog posts you can go directly to this github repo that contains my scripts and configuration.

*nix for the win

First and foremost I am using Linux on all of my machines. Debian is my distro of choice, but any *nix based system will do. That said I believe things I describe below can’t be done on Windows. Unless you’re using Cygwin. But then again if you work under Cygwin then maybe it’s time to switch to Linux instead of faking it?

Sandboxes

One thing I quickly learned is that it is useful to have access to different versions of GHC and – if you’re working on the backend – LLVM. It is also useful to be able to install latest GHC HEAD as your system-wide GHC installation. I know there are tools designed to automate sandboxing, like hsenv, but I decided to use sandboxing method described by Edsko. This method is essentially based on setting your path to point to certain symlinks and then switching these symlinks to point to different GHC installations. Since I’ve been using this heavily I wrote a script that manages sandboxes in a neat way. When run without parameters it displays list of sandboxes in a fashion identical to git branch command. When given a sandbox name it makes that sandbox active. It can also add new and remove existing sandboxes. It is even smart enough to prevent removal of a default sandbox. Finally, I’ve set up my .bashrc file to provide auto-completion of sandbox names. Here’s how it looks in practice (click to enlarge):

ghc-sandbox

Scripting for the win

This is probably obvious to anyone working under Linux: script as much as you can. If you find yourself doing something for the second or third time then this particular activity should be scripted. I know how hard it is to convince yourself to dedicate 10 or 15 minutes to write a script when you can do the task in 1 minute, but this effort will quickly pay off. I have scripts for pulling the GHC source repositories (even though I do it really seldom), resetting the GHC build tree, starting tmux sessions and a couple of other things.

Environment variables

In the beginning I wrote my scripts in an ad-hoc way with all the paths hardcoded. This turned out to be a pain when I decided to reorganize my directory structure. The moral is: define paths to commonly used directories as environment variables in your shell’s configuration file (~/.bashrc in case of bash). Once you’ve done that make your scripts dependent on that variables. This will save you a lot of work when you decide to move your directories around. I’ve also defined some assertion functions in my .bashrc file. I use them to check whether the required variables are set and if not the script fails gracefully.

Auto-completion

Bash has a built-in auto-completion support. It allows you to get auto-completion of parameters for the commonly used commands. I have auto-completion for cabal and my sandbox management scripts. When GHC 7.8 comes out it will have support for auto-completion as well.

Emacs

I use Emacs for development despite my initial scepticism. Since configuring Emacs is a nightmare I started a page on GHC wiki to gather useful tips, tricks and configurations in one place so that others can benefit from them. Whatever editor you are using make sure that you take as much advantage of its features as possible.

Firefox

GHC wiki describes how to set up Firefox to quickly find tickets by number. Use that to your benefit.

Make

Geoffrey Mainland managed to convince me to use make and I thank him for that. Makefiles are a great help if you’re debugging GHC and need to repeatedly recompile a test case and possibly analyse some Core or Cmm dumps. Writing the first Makefile is probably the biggest pain but later you can reuse it as a template. See here for some example Makefiles I used for debugging.

Summary

The goal of this post was to convince you that spending time on configuring and scripting your GHC development environment is an investment. It will return and it will allow you to focus on important things that really require your attention. Remember that most of my configuration and scripts described in this post is available on github.

Cmm pipeline woes

For the last 4 months I have been working on improving compilation of intermediate Cmm language used by GHC. Recently I am noticing some recurring problems in design of the Cmm pipeline.

Cmm

I already mentioned in my earlier post that Cmm is a low-level language, something between C and assembly. Cmm is produced from another intermediate language, STG. A single Cmm procedure is represented as a directed graph. Each node in a graph is a Cmm Block of low level instructions. Exactly one Cmm Block in a graph is an entry point – a block from which the execution starts when procedure represented by a graph is called. Most Cmm Blocks in a graph have at least one successor, that is node(s) to which control flows from a given Cmm Block. A Cmm Block may not have a successor if it is a call to another procedure, i.e. it passes flow of control to another Cmm graph. Each Cmm Block consists of a linear list of Cmm Nodes. A Cmm Node represents a single Cmm instruction like store to a register, conditional jump or call to a function.

Pipeline

Cmm representation produced by the STG -> Cmm pass is incomplete. For example operations on the stack are represented in an abstract way. It is also far from being optimal as it may contain lots of empty blocks or control flow paths that will never be taken. That’s why we have Cmm pipeline. It takes Cmm representation produced by the code generator1 and applies a series of transformations to it. Some of them are mandatory (like stack layout), while others perform optimisations and are completely optional. Here’s a rough overview of the pipeline:

  1. Control Flow Optimisations. Optimises structure of a graph by concatenating blocks and omitting empty blocks.
  2. Common Block Elimination (optional). Eliminates duplicate blocks.
  3. Minimal Proc-point set. Determines a minimal set of proc-points2.
  4. Stack Layout. Turns abstract stack representation into explicit stack pointer references. Requires proc-point information computed in step 3. Creates stack maps.
  5. Sinking pass (optional). Moves variable declarations closer to their usage sites. Inlines some literals and registers in a way similar to constant propagation.
  6. CAF analysis. Does analysis of constant-applicative forms (top-level declarations that don’t have any parameters). CAF information is returned by the Cmm pipeline together with optimized Cmm graph.
  7. Proc-point analysis and proc-point splitting (optional). Here the pipeline splits into two alternative flows. They are identical except for the fact that one branch begins by performing proc-point splitting. This means that blocks that were determined to be proc-points are now turned into separate procedures. Requires proc-point information computed in step 3.
  8. Info Tables. Populates info tables of each Cmm function with stack usage information. Uses stack maps created by the stack layout (step 4).
  9. Control Flow Optimisations (optional). Repeat control flow optimisations (step 1), but this time this is optional.
  10. Unreachable Blocks Removal. Eliminates blocks that don’t have a predecessor in the Cmm graph.

As an example consider this Cmm produced by the code generator:

c4wk:
    goto c4wi;
c4wi:
    goto c4wl;
c4wl:
    goto c4wm;
c4wm:
    if ((old + 0) - <highSp> < SpLim) goto c4x9; else goto c4xa;
c4x9:
    R2 = _s2Rv::P64;
    R1 = _s2Ru::P64;
    call (stg_gc_fun)(R2, R1) args: 8, res: 0, upd: 8;

After going through the Cmm pipeline the empty blocks are eliminated and abstract stack representation ((old + 0) and <highSp>) is turned into explicit stack usage:

c4wi:
    if (Sp - 88 < SpLim) goto c4x9; else goto c4xa;
c4x9:
    R2 = _s2Rv::P64;
    R1 = _s2Ru::P64;
    call (stg_gc_fun)(R2, R1) args: 8, res: 0, upd: 8;

Woes

 During my work on the Cmm pipeline I encountered following bugs and design issues:

  • In some cases the Stack Layout phase (step 4) can invalidate the minimal set of proc-points by removing a block that was earlier determined to be a proc-point. However, minimal proc-point information is later used by proc-point analysis. GHC would panic if a proc-point block was removed. This was bug #8205. I solved it by adding additional check in the proc-point analysis that ensures that all proc-points actually exist in a graph. This was a simple, one line solution, but it doesn’t feel right. It accepts the fact that our data is in an inconsistent state and places the responsibility of dealing with that on algorithms relying on that state. In other words, any algorithm that relies on minimal proc-point set after the stack layout must check whether given proc-points exist in a graph.
  • I observed that in some circumstances control flow optimisation pass may lead to unexpected duplication of blocks (see #8456). After some investigation it turned out that this pass began by computing set of predecessors for each block and then modified the graph based on this information. The problem was that the list of predecessors was not being updated as the graph was modified. This problem is the same as the previous one: we compute a fact about our data structure and based on that fact we start modifying the data but we don’t update the fact as we go.
  • Control flow optimisations pass may produce unreachable blocks. They remain in the data structure representing the graph, but they are not reachable from any block in the graph. The common block elimination pass will remove the unreachable blocks but before it does the data is in an inconsistent state.
  • Same thing happens later: stack layout pass may create unreachable blocks and relies on later passes to remove them.

I listed only the problems I am aware of but I believe there are more and they will manifest themselves one day. I spent last couple of days thinking how to solve these issues. Of course fixing each of them separately is a relatively simple task, but I’m trying to come up with some general design that would prevent us from introducing inconsistencies in our data structures.

  1. That’s how we often refer to STG -> Cmm pass []
  2. I want to avoid going into details of what are proc-points and why do we need them. For the purpose of this post it is sufficient that you consider proc-point as a property that might be assigned to a block based on its predecessors []

Let no escape!

I observed that the number of people familiar with Core and using it for debugging purposes is relatively big to the number of people familiar with other two intermediate representations used by GHC further in the pipeline: STG and Cmm. Some seem to treat these two a bit like black magic :-) I have to admit that I used to treat them like this before I came to MSR to work on the GHC backend. Now I am a bit more familiar with code generator and I’d like to share some of my knowledge. Today I will present an optimization called “let-no-escape” (LNE for short). I believe it is almost unknown amongst Haskellers and GHC hackers. Main reasons for that are: a) it is performed in the code generator (STG-to-Cmm pass) and, as I said, people tend to stay away from that part of the compiler; b) it was not described in any paper (“We didn’t consider it to be worth a paper” – Simon Marlow), nor it is described on GHC wiki – the only chance to learn about was to find it in the source code. My plan is to explain what and why LNE does.

Motivation

Consider this imperative code in C:

if (y > 0) {
  x = 3;
} else {
  x = 4;
}
...x...// use x

Earlier I mentioned the Cmm intermediate language used by GHC. It is a low-level imperative language, something between C and assembly. If we were to rewrite above C program into Cmm we would get something like this:

     if (y > 0) goto L1; else goto L2;
L1 : x = 3
     goto L3;
L2 : x = 4;
     goto L3;
L3 : ....x... // use x

In both C and Cmm code we would expect the generated assembly to check value of y, perform a conditional jump to one of the branches, set value of x to 3 or 4 and finally perform unconditional jump from the branch to code that follows the if construct (i.e. L3 label in Cmm version).

All of this seems obvious and looks very simple, but it gets more complicated when we start thinking in a functional setting. How do we implement this logic in Haskell? Well, we could do something like this:

let j x = ...x... -- use x
in case (y > 0) of
      True  -> j 3
      False -> j 4

We introduced a binding j, which corresponds to code after the conditional (again, L3 in the Cmm example). case expression corresponds to if statement. In the branches we just make a call to j and pass desired value of x as a parameter. There’s a problem here though. If we simply compile this code then j will be compiled as a function. It will be a closure with two entry points (slow and fast) and possibly stack check and heap check at the very beginning. So with C we had an unconditional jump, whereas here we get a function call with all its overhead. This is very bad and we want something better. One easy way to avoid call to j would be inlining it in every case alternative. But this duplicates code so this is no good either. We really want to produce code that is as fast as produced by C and without unnecessary duplication. Let-no-escape optimization allows us to achieve that.

Let-no-escape

When we compile core to STG we distinguish between two types of let bindings: normal and let-no-escape ones. A binding is a let-no-escape binding if it is non-updatable (so it must have some parameters), it is called in a tail position with exactly the right number of arguments and it is “guaranteed to be entered before the stack retreats – i.e. it is not enclosed in a heap-allocated closure or passed as an argument to something”1. Normal bindings – i.e. bindings for which at least one of these conditions does not hold – are compiled as a closure and follow the calling convention i.e. they expect their parameters to be passed in some particular registers or on the stack. Let-no-escape bindings are compiled in a special way: they are not closures and don’t follow the calling convention. Instead they are compiled as a normal block of Cmm code that expects its entry parameters to be located in some particular local variables (which will later be mapped to hardware registers or spilled to the stack). To “call” a LNE binding we place parameters in the correct local variables and jump to the block of Cmm code.

Let’s look once again at our example program:

let j x = ...x... -- use x
in case (y > 0) of
      True  -> j 3
      False -> j 4

The binding for j is a let-no-escape one: it is non-updatable, in a tail position, all call sites pass exactly the required number of arguments and it is not passed as a parameter. This means that we can compile j as a let-no-escape binding – it will not be a closure and will expect its parameter in a local variable. Call sites for j – i.e. True and False alternatives of a case expression – need to place the value of argument x in a place where j expects it. This brings us to Cmm code that I presented at the top of this post:

     if (y > 0) goto L1; else goto L2;
L1 : x = 3
     goto L3;
L2 : x = 4;
     goto L3;
L3 : ....x... // use x

The conditional if implements case expression that selects one of alternatives. L1 is the True branch, while L2 is the False alternative. We compiled j in such a way that it expects its parameter to be passed in a local variable x, so every call site needs to initialize x with the value it wants to pass to j as an argument. After initializing the argument we perform unconditional jump to code that implements the binding (instead of making a call). In this example L3 implements the j let-no-escape binding. It is not a callable closure, but a normal block of Cmm code which expects its argument to be passed in a local variable x. In this way we compiled a functional program to efficient low-level code that is comparable with C.

Summary

This was a conceptual explanation of what are let-no-escape bindings and why we use them. A separate question is “how is this optimisation implemented by GHC?”. Relevant code is located in a couple of places, mostly in codeGen/ directory (STG -> Cmm pass). If you want to explore implementation details a good starting point might be getCallMethod function and CallMethod data type in StgCmmClosure module and cgIdApp in StgCmmExpr module.

  1. Quote from source file compiler/stgSyn/CoreToStg.lhs []

Staypressed theme by Themocracy