Posts tagged: compilers

Typed holes support in Template Haskell

Back in April I found myself in a need for typed holes in Template Haskell. To my disappointment it turned out that typed holes are not implemented in TH. Sadly, this happens too often: a feature is added to GHC but no Template Haskell support is implemented for it. This was the time when I was working on injective type families and I already had some experience in extending TH implementation. I figured that adding support for typed holes should be a trivial task, no more than 30 minutes of coding. I created a feature request on Trac and started coding. I quickly realized that it won’t be that simple. Not that the amount of required work was that extensive. I simply tripped over the way GHC handles names internally. As a result the work got stalled for several months and I only finished it two weeks ago thanks to help from Richard Eisenberg.

My patch allows you to do several interesting things. Firstly, it allows to quote typed holes, ie. expressions with name starting with an underscore:

[d| i :: a -> a
    i x = _ |]

This declaration quote will represent _ using an UnboundVarE constructor. Secondly, you can now splice unbound variables:

i :: a -> a
i x = $( return $ VarE (mkName "_") )
 
j :: a -> a
j x = $( return $ UnboundVarE (mkName "_") )

Notice that in a splice you can use either VarE or UnboundVarE to represent an unbound variable – they are treated the same.

A very important side-effect of my implementation is that you can actually quote unbound variables. This means that you can now use nested pattern splices, as demonstrated by one of the tests in GHC testsuite:

baz = [| \ $( return $ VarP $ mkName "x" ) -> x |]

Previously this code was rejected. The reason is that:

  1. nested pattern splice is not compiled immediately, because it is possible that it refers to local variables defined outside of the bracket;
  2. the bracket is renamed immediately at the declaration site and all the variables were required to be in scope at that time.

The combination of the above means that the pattern splice does not bring anything into scope (because it is not compiled until the outer bracket is spliced in), which lead to x being out of scope. But now it is perfectly fine to have unbound variables in a bracket. So the above definition of baz is now accepted. When it is first renamed x is treated as an unbound variable, which is now fine, and when the bracket is spliced in, the inner splice is compiled and it correctly brings binding for x into scope. Getting nested pattern splices to work was not my intention when I started implementing this patch but it turned out we essentially got this feature for free.

One stumbling block during my work was typed Template Haskell. With normal, untyped TH I can place a splice at top-level in a file:

$$(return [
   SigD (mkName "m")
        (ForallT [PlainTV (mkName "a")]
                 []
                 (AppT (AppT ArrowT (VarT (mkName "a"))) (VarT (mkName "a"))))
 , FunD (mkName "m")
        [Clause [VarP (mkName "x")] (NormalB (VarE (mkName "x"))) [] ]
   ])

and this will build a definition that will be spliced into the source code. But converting this into a typed splice, by saying $$(return ...., resulted in compiler panic. I reported this as #10945. The reason turned out to be quite tricky. When Template Haskell is enabled, top-level expressions are allowed. Each such expression is treated as an implicit splice. The problem with typed TH splice is that it doesn’t really make sense at the top-level and it should be treated as an implicit splice. Yet it was treated as an explicit splice, which resulted in a panic later in the compiler pipeline.

Another issue that came up with typed TH was that typed holes cannot be quoted, again leading to panic. I reported this as #10946. This issue has not yet been solved.

The above work is now merged with HEAD and will be available in GHC 8.0.

Cmm pipeline woes

For the last 4 months I have been working on improving compilation of intermediate Cmm language used by GHC. Recently I am noticing some recurring problems in design of the Cmm pipeline.

Cmm

I already mentioned in my earlier post that Cmm is a low-level language, something between C and assembly. Cmm is produced from another intermediate language, STG. A single Cmm procedure is represented as a directed graph. Each node in a graph is a Cmm Block of low level instructions. Exactly one Cmm Block in a graph is an entry point – a block from which the execution starts when procedure represented by a graph is called. Most Cmm Blocks in a graph have at least one successor, that is node(s) to which control flows from a given Cmm Block. A Cmm Block may not have a successor if it is a call to another procedure, i.e. it passes flow of control to another Cmm graph. Each Cmm Block consists of a linear list of Cmm Nodes. A Cmm Node represents a single Cmm instruction like store to a register, conditional jump or call to a function.

Pipeline

Cmm representation produced by the STG -> Cmm pass is incomplete. For example operations on the stack are represented in an abstract way. It is also far from being optimal as it may contain lots of empty blocks or control flow paths that will never be taken. That’s why we have Cmm pipeline. It takes Cmm representation produced by the code generator1 and applies a series of transformations to it. Some of them are mandatory (like stack layout), while others perform optimisations and are completely optional. Here’s a rough overview of the pipeline:

  1. Control Flow Optimisations. Optimises structure of a graph by concatenating blocks and omitting empty blocks.
  2. Common Block Elimination (optional). Eliminates duplicate blocks.
  3. Minimal Proc-point set. Determines a minimal set of proc-points2.
  4. Stack Layout. Turns abstract stack representation into explicit stack pointer references. Requires proc-point information computed in step 3. Creates stack maps.
  5. Sinking pass (optional). Moves variable declarations closer to their usage sites. Inlines some literals and registers in a way similar to constant propagation.
  6. CAF analysis. Does analysis of constant-applicative forms (top-level declarations that don’t have any parameters). CAF information is returned by the Cmm pipeline together with optimized Cmm graph.
  7. Proc-point analysis and proc-point splitting (optional). Here the pipeline splits into two alternative flows. They are identical except for the fact that one branch begins by performing proc-point splitting. This means that blocks that were determined to be proc-points are now turned into separate procedures. Requires proc-point information computed in step 3.
  8. Info Tables. Populates info tables of each Cmm function with stack usage information. Uses stack maps created by the stack layout (step 4).
  9. Control Flow Optimisations (optional). Repeat control flow optimisations (step 1), but this time this is optional.
  10. Unreachable Blocks Removal. Eliminates blocks that don’t have a predecessor in the Cmm graph.

As an example consider this Cmm produced by the code generator:

c4wk:
    goto c4wi;
c4wi:
    goto c4wl;
c4wl:
    goto c4wm;
c4wm:
    if ((old + 0) - <highSp> < SpLim) goto c4x9; else goto c4xa;
c4x9:
    R2 = _s2Rv::P64;
    R1 = _s2Ru::P64;
    call (stg_gc_fun)(R2, R1) args: 8, res: 0, upd: 8;

After going through the Cmm pipeline the empty blocks are eliminated and abstract stack representation ((old + 0) and <highSp>) is turned into explicit stack usage:

c4wi:
    if (Sp - 88 < SpLim) goto c4x9; else goto c4xa;
c4x9:
    R2 = _s2Rv::P64;
    R1 = _s2Ru::P64;
    call (stg_gc_fun)(R2, R1) args: 8, res: 0, upd: 8;

Woes

 During my work on the Cmm pipeline I encountered following bugs and design issues:

  • In some cases the Stack Layout phase (step 4) can invalidate the minimal set of proc-points by removing a block that was earlier determined to be a proc-point. However, minimal proc-point information is later used by proc-point analysis. GHC would panic if a proc-point block was removed. This was bug #8205. I solved it by adding additional check in the proc-point analysis that ensures that all proc-points actually exist in a graph. This was a simple, one line solution, but it doesn’t feel right. It accepts the fact that our data is in an inconsistent state and places the responsibility of dealing with that on algorithms relying on that state. In other words, any algorithm that relies on minimal proc-point set after the stack layout must check whether given proc-points exist in a graph.
  • I observed that in some circumstances control flow optimisation pass may lead to unexpected duplication of blocks (see #8456). After some investigation it turned out that this pass began by computing set of predecessors for each block and then modified the graph based on this information. The problem was that the list of predecessors was not being updated as the graph was modified. This problem is the same as the previous one: we compute a fact about our data structure and based on that fact we start modifying the data but we don’t update the fact as we go.
  • Control flow optimisations pass may produce unreachable blocks. They remain in the data structure representing the graph, but they are not reachable from any block in the graph. The common block elimination pass will remove the unreachable blocks but before it does the data is in an inconsistent state.
  • Same thing happens later: stack layout pass may create unreachable blocks and relies on later passes to remove them.

I listed only the problems I am aware of but I believe there are more and they will manifest themselves one day. I spent last couple of days thinking how to solve these issues. Of course fixing each of them separately is a relatively simple task, but I’m trying to come up with some general design that would prevent us from introducing inconsistencies in our data structures.

  1. That’s how we often refer to STG -> Cmm pass []
  2. I want to avoid going into details of what are proc-points and why do we need them. For the purpose of this post it is sufficient that you consider proc-point as a property that might be assigned to a block based on its predecessors []

GHC gets new primops

I finished 7th week of my MSR internship in Cambridge, with 5 more weeks to go. I’m working on Cmm optimizations, but so far my project has been moving rather slowly and I’m not yet sure if the final results will be beneficial enough to be merged into HEAD. But I finally finished something I’ve been working on for the last couple of months – patches for new comparison PrimOps in GHC (Trac ticket #6135) have been merged on Wednesday. I created a wiki page which gives detailed information about motivation behind this change, discusses some implementation details and presents preliminary benchmarking results. This post is aimed at people who want to learn more about how the previous and current implementation of comparisons work.

So what is a Primitive Operation – PrimOp for short – in GHC? The developer wiki defines primops as “functions that cannot be implemented in Haskell, and are provided natively by GHC”. In other words these are special functions, which are not defined in the standard libraries – because these libraries are written in Haskell and, as we just said, PrimOps cannot be expressed in Haskell – but instead are built into the compiler. Whenever GHC encounters a PrimOp invocation in the source code it magically generates machine code for that PrimOp. There’s a lot of PrimOps in GHC (their definitions are here) and my task was to modify the ones responsible for comparing unboxed primitive values like Int# or Double#. Up till now all these comparisons returned a Bool indicating whether a given equality or ordering relation holds between the two parameters. This was determined by comparing them using a machine instruction like cmp, which set flags in flag register of the processor. Since the result of a primop was supposed to be a Bool (boxed, lifted value) and not some flags, we had to use another instruction1 to examine flags set by cmp and based on them set a value of a general purpose register to 0# (representing False) or 1# (representing True). This gives us an Int#, so still not a Bool, but fear not! GHC has a tagToEnum# primop, which takes an unboxed integer and produces a value of an enumeration2. Calling tagToEnum# 0# means “create value represented by the first value constructor” (in case of Bool this is False), while calling tagToEnum# 1# means “create value represented by the second value constructor” (in case of Bool this is True)3. So we have our Bool value. Now we must examine whether it is True or False and act accordingly. Since Bool is both boxed and lifted, its values are represented by closures. Bool is a bit of a special case, because closures for its value constructors are allocated statically in the data area of a compiled program and not dynamically on the heap. Normally we would have to inspect a returned closure, but GHC has two optimisations which kick in here. One of them is pointer tagging, which allows us to determine a value constructor in a closure without dereferencing the pointer. This is done using pointer tagging. But we don’t even use pointer tagging in this case, because GHC is even smarter – it optimizes away closures generated by comparison primops and just uses an unboxed integer value generated by machine instructions doing comparison. This requires a special case in the code generator which is Not A Good Thing.

That’s how things were up till now. The key idea behind my change is that code generator no longer generates implicit call to tagToEnum#. Instead, the new primops return an Int# (either 0# or 1#) and whenever we really want a Bool we have to call tagToEnum# explicitly in the Haskell source code.

There is one thing that could be improved here. We still need that special case in the code generator to avoid generating Bool closures and inspecting results by checking pointer tags in situations where we make an explicit call to tagToEnum#. But Simon already has an idea how to resolve that problem. We just need to add some extra simplifier transformations that will optimize away tagToEnum# at the Core level, although I’m not yet sure if I’ll have enough time to implement that during my internship.

  1. From SETcc family. []
  2. Algebraic data type with nullary constructors []
  3. Note that tagToEnum# has type Int# -> a, which means it is a polymorphic primop – quite an unusual thing []

Staypressed theme by Themocracy