## Waiting for garbage collection can kill parallelism?

I am reposting my mail from Haskell-cafe, since I got no replies in over a week and I think it is an interesting case. I was reading “Parallel Performance Tuning for Haskell” by Jones, Marlow and Singh and wanted to replicate the results for their first case study. The code goes like this:

```module Main where   import Control.Parallel   main :: IO () main = print . parSumFibEuler 38 \$ 5300   fib :: Int -> Int fib 0 = 0 fib 1 = 1 fib n = fib (n - 1) + fib (n - 2)   mkList :: Int -> [Int] mkList n = [1..n-1]   relprime :: Int -> Int -> Bool relprime x y = gcd x y == 1   euler :: Int -> Int euler n = length (filter (relprime n) (mkList n))   sumEuler :: Int -> Int sumEuler = sum . (map euler) . mkList   sumFibEuler :: Int -> Int -> Int sumFibEuler a b = fib a + sumEuler b   parSumFibEuler :: Int -> Int -> Int parSumFibEuler a b = f `par` (e `pseq` (e + f)) where f = fib a e = sumEuler b```

In the paper authors show that this code performs computation of `fib` ans `sumEuler` in parallel and that good speed-up is achieved:

To make the parallelism more robust, we need to be explicit about the evaluation order we intend. The way to do this is to use `pseq` in combination with `par`, the idea being to ensure that the main thread works on `sumEuler` while the sparked thread works on `fib`. (…) This version does not make any assumptions about the evaluation order of `+`, but relies only on the evaluation order of `pseq`, which is guaranteed to be stable.

These results were obtained on older GHC version1. However, compiling program with:

`\$ ghc -O2 -Wall -threaded -rtsopts -fforce-recomp -eventlog parallel.hs`

and then running on GHC 7.4.1 using:

`\$ ./parallel +RTS -qa -g1 -s -ls -N2`

yields a completely different result. These are statistics for a parallel run on two cores:

```SPARKS: 1 (1 converted, 0 overflowed, 0 dud, 0 GC'd, 0 fizzled)   INIT    time    0.00s  (  0.00s elapsed) MUT     time    2.52s  (  2.51s elapsed) GC      time    0.03s  (  0.05s elapsed) EXIT    time    0.00s  (  0.00s elapsed) Total   time    2.55s  (  2.56s elapsed)```

Running the same code on one core results in:

```SPARKS: 1 (0 converted, 0 overflowed, 0 dud, 1 GC'd, 0 fizzled)   INIT    time    0.00s  (  0.00s elapsed) MUT     time    2.51s  (  2.53s elapsed) GC      time    0.03s  (  0.05s elapsed) EXIT    time    0.00s  (  0.00s elapsed) Total   time    2.55s  (  2.58s elapsed)```

Looking and `MUT` (mutator time) it looks that there is no speed-up at all. Investigating eventlog using ThreadScope sheds some light on execution of a program:

Both threads start computation, but HEC 1 soon blocks and only resumes when HEC 0 finishes computation. Zooming in it looks that HEC 1 stops because it requests garbage collection, but HEC 0 does not respond to that request so GC begins only when HEC 0 is done with its computation:

Why does this happen? I am no expert on GHC’s garbage collection, my only knowledge of that comes from section 6 of “Runtime Support for Multicore Haskell“. If I understood correctly this should not happen – it certainly didn’t happen when the paper was published. Do we have a regression or am I misunderstanding something?

1. Paper does not mention which version exactly. I believe it was 6.10, since “Runtime Support for Multicore Haskell” by the same authors released at the same time uses GHC 6.10 []

## How to shoot yourself in the foot with Haskell

Haskell is “advertised” as a safe language that does all type checking upfront, making sure that you don’t experience runtime type errors, null pointers and all that kind of stuff. It also gives you ways to bypass some of the safety mechanisms, so a conscious programmer can use unsafe functions to get a boost in performance (e.g. by not performing bounds checking when indexing a vector).

I’ve written some very ugly Haskell code that creates a vector using destructive updates. It is in fact an imperative algorithm, not a functional one. When the initialization is over the vector is frozen using `unsafeFreeze`. I wrote my code using `read` and `write` functions, tested it using QuickCheck and when the tests passed I switched to `unsafeRead` and `unsafeWrite` to make my program faster. Some time later I started getting random segfaults when running my tests. This never happened before in any of my Haskell programs so I almost panicked. At first I didn’t had a slightest idea how to even approach this problem. I suspected that this might even be a bug in GHC. Then I started disabling groups of tests trying to track down the bug and finally managed to locate a single test that was causing the problem. Guess what – it was the test of my vector initialization with unsafe functions. What happened is that after switching to `unsafeRead`/`unsafeWrite` I refactored the function and its test. I made a mistake in the testing code and passed incorrect data that resulted with an attempt to write an element at address -1. Finding this bug took me a little over an hour. A factor that made debugging harder was that disabling tests that seemed to cause the segfault resulted in problems appearing in a completely different part of the program – or so I thought by looking at the output of my program. Looks like I completely forgot about lazy evaluation and unspecified evaluation order!

Second bug I encountered was even trickier. I wrote functions that perform cyclic shifts of a signal by any value. For example shifting `[1,2,3,4]` left by 1 yields `[2,3,4,1]`. Note that shifting by 5, 9, 13 and so on gives exactly the same result – the shift function is periodic. You might recall that I used shift functions to demonstrate code testing. This time however I written shifts using Repa library. Then I created QuickCheck property stating that shifting any signal left and then right by same value yields the original signal. This is pretty obvious property and the tests passed without any problems. Later, when writing some other tests I ended up with one of the tests failing. Using ghci I checked that this error should not be happening at all, but after careful debugging it turned out that during actual tests some of the values in the results become zeros. After two hours of debugging I realized that the actual bug is in the shifting functions – they worked only for the basic period, that is shift values from 0 to signal length. Why QuickCheck didn’t manage to falsify my property of left/right shift compositions? Repa is a smart (and tricky) library that attempts to fuse many operations on an array into one. And it fused application of left shift followed by right shift into identity transform! Well, this is great. After all this is the kind of optimization we would like to have in our programs. But it turns out that it can also impact tests! After realizing what is going on it was actually a matter of 5 minutes to fix the bug, but finding it was not a trivial task.

## Configuring Emacs is a nightmare

When I started using Emacs a few months ago I went through some effort to configure it to my liking. There were a couple of issues I wasn’t able to resolve back then. I decided that it’s time to get them fixed. That’s how I wasted two days doing something that should take no longer than an hour.

About 90% of my time went to configuring Emacs running under Tmux. While Emacs runs fine under terminal emulator (I’m using Konsole for that) it turned out that running it under Tmux causes numerous problems: some key combinations don’t work and even worse the colours get screwed up. I am using Tmux always when working in console and at first I wanted to blame Tmux for all the problems. Soon I realized that all other applications work correct and only Emacs is causing problems. Problems with key combinations were solved using xterm-extras add-on. I am not exactly sure what is causing issues with colours but this is related to `TERM` environment variable. Within Tmux this is set to `screen`, while my terminal emulator sets it to `xterm`. If I understand correctly Tmux takes care of translating between these two different modes, but it looks that Emacs is still affected by this difference. Of course I tried changing values of `TERM` in all possible ways, but things only got worse. I tried improving Emacs’ colours by using color-theme but under my terminal emulator this breaks completely. I enabled 256 colours (by default there’s only 8!), but still I mostly got garbage when applying different themes. Oh well, I can live with 8 colours just fine. This was day one.

On the next day, when things were more or less working fine, I decided to upgrade Emacs to version 24.2. I was tempted by Emacs’ built in package management system and a promise of better compatibility with terminal emulators. This was a mistake. I spent another day trying to figure out why configuration that worked fine under 23.2 breaks under 24.2.

I acknowledge the fact that if I had arcane knowledge of Elisp all my issues would be much easier to solve. Without that knowledge I have to resort to googling and voodoo programming. Well, I am just an ordinary user who just wants to use a text editor. I need a text-based editor so I can do almost all of my work using console. Emacs meets that requirement, which makes me more productive but I seriously doubt that this increase in productivity justifies time needed to configure editor to my needs. I guess I would be proud of myself had I managed to solve all the problems. I didn’t and there are still unresolved issues:

• How can I manage keyboard shortcuts in a sane way? I install some expansion only to realize it is not working because keyboard shortcuts collide with the ones from different expansion. I can remap the keys, but then again I run into risk of colliding with other shortcuts.
• Can I have a block mode in Emacs similar to one offered by Kate (under KDE)? I found that rect-mark allows me to mark a rectangle area and cut it, but pasting doesn’t work the way it should.
• Emacs’ undo is clumsy. I managed to improve it by using undo-tree, but still it has rough edges: every delete is seen as a single operation. If I hold backspace and delete 30 characters Emacs sees that as 30 separate operations! Undoing that is a real pain.

Solving these issues would make Emacs a very decent editor to work with Haskell (though not as good as Eclipse is for Java). Perhaps one day I’ll find patience and energy to resolve the remaining problems. Until then I steel myself for using Emacs the way it is.

Staypressed theme by Themocracy