## Posts tagged: cabal

Three weeks ago I wrote about code testing in Haskell. Today I will discuss how to benchmark code written in Haskell and how to integrate benchmarks into a project. For demonstration purposes I extended my sample project on github so now it shows how to create both tests and benchmarks.

# Overview of Criterion benchmarking library

While there was a lot of testing libraries to choose from, it seems that benchmarking is dominated by only one library – Bryan O’Sullivan’s criterion. To get started with it you should read this post on Bryan’s blog. I will present some of the basics in today’s post and will also mention a couple of things that are undocumented.

Writing benchmarks for a functional language like Haskell is a bit tricky. There are no side effects in pure functional code, which means that after computing value of a function once it can be memoized and reused later without need for recomputing. This is of course not what we want during benchmarking. Criterion takes care of that, but requires that benchmarks be written in a special way. Let’s look at an example banchmark for our shift function1:

`bench "Left shift" \$ nf (cyclicShiftLeft 2) [1..8192]`

The `nf` function is the key here. It takes two parameters: first is the benchmarked function saturated with all but its last argument; second is the last parameter to the benchmarked function. The type of `nf` is:

```ghci> :t nf nf :: Control.DeepSeq.NFData b => (a -> b) -> a -> Pure```

When the benchmark is run `nf` applies the argument to the function and evaluates it to normal form, which means that the result gets fully evaluated. This is needed to ensure that laziness doesn’t distort the outcomes of a benchmark.

Code shown above will work perfectly, but I find such way of creating benchmarks very inconvenient due to four reasons:

• presence of magic numbers
• problems with creating more complex input data
• verbosity of benchmarks when there are many parameters taken by the benchmarked function
• problems of keeping consistency between benchmarks that should take the same inputs (it wouldn’t make sense to benchmark shift left and right functions with signals of different length)

To deal with these problems I decided to write my benchmarks using wrappers:

`bench "Left shift"  \$ nf benchCyclicShiftLeft paramCyclicShift`

The `benchCyclicShiftLeft` function takes a tuple containing all the data needed for a benchmarked function:

```{-# INLINE benchCyclicShiftLeft #-} benchCyclicShiftLeft :: (Int, [Double]) -> [Double] benchCyclicShiftLeft (n, sig) = cyclicShiftLeft n sig```

The `INLINE` pragma is used to make sure that the function doesn’t add unnecessary call overhead. As you have probably guessed, the `paramCyclicShift` takes care of creating the tuple. In my code `paramCyclicShift` is actually a wrapper around such function:

```dataShift :: RandomGen g => g -> Int -> Int -> (Int, [Double]) dataShift gen n sigSize = (n, take sigSize \$ randoms gen)```

To keep benchmarking code easily manageable I organize it similarly to tests. Project root contains `bench` directory with structure identical to `src` and `tests` directories. File containing benchmarks for a module is named like that module but with “Bench” appended before file extension. For example `benchCyclicShiftLeft` and `dataShift` functions needed to benchmark code in `src/``Signal/``Utils.hs` are placed in `bench/``Signal/``UtilsBench.hs`. Just like tests, benchmarks are assembled into one suite in `bench/``MainBenchmarkSuite.hs` file:

```import qualified BenchParam        as P import qualified Signal.UtilsBench as U   main :: IO () main = newStdGen >>= defaultMainWith benchConfig (return ()) . benchmarks   benchmarks :: RandomGen g => g -> [Benchmark] benchmarks gen =     let paramCyclicShift = U.dataShift gen P.shiftSize P.sigSize     in [       bgroup "Signal shifts"       [         bench "Left shift"  \$ nf U.benchCyclicShiftLeft  paramCyclicShift       , bench "Right shift" \$ nf U.benchCyclicShiftRight paramCyclicShift       ]     ]   benchConfig :: Config benchConfig = defaultConfig {              cfgPerformGC = ljust True            }```

The most important part is the `benchmarks` function, which takes a random number generator and assembles all benchmarks into one suite (UPDATE (24/10/2012): read a follow-up post on random data generation). Just as with tests we can create logical groups and assign names. A cool thing is a `bcompare` function. It takes a list of benchmarks, assumes that the first one is the reference one and reports the relative speed of other functions. In my code I use let to introduce `paramCyclicShift` wrapper around `dataShift` function. This allows to use the same input data for both benchmarks. Of course let is not necessary, but it allows to avoid code repetition. I also use `shiftSize` and `sigSize` functions from `BenchParam` module. These functions are defined as constant values and ensure that there is a single source of configuration. Using a separate module for this is a disputable choice – you may as well define `shiftSize` and `sigSize` in the same let binding as `paramCyclicShift`. The main function creates a random generator, uses bind to pass it to benchmarks function and finally runs the created suite. I use custom configuration created with `benchConfig` function to enable garbage collection between benchmarks2. I noticed that enabling GC is generally a good idea, because otherwise it will kick in during the benchmarking and distort the results.

The good thing about this approach is that benchmarks are concise, follow the structure of the project, magic numbers can be eliminated and it is easy to ensure that benchmarked functions get the same data when needed.

# Automating benchmarks using cabal

Guess what – cabal has a built in support for benchmarks! All we need to do is add one more entry to project’s cabal file:

```benchmark signal-bench   type:             exitcode-stdio-1.0   hs-source-dirs:   src, bench   main-is:          MainBenchmarkSuite.hs   build-depends:    base,                     criterion,                     random   ghc-options:      -Wall                     -O2```

Structure of this entry is identical to the one related to tests, so I will skip the discussion. To run the benchmarks issue these three commands:

```cabal configure --enable-benchmarks cabal build cabal bench```

This couldn’t be easier. Criterion will produce quite a detailed output on the console. As I already said, it’s all explained on Bryan’s blog, but some of the mentioned features are not present any more in criterion. Most importantly, you cannot display results in a window or save them to png file. Luckily, there is an even fancier feature instead. Run benchmarks like this:

`cabal bench --benchmark-options="-o report.html"`

And criterion will produce a nice html report with interactive graphs.

# A few more comments on benchmarking

All of this looks very easy and straightforward, but I actually spent about three days trying to figure out whether my code is benchmarked correctly. The problem is laziness. When I call my `dataShift` function the random data isn’t created until it is demanded by the cyclic shift function. This means that the time needed to actually create the random data would be incorporated in the benchmark. It turns out that criterion is smart enough not to do so. The first run of each benchmark forces the evaluation of lazily created data, but its run time is discarded and not included in the final results. The evaluated data is used in the subsequent runs of the benchmark. You can test this easily by doing something like this:

```dataShift gen n sigSize = unsafePerformIO \$ do     delayThread 1000000     return (n, take sigSize \$ randoms gen)```

This will cause a delay of 1 second each time `dataShift` function is evaluated. When you run the benchmark you will notice that criterion will estimate time needed to run the benchmarks to be over a hundred seconds (this information is not displayed when the estimated time is short), but it will finish much faster and there should be no difference in the performance of benchmarked functions. This will even work if you create your benchmark like this:

```bench "Left shift" \$ nf U.benchCyclicShiftLeft (U.dataShift gen P.shiftSize P.sigSize)```

Another problem I stumbled upon quite quickly was type constraint on the `nf` function: it requires that the return value belongs to `NFData`. This type class represents data that can be evaluated to normal form (i.e. can be fully evaluated). Most of standard Haskell data types belong to it, but this is not the case with data containers like Vector or Repa arrays. For such cases there is a `whnf` function that doesn’t have that constraint, but it only evaluates the result to weak head normal form (i.e. to the first data constructor or lambda). Luckily, for unboxed arrays containing primitive types weak head normal form is the same as the normal form so the problem with Vectors is solved.

I also quickly realised that benchmarking results are not as repeatable as I would like them to be. This is of course something I could expect in a multitasking operating system. I guess that ultimate solution to this would be booting into single user mode and closing all the background services like cron and network.

I am also experimenting with tuning the options of the runtime system, as they can also influence performance considerably. In a project I am currently working on I benchmark parallel code and got better performance results by setting the thread affinity option (`-qa` command line switch) and disabling parallel garbage collection (`-g1` switch). I also found ThreadScope to be extremely useful for inspecting events occurring during program runtime. It works great also for a single threaded application, as it shows when the garbage collection happens and that alone is very useful information.

# Summary

Up till now I never wrote repeatable benchmarks for my code and relied on a simple methods like measuring the wall time of a single run of my application. Criterion seems like a big step forward and, thanks to an instant feedback it provides, I already managed to speed up my parallel code by a factor of 2. I guess that most of all I like the way cabal allows to seamlessly integrate tests and benchmarks into my project – this speeds up development considerably.

Remember that all the source code used in this post is available as a project on github. There are also some additional comments in the code.

1. See my post on testing if you don’t know what shift I’m talking about []
2. This can also be enabled with a command line switch -g, but doing this in code ensures that it is always turned on. []

# Why?

Before I begin with presenting my approach to testing in Haskell it is important to say why did I even bother to spend so much time and effort trying to figure out test organization. The answer is very simple: none of the approaches I was able to find in Haskell community suited my needs. I wanted to have three things:

• separate tests from the actual source code so that release version of my software doesn’t depend on testing libraries,
• organize tests in such a way that they are easy to manage (e.g. I wanted to be able to quickly locate tests for a particular module),
• automate my tests.

I read some blog posts discussing how to test code in Haskell but none of demonstrated approaches met all the above criteria. For example I noticed that many people advocate putting tests in the same source file as the tested code, arguing that tests act as a specification and should be bundled with the code. Tests are part of specification, that is true. Nevertheless this is not the reason to make our final executable (or library) depend on testing libraries! That is why I had to find my own way of doing things.

# Project overview

I will create a very simple library with tests to demonstrate all the concepts. The library provides two basic signal processing operations – cyclic shifts. A left cyclic shift by one moves all elements in a signal (list in our case) to the left and the formerly first element becomes the last one. For example a cyclic shift to the left of [1,2,3,4] produces [2,3,4,1]. Right shift works in similar way, only it shifts elements in opposite direction. Shifts by one can be extended to shift signal by any natural number, e.g. shift right by 3 of a [1,2,3,4,5] signal yields [3,4,5,1,2]. Here’s the complete implementation of all these functions:

```module Signal.Utils where   cyclicOneShiftLeft :: (Num a) => [a] -> [a] cyclicOneShiftLeft (x:xs) = xs ++ [x]   cyclicOneShiftRight :: (Num a) => [a] -> [a] cyclicOneShiftRight xs = last xs : init xs   cyclicShiftLeft :: (Num a) => Int -> [a] -> [a] cyclicShiftLeft _ [] = [] cyclicShiftLeft n xs | n > 0 = cyclicShiftLeft (n - 1) . cyclicOneShiftLeft \$ xs | otherwise = xs   cyclicShiftRight :: (Num a) => Int -> [a] -> [a] cyclicShiftRight _ [] = [] cyclicShiftRight n xs | n > 0 = cyclicShiftRight (n - 1) . cyclicOneShiftRight \$ xs | otherwise = xs```

Note that `cyclicOneShiftLeft` and `cyclicOneShiftRight` are partial functions. They do not work for empty lists (the former one will cause a warning about non-exhaustive pattern match). On the other hand `cyclicShiftLeft` and `cyclicShiftRight` are total functions. They work for any list and any shift value. These two functions will thus constitute external API of our library.

The above code is placed into module `Signal.Utils`. This module – and generally all modules in a library – exports all its internal functions, thus breaking the encapsulation principle. The library contains one main module (`Signal`) that imports all modules of the library and exports only those functions that are meant to be the part of library’s public API. Thus Signal.hs file looks like this:

```module Signal ( cyclicShiftLeft , cyclicShiftRight ) where   import Signal.Utils```

Finally, the .cabal file for the library contains such entries:

```library hs-source-dirs: src exposed-modules: Signal other-modules: Signal.Utils build-depends: base ghc-options: -Wall```

This ensures that users will have access only to functions that we exposed via `Signal` module. Internal functions of our library will remain hidden. Why did we give up on module encapsulation within library? This will become clear in a moment, when we talk about automating tests.

# Overview of Haskell testing libraries

Haskell offers quite a few testing libraries. Among them there are two that seem to be in wide use and are in fact a standard – HUnit and QuickCheck. HUnit, as the name suggests, is a library providing xUnit capabilities in Haskell. The idea of using HUnit is to feed some data to functions that we are testing and compare the actual result returned by them to the result that we expect. If expected and actual results differ the test fails. Here’s a simple example:

```testCyclicOneShiftRightHU :: Test testCyclicOneShiftRightHU = "Cyclic one shift right" ~: [4,1,2,3] @=? cyclicOneShiftRight [1,2,3,4]```

This code creates an assertion that checks if the result of applying function `cyclicShiftLeft` to list [1,2,3,4] returns [2,3,4,1]. This assertion is given a name and assigned to a test. The test is run and if the assertion is true the test succeeds. Otherwise it fails. That’s all there is to it. If you used any testing framework that uses the xUnit approach then you already know what HUnit is all about. Note also, that we will NOT create tests in the form given above. Instead we will create tests that create `Assertion`:

```testCyclicOneShiftLeftAssertion :: Assertion testCyclicOneShiftLeftAssertion = [4,1,2,3] @=? cyclicOneShiftRight [1,2,3,4]```

This is required for integration with test-framework library, which I will discuss in a moment.

One thing to note is that HUnit lacks assertions that would allow to compare floating point numbers effectively. A problem with floating points in any language, not only Haskell, is that comparing them using equality sign my give unexpected results due to round-off errors. Every xUnit testing framework I’ve seen so far provided an “almost equal” assertion that allowed to compare floats with some given precision. Since there is no such assertion in HUnit I created it myself and placed in the `Test.Utils` module:

```class AEq a where (=~) :: a -> a -> Bool   instance AEq Double where x =~ y = abs ( x - y ) < (1.0e-8 :: Double)   (@=~?) :: (Show a, AEq a) => a -> a -> HU.Assertion (@=~?) expected actual = expected =~ actual HU.@? assertionMsg where assertionMsg = "Expected : " ++ show expected ++ "\nActual : " ++ show actual```

I created `AEq` (Almost Equal) type class defining “almost equal” operator and created instances for `Double`, lists and `Maybe` (see source code) and then created HUnit assertion that works just like other assertions. In our code this assertion is not really necessary, but I included it since I think it is very helpful if you want to test functions performing numerical computations.

Another approach to testing is offered by QuickCheck. Instead of creating test data a programmer defines properties that tested functions should always obey and QuickCheck library takes care of automatically generating test data. An example property is that if we take a signal of length n and shift it by n (either left or right) we should get the original signal as a result. Here’s how this property looks in QuickCheck:

```propLeftShiftIdentity :: [Double] -> Bool propLeftShiftIdentity xs = cyclicShiftLeft (length xs) xs == xs```

Another property that we can define is that composition of left shift by one and right shift by one is an identity function. In case of our `cyclicOneShiftLeft` and `cyclicOneShiftRight` functions this will not exactly be true, because these functions don’t work for empty lists. This means that empty lists must be excluded from the test:

```propCyclicOneShiftIdentity1 :: [Double] -> Property propCyclicOneShiftIdentity1 xs = not (null xs) ==> cyclicOneShiftLeft (cyclicOneShiftRight xs) == xs```

As you can see QuickCheck properties return either `Bool` or `Property`1. When these tests are run QuickCheck generates 100 random lists to see if the property holds for them. If for some input data the property fails then QuickCheck reports a failed test together with data that lead to failure.

We know how to write tests. Now it is time to run all of them in one coherent testing suite. For this we will use test-framework. This framework was designed to allow using HUnit and QuickCheck tests together in a uniform fashion. I think this is not the only such framework, but I think that it does its job very well so I did not feel the need to look for anything different. Here is main testing module responsible for running all tests:

```module Main ( main ) where   import Test.Framework import Test.Framework.Providers.QuickCheck2 import Test.Framework.Providers.HUnit   import Signal.UtilsTest   main :: IO () main = defaultMain tests   tests :: [Test] tests = [ testGroup "Signal shifts" [ testGroup "Migrated from HUnit" \$ hUnitTestToTests testCyclicOneShiftRightHU , testProperty "L/R one shift composition" propCyclicOneShiftIdentity1 , testProperty "Left shift identity" propLeftShiftIdentity ] ]```

The `tests` function is the most important one. It groups tests into groups and assigns names to both groups and individual tests. These names will be useful for locating tests that failed. Notice the test group named “Migrated from HUnit”. As the name suggests these are HUnit tests that were adjusted to work with test-framework, which means that if you already have HUnit tests you can easily migrate to test-framework. Nevertheless test-framework expects an `Assertion` by default and that is why we created such test earlier. Notice also that in the project on github there are more tests than shown above. These are however very similar to the functions already shown.

# Automating tests using cabal

It is time to automate our tests so that they can be easily rerun. For that we will use `cabal`, but before we start we need to discuss how to organize our tests and place them within project’s directories.

In Java it is a standard practice to put source and tests into two separate directories located in the project root. These directories have identical internal structure. This is due to two facts. First, Java names packages according to directory in which they are located2, so files `src/``SomePackage/``SomeSubpackage/``SomeClass.java` and `tests/``SomePackage/``SomeSubpackage/``SomeClassTest.java` are considered to be in the same package. The second reason is that classes located in the same package can have access to their protected fields, which allows tests to access internals of a class. This approach breaks object encapsulation within a single package, but this is generally acceptable and not a real problem.

I decided to follow similar approach in Haskell. In the project directory I have `src` and `tests` directories that allow me to separate application/library code from tests. Both directories have the same internal structure. Files containing tests for a module are named like that module but with “Test” appended before file extension. In my simple project this is demonstrated by file `src/``Signal/``Utils.hs` and `tests/``Signal/``UtilsTest.hs`. This way it is easy to locate tests for a particular module. This mimics approach used in Java, but there is one important difference. In Java tests organized in such a way have access to unexposed internals of a class, but this does not happen in Haskell. If a module does not expose its internal functions there is no way for tests to reach them. I know two solutions to this problem. First is the one I used – export everything from the modules. It was suggested to me by Matthew West on Haskell-Cafe. The second one is using CPP language extension, which will cause source files to be processed by C preprocessor. To use this method our `Signal.Utils` would have to be modified like this:

```{-# LANGUAGE CPP #-} module Signal.Utils ( cyclicShiftLeft , cyclicShiftRight #ifdef TEST , cyclicOneShiftLeft , cyclicOneShiftRight #endif ) where ...```

We also have to add `cpp-options: -DTEST` entry in test section of project’s .cabal file (this will be explained in next paragraph). It might also be convenient to create `.ghci` file in the project directory containing `:set -DTEST -isrc -itest`, which will enable `TEST` flag within ghci. This solution was pointed to me by Simon Hengel, also on Haskell-Cafe. I didn’t use it because it doesn’t look very well and feels more like a hack than a real solution. Nevertheless this is also a way of doing things and it may better suit your needs than the one I chose.

With all this knowledge we can finally use `cabal`‘s support for testing3. For this we must add another section to .cabal file of our project:

```test-suite signal-tests type: exitcode-stdio-1.0 hs-source-dirs: tests, src main-is: MainTestSuite.hs build-depends: base, HUnit, QuickCheck, test-framework, test-framework-hunit, test-framework-quickcheck2```

Let’s walk through this configuration and see what it does. The `type` field defines testing interface used by tests. Theoretically there are two accepted values: `exitcode-stdio-1.0` and `detailed-1.0`. First one means that test executable works by displaying test results on the screen and indicates possible failure by non-zero exit code. Second option, `detailed-1.0`, is meant for test suites that export some special symbols that allow test results to be intercepted and further processed by Cabal. Sadly, while this second options seems very interesting, it is not fully implemented yet and there is no way to make use of it. Thus, for the time being, we are left with `exitcode-stdio-1.0`. Rest of the entries should be self-explanatory. The `hs-source-dirs` option points to source directories. Note that it includes both the `src` and `tests` directories. Next entry defines a file containing `main :: IO ()`. Finally there are dependencies on external libraries.

To run tests you need to perform:

```cabal configure --enable-tests cabal build cabal test```

This will build both the library and testing binary and run the tests. Here’s how the test output looks like:

```[killy@xerxes : ~] cabal test Running 1 test suites... Test suite wavelet-hs-test: RUNNING... Test suite wavelet-hs-test: PASS Test suite logged to: dist/test/haskell-testing-stub-1.0.0-signal-tests.log 1 of 1 test suites (1 of 1 test cases) passed.```

The detailed result is logged to a file. If any of the tests fails then whole output from the suite is displayed on the screen (try it by supplying incorrect expected value in a HUnit test).

Cabal has also support for testing code coverage with HPC. To use it run `cabal configure --enable-tests --enable-library-coverage`. This should enable HPC when running tests, automatically exclude testing code from the coverage summaries and generate HTML files. Sadly, I’ve been affected by some bug which results in HTML files not being generated and testing code not being excluded from the report. I reported this to the author so I hope it will get fixed some day.

# Enhancing HUnit tests with data providers

In the beginning of my post I mentioned that TestNG library for Java offered better capabilities than JUnit. To me one of key features of TestNG were DataProviders. They allowed user to define a parametrized test function that contained test logic with assertions. For each such parametrized test user had to supply a data provider, that is a function that returned many sets of testing data that could be passed to this single test. This allowed to neatly separate test logic from test data. TestNG of course treated such tests as many different tests and it was possible for one test set to fail and others to pass. This was a big step forward, because earlier solutions to such problems lead either to duplication of test logic (violation of DRY) or locked multiple test data within one test, which caused whole test to fail on first data set that caused failure.

There are no built-in data providers in HUnit but we can easily add them. In `Test.Utils` module I created a function for this:

```import qualified Test.Framework as TF import qualified Test.Framework.Providers.HUnit as TFH import qualified Test.HUnit as HU   testWithProvider :: String -> (a -> HU.Assertion) -> [a] -> TF.Test testWithProvider testGroupName testFunction = TF.testGroup testGroupName . map createTest . zipWith assignName [1::Int ..] where createTest (name, dataSet) = TFH.testCase name \$ testFunction dataSet assignName setNumber dataSet = ("Data set " ++ show setNumber, dataSet)```

This function is very similar to other functions defined within test-framework and thus should be considered more an enhancement to test-framework than HUnit. The `testWithProvider` function takes name for a group of tests, a test function, a list of test data (that’s the data provider) and returns a Test. Note that last parameter is omitted due to currying. Tests within created group are named “Dataset n”, where n is the number. This allows to easily locate failing test data set. Now we can write HUnit tests like this:

```testCyclicShiftLeft :: (Int, [Double], [Double]) -> Assertion testCyclicShiftLeft (n, xs, expected) = expected @=~? cyclicShiftLeft n xs   dataCyclicShiftLeft :: [(Int, [Double], [Double])] dataCyclicShiftLeft = [ ( 0, [], [] ) , ( 2, [1,2,3,4], [3,4,1,2] ) , ( 4, [1,2], [1,2] ) ]```

Notice that test data are passed as tuples. Finally, we can add these tests to a suite like this:

```tests :: [Test] tests = [ testGroup "Signal shifts" [ .... , testWithProvider "Cyclic left shift" testCyclicShiftLeft dataCyclicShiftLeft ] ]```

One might argue that we really don’t need data providers, since there is QuickCheck that generates test data automatically and there is no need for programmer to do it. That is a good point, but I think that data provider capability comes in handy when we want to be sure that border cases of an algorithm are properly tested.

# Summary

When I started with code testing in Haskell I had three goals in my mind: separation of tests from the code, organizing them in a manageable and flexible way and finally automating tests. The approach I demonstrated meats all these goals and is based on my experience in other programming languages. So far it works very well for me, but I dare not argue that this is the only way of doing things, not even to say that it’s the best one. As always I’m open to discussion and suggestions for improvements.

LAST MINUTE NEWS: As I was finishing writing of this post, Roman Cheplyaka announced on Haskell-Cafe release of his test-framework-golden library. This library is meant for “Golden testing” which works by writing test output to a file and comparing it with some expected (“golden”) file. I never used this approach, but this library also integrates with test-framework so it could be used in my sample project without problems. That’s the kind of test extensibility I like!

1. You can think of `Property` as something that can be evaluated to true or false []
2. Haskell uses the same approach and if I remember correctly it was adapted from Java. []
3. When I say cabal I really mean cabal-install, a command-line tool used for installing Haskell packages, not the Cabal library. The confusion arises because cabal-install executable is named cabal. []

New version of Haskell Platform has been released just a few days ago. It ships with the latest stable version of GHC (7.4.1). Here you can find release notes describing changes made to the compiler. The list is long and I haven’t read all of it but among the most important changes are:

• the possibility of entering any top-level declarations in GHCi;
• `Num` type class no longer has `Eq` and `Show` as its superclass;
• Data Parallel Haskell has been improved

Three months ago I wrote about installing Haskell Platform on openSUSE. I recommended that GHC be installed from precompiled binaries and the platform be installed from sources, instead of using packages from repository. Now that the new version is out this post needs an addendum about updating the platform. If the Platform was installed from the repo using a package manager everything would be simple1 . An update of packages would be enough, providing that they were updated by the maintainers of the repository (at the moment packages for openSUSE still contain older version of the platform). With manual installation this process is a bit more difficult.

First step is to remove the old installation. I figured out that it would be good to first remove all the packages installed with cabal and then remove GHC. There’s a problem though. Cabal doesn’t have uninstallation feature. This means that each package has to be manually unregistered using ghc-pkg and then all the files belonging to that package have to be removed. After spending about 30 minutes trying to figure out why I can remove one package using

`ghc-pkg list | grep -v "^/" | sed -e "s/[ {}]//g" | head -n 1 | xargs ghc-pkg --force unregister`

but can’t remove all the packages using

`ghc-pkg list | grep -v "^/" | sed -e "s/[ {}]//g" | xargs ghc-pkg --force unregister`

I gave up and decided to simply remove all of GHC files. This wasn’t easy since they were scattered all over `/usr/local/``{bin,lib,share,doc}`, but in the end I managed to remove everything.

I noticed that there is a lot of discussion in the community whether packages installed with cabal should go to `/usr/local` or to user’s home directory. Surprisingly to me it seems that most people follow the home directory approach. This approach doesn’t suit me completely. I have a separate home partition used only to store settings and email – which I’ve been told is a “complex partition setup” :-o  – and all the software is kept on `/` partition, with all programs not installed from the packages being placed in `/usr/local` (BTW. it would be nice to have a separate partition for that one directory). This approach certainly wouldn’t work in a multi-user environment and I guess it could be problematic if I developed many projects, each with different dependencies (cabal-dev aims to solve that problem). As a side note, it seems to me that with hundreds of packages available from Hackage and a management tool with rather limited capabilities (cabal can’t even automatically update installed packages!) Haskell community is in a place where Linux community was over ten years ago. The dependency hell, now gone from Linux, looms over Haskell world and if cabal won’t be enhanced I see this as a very huge problem hindering large Haskell projects. It seems that Yesod team is particularly concerned about this – see here and here.

Anyway, I decided to place my new installation of the platform in `/usr/local`, but this time I was smarter by placing everything in a dedicated directory. Both GHC and the platform can be installed within a specific path. This is done by passing `--prefix=/some/path` to configure script. The only trick is that after installation of the platform `~/.cabal/config` file in the /root directory has to be edited to point to the directory in which installed packages are to be placed. Of course, you have to also add the /your/haskell/platform/directory/bin to the path, so that GHC executables are visible. Now, when the new platform comes out I can simply remove the directory with the platform and install the new version. I can also easily control the disk space used by the installation. This tends to be rather huge. GHC, Platform and packages required by EclipseFP use 1,8GB of disk space. I also noticed that binaries for programs written in Haskell are rather large. The biggest one I have, buildwrapper, is over 50MB. This is caused by the inclusion of RTS (Run Time System) into the binary but I wonder what else gets included (or is the RTS that large?).

1. Read this post, if you’re wondering why I decided not to use the package repository. []

Staypressed theme by Themocracy