Haskell file reading pitfall

I decided that I should start a small project in Haskell in order to practice the language. I hope that I’ll be able to make the first public release somewhere in a reasonable future and boast about it on the blog, but meanwhile here’s a report from the trenches. Two days ago, while I was developing the code, I fell into a Haskell newbie pitfall. I spent nearly three hours trying to find a solution to a relatively simple problem. Now that I’ve found it it’s time to share it.

The problem

My application reads the contents of a text file, processes it and writes the processed contents to the same file it was initially read from. I created two simple functions to act as a back-end for my simple text file database:

loadTextFileDataBase :: String -> IO [Task]
loadTextFileDataBase databaseFile =
    withFile databaseFile ReadMode (\handle -> do
        contents <- hGetContents handle
        return (read contents))
 
storeTextFileDataBase :: String -> [Task] -> IO ()
storeTextFileDataBase databaseFile tasks =
    writeFile databaseFile (show tasks)

That’s pretty simple I/O in Haskell based on LYAH. The withFile function opens the file in ReadMode, passes the handle to that file to a lambda and closes the file after the lambda ends. Inside my lambda the file is read lazily as a String, which means that the file is read only when the data from it is requested. The String is parsed to become a [Task] type (read function) and the result of parsing is wrapped into an IO action using the return function. Writing to a file is a simple one-liner that hopefully requires no detailed explanation. Reading and writing of my own data type Task is possible because it derives Read and Show typeclasses. It’s a very nice way of data serialization in Haskell. I’m not sure if it’s very efficient, though it’s easy to use. I compiled the code and tried to run it only to get:

Prelude.read: no parse

Houston, we have a problem! I had a feeling that this has something to do with the lazy reading of the file…

A solution

I started debugging right away. GHCi has a debugger, but I didn’t bother to learn it yet, so my task was a bit more complicated than it could’ve been. Debugging in Haskell using trial-and-error method isn’t that easy. For example, you cannot display text on the console just like that – you have to bother with I/O actions, type signatures and so on, but it can be done.

Anyway, I started to break my program into the simplest possible components to make sure that they work. I commented out the part of the program responsible for writing to a file, so now my program was supposed only to read data from a file, parse and display it. The error persisted. I replaced the return function with putStrLn to display my list of Tasks on the console so I could be sure that the data is read from the file. Then I used GHCi to make sure that this string can be parsed correctly. In GHCi everything worked.

I was getting more and more desperate. I’ve already spent over two hours on that. I was looking at the documentation of System.IO module where I accidentally saw readIO function. Description of that functions says: “The readIO function is similar to read except that it signals parse failure to the IO monad instead of terminating the program.” I wasn’t exactly sure what it means, but the type signature of this function was Read a => String -> IO a and, based on that signature, I saw that I can replace return (read contents) with readIO contents:

loadTextFileDataBase :: String -> IO [Task]
loadTextFileDataBase databaseFile =
    withFile databaseFile ReadMode (\handle -> do
        contents <- hGetContents handle
        readIO contents)

The types were OK, so this expression should have some reasonable output. And it worked! The no parse error was gone.

An explanation

Now it’s time for the most important part: understanding what actually happened and why the solution works. As always, the great community at #haskell IRC channel provided invaluable support. It turns out that mu suspicion about lazy reading of the file was entirely correct. The file was opened, no reading was done since hGetContents is lazy, and it was closed. At the time when contents was requested, the file was already closed. This caused the error. The definition of readIO looks like this:

readIO :: Read a => String -> IO a
readIO s =  case (do { (x,t) <- reads s ;
                  ("","") <- lex t ;
                  return x } ) of
                 [x]  -> return x
                 []  -> ioError (userError "Prelude.readIO: no parse")
                  _  -> ioError (userError "Prelude.readIO: ambiguous parse")

This basically forces the evaluation of condition in case-of expression, which effectively leads to reading the file contents1. During the discussion I was also suggested to read and parse the file using this function:

loadTextFileDataBase = fmap read . readFile

This uses the Functor typeclass and I don’t fully understand yet why this works the way it works. Besides, I wouldn’t be able to use this function in my program because I read from and write to the same file. This means that I need more explicit control of fileOpen and hClose.

Conclusions

There are a few lessons learned. First of all, be very careful with hGetContents. I think that it would be wise to avoid it completely if possible. Second, Haskell type’s system is very powerful. Recall that I found the solution thanks to the type signature that fit in my code. Third, learn the debugger. Fourth, learn the more advanced features of Haskell, like functors, monads and so on. They allow to make the program a whole lot shorter. Five, learn the API. I was lucky to stumble upon the readIO function, but I could as well spend two more hours on debugging.

That was tiring. I think we all deserved a good song.

  1. Added 19/04/2012: I think this sentence needs more explanation. Case-of expression is lazy. The reason why it gets evaluated is the fact that we are matching the condition against three different patterns: [x],[] and wildcard _. The program needs to know which pattern should be matched and therefore evaluates the value of the condition, thus forcing the reading of the file.

    []

Yet Another Quine In Haskell

A few years ago I was asked by a student if I heard about a task to write a program that outputs its own source code. Back then I didn’t know about it and sort of ignored that question, but later a friend of mine told me that such a program is called a quine and that he had already solved that in couple of languages. Today I used my beginner’s knowledge of Haskell to write my first quine. If you haven’t written any quine by yourself yet then it may be better that you don’t read this post – knowing the solution will spoil the fun.

I started of with the general idea of defining a string that will contain the text of the program and displaying that string. This is a strange kind of recursion that I haven’t met before. The idea itself seems rather simple, but it took me about 40 minutes of hacking to get it right. I spent much of that time struggling to get the quote signs display correctly. Anyway, here it is:

module Main where --"
main = do putStr (quine ++ "\nquine = \"" ++ take 20 quine ++ "\\")
          print (drop 21 quine)
quine = "module Main where --\"\nmain = do putStr (quine ++ \"\\nquine = \\\"\" ++ take 20 quine ++ \"\\\\\")\n          print (drop 21 quine)"

This requires that there is a new line at the and of source file. It’s rather crappy but I’m still happy with it. I googled around for some other quines in Haskell and, as expected, found more elegant solutions:

(\a -> a ++ show a) "(\\a -> a ++ show a) "

As Shin-Cheng Mu notices this is based on the resemblance to lambda expression (\x -> x x) (\x -> x x), which reduces to itself. It also resembles the Y-combinator (on which I hope to write soon). There is also a shorter version of the above.

ap (++) show "ap (++) show "

It uses the ap function operating on a monad and frankly speaking I don’t understand it. The two above solutions are not stand-alone programs and work only in ghci, as opposed to my solution. It is not difficult however to adapt the idea to a stand-alone application:

main = putStr (quine q)
quine s = s ++ show s
q = "main = putStr (quine q)\nquine s = s ++ show s\nq = "

The solution is by Jon Fairbairn and I found it here. The same using a monad (found here):

main = (putStr . ap (++) show) "main = (putStr . ap (++) show) "

Only this will not compile since ap is not in scope and would have to be imported. I don’t know if this should count as a valid solution.

To me the conclusion is that I’m still thinking in an imperative style. Quine solutions seem to be a lot easier in languages supporting higher order functions, yet I solved it in a way typical to languages like Java or C.

Saturday Web Overview

During last week I’ve encountered some interesting links on the web. Here’s an overview.

  • Although I’ve read 10 chapters of LYAH and four chapters of RWH, I find learning Haskell difficult. Not that I don’t understand it. The problem is that I have a really hard time to create a fully working application. I was somewhat relieved to find out that I’m not the only one and that it’s typical for Haskell newbies.
  • In the 12th issue of The Monad Reader Neil Mitchell published an overview of Hoogle. I find this paper very helpful, since Neil gives also some useful tips about designing a Haskell project. I’ve been playing a bit with some Haskell code recently, hoping that perhaps it could be turned into something actually usable.
  • Speaking of which, Haskell wiki also provides a great guide how to start a new Haskell project. Also very useful, although it assumes usage of darcs, while I use git. That’s not a big problem, though.
  • There’s a book called The Architecture of Open Source Applications. It looks like there’s a lot of interesting reading. Volume 1 contains a chapter about LLVM and about Eclipse. Volume 2 has not yet been released, but there is a draft of chapter about the architecture of GHC.
  • Just a few months ago I didn’t know lots of programming concepts: lambdas, folds, currying, partial function applications, the Y-combinator to name a few. There are some more to learn: monads, which I’ll tackle soon, and continuations. I have a plan to read The Seasoned Schemer one day (that’s a sequel to The Little Schemer) to learn about continuations but in the meantime I’ve found a continuations tutorial on Scheme Wiki.
  • Constructing minimal PNG encoder in Haskell seems extremely easy.
  • I dug out an interesting thread on Stack Overflow. This time it’s about memoization in Haskell. I wasn’t yet able to wrap my head around edwardk’s solution but it looks impressive.

Staypressed theme by Themocracy