Some impressions on Stanford’s Automata and Compilers online courses

Over two months ago Stanford opened first edition of online courses on Automata and Compilers (see this post). I’ve participated in both. These courses cover basics of each discipline and assume no previous knowledge of the subject. Now the courses are over and it’s time to share some impressions.

Automata course was six weeks long. It consisted of lectures and so-called homeworks, which are quizzes that should be taken every week. There were some optional programming assignments, but they didn’t affect the final score. The topics covered were: finite automata (deterministic, non-deterministic), regular expressions, context-free grammars and languages, Turing machines, decidability and finally P and NP problems. The course closely follows the book “Introduction to Automata Theory, Languages, and Computation” by John Hopcroft, Rajeev Motwani and Jeffrey Ullman. This shouldn’t come as big surprise, since the course is lectured by Jeffrey Ullman, one of the authors. For the first three weeks this course very closely followed Compilers, as both discussed automata, regular languages and context-free grammars. I initially enjoyed Automata course a lot, despite lectures being a bit boring. I just read the book and then skimmed through the lecture slides. Sadly the last two weeks of the course were extremely hard to follow. Lectures were rushed and impossible for me to comprehend. Reading the book didn’t help a lot, so I didn’t understand much of decidability, P and NP problems. That said, I’m still very happy with this course, since automata and context-free languages were top priority for me. Keep also in mind that this is very subjective opinion of mine. Reading posts in the course forums I saw that there were really mixed opinions. Some people complained about the lectures being hard to follow, but others praised them for being interesting.

Compilers course took 10 weeks and was much more demanding. It covered theory of lexical analysis using automata, top-down parsing using recursive descent algorithm, bottom-up parsing using LL(1), SLR, and LR algorithms, semantic analysis, operational semantics, code generation, optimizations and garbage collection. Believe me, that’s a lot of theory! In contrast to Automata course, it was all presented in very accessible and interesting way. Each week contained up to 3 hours of lectures and a theoretical quiz covering presented material. There were also four programming assignments which required to implement four different parts of the compiler: lexical analyser (a.k.a. lexer), parser, semantic analyser and code generator. First two exercises relied on existing tools for generating lexers and parsers. The language implemented was COOL – Classroom Object-Oriented Language. I consider programming assignments to be demanding, especially the last two. Third took me about 25 hours to complete. I didn’t manage to finish the fourth one, but I’m still satisfied, since I’m interested mostly in the front-end parts of the compiler (type checking especially). The best thing about Compilers course is that it was motivating. It encouraged me to finally read some chapters from the Dragon Book and “Engineering a Compiler”. Which reminds me that I promised to write more about compiler books.

The Dragon Book turned out to be surprisingly interesting. It is verbose and very dry, so it’s not suitable for reading from cover to cover. Nevertheless, if you have a general idea of what is going on in the compiler and need details on some particular algorithm this is an invaluable resource. In short: bad for introductory text, perfect for reference.

“Engineering a Compiler, 2nd edition” is different. It is very accessible comparing to the Dragon Book. In every chapter the reader is given a clear outline of what will be done and rationale why some particular theory is needed. This greatly helps to understand what’s going on in each phase of compilation. At the end of each chapter there’s an overview of literature on a given topic, so it’s also a great starting point for deeper research into each subject. Thus I think that book makes a very good introductory text. I wholeheartedly recommend it.

I haven’t read Appel’s “Modern Compiler Implementation in C” so I really can’t tell much about it. Hopefully I will find some time during the summer holiday to read the chapter about Hindley-Milner type inference algorithm.

Coursera online courses were an awesome experience. Challenging and time consuming, but also very motivating and rewarding. I’m waiting for more courses in the future, especially for the new edition of the Game Theory course.

A History of Haskell

Recently I read a paper “A History of Haskell: Being Lazy With Class” by Paul Hudak, John Hughes, Simon Peyton Jones and Philip Wadler. Now it’s time to share some impressions.

First of all this is a very long read. It took me a couple of days to get through 55 pages of dense, two-column text. Well, if you exclude the references section length reduces to 45 pages, but that’s still quite a lot. The title can be slightly misleading. This paper is about history but it is also practically about everything related to Haskell. It describes creation of the Haskell Committee, initial meetings and discussions about language goals, features and name. But that’s only a small part of this paper. The rest is about principles, features, contributions, tools and community. Of course all of this is placed in a historical context and I must say it is very insightful to know the motivation behind some particular language features. I consider hours spent on reading this paper a very good investment. “A History of Haskell” is a great summary of Haskell’s development process as a whole. As a beginner I learned a lot of things about the language that I wasn’t aware of. Wide overview of literature presented in the paper gives a general idea on what is researched in the Haskell world, which gives me some ideas where to go with my own research. This paper is definitely a must-read for people diving into Haskell!

Upgrading Haskell Platform on openSUSE

New version of Haskell Platform has been released just a few days ago. It ships with the latest stable version of GHC (7.4.1). Here you can find release notes describing changes made to the compiler. The list is long and I haven’t read all of it but among the most important changes are:

  • the possibility of entering any top-level declarations in GHCi;
  • Num type class no longer has Eq and Show as its superclass;
  • Data Parallel Haskell has been improved

Three months ago I wrote about installing Haskell Platform on openSUSE. I recommended that GHC be installed from precompiled binaries and the platform be installed from sources, instead of using packages from repository. Now that the new version is out this post needs an addendum about updating the platform. If the Platform was installed from the repo using a package manager everything would be simple1 . An update of packages would be enough, providing that they were updated by the maintainers of the repository (at the moment packages for openSUSE still contain older version of the platform). With manual installation this process is a bit more difficult.

First step is to remove the old installation. I figured out that it would be good to first remove all the packages installed with cabal and then remove GHC. There’s a problem though. Cabal doesn’t have uninstallation feature. This means that each package has to be manually unregistered using ghc-pkg and then all the files belonging to that package have to be removed. After spending about 30 minutes trying to figure out why I can remove one package using

ghc-pkg list | grep -v "^/" | sed -e "s/[ {}]//g" | head -n 1 | xargs ghc-pkg --force unregister

but can’t remove all the packages using

ghc-pkg list | grep -v "^/" | sed -e "s/[ {}]//g" | xargs ghc-pkg --force unregister

I gave up and decided to simply remove all of GHC files. This wasn’t easy since they were scattered all over /usr/local/{bin,lib,share,doc}, but in the end I managed to remove everything.

I noticed that there is a lot of discussion in the community whether packages installed with cabal should go to /usr/local or to user’s home directory. Surprisingly to me it seems that most people follow the home directory approach. This approach doesn’t suit me completely. I have a separate home partition used only to store settings and email – which I’ve been told is a “complex partition setup” :-o  – and all the software is kept on / partition, with all programs not installed from the packages being placed in /usr/local (BTW. it would be nice to have a separate partition for that one directory). This approach certainly wouldn’t work in a multi-user environment and I guess it could be problematic if I developed many projects, each with different dependencies (cabal-dev aims to solve that problem). As a side note, it seems to me that with hundreds of packages available from Hackage and a management tool with rather limited capabilities (cabal can’t even automatically update installed packages!) Haskell community is in a place where Linux community was over ten years ago. The dependency hell, now gone from Linux, looms over Haskell world and if cabal won’t be enhanced I see this as a very huge problem hindering large Haskell projects. It seems that Yesod team is particularly concerned about this – see here and here.

Anyway, I decided to place my new installation of the platform in /usr/local, but this time I was smarter by placing everything in a dedicated directory. Both GHC and the platform can be installed within a specific path. This is done by passing --prefix=/some/path to configure script. The only trick is that after installation of the platform ~/.cabal/config file in the /root directory has to be edited to point to the directory in which installed packages are to be placed. Of course, you have to also add the /your/haskell/platform/directory/bin to the path, so that GHC executables are visible. Now, when the new platform comes out I can simply remove the directory with the platform and install the new version. I can also easily control the disk space used by the installation. This tends to be rather huge. GHC, Platform and packages required by EclipseFP use 1,8GB of disk space. I also noticed that binaries for programs written in Haskell are rather large. The biggest one I have, buildwrapper, is over 50MB. This is caused by the inclusion of RTS (Run Time System) into the binary but I wonder what else gets included (or is the RTS that large?).

  1. Read this post, if you’re wondering why I decided not to use the package repository. []

Staypressed theme by Themocracy