Posts tagged: automata

MSR internship and some retrospection

I feel I can finally write about: I got accepted for a three-month internship at Microsoft Research Cambridge! This means I will be developing GHC and, hopefully, doing some serious research on the subject of functional programming and compiler implementation. My internship starts tomorrow, on 1st July. I’m not yet 100% certain about the exact topic of my research, so I’ll refrain from going into any kind of technical details for now and I will focus on my personal experience with functional programming. I feel this is really a good moment to summarize the past 1,5 year. I learned about functional programming at the very beginning of 2012 and since then I progressed from knowing completely nothing to being in Cambridge – something I would have not imagined 18 months ago.

Somewhere around July 2011 I finished writing my PhD. I had yet to deal with many formalities – which in the end took 8 months – but the most important part of my work was done and I only continued research on a few minor subjects that I ran into while writing a PhD. Somewhere in October I decided I need a break from all my current research topic – I finally wanted some time to pursue topics that interested me all along and for which I never had time. Compiler construction and theory of automata were two main topics I had in mind. That was the plan, but it wasn’t meant to work out, at least not yet. Somewhere around December 2012 I stumbled upon a book “Seven languages in seven weeks”, which was my first contact with functional programming. I didn’t follow the book exactly. I read chapters about Ruby, Io, Prolog (so much fun!), Scala and Erlang, but instead of reading chapter about Clojure I went for Scheme. I read R5RS language specification and The Little Schemer and when I reached the chapter about Haskell I decided to read Learn You A Haskell instead. At that point I already knew that Haskell is the functional programming language and I think that this was the moment I started having some serious plans about functional programming. But at the same time I was figuring out how to learn about compilers. It was April when Stanford University announced their two online courses on Compilers and Automata – these were really godsend. The Compilers course ended in late June. This concludes my first six months of contact with FP and I think that these months were extremely intense. I learned theoretical and practical foundations of compilers, a new programming paradigm and some new languages designed in that paradigm. I also started reading research papers on functional programming, with a focus on implementation of GHC. At that point I didn’t even try to work on the source code, but I was trying to understand how the compiler is designed.

The next six months, from July to December, were not as fruitful. I picked up interest in doing data-parallel computations in Haskell, as this seemed to be an active topic of research and also related to my PhD work. I made a failed attempt of an efficient parallel implementation of a wavelet transform. Although I wasn’t successful, my time was not wasted: I learned how to write, test and benchmark libraries in Haskell and also read a lot of papers on FP. I also got in touch with Ben Lippmeier, who pointed me to one problem with GHC he needed fixed. This was somewhere in January 2013. I already started reading the source code of GHC in December, but now I finally had a particular problem to solve. It was the time to start working on GHC. That is mostly what I did during the last six months, although I also managed to spend some time on theory (more papers and a book on combinatory logic).

As for the internship, I decided to apply for it in February. I polished my CV and cover letter (many thanks go to my friend Marek for his help) and sent my application at the beginning of March. After an interview with Geoffrey Mainland and Simon Peyton Jones I got acceptance notification at the beginning of April. And here I am in Cambridge, over 1300km from home, waiting for my first day at Microsoft Research.

Some impressions on Stanford’s Automata and Compilers online courses

Over two months ago Stanford opened first edition of online courses on Automata and Compilers (see this post). I’ve participated in both. These courses cover basics of each discipline and assume no previous knowledge of the subject. Now the courses are over and it’s time to share some impressions.

Automata course was six weeks long. It consisted of lectures and so-called homeworks, which are quizzes that should be taken every week. There were some optional programming assignments, but they didn’t affect the final score. The topics covered were: finite automata (deterministic, non-deterministic), regular expressions, context-free grammars and languages, Turing machines, decidability and finally P and NP problems. The course closely follows the book “Introduction to Automata Theory, Languages, and Computation” by John Hopcroft, Rajeev Motwani and Jeffrey Ullman. This shouldn’t come as big surprise, since the course is lectured by Jeffrey Ullman, one of the authors. For the first three weeks this course very closely followed Compilers, as both discussed automata, regular languages and context-free grammars. I initially enjoyed Automata course a lot, despite lectures being a bit boring. I just read the book and then skimmed through the lecture slides. Sadly the last two weeks of the course were extremely hard to follow. Lectures were rushed and impossible for me to comprehend. Reading the book didn’t help a lot, so I didn’t understand much of decidability, P and NP problems. That said, I’m still very happy with this course, since automata and context-free languages were top priority for me. Keep also in mind that this is very subjective opinion of mine. Reading posts in the course forums I saw that there were really mixed opinions. Some people complained about the lectures being hard to follow, but others praised them for being interesting.

Compilers course took 10 weeks and was much more demanding. It covered theory of lexical analysis using automata, top-down parsing using recursive descent algorithm, bottom-up parsing using LL(1), SLR, and LR algorithms, semantic analysis, operational semantics, code generation, optimizations and garbage collection. Believe me, that’s a lot of theory! In contrast to Automata course, it was all presented in very accessible and interesting way. Each week contained up to 3 hours of lectures and a theoretical quiz covering presented material. There were also four programming assignments which required to implement four different parts of the compiler: lexical analyser (a.k.a. lexer), parser, semantic analyser and code generator. First two exercises relied on existing tools for generating lexers and parsers. The language implemented was COOL – Classroom Object-Oriented Language. I consider programming assignments to be demanding, especially the last two. Third took me about 25 hours to complete. I didn’t manage to finish the fourth one, but I’m still satisfied, since I’m interested mostly in the front-end parts of the compiler (type checking especially). The best thing about Compilers course is that it was motivating. It encouraged me to finally read some chapters from the Dragon Book and “Engineering a Compiler”. Which reminds me that I promised to write more about compiler books.

The Dragon Book turned out to be surprisingly interesting. It is verbose and very dry, so it’s not suitable for reading from cover to cover. Nevertheless, if you have a general idea of what is going on in the compiler and need details on some particular algorithm this is an invaluable resource. In short: bad for introductory text, perfect for reference.

“Engineering a Compiler, 2nd edition” is different. It is very accessible comparing to the Dragon Book. In every chapter the reader is given a clear outline of what will be done and rationale why some particular theory is needed. This greatly helps to understand what’s going on in each phase of compilation. At the end of each chapter there’s an overview of literature on a given topic, so it’s also a great starting point for deeper research into each subject. Thus I think that book makes a very good introductory text. I wholeheartedly recommend it.

I haven’t read Appel’s “Modern Compiler Implementation in C” so I really can’t tell much about it. Hopefully I will find some time during the summer holiday to read the chapter about Hindley-Milner type inference algorithm.

Coursera online courses were an awesome experience. Challenging and time consuming, but also very motivating and rewarding. I’m waiting for more courses in the future, especially for the new edition of the Game Theory course.

Some books about compilers

Online Compilers course at Stanford University began last Monday. Although it should be self-containing I still prefer to have some books at hand, especially that I’ll be leaving for 10 days and I want to self-study material for the next week. On Monday I’ve paid a visit to my university’s library. Here’s a quick glimpse at some literature that I’ve found.

First of all, there’s the Dragon Book, officially known as “Compilers: Principles, Techniques, and Tools”, by Aho, Lam, Sethi, and Ullman. This is a classic originally published in 1986. There’s nothing to worry though – there was no revolution in the field of compiler construction, so what was true in 1986 is mostly still valid now. The book was re-released in 2006 with some updates about garbage collection and code-optimization (OK, I lied, there was some progress in a few aspects of compiler engineering). I have a Polish edition of the first edition. In fact I borrowed it long before the course had started hoping to read it on my own. The book is very detailed and extremely verbose (750 pages total) which is a bit discouraging to me. I’ve read only a couple of sections so I don’t have a clear opinion about it yet.

The second book recommended as a textbook for the Compilers course is “Engineering a compiler” by Cooper and Torczon. This one was more difficult to find. I just got it today and haven’t read it yet. With over 800 pages it’s longer than the Dragon Book. Skimming through the book I noticed that it seems to be detailed and precise and puts emphasis on practical aspects. Chapter about lexical analysis has a section about implementing a lexer. I’m looking forward to this one, since I still don’t have a clear vision on how to implement lexer efficiently. These are only the first impressions of course. I’ll verify them when I start reading (expect a follow-up on this one read a follow up on this one).

Third of the recommended books – “Modern compiler implementation in C/Java/ML” (there are three different editions of this book) by Appel – was also available in my library. I’ve already read the chapter about lexical analysis and I think that, contrary to the Dragon Book, it was a bit too concise. It covers the subject of Deterministic Finite Automata (DFA) and Non-deterministic Finite Automata (NFA), which luckily I am familiar with (more on that in a moment), so that wasn’t much of a problem. However if this was my first contact with DFAs and NFAs I would have some trouble understanding what it’s all about. I hope that the next chapters will be more detailed. I was very happy to see that this book has a chapters about functional languages and polymorphic types. The latter one covers the subject of the Hindley-Milner type inference algorithm in a form that seems to be accessible.

Now, here’s a surprise! Just a few days ago I found about a book “The Implementation of Functional Programming Languages” by Simon Peyton Jones (see my previous post). To my greatest surprise it was in the library! I stumbled upon it by pure accident while I was searching the shelves for Appel’s book – I never would have thought that it’s available in the library of my university. Do I have to mention that I borrowed it right away? I’ve read the Introduction and some random parts of the first chapter about the lambda calculus. I must say that the book looks very accessible so I hope to find some time to read at least couple of chapters. What’s interesting about the book (or rather about functional languages) is the fact that Haskell didn’t exist when this book was written, yet much of the code in the book is almost Haskell (in fact it’s Miranda).

I’ve got one more book entitled “Compiler construction” by Waite and Goos. This one is old (1984) and, aside from universal knowledge about parsing, semantic analysis, error handling an so on, it seems to cover many aspects that are a bit outdated. I think I’ll just rely on previous books and skip reading this one.

I also enrolled to Automata course. Initially I didn’t plan to do it, but I figured out that I can just give it a try and, if I won’t have enough time, I can simply download the videos and watch them during summer holiday. Luckily the first week covers the DFAs and NFAs so I got 100% from the first quiz without even watching the lectures. I’m afraid that next weeks won’t be that easy.

Staypressed theme by Themocracy