wiki:Status/Oct09

Version 9 (modified by simonpj, 4 years ago) (diff)

--

GHC status October 2009

We are just about to make our annual major release, of GHC 6.12.1 (in the following we will say "GHC 6.12" to refer to GHC 6.12.1 and future patch-level releases along the 6.12 branch).

GHC continues to be very active, with many opportunities for others to get involved. We are particularly eager to find partners who are willing to take responsibility for a particular platform (e.g. Sparc/Solaris?, currently maintained by Ben Lippmeier); see [Platforms].

The GHC 6.12 release

We usually try to make a major release of GHC immediately after ICFP. It has been somewhat delayed this year, but we expect to release GHC 6.12 during November or December 2009. Apart from the myriad of new bug fixes and minor enhancements, the big new things in 6.12 are:

  • Considerably improved support for parallel execution. GHC 6.10 would execute parallel Haskell programs, but performance was often not very good. Simon Marlow has done lots of performance tuning in 6.12, removing many of the accidental (and largely invisible) gotchas that made parallel programs run slowly.
  • As part of this parallel-performance tuning, Satnam Singh and Simon Marlow have developed ThreadScope, a GUI that lets you see what is going on inside your parallel program. It's a huge step forward from "It takes 4 seconds with 1 processor, and 3 seconds with 8 processors; now what?". ThreadScope will be released separately from GHC, but at more or less the same time as GHC 6.12.
  • Dynamic linking is now supported on Linux, and support for other platforms will follow. Thanks for this most recently go to the Industrial Haskell Group (thank you [IHG]!) who pushed it into a fully-working state; dynamic linking is the culmination of the work of several people over recent years.

    One effect of dynamic linking is that binaries shrink dramatically, because the run-time system and libraries are shared. Perhaps more importantly, it is possible to make dynamic plugins from Haskell code that can be used from other applications.
  • The I/O libraries are now Unicode-aware, so your Haskell programs should now handle text files containing weird characters.
  • The package system has been made more robust, by associating each installed package with a unique identifier based on its exposed ABI. Now, cases where the user re-installs a package without recompiling packages that depend on it will be detected, and the packages with broken dependencies will be disabled. Previously, this would lead to obscure compilation errors, or worse, segfaulting programs.

    This change involved a large amount of internal restructuring, but it paves the way for future improvements to the way packages are handled. For instance, in the future we expect to track profiled packages independently of non-profiled ones, and we hope to make it possible to upgrade a package in an ABI-compatible way, without recompiling the packages that depend on it. This latter facility will be especially important as we move towards using more shared libraries.
  • A variety of small improvements to data types: record punning, declararing constructors with class constraints, GADT syntax for type fammilies etc.
  • You can omit the "$" in a top-level Template Haskell splice, which makes the TH call look more like an ordinary top-level declaration with a new keyword.
  • We're are deprecating mdo for recursive do-notation, in favour of the more expressive rec statement.
  • We've concluded that the implementation of impredicative polymorphism is unsustainably complicated, so we are re-trenching. It'll be depreceated in 6.12 (but will still work), and will be either removed or replaced with something simpler in 6.14.

For more detail, see the release notes in the 6.12 User manual [UserManual], which mention many things skipped over here.

Another big change with GHC 6.12 is that Hackage and the Haskell Platform is allowing GHC HQ to get out of the libraries business. So the plan is

  • We release GHC 6.12 with very few libraries
  • Bill Library Author downloads GHC 6.12 and tests his libraries
  • The next Haskell Platform release packages GHC 6.12 with these tested libraries
  • Joe User downloads the Haskell Platform.
  • Four months later there's a new HP release, still with GHC 6.12, but with more or better libraries. The HP release cycle is decoupled from GHC

So if you are Joe User, you want to wait for the HP release. Don't grab the GHC 6.12 release. It'll be perfectly usable, but only if you use (an up to date) cabal-install to download libraries, and accept that they may not be tested with GHC 6.12.

Lastly, GHC 6.12 has a totally re-engineered build system, with much-improved dependency tracking Building. While there have been lots of teething problems, things are settling down and the new system is a huge improvement over the old one. The main improvement is that you can usually just say make, and everything will be brought up to date (before it was often necessary to make clean first). Another improvement is that the new system exposes much more parallelism in the build, so GHC builds faster on multicores.

What's hot for the next year

GHC continues to be a great substrate for research. Here are the main things we are working on at the moment.

Type systems

Type families have proved a great success. From the outside it might seem that they are done -- after all, they are in GHC 6.10 -- but the internals are quite fragile and it's amazing that it all works well as it does. (Thanks to Manuel's work.) Tom Schrijver, Dimitrios Vytiniotis, Martin Sulzmann, and Manuel Chakravarty have been working with Simon PJ to understand the fundamentals and, in the light of that insight, to re-engineer the implementation into something more robust. We have developed the "OutsideIn" algorithm, which gives a much nicer account of type inference than our previous story of type inference. The new approach is described in Complete and Decidable Type Inference for GADTs [ICFP09a]. More controversially, we now believe that local let/where bindings should not be generalised -- see should not be generalised [LetGen]. Dimitrios is building a prototype that embodies these ideas, which we'll then transfer into GHC.

Meanwhile, Dimitrios, Simon, and Stephanie Weirich are also working on fixing one of GHC's more embarassing bugs (Trac #1496), whereby an interaction of type families and the newtype-deriving can persuade GHC to generate type-unsound code. It's remained un-fixed because the obvious approaches seem to be hacks, so the cure was as bad as the disease. We think we are on to something; stay tuned.

Interemediate language and optimisation

Although it is, by design, invisible to users, GHC's intermediate language and optimsation passes have been receiving quite a bit of attention. Some highlights

  • Read Max Bolingbroke's paper on Strict Core [MaxB], a possible new intermediate language for GHC. Adopting Strict Core would be a Big Change, however, and we have not decided to do so (yet).
  • Peter Jonsson did an internship in which he made a start on turning GHC into a supercompiler. Neil Mitchell's terrific PhD thesis suggested that supercompliation works well for Haskell [!NeilM], and Peter has been working on supercompilation for Timber as part of his own PhD [!PeterJ]. The GHC version isn't ready for prime time yet, but Simon PJ (now educated by Peter and Neil) is keen to pursue it.
  • An internal change in GHC 6.12 is the addition of "annotations", a general-purpose way for a programmer to add annotations to top-level definitions that can be consulted by a core-to-core pass, and for a core-to-core pass to pass information to its successors Annotations. We expect to use these annotations increasingly in GHC itself.

Parallelism

Most of the changes in this area in GHC 6.12.1 were described in our ICFP'09 paper Runtime Support for Multicore Haskell [ICFP09b]. The highlights:

  • Load-balancing of sparks is now based on lock-free work-stealing queues.
  • The overhead for running a spark is significantly less, so GHC can take advantage of finer-grained parallelism
  • The parallel GC is now much more locality-aware. We now do parallel GC in young-generation collections by default, mainly to avoid destroying locality by moving data out of the CPU cache on which it is needed. Young-generation collections are parallel but not load-balanced. There are new RTS flags to control parallel GC behaviour.
  • Various other minor performance tweaks.

In the future we plan to focus on the GC, with the main goal being to implement independent per-CPU collection. The other area we plan to look at is changing the GC policy for sparks, as described in our ICFP'09 paper; this will need a corresponding change to the Strategies library to avoid relying on the current "sparks are roots" GC policy, which causes difficulties for writing parallel code that exploits speculation.

Data Parallelism

MANUEL CHAKRAVARTY to write

Code generation

For the last two years we have been advertising a major upheaval in GHC's back end. Currently a monolithic "code generator" converts lambda code (the STG language) into flat C--; "flat" in the sense that the stack is manifested, and there are no function calls. The upheaval splits this in to a pipeline of passes, with a relatively-simple conversion of lambda code into C-- (with function calls), followed by a succession of passes that optimise this code, and flatten it (by manifesting the stack and removing calls).

John Dias is the principal architect of this new path, and it is in GHC already; you can switch it on by saying -fnew-codegen. What remains is (a) to make it work 100% (currently 99%, which is not good enough); (b) commit to it, which will allow us to remove gargantuan quantities of cruft; (c) exploit it, by implementing cool new optimisations at the C-- level; (d) take it further by integrating the native code generators into the same pipeline. You can read more on the wiki Commentary/Compiler/NewCodeGenPipeline !CodeGen?.

Several passes of the new code generation pipeline are supported by Hoopl, a Haskell library that makes it easy to write dataflow analyses and optimisations over C-- code http://research.microsoft.com/~simonpj/papers/c-- Hoopl. We think Hoopl is pretty cool, and have well-advanced ideas for how to improve it a lot more.

All of this has taken longer than we hoped. Once the new pipeline is in place we hope that others will join in. For example, David Terei did an interesting undergraduate project on using LLVM as a back end for GHC [Terei], and Krzysztof Wos is just beginning an undergraduate project on optimisation in the new pipeline. We are particularly grateful to Ben Lippmeier for his work on the SPARC native code generator.

Bibliography: papers

  • [Terei] Manuel: what URL?

Bibliography: wiki

All these URLs should be preceded with http://hackage.haskell.org/trac/ghc/wiki

  • [Platforms] Platforms that GHC supports Platforms
  • [Annotations] Annotations in GHC Annotations
  • [CodeGen] The new codegen pipeline [wike:Commentary/Compiler/NewCodeGenPipeline]