|Version 8 (modified by simonmar, 3 years ago) (diff)|
GHC Status Report May 2012
GHC 7.4.1 was released at the beginning of February, and has been by and large a successful release. Nevertheless the tickets keep pouring in, and a large collection of bug fixes have been made since the 7.4.1 release. We plan to put out a 7.4.2 release candidate very soon (it may be out by the time you read this), followed shortly by the release.
We have a new member of the team! Please welcome Paolo Capriotti who is assuming some of the GHC maintenance duties for Well-Typed.
7.4.1 included a few major improvements. For more details on these, see the previous status report
- Support for all declarations at the GHCi prompt
- Data type promotion and kind polymorphism 
- Improvements to Safe Haskell (safety is now inferred)
- Constraint Kinds
- Profiling improvements: a major internal overhaul, and support for stack traces with +RTS -xc.
- Preliminary support for registerised ARM compilation
Here are the projects we're currently working on:
- Completing the support for kind polymorphism (Simon PJ)
- Typechecker performance improvements (Dimitrios?)
- Type-level natural numbers (Iavor D)
- Windows x64 Support (Ian L). The Industrial Haskell Group has funded work to implement 64bit Windows support in GHC. The port is now self-hosting and mostly complete, with just a number of bugs in the periphery to fix, and some logistics to work out. We expect a 64bit Windows installer to be included in the GHC 7.6 releases.
- The new code generator (Simon M). The glorious new code generator  has been an ongoing project for some time now. The basic idea is to replace the pass of the compiler that converts from STG to Cmm (our internal C-- representation) with a more flexible framework consisting of two main passes: one that generates C-- without explicit stack manipulation, and a second pass that makes the stack explicit. This will enable a host of improvements and optimisations in due course. The new code generator uses the Hoopl framework for code analysis and rewriting . Earlier this year I (Simon M) took over this project, and spent a lot of time optimising the existing framework and Hoopl itself. I also rewrote the stack allocator, and made a number of simplifications. The current state is that the new code generator produces code that is almost as good as the old one (and occasionally better), and is somewhat slower (roughly 15% slower compilation with -O). The goal is to further improve on this, and I'm confident that we can generate better code in most cases than the old code generator. I hope this can make it into 7.6.1, but no guarantees.
- Changing the +RTS -N setting at runtime. Up until recently, the number of cores ("Capabilities" in GHC terminology) that GHC uses was fixed by the +RTS -N flag when you start the program. For instance, to use 2 cores, we pass the flag +RTS -N2 to the Haskell program. GHC now has support for modifying this setting programmatically at runtime, both up and down, via the API Control.Concurrent.setNumCapabilities. So a parallel Haskell program can now set the number of cores to run on itself, without the user needing to pass +RTS -N. Another use for this feature is to drop back to using a single core during sequential sections of the program, which is likely to give better performance, especially on a loaded system. A threadscope diagram showing this in action is here: . In the future we hope to use heuristics to dynamically adjust the number of cores in use according to system load or application demand, for example.
- Profiling and stack traces (Simon M). 7.4.1 has an overhauled profiling system, and in many cases gives better results than earlier versions. However, some details remain to be resolved around the precise semantics of cost-centre stacks. Also, I hope that it might be possible to provide stack traces of a kind without having to compile for profiling, perhaps in GHCi only.
- Support for SSE primitives when using the LLVM back end (Geoffrey M). The simd git branch of GHC adds support for primitive 128-bit SIMD vector types and associated primops when using the LLVM back end, meaning this branch can now generate SSE instructions on x86 platforms. We hope this support will make it into 7.6.1. Experimental versions of the vector library  and DPH  provide higher-level interfaces to the new primitives. Initial benchmarks indicate that numerical code can benefit substantially.
- Data Parallel Haskell. The vectorisation transformation underlying our implementation of nested data parallelism in GHC had a fundamental and long standing asymptotic complexity problem that we were finally able to resolve. Details are in a recent draft paper entitled Work Efficient Higher-Order Vectorisation . The implementation described in the paper is available in the DPH packages from Hackage (which need to be used with GHC 7.4.1). The new implementation of the DPH libraries still needs to be optimised; hence, our next step will be to optimise constant factors.
In addition, we released Repa 3 , which uses type-indices to control array representations. This leads to more predictable performance. You can install Repa 3, which requires GHC 7.4.1, from Hackage. We are currently writing a paper describing the new design in detail.
Finally, we are about to release (it may be out by the time you read this) a stable, end-user ready version of the Repa-like array library Accelerate for GPU computing on Hackage. It integrates with Repa, so you can mix GPU and CPU multicore computing, and via the new meta-par package you can share workload between CPUs and GPUs . This new version 0.12 is already available on GitHub . You need a CUDA-capable NVIDIA GPU to use it.
- Lightweight concurrency substrate (Sivaramakrishnan Krishnamoorthy Chandrasekaran, aka "KC"). During his internship at MSR Cambridge, KC has been working on replacing the RTS scheduler with some APIs that enable the scheduler to be implemented in Haskell. The aim is to not just move the scheduler into Haskell, but also enable user-defined schedulers to coexist, which will ultimately enable much greater control over scheduling behaviour. This follows on from previous work  with Peng Li and Andrew Tolmach, but this time we are taking a slightly different approach that has a couple of important benefits.
Firstly, KC found a way to enable concurrency abstractions to be defined without depending on a particular scheduler. This means for example that we can provide MVars that work with any user-defined scheduler, rather than needing one MVar implementation per scheduler. Secondly, we found ways to coexist with some of the existing RTS machinery for handling blackholes and asynchronous exceptions in particular, which means that these facilities will continue to work as before (with the same performance), and writers of user-defined schedulers do not need to worry about them. Furthermore this significantly lowers the barrier for writing a new scheduler.
This is all still very much experimental, and it is not clear whether it will ever be in GHC proper. It depends on whether we can achieve good enough performance, amongst other things. All we can say for now is that the approach is promising. You can find KC's work on the ghc-lwc branch of the git repo.