# Let generalisation in GHC 7.0

[Note: updated to GHC 7.2, Sept 2011]

GHC 7.2 has a completely new type inference engine. The old one had grown to resemble the closing stages of a game of Jenga, in which making any change was a delicate operation that risked bringing the whole edifice crashing down. In particular, there were a dozen open bugs concerning type families that I had no idea how to fix.

This summer Dimitrios Vytiniotis, Brent Yorgey and I built a completely new inference engine. It rests on a solid technical foundation developed over the last couple of years, in a collaboration with Dimitrios, Tom Schrijvers, and Martin Sulzmann. There is a series of papers that lead up to it, but our JFP submission "Modular type inference with local assumptions" summarises the whole thing in one consistent notation.

The new engine feels much more solid to me, and indeed I've been able to close almost all those open bugs. Doubtless there will be some new ones, but I feel vastly more confident that we can fix them than I did with the old engine.

One other thing is that the new typechecker has completely swept away
the old rather *ad-hoc* stuff about "rigid" types in GADT pattern matches. So
you may find that some GADT programs will typecheck with fewer
type annotations than before.

However, there is one way in which GHC 7.2 will type *fewer* programs
than before, and that is the focus of this post. I want explain what the
change is, and how to adapt your code to accommodate it.

## Local polymorphism

Consider this (contrived) program:

{-# LANGUAGE ScopedTypeVariables #-} f :: forall a. a -> ((a, Char), (a, Bool)) f x = (g 'v', g True) where g :: forall b. b -> (a,b) g y = (x,y)

This polymorphism is expressed in `g`'s type signature:

`g`is a*polymorphic*function, because it is applied both to the character 'v' and to the boolean`True`.- However,
`g`is*monomorphic*in`a`, because`g`'s body mentions`x`

The type signature `g :: forall b. b -> (a,b)`
reflects these two points:

- The "
`forall b`" says that`g`is polymorphic in`b`.

- The
`a`in the signature must is the`a`in the type of`x`. This type variable`a`is brought into scope, in the body of`f`, by the`forall a`in the type signature for`f`.

The `LANGUAGE ScopedTypeVariables` pragma (a) allows you to use an
explicit "`forall`" in the type signature for a function `f`, and (b)
brings the `forall`'d type variables (`a` in this case) into scope in the body of the
function `f`.

## What has changed in GHC 7.2?

GHC 6.12 will compile the above program without any type signatures at all.
It infers the polymorphic type for `g`. But, if you are using
type families or GADTs, GHC 7.0 will not. More precisely:

- The flags
`-XGADTs`or`-XTypeFamilies`, or the corresponding`LANGUAGE`pragmas, imply the (new) flag`-XMonoLocalBinds`.

- With
`-XMonoLocalBinds`, non-top-level (or*local*) let/where bindings are not generalised.

- Remember that
`-fglasgow-exts`implies`-XGADTs`and hence`-XTypeFamilies`.

So given the type-signature-free code

{-# LANGUAGE MonoLocalBinds #-} f x = (g 'v', g True) where g y = (x,y)

GHC 7.2 will reject the program with

Foo.hs:6:17: Couldn't match expected type `Char' with actual type `Bool' In the first argument of `g', namely `True' In the expression: g True In the expression: (g 'v', g True)

Why? Because the type of `g` is not generalised, so it
can only be applied to either `Char` or `Bool` but not both.

## Which bindings are affected?

In the JFP paper we propose the rule that *local bindings are not generalised*,
where "local" means "syntactically not top level". This was the rule implemented by GHC 7.0.

However GHC 7.2 relaxes the restriction slightly. Consider

{-# LANGUAGE MonoLocalBinds #-} f x = (k 'v', k True) where h y = (y,y) -- Note: x is not mentioned k z = (h z, h z)

Now the binding for `h` is "local" in the sense that it is
not syntactically top-level, but it has no free variables, so it
*could have been* top-level. Similarly, `k` is "local" but
only mentions `h` which could have been top-level, and hence `k`
could be top-level too. GHC 7.2 spots this, and generalises both
just as if they had been top level.

Here's the rule. With `-XMonoLocalBinds` (the default), a binding without a
type signature is **generalised only if all its free variables are closed**.

A binding is **closed** if and only if

- It has a type signature, and the type signature has no free variables; or
- It has no type signature, and all its free variables are closed, and it is unaffected by the monomorphism restriction. And hence it is fully generalised.

In our example, `h` has no free variables and hence is generalised;
moreover, `k` has free variable `h`, but it is closed, so `k` too can be
generalised.

Here's another example:

x = 4 + 5 y = x + 7

Both bindings are top-level, but `x` falls under the Monomorphism Restriction and is not generalised for that reason. So `x` is not closed. Hence `y` is not generalised either, since it has a non-closed free variable `x`. (This really fixes a bug in GHC 7.0 which erroneously generalised `y` because it was syntactically top level.)

## How can I fix my program?

Almost all code doesn't need local let definitions to have
a polymorphic type, because the function is only used at one type.
But not all. There are two ways to fix your program if it falls
over because of the `MonoLocalBinds` change.

- Check whether you need
`-XGADTs`or`-XTypeFamilies`. If you are using the blunderbuss`-fglasgow-exts`, replace it with proper`LANGUAGE`pragmas.

- The good way is simply give a type signature to the local definition, as I did for
`g`above. Writing such a type signature may force you to use`{-# LANGUAGE ScopedTypeVariables #-}`as the above example showed. My experience is that the code is easier to understand with the type signature.

- The
`-XGADTs`or`-XTypeFamilies`pragmas switch on`MonoLocalBinds`but, if you want, you can override it with`-XNoMonoLocalBinds`(or the equivalent`LANGUAGE`pragma). The type checker will then do its best to generalise local bindings, and your program will almost certainly work. However, I don't know how to guarantee any good properties of the type inference algorithm. So I think this is ok as as short term fix, but I'd like to encourage you to add that handful of type signatures instead.

You may not be very familiar with the code,
so it can be tricky to work out what type the local definition should have.
(That's one reason I think that the code is improved by having a type signature!)
With this in mind I've added a new warning flag `-fwarn-missing-local-sigs`, which
prints a warning, and the type, of any *polymorphic* local binding that
lacks a type signature. So you can follow this procedure:

- Compile with
`-XNoMonoLocalBinds -fwarn-missing-local-sigs`. The first flag makes it compile, and the second shows you the types. - Compile with neither flag, getting some errors.
- Add type signatures to cure the errors, using the types printed out in the first step.

Adding the type signature is completely compatible with GHC 6.12 etc, so you shouldn't need any conditional compilation if you want to compile your code with multiple versions of GHC.

## A second, bigger example

One relatively common situation in which you need a local type signature
is when you use `runST` with some auxiliary definitions.
Here is a typical example:

{-# LANGUAGE ScopedTypeVariables #-} module Test where import Control.Monad.ST import Data.STRef foo :: forall a. Bool -> a -> a -> a foo b x y = runST (do { r <- newSTRef x; fill r; readSTRef r }) where -- fill :: forall s. STRef s a -> ST s () fill r = case b of True -> return () False -> writeSTRef r y -- runST :: forall a. (forall s. ST s a) -> a -- newSTRef :: a -> ST s (STRef s a) -- readSTRef :: STRef s a -> ST s a -- writeSTRef :: STRef s a -> a -> ST s ()

The function `foo` is a bit contrived, but it's typical of the way
in which `runST` is often used. We allocate a reference cell, containing `x`,
call `fill`, and then read out what is now in the reference cell.
All `fill` does is overwrite the cell if `b` is `False`.
So `foo` is really an elaborate `if` function.

But in GHC 7.0, with `-XMonoLocalBinds` (or `-XGADTs`, `-XTypeFamilies`),
the program will be rejected with this alarming-seeming error message

Foo.hs:7:11: Couldn't match type `s' with `s1' because this skolem type variable would escape: `s1' This skolem is bound by the polymorphic type `forall s. ST s a' The following variables have types that mention s fill :: STRef s a -> ST s () (bound at Foo.hs:10:11) In the first argument of `runST', namely `(do { r <- newSTRef x; fill r; readSTRef r })'

Remember, `fill` has not been generalised, so it gets a monomorphic type.
That in turn means that the argument to `runST` is not polymorphic enough.

Despite its scariness, in some ways the error message identifies
the problem more clearly than the previous example. In particular, the error
message identifies `fill` as the culprit.
You can fix it the same way as before, by adding a signature for `fill`;
I've put it in a comment above.

## Why make this change?

At first this change seems like retrograde step, so why did we make it? It's a long story, but the short summary is this: I don't know how to build a reliable, predictable type inference engine for a type system that has both (a) generalisation of local let/where and (b) local type equality assumptions, such as those introduced by GADTs. The story is told in our paper "Let should not be generalised" and, at greater length, in the journal version "Modular type inference with local assumptions".

# First results from GHC's new garbage collector

For a number of months now, we have been designing and developing a new garbage collector for GHC, with the primary goal of improving scaling on multicores. The current garbage collector isn't exactly a slouch: it is a parallel generational design, and after going to some effort to improve locality we managed to achieve some respectable scaling results on up to 8 cores, and the Intel CnC project has been able to achieve even greater speedups (as much as 22x on a 32-core machine) with carefully-tuned benchmarks.

Still, we recognise that the garbage collector is often the bottleneck where scaling is concerned. Tuning a program for scaling usually involves reducing its demands on the memory subsystem and hence reducing the amount of GC that happens. Here's how the current GC impedes scaling:

This is a screenshot from ThreadScope, showing part of the timeline of the execution of a parallel Haskell program - in this case, the minimax program for searching the game tree of 4x4 noughts and crosses. Each bar represents the execution of a single CPU, and the thin orange/green bars are the GCs. Every GC stops all the threads and performs the GC in parallel, before restarting the computation again. This is known as "stop the world" parallal GC - compared to other techniques such as concurrent GC it's easier to implement and generally gives good performance. Many industrial-strength virtual machines use stop-the-world parallel GC in their server products, because it gives the best throughput. However, the diagram clearly shows that we're wasting time here: the GCs aren't well-balanced (the green parts show the actual work, the blank parts are idle time). We deliberately don't load-balance the work in young-generation collections, because doing so impacts locality and is worse for overall performance (we discussed this in the ICFP'09 paper and gave measurements). So we end up wasting some of our processor bandwidth, and hence losing some scaling.

Not only that, but the cost of the all-core synchronisation becomes a bottleneck as the number of cores increases, and the need for rapid synchronisations causes problems if the OS decides to borrow one of the cores to do something else for a while: the symptom of this problem has been dubbed the "last core parallel slowdown" in Haskell.

Here's what our new GC looks like on the same program:

The heap organisation in fact hasn't changed much: there are still per-core young generations and a single old generation, but the improvement is that the per-core young generations can be collected independently without stopping the other threads. Collecting the old generation - the "global heap" - still requires stopping all the threads, but since this is the old generation, global collections are much less frequent than local collections.

Since there are fewer synchronisations and less wasted processor bandwidth, the overall throughput is higher - only about 10% on 8 cores with this example, but there are other (less picturesque) examples that improve more, and it's still early days and we have a lot of tuning to do. We expect the benefits to be greater on larger numbers of cores, and furthermore the "last core slowdown" should be finally banished.

Designs like this have appeared in the literature (starting with Doligez/Leroy POPL'93), and ours borrows ideas from several of these earlier designs while adding a few novel twists of our own. We plan to write up the design properly once all the tuning is done and we have some solid results. I expect it to land in GHC HEAD in a few months time, and it should be in the autumn 2011 major release of GHC.