# Let generalisation in GHC 7.0

[Note: updated to GHC 7.2, Sept 2011]

GHC 7.2 has a completely new type inference engine. The old one had grown to resemble the closing stages of a game of Jenga, in which making any change was a delicate operation that risked bringing the whole edifice crashing down. In particular, there were a dozen open bugs concerning type families that I had no idea how to fix.

This summer Dimitrios Vytiniotis, Brent Yorgey and I built a completely new inference engine. It rests on a solid technical foundation developed over the last couple of years, in a collaboration with Dimitrios, Tom Schrijvers, and Martin Sulzmann. There is a series of papers that lead up to it, but our JFP submission "Modular type inference with local assumptions" summarises the whole thing in one consistent notation.

The new engine feels much more solid to me, and indeed I've been able to close almost all those open bugs. Doubtless there will be some new ones, but I feel vastly more confident that we can fix them than I did with the old engine.

One other thing is that the new typechecker has completely swept away
the old rather *ad-hoc* stuff about "rigid" types in GADT pattern matches. So
you may find that some GADT programs will typecheck with fewer
type annotations than before.

However, there is one way in which GHC 7.2 will type *fewer* programs
than before, and that is the focus of this post. I want explain what the
change is, and how to adapt your code to accommodate it.

## Local polymorphism

Consider this (contrived) program:

{-# LANGUAGE ScopedTypeVariables #-} f :: forall a. a -> ((a, Char), (a, Bool)) f x = (g 'v', g True) where g :: forall b. b -> (a,b) g y = (x,y)

This polymorphism is expressed in `g`

's type signature:

`g`

is a*polymorphic*function, because it is applied both to the character 'v' and to the boolean`True`

.- However,
`g`

is*monomorphic*in`a`

, because`g`

's body mentions`x`

The type signature `g :: forall b. b -> (a,b)`

reflects these two points:

- The "
`forall b`

" says that`g`

is polymorphic in`b`

.

- The
`a`

in the signature must is the`a`

in the type of`x`

. This type variable`a`

is brought into scope, in the body of`f`

, by the`forall a`

in the type signature for`f`

.

The `LANGUAGE ScopedTypeVariables`

pragma (a) allows you to use an
explicit "`forall`

" in the type signature for a function `f`

, and (b)
brings the `forall`

'd type variables (`a`

in this case) into scope in the body of the
function `f`

.

## What has changed in GHC 7.2?

GHC 6.12 will compile the above program without any type signatures at all.
It infers the polymorphic type for `g`

. But, if you are using
type families or GADTs, GHC 7.0 will not. More precisely:

- The flags
`-XGADTs`

or`-XTypeFamilies`

, or the corresponding`LANGUAGE`

pragmas, imply the (new) flag`-XMonoLocalBinds`

.

- With
`-XMonoLocalBinds`

, non-top-level (or*local*) let/where bindings are not generalised.

- Remember that
`-fglasgow-exts`

implies`-XGADTs`

and hence`-XTypeFamilies`

.

So given the type-signature-free code

{-# LANGUAGE MonoLocalBinds #-} f x = (g 'v', g True) where g y = (x,y)

GHC 7.2 will reject the program with

Foo.hs:6:17: Couldn't match expected type `Char' with actual type `Bool' In the first argument of `g', namely `True' In the expression: g True In the expression: (g 'v', g True)

Why? Because the type of `g`

is not generalised, so it
can only be applied to either `Char`

or `Bool`

but not both.

## Which bindings are affected?

In the JFP paper we propose the rule that *local bindings are not generalised*,
where "local" means "syntactically not top level". This was the rule implemented by GHC 7.0.

However GHC 7.2 relaxes the restriction slightly. Consider

{-# LANGUAGE MonoLocalBinds #-} f x = (k 'v', k True) where h y = (y,y) -- Note: x is not mentioned k z = (h z, h z)

Now the binding for `h`

is "local" in the sense that it is
not syntactically top-level, but it has no free variables, so it
*could have been* top-level. Similarly, `k`

is "local" but
only mentions `h`

which could have been top-level, and hence `k`

could be top-level too. GHC 7.2 spots this, and generalises both
just as if they had been top level.

Here's the rule. With `-XMonoLocalBinds`

(the default), a binding without a
type signature is **generalised only if all its free variables are closed**.

A binding is **closed** if and only if

- It has a type signature, and the type signature has no free variables; or
- It has no type signature, and all its free variables are closed, and it is unaffected by the monomorphism restriction. And hence it is fully generalised.

In our example, `h`

has no free variables and hence is generalised;
moreover, `k`

has free variable `h`

, but it is closed, so `k`

too can be
generalised.

Here's another example:

x = 4 + 5 y = x + 7

Both bindings are top-level, but `x`

falls under the Monomorphism Restriction and is not generalised for that reason. So `x`

is not closed. Hence `y`

is not generalised either, since it has a non-closed free variable `x`

. (This really fixes a bug in GHC 7.0 which erroneously generalised `y`

because it was syntactically top level.)

## How can I fix my program?

Almost all code doesn't need local let definitions to have
a polymorphic type, because the function is only used at one type.
But not all. There are two ways to fix your program if it falls
over because of the `MonoLocalBinds`

change.

- Check whether you need
`-XGADTs`

or`-XTypeFamilies`

. If you are using the blunderbuss`-fglasgow-exts`

, replace it with proper`LANGUAGE`

pragmas.

- The good way is simply give a type signature to the local definition, as I did for
`g`

above. Writing such a type signature may force you to use`{-# LANGUAGE ScopedTypeVariables #-}`

as the above example showed. My experience is that the code is easier to understand with the type signature.

- The
`-XGADTs`

or`-XTypeFamilies`

pragmas switch on`MonoLocalBinds`

but, if you want, you can override it with`-XNoMonoLocalBinds`

(or the equivalent`LANGUAGE`

pragma). The type checker will then do its best to generalise local bindings, and your program will almost certainly work. However, I don't know how to guarantee any good properties of the type inference algorithm. So I think this is ok as as short term fix, but I'd like to encourage you to add that handful of type signatures instead.

You may not be very familiar with the code,
so it can be tricky to work out what type the local definition should have.
(That's one reason I think that the code is improved by having a type signature!)
With this in mind I've added a new warning flag `-fwarn-missing-local-sigs`

, which
prints a warning, and the type, of any *polymorphic* local binding that
lacks a type signature. So you can follow this procedure:

- Compile with
`-XNoMonoLocalBinds -fwarn-missing-local-sigs`

. The first flag makes it compile, and the second shows you the types. - Compile with neither flag, getting some errors.
- Add type signatures to cure the errors, using the types printed out in the first step.

Adding the type signature is completely compatible with GHC 6.12 etc, so you shouldn't need any conditional compilation if you want to compile your code with multiple versions of GHC.

## A second, bigger example

One relatively common situation in which you need a local type signature
is when you use `runST`

with some auxiliary definitions.
Here is a typical example:

{-# LANGUAGE ScopedTypeVariables #-} module Test where import Control.Monad.ST import Data.STRef foo :: forall a. Bool -> a -> a -> a foo b x y = runST (do { r <- newSTRef x; fill r; readSTRef r }) where -- fill :: forall s. STRef s a -> ST s () fill r = case b of True -> return () False -> writeSTRef r y -- runST :: forall a. (forall s. ST s a) -> a -- newSTRef :: a -> ST s (STRef s a) -- readSTRef :: STRef s a -> ST s a -- writeSTRef :: STRef s a -> a -> ST s ()

The function `foo`

is a bit contrived, but it's typical of the way
in which `runST`

is often used. We allocate a reference cell, containing `x`

,
call `fill`

, and then read out what is now in the reference cell.
All `fill`

does is overwrite the cell if `b`

is `False`

.
So `foo`

is really an elaborate `if`

function.

But in GHC 7.0, with `-XMonoLocalBinds`

(or `-XGADTs`

, `-XTypeFamilies`

),
the program will be rejected with this alarming-seeming error message

Foo.hs:7:11: Couldn't match type `s' with `s1' because this skolem type variable would escape: `s1' This skolem is bound by the polymorphic type `forall s. ST s a' The following variables have types that mention s fill :: STRef s a -> ST s () (bound at Foo.hs:10:11) In the first argument of `runST', namely `(do { r <- newSTRef x; fill r; readSTRef r })'

Remember, `fill`

has not been generalised, so it gets a monomorphic type.
That in turn means that the argument to `runST`

is not polymorphic enough.

Despite its scariness, in some ways the error message identifies
the problem more clearly than the previous example. In particular, the error
message identifies `fill`

as the culprit.
You can fix it the same way as before, by adding a signature for `fill`

;
I've put it in a comment above.

## Why make this change?

At first this change seems like retrograde step, so why did we make it? It's a long story, but the short summary is this: I don't know how to build a reliable, predictable type inference engine for a type system that has both (a) generalisation of local let/where and (b) local type equality assumptions, such as those introduced by GADTs. The story is told in our paper "Let should not be generalised" and, at greater length, in the journal version "Modular type inference with local assumptions".

## Comments

One gotcha about combining -XGADTs and -XNoMonoLocalBinds: because -XGADTs implies -XMonoLocalBinds, you need to place -XNoMonoLocalBinds *after* -XGADTs; otherwise MonoLocalBinds will just get switched back on. Order of the flags matters.