wiki:IntermediateTypes

Version 20 (modified by guest, 8 years ago) (diff)

--

Notes on the FC-based intermediate language

These notes describe the new intermediate language for GHC. The intermediate language is based on System F with algebraic datatypes and explicit type coercions (hereafter FC) (see the paper). This note mostly focuses on the type system and also discuss how some source-level features are represented in the intermediate language.

Most of the system is fairly standard, with the exception of coercions. A coercion c, is a type-level term, with a kind of the form T1 :=: T2. (c :: T1 :=: T2) is a proof that a term of type T1 can be coerced to type T2. It is used as argument of a cast expression; if t :: T1 then (t `cast` c) :: T2.

What's New?

If you are already familiar with the intermediate language and its type system as it existed in 6.4.2 (and probably before) then the key aspects that have changed are (and are further described in several sections below):

  • Coercions have been added - The syntax of types now includes coercions which are evidence for type equalities. There are distinguished coercion variables and a new variant of TyCon, with constructor CoercionTyCon. There is also a new Expr variant, with constructor Cast, which performs a cast given an expression and evidence of the safety of the cast.
  • Kinds are now Types - The type Kind is just a synonym for Type. There are special PrimitiveTyCons that represent kinds.
  • Newtypes are implemented with coercions - The previous ad-hoc mechanism has been replaced with one that uses coercions.
  • GADTs are implemented with coercions - The complex typing rules for case expressions on GADTs have been removed and instead the data constructors of GADTs carry coercions inside them. Consequently, typechecking Core is simpler (though type inference is just as hard as ever).

For anyone not familiar with the system as it existed previously, the rest of this note attempts to describe most of the type system of the intermediate language.

Types

The representation of types is defined (as the datatype Type) in TypeRep, and most of the useful functions on types are defined in Type. TypeRep exports the representation concretely, and should probably not be used outside the few places it is already used. Type re-exports everything useful from TypeRep, but exports the representation abstractly. The datatype Type really represents a single syntactic category that includes types, coercions, kinds, and super-kinds.

Type Variables

Type variables, of type Var, and associated construction and manipulation functions are defined in the Var module. There are two data constructors that make type variables, TyVar and TcTyVar. TcTyVars can be mutable tyvars that are instantiated during type checking. After typechecking, all TcTyVars are turned to TyVars. TyVars carry a Bool field, isCoercionVar, which is True if the type variable is a coercion variable and False otherwise. The function isCoVar should be used to test if a type variable is a coercion variable.

Type Constructors

Type constructors, of datatype TyCon, are defined in the TyCon module and exported abstractly. There are several different sorts of type constructors; the most important for understanding the overall intermediate language type system are:

  • AlgTyCon, which are for tycons for datatypes and newtypes and have a field of type AlgTyConRhs which specified whether it is a datatype or newtype and contains further information for each;
  • PrimTyCon, which are for built-in primitive tycons, and are also used to represent base kinds;
  • CoercionTyCon, which are for special tycons which are meant to represent syntactic forms (and not really type constructors), so they must be saturated to have a kind;
  • SuperKindTyCon, which are tycons that are used to represent super-kinds, also called sorts (which classify kinds as either coercion kinds, CO, or type kinds, TY), SuperKindTyCons are unique in never having a kind.

All TyCons but SuperKindTyCon and CoercionKindTyCon carry their kind in a field called tyConKind, and CoercionKindTyCons carry their kinding rule (a function with type [Type] -> Kind) in a field called coKindFun.

Kinds are Types

We have (as of August 2006) unified types and kinds as members of the datatype Type. Kind is just a synonym for Type. Basic kinds are now represented using type constructors, e.g. the kind * is represented as

star = TyConApp liftedTypeKindTyCon []

where liftedTypeKindTyCon is a built-in PrimTyCon. The arrow type constructor is used as the arrow kind constructor, e.g. the kind `* -> *` is represented internally as

FunTy star star

Kinds and types may be distinguished by looking at their "Kind" using the typeKind function. The "Kind" of a kind is always one of the sorts TY (for kinds that classify normal types) or CO (for kinds that classify coercion evidence). The coercion kind, T1 :=: T2, is represented by PredTy (EqPred T1 T2).

GHC has a relatively complicated kind structure...

There's a little subtyping at the kind level. Here is the picture for type-kinds (kinds of sort TY).

		 ?
		/ \
	       /   \
	      ??   (#)
	     /  \
            *   #

where	*    [LiftedTypeKind]   means boxed type
	#    [UnliftedTypeKind] means unboxed type
	(#)  [UbxTupleKind]     means unboxed tuple
	??   [ArgTypeKind]      is the lub of *,#
	?    [OpenTypeKind]	means any type at all

In particular:

	error :: forall a:?. String -> a
	(->)  :: ?? -> ? -> *
	(\(x::t) -> ...)	Here t::?? (i.e. not unboxed tuple)

Coercions and Coercion Kinds

Coercions are type-level terms which act as evidence for type equalities and are classified by a new sort of kind (with the form T1 :=: T2). Most of the coercion construction and manipulation functions are found in the Coercion module.

The syntax of coercions extends the syntax of types (and the type Coercion is just a synonym for Type). By representing coercion evidence on the type level, we can take advantage of the existing erasure mechanism and keep non-termination out of coercion proofs (which is necessary to keep the system sound). The syntax of coercions and types also overlaps a lot. A normal type is evidence for the reflexive coercion, i.e.,

Int :: Int :=: Int

Coercion variables are used to abstract over evidence of type equality, as in

(/\c::(a :=: Bool). \x::a. if (x `cast` c) then 0 else 1) :: (a :=: Bool) => a -> Int

There are also coercion constants that are introduced by the compiler to implement some source language features (newtypes for now, associated types soon and probably more in the future). Coercion constants are represented as TyCons made with the constructor CoercionTyCon.

Coercions are type level terms and can have normal type constructors applied to them. The action of type constructors on coercions is much like in a logical relation. So if c1 :: T1 :=: T2 then

[c1] :: [T1] :=: [T2]

and if c2 :: S1 :=: S2 then

c1 -> c2 :: (T1 -> S1 :=: T2 -> S2)

The sharing of syntax means that a normal type can be looked at as either a type or as coercion evidence, so we use two different kinding relations, one to find type-kinds (implemented in Type as `typeKind :: Type -> Kind`) and one to find coercion-kinds (implemented in Coercion as coercionKind :: Coercion -> Kind).

Coercion variables are distinguished from type variables, and non-coercion type variables (just like any normal type) can be used as the reflexive coercion, while coercion variables have a particular coercion kind which need not be reflexive.

GADTs

The internal representation of GADTs is as regular algebraic datatypes that carry coercion evidence as arguments. A declaration like

data T a b where
  T1 :: a -> b -> T [a] (a,b)

would result in a data constructor with type

  T1 :: forall a b. forall a1 b1. (a :=: [a1], b :=: (a1, b1)) => a1 -> b1 -> T a b

This means that (unlike in the previous intermediate language) all data constructor return types have the form T a1 ... an where a1 through an are the parameters of the datatype.

However, we also generate wrappers for GADT data constructors which have the expected user-defined type, in this case

$wT1 = /\a b. \x y. T1 [a] (a,b) a b [a] (a,b) x y

Where the 4th and 5th arguments given to T1 are the reflexive coercions

[a]   :: [a] :=: [a]
(a,b) :: (a,b) :=: (a,b)

Representation of coercion assumptions

In most of the compiler, as in the FC paper, coercions are abstracted using ForAllTy cv ty where cv is a coercion variable, with a kind of the form PredTy (EqPred T1 T2). However, during type inference it is convenient to treat such coercion qualifiers in the same way other class membership or implicit parameter qualifiers are treated. So functions like tcSplitForAllTy and tcSplitPhiTy and tcSplitSigmaTy, treat ForAllTy cv ty as if it were FunTy (PredTy (EqPred T1 T2)) ty (where PredTy (EqPred T1 T2) is the kind of cv). Also, several of the dataConXXX functions treat coercion members of the data constructor as if they were dictionary predicates (i.e. they return the PredTy (EqPred T1 T2) with the theta).

Newtypes are coerced types

The implementation of newtypes has changed to include explicit type coercions in the place of the previously used ad-hoc mechanism. When a newtype

newtype T a = MkT (T a -> T a)

is declared, a new TyCon called CoT is created, and is stored in the type constructor. The new TyCon? has the ame arity as the newtype type constructor, in this case one. The TyCon? does not have a kind on its own, only when fully applied to its arguments. In this case we have

CoT Int :: T Int :=: (T Int -> T Int)

This coercion is used to wrap and unwrap newtypes whenever the constructor or case is used in the Haskell source code.

Such coercions are always used when the newtype is recursive and are optional for non-recursive newtypes. This can be easily changed by altering the function mkNewTyConRhs in iface/BuildTyCl.lhs.

Core (the intermediate language)

  • Exprs
  • Casts
  • Typechecking
  • Environments and substitution

Simplification

  • exprIsConApp_maybe
  • simplExpr

Loose Ends

Some loose ends that came up during implementation of FC:

  • there is a strange unsafeCoerce that we could not figure out the purpose of in the FFI, a warning is currently emitted when it is used
  • removed the -DBREAKPOINT definition in the Makefile because it induced a module loop, we should probably fix this