wiki:Commentary/Compiler/CoreSynType

Version 18 (modified by malcolm.wallace@…, 8 years ago) (diff)

--

Video: GHC Core language (14'04")

The Core type

The Core language is GHC's central data types. Core is a very small, explicitly-typed, variant of System F. The exact variant is called System FC, which embodies equality constraints and coercions.

The CoreSyn type, and the functions that operate over it, gets an entire directory compiler/coreSyn:

Here is the entire Core type compiler/coreSyn/CoreSyn.lhs:

type CoreExpr = Expr Var

data Expr b	-- "b" for the type of binders, 
  = Var	  Id
  | Lit   Literal
  | App   (Expr b) (Arg b)
  | Lam   b (Expr b)
  | Let   (Bind b) (Expr b)
  | Case  (Expr b) b Type [Alt b]
  | Cast  (Expr b) Coercion
  | Note  Note (Expr b)
  | Type  Type

type Arg b = Expr b
type Alt b = (AltCon, [b], Expr b)

data AltCon = DataAlt DataCon | LitAlt  Literal | DEFAULT

data Bind b = NonRec b (Expr b) | Rec [(b, (Expr b))]

That's it. All of Haskell gets compiled through this tiny core.

Expr is parameterised over the type of its binders, b. This facility is used only rarely, and always temporarily; for example, the let-floater SetLevels pass attaches a binding level to every binder. By far the most important type is CoreExpr, which is Expr with Var binders.

Here are some notes about the individual constructors of Expr.

  • Lam is used for both term and type abstraction (small and big lambdas).
  • Type appears only in type-argument positions (e.g. App (Var f) (Type ty)). To emphasise this, the type synonym Arg is used as documentation when we expect that a Type constructor may show up. Anything not called Arg should not use a Type constructor.
  • Let handles both recursive and non-recursive let-bindings; see the the two constructors for Bind.
  • Note is used for profiling and debugging information.

Case expressions

Case expressions are the most complicated bit of Core. In the term Case scrut case_bndr res_ty alts:

  • scrut is the scrutinee
  • case_bndr is the case binder (see notes below)
  • res_ty is the type of the entire case expression (redundant once FC is in HEAD -- was for GADTs)
  • alts is a list of the case alternatives

A case expression can scrutinise

  • a data type (the alternatives are DataAlts), or
  • a primitive literal type (the alternatives are LitAlts), or
  • a value of any type at all (if there is one DEFAULT alternative).

A case expression is always strict, even if there is only one alternative, and it is DEFAULT. (This differs from Haskell!) So

case error "urk" of { DEFAULT -> True }

will call error, rather then returning True.

The case_bndr field, called the case binder, is an unusual feature of GHC's case expressions. The idea is that in any right-hand side, the case binder is bound to the value of the scrutinee. If the scrutinee was always atomic nothing would be gained, but real expressiveness is added when the scrutinee is not atomic. Here is a slightly contrived example:

case (reverse xs) of y 
  Nil       -> Nil
  Cons x xs -> append y y

(Here, "y" is the case binder; at least that is the syntax used by the Core pretty printer.) This expression evaluates reverse xs; if the result is Nil, it returns Nil, otherwise it returns the reversed list appended to itself. Since the returned value of reverse xs is present in the implementation, it makes sense to have a name for it!

The most common application is to model call-by-value, by using case instead of let. For example, here is how we might compile the call f (reverse xs) if we knew that f was strict:

case (reverse xs) of y { DEFAULT -> f y }

Case expressions have several invariants

  • The res_ty type is the same as the type of any of the right-hand sides (up to refining unification -- coreRefineTys in compiler/types/Unify.lhs -- in pre-FC).

  • If there is a DEFAULT alternative, it must appear first. This makes finding a DEFAULT alternative easy, when it exists.
  • The remaining non-DEFAULT alternatives must appear in order of
    • tag, for DataAlts
    • lit, for LitAlts

This makes finding the relevant constructor easy, and makes comparison easier too.

  • The list of alternatives is always exhaustive, meaning that it covers all reachable cases. Note, however, that an "exhausive" case does not necessarily mention all constructors:
    data Foo = Red | Green | Blue
    
    ...case x of 
    	Red   -> True
    	other -> f (case x of 
    			Green -> ...
    			Blue  -> ... )
    
    The inner case does not need a Red alternative, because x can't be Red at that program point. Furthermore, GADT type-refinement might mean that some alternatives are not reachable, and hence can be discarded.