Version 1 (modified by simonpj, 8 years ago) (diff)


[ Up: Commentary/Compiler/HscMain ]

Data types for Haskell entities: Id, TyVar, TyCon, DataCon, and Class

For each kind of Haskell entity (identifier, type variable, type constructor, data constructor, class) GHC has a data type to represent it. Here they are:

All of these entities have a Name, but that's about all they have in common. However they are sometimes treated uniformly:

  • A TyThing? (compiler/types/TypeRep.lhs) is simply the sum of all four:
    data TyThing = AnId     Id
    	     | ADataCon DataCon
    	     | ATyCon   TyCon
    	     | AClass   Class
    For example, a type environment is a map from Name to TyThing.

All these data types are implemented as a big record of information that tells you everything about the entity. For example, a TyCon contains a list of its data constructors; a DataCon contains its type (which mentions its TyCon); a Class contains the Ids of all its method selectors; and an Id contains its type (which mentions type constructors and classes).

So you can see that the GHC data structures for entities is a graph not tree: everything points to everything else. This makes it very convenient for the consumer, because there are accessor functions with simple types, such as idType :: Id -> Type. But it means that there has to be some tricky almost-circular programming ("knot-tying") in the type checker, which constructs the entities.

Type variables and term variables

Type variables and term variables are represented by a single data type, Var, thus (compiler/basicTypes/Var.lhs):

type Id    = Var
type TyVar = Var

It's incredibly convenient to use a single data type for both, rather than using one data type for term variables and one for type variables. For example:

  • Finding the free variables of a term gives a set of variables (both type and term variables): exprFreeVars :: CoreExpr -> VarSet.
  • We only need one lambda constructor in Core: Lam :: Var -> CoreExpr -> CoreExpr.

The Var type distinguishes the two sorts of variable; indeed, it makes somewhat finer distinctions (compiler/basicTypes/Var.lhs):

data Var
  = TyVar {
	varName    :: !Name,
	realUnique :: FastInt,		-- Key for fast comparison
					-- Identical to the Unique in the name,
					-- cached here for speed
	tyVarKind :: Kind }

  | TcTyVar { 				-- Used only during type inference
	varName        :: !Name,
	realUnique     :: FastInt,
	tyVarKind      :: Kind,
	tcTyVarDetails :: TcTyVarDetails }

  | GlobalId { 			-- Used for imported Ids, dict selectors etc
	varName    :: !Name,	-- Always an External or WiredIn Name
	realUnique :: FastInt,
   	idType     :: Type,
	idInfo     :: IdInfo,
	gblDetails :: GlobalIdDetails }

  | LocalId { 			-- Used for locally-defined Ids (see NOTE below)
	varName    :: !Name,
	realUnique :: FastInt,
   	idType     :: Type,
	idInfo     :: IdInfo,
	lclDetails :: LocalIdDetails }
is self explanatory.
is used during type-checking only. Once type checking is finished, there are no more TcTyVars.
is used for term variables bound in the module being compiled. More specifically, a LocalId is bound either within an expression (lambda, case, local let), or at the top level of the module being compiled.
  • The IdInfo of a LocalId may change as the simplifier repeatedly bashes on it.
  • A LocalId carries a flag saying whether it's exported. This is useful for knowing whether we can discard it if it is not used.
    data LocalIdDetails 
      = NotExported	-- Not exported; may be discarded as dead code.
      | Exported	-- Exported; keep alive
is used for fixed, immutable, top-level term variables, notably ones that are imported from other modules.
  • A GlobalId always has an External or WiredIn Name, and hence has a Unique that is globally unique across the whole of a GHC invocation.
  • The IdInfo of a GlobalId is completely fixed.
  • All implicit Ids (data constructors, class method selectors, record selectors and the like) are are GlobalIds from birth, even the ones defined in the module being compiled.
  • When finding the free variables of an expression (exprFreeVars), we only collect LocalIds and ignore GlobalIds.

All the value bindings in the module being compiled (whether top level or not) are LocalIds until the CoreTidy phase. In the CoreTidy phase, all top-level bindings are made into GlobalIds. This is the point when a LocalId becomes "frozen" and becomes a fixed, immutable GlobalId.

GlobalIdDetails and implict Ids

GlobalIds are further classified by their GlobalIdDetails. This type is defined in compiler/basicTypes/IdInfo, because it mentions other structured types such as DataCon. Unfortunately it is used in Var.lhs so there's a hi-boot knot to get it there. Anyway, here's the declaration (elided a little):

data GlobalIdDetails
  = VanillaGlobal		-- Imported from elsewhere, a default method Id.
  | RecordSelId { ... }		-- Record selector
  | DataConWorkId DataCon	-- The Id for a data constructor *worker*
  | DataConWrapId DataCon	-- The Id for a data constructor *wrapper*
  | ClassOpId Class		-- An operation of a class
  | PrimOpId PrimOp		-- The Id for a primitive operator
  | FCallId ForeignCall		-- The Id for a foreign call
  | NotGlobalId			-- Used as a convenient extra return value from globalIdDetails

Some GlobalIds are called implicit Ids. These are Ids that are defined by a declaration of some other entity (not just an ordinary variable binding). For example:

  • The selectors of a record type
  • The method selectors of a class
  • The worker and wrapper Id for a data constructor

It's easy to distinguish these Ids, because the GlobalIdDetails field says what kind of thing it is: Id.isImplicitId :: Id -> Bool.