wiki:TypeFunctions

Version 5 (modified by chak, 8 years ago) (diff)

--

Type Functions and Associated Types in GHC - The Master Plan

This page serves as a collection of notes concerning the implementation of type functions and associated types, especially about the implications for type checking, interface files, and FC intermediate code generation.

Aims

New features:

  • Toplevel type function definitions.
  • Associated data types and type synonyms in classes, where the latter are eseentially type function definitions spread across the instances of the associated class. Associated types are essentially syntactic sugar for general type functions.

Revised features

  • We may want to re-implement functional dependencies using associated type synonyms.

We keep track of the current implementation status.

Specification and Restrictions

Refinement of the specification in the Beyond Associated Types paper. (I'll actually link this paper here once it is a bit more coherent.)

  • Kind signatures can only occur on the type variables that are in excess of the class parameters in an associated type declaration (in a type class declaration). Rationale: The binding position for the class parameters is the class head. That's where the signatures should be.
  • Associated data type definitions (in instances) can have kind signatures at type variables occuring in the head. These signatures must coincide with those of the instance head (for the class parameters) and those of the associated data type declarations (for the excess parameters). Rationale: In contrast to class declarations, we don't regard the instance head as binding variables in the body.
  • The declaration of an associated data type in a class cannot have a context. Rationale: We don't want a context constraining class parameters for the same reason that we don't want that on function signatures. A context on additional arguments to the data declaration would be feasible, but doesn't seem worth the trouble. This is a pre-FC restriction that needs to be removed from the currect code base, before being taken off the wiki.
  • The declaration of an associated data type in a class can have a deriving clause. The meaning is that all instances of that type inherit all these derivings (or do we merely want to force them to state - at least - these derivings). Rationale: If I want equality on an associated type, we need to guarantee that all its variants come with an equality.

Restrictions:

  • We currently don't allow associated GADTs. I cannot see any fundamental problem in supporting them, but I want to keep it simple for the moment. (When allowing this, a constructor signature in an associated GADT can of course only refine the instantiation of the type arguments specific to the instance in which the constructor is defined.)
  • We currently don't have toplevel data definitions with type patterns. They would essentially be open GADTs, which we probably can type check with the existing GADT machinery and translate much as we translate associated data types in classes. Again, I want to avoid doing too much in the first sweep.

How It Works

Type declarations in classes and indexed types

Adding types declarations to classes is fairly straight forward. The ClassDecl variant of TyClDecl gets a new field tcdATs, which contains a list of type declarations - currently, the parser will only allow data type declarations. Similarly, InstDecl gets a fourth argument, which is a list of type declarations.

More tricky is the addition of type indexes (i.e., non-type variable arguments) to data type declarations. The grammar is already very general and allows arbitrary arguments, but the parser uses RdHsSyn.checkTyClHdr to construct the AST and that function ensures that only type variables are supplied. The new story is that checkTyClHdr can operate in two different modes: (1) checking mode and (2) extraction mode. Checking mode corresponds to the original behaviour. In extraction mode, all free type variables of the arguments will be collected, but we don't enforce that the arguments are themselves merely type variables. When processing class headers (and for the moment also type synonyms), we use checkTyClHdr in checking mode as before. However, when processing a data type (or newtype) declaration, we use extraction mode and keep both the list of type variables (as tcdTyVars) and the original arguments (as tcdTyPats) in the representation of data type declarations (i.e., in the variant TyData of TyClDecl). The check enforcing that all arguments to top-level data type declarations and the non-class parameter arguments of associated data types are variables is delayed until the renamer (as we need context information). In the renamer, we check that non-variable type parameters can only occur in the first few arguments of ATs and we remove these parameters by floating the associated data type declarations to the top-level.

In the parser, we put the original type terms specified as parameters in the field tcdTyPats. For top-level declarations, after checking that the parameters are all plain type variables (possibly with a kind signature), we reset tcdTyPats to Nothing (this already happens during AST construction). DataDecls created during parsing Core are already born with tcdTyPats being Nothing. (Although, the latter may change.)

Phasing

GHC is organised such that class and type declarations are processed (during renaming and type checking) before any instance declarations are considered. The problem now is that instance declarations may contain type declarations; hence, anything that may depend on a type declaration can now also depend on an instance declaration. We solve that by lifting associated data types out of instances before renaming (and hence also before type checking of type and class declarations).