Changes between Version 44 and Version 45 of TypeFunctions


Ignore:
Timestamp:
Aug 9, 2006 6:02:08 PM (8 years ago)
Author:
chak
Comment:

--

Legend:

Unmodified
Added
Removed
Modified
  • TypeFunctions

    v44 v45  
    6060== How It Works == 
    6161 
    62 The details of the implementations are split over a couple of subpages, due to the amount of the material: 
     62The details of the implementation are split over a couple of subpages, due to the amount of the material: 
    6363 * [wiki:TypeFunctionsSyntax syntax and representation,] 
    6464 * [wiki:TypeFunctionsRenaming renaming,] 
    6565 * [wiki:TypeFunctionsTypeChecking type checking,] and 
    66  * [wiki:TypeFunctionsCore representation in Core and interfaces.] 
    67  
    68 === Syntax of kind signatures and definitions of indexed types === 
    69  
    70 All kind signature consists of a type declaration head followed by a `::` and a kind.  In the case of  a data declaration, we addititonally require that there is no `where` clause.   We require for every definition of an indexed type (i.e., type equations or indexed data/newtype declaration) that a matching kind signature is in scope.  Vanilla type synonym definitions and data/newtype declarations fall out as special cases of type function equations and indexed type declarations that have variable-only patterns, for which we require no kind signatures.  The vanilla forms are also closed (further definitions would be useless, as they are bound to overlap). 
    71  
    72 === Representation of indexed types === 
    73  
    74 ==== Kind signatures ==== 
    75  
    76 `HsDecls.TyClDecl` has a new variant `TyFunction` to represent signatures of type functions.  These consist of the name, type parameters, an iso flag, and optionally an explicit result kind.  The type parameters can have kind signatures as usual.   
    77  
    78 Signatures for indexed data and newtypes are represented as a special case of `TyData`, namely when `TyData` has a kind signature, but no constructors.  
    79  
    80 We recognise both forms of kind signatures by the predicate `HsDecls.isKindSigDecl`. 
    81  
    82 ==== Definitions of indexed types ==== 
    83  
    84 We represent type functions and indexed data and newtypes by generalising type synonym declarations `TySynonym` and data type declarations `TyData` to allow patterns ofr type indexes instead of just type variables as parameters.  In both variants, we do so by way of the field `tcdPats` of type `Maybe [LHsType name]`, used as follows: 
    85  * If it is `Nothing`, we have a ''vanilla'' data type declaration or type synonym declaration and `tcdVars` contains the type parameters of the type constructor. 
    86  * If it is `Just pats`, we have the definition of an a indexed type (toplevel or nested in an instance declarations).  Then, 'pats' are type patterns for the type-indexes of the type constructor and `tcdVars` are the variables in those patterns.  Hence, the arity of the type constructor is `length tcdPats` and ''not'' `length tcdVars`. 
    87 In both cases (and as before we had type functions), `tcdVars` collects all variables we need to quantify over. 
    88  
    89 ==== Parsing and AST construction ==== 
    90  
    91 The LALR parser allows arbitrary types as left-hand sides in '''data''', '''newtype''', and '''type''' declarations.  The parsed type is, then, passed to {{{RdHsSyn.checkTyClHdr}}} for closer analysis (possibly via `RdHsSyn.checkSynHdr`).  It decomposes the type and, among other things, yields the type arguments in their original form plus all type variables they contain.  Subsequently, {{{RdrHsSyn.checkTyVars}}} is used to either enforce that all type arguments are variables (second argument is `False`) or to simply check whether the type arguments are variables (second argument `True`).  If in enforcing mode, `checkTyVars` will raise an error if it encounters a non-variable (e.g., required for class declarations).  If in checking mode, it yields the value placed in the `tcdPats` field described above; i.e., returns `Nothing` instead of the type arguments if these arguments are all only variables. 
    92  
    93 === Representation of associated types === 
    94  
    95 We add type declarations to class declarations and instance declarations by a new field, of type `[LTyClDecl]`, to both `TyClDecl.ClassDecl` (known by the field name `tcdATs`) and `TyClDecl.InstDecl`.  For classes, this new field contains values constructed from `TyData`, `TyFunction`, and `TySynonym`, whereas for instances, we only have `TyData` and `TySynonym`.  This is due to (a) `TyData` representing both signatures and definitions of associated data types (whereas the two are split into `TyFunction` and `TySynonym` for associated synonyms) and (b) associated synonyms having default definitions, which associated data types do not possess. 
    96  
    97 === Phasing === 
    98  
    99 GHC is organised such that class and type declarations are processed (during renaming and type checking) before any instance declarations are considered.  In the presence of associated types, instance declarations may contain type definitions.  In particular, the ''data constructors'' introduced by associated data declarations need to be brought into scope before we can rename any expressions. 
    100  
    101 === Renaming of indexed types === 
    102  
    103 ==== Kind signatures ==== 
    104  
    105 Kind signatures are renamed by `RnSource.rnTySig`, which is parametrised by a function that handles the binders (i.e., index variables) of the declaration.  This is so that we can use the same code for toplevel signatures and those in classes.  In the former case, the variables are in a defining position, whereas in classes they are in a usage position (as all index variables must be class parameters). 
    106  
    107 ==== Definitions of indexed types ==== 
    108  
    109 There is little extra that needs to be done for indexed types.  The main difference between vanilla synonyms and data/newtype declarations and the indexed variants is that the `tcdTyPats` field is not `Nothing`.  We simply call `rnTyPats` on these fields, which traverses them in the usual way. 
    110  
    111 ==== Renaming of associated types ==== 
    112  
    113 Associated '''data''' definitions are particularly interesting, as they not only introduces, but also value level entities, namely the data constructors.  During renaming, we enter the names of all data constructors that an associated data type defines into the global `RdrName` environment by extending the function `RnNames.getLocalDeclBinders` such that it traverses instance declarations, too.  We are careful not to add the data type constructor multiple times by ignoring them in instance declarations.  The global `RdrName` environment only ever contains the type constructor introduced in the class declaration (i.e, the `RdrName` of an associated data type maps to the `Name` of the AT declaration in the class). 
    114  
    115 ---- 
    116 '''Remaining problem:''' The function `getLocalDeclBinders` must still supply the parent `Name` to the name generation for the data constructors. That parent name should be the one produced for the associated data declaration in the corresponding class declaration, which is hard to get hold of at this moment. So, we supply the Name of the data type constructor instead. That should probably be replaced by the class name in a later phase. 
    117 ---- 
    118  
    119 Otherwise, `RnSource.rnSrcInstDecl` invokes `RnSource.rnTyClDecl` on all associated types of an instance to rename them. 
    120  
    121 ==== Lifting of associated type definitions out of instances ==== 
    122  
    123 In the current implementation, `RnSource.rnSrcDecl` (which is only called by `RnSource.rnSrcDecls`) duplicates all definitions of associated types '''after''' renaming them.  It does so by adding them to the type and class declarations (i.e., `hs_tyclds`) of the currently processed binding group, but also keeps a copy in the instance declarations, were they are needed during type checking to perform some well-formedness checks (e.g., that each AT of a class receives a definition). 
     66 * [wiki:TypeFunctionsCore representation in Core and in interfaces.] 
    12467  
    125 NB: Lifted associated type declarations inherit the context of the instance head. However, the variables of the data declaration are renamed independently of those of the instance head (which implies that the inherited copy of the instance context is renamed again as part of the data declaration). 
    126  
    127 ---- 
    128 '''Open Points:'''  
    129  * Do we really want to copy associated types in `rnSrcDecl` into the toplevel of the binding group?  On one hand, general GHC design priciples discourages moving any code around before type checking has been completed.  On the other hand, by lifting data declarations out before type checking, we have to worry less about phasing.  (NB: Associated type signatures in class declarations are less of an issue as classes are very much treated like type declarations anyway - being in `TyClDecl` and all - and so are usually around when we need to get at their embedded types.) 
    130  * In case, we leave the duplication of ATs after renaming as it is, do we still want to add the context to lifted AT definitions?  Strictly speaking, this is not necessary under the new translation scheme.  However, morally it might still be the right thing, as the constructors are declared under that context. 
    131 ---- 
    132  
    133 === Kind checking indexed data types === 
    134  
    135 The workhorse of kind checking type and class declarations is `TcTyClDecls.kcTyClDecls`, which is invoked by `TcTyClDecls.tcTyClDecls` once per binding group.  It handles type synonyms different from algebraic data type declarations and classes, as synonyms have a richer kind structure (making kind checking harder), but cannot be recursive (which makes kind checking easier).  Indexed types present yet a different set of trade offs as they are guaranteed to come with kind signatures, but have type patterns and not necessarily all definitions visible at once. 
    136  
    137 Indexed types (including type functions!) are generally included in the processing of algebraic data types and classes.  However, we handle kind signatures slightly different from definitions of members of the indexed family.  More precisely, kind signatures participate in the construction of the initial kind environment (as performed by `getInitialKind`), whereas the definition of the members of an indexed type does not.  Otherwise, the two are treated the same. 
    138  
    139 === Type checking indexed data types === 
    140  
    141 Type checking in the presence of only indexed data and newtypes is much simpler than in the presence of type functions as type equality remains purely syntactic (i.e., we do not need to change the unification procedure).  However, we need to check that the alternatives of a case expression inspecting an indexed data type contains only constructors of one member of the family.  (To relax this restriction, we would need a story for compiling open data types.) 
    142  
    143 ==== Desugaring indexed data types ==== 
    144  
    145 The kind signature of an indexed data type 
    146 {{{ 
    147 data T (a1::<kind1>) .. (an::<kindn>) :: <kind> 
    148 }}} 
    149 turns into an F,,C,, type function declaration 
    150 {{{ 
    151 type T_n : <kind1> -> .. -> <kindn> -> <kind> 
    152 }}} 
    153 A member of an indexed data type 
    154 {{{ 
    155 data T t1 .. tn b1 .. bm = <constructors> 
    156 }}} 
    157 turns into an equality axiom and a vanilla data declaration 
    158 {{{ 
    159 axiom cTinst : (forall c1..cr. T_n t1 .. tn) :=: (forall c1..cr. Tinst c1 .. cr) 
    160 data Tinst c1 .. cr b1 .. bm = <constructors> 
    161 }}} 
    162 where the `ci` are the free variables of the `tj`.  Moreover, we morally replace all occurences of `T` in the rest of the program by `T_n`.  No such replacement is required in the actual implementation as the arity index at F,,C,, type functions is just a formal device used in the formal development.  In the implementation, it is perfectly fine to retain the original name and maintain the arity information separately. 
    163  
    164 Neverthless, we need to generate a new name for the vanilla data types representing family members (i.e., `Tinst` above).  We use a similar mechanism as for the generation of the dictionary type constructors of type classes.  In particular, we generalise the field `algTcClass` of the internal representation for datatypes, `TyCon.AlgTyCon`, to be three valued: none, `Class` for data types representing dictionaries, and <which structure?> for data types representing members of a family. 
    165  
    166 ==== Inserting coercions ==== 
    167  
    168 To ensure that the F,,C,, code generated by the above desugaring still type checks, we need to introduce cast expressions using `cTinst` to move between the indexed type `T_n` and the representation types, such as `Tinst`, of its members.  The simplicity of type checking and desugaring indexed data types - as opposed to general type functions - is due to the locations where these casts need to be added being well defined.  More precisely, there are two different kinds of locations corresponding to the introduction and elimination of indexed data types: 
    169  1. Wrappers for data constructors introduce indexed types. 
    170  2. Case expressions scrutinising indexed types eliminate them. 
    171  
    172 ==== Wrappers for indexed data types ==== 
    173  
    174 The wrapper of a data constructor acts as an impedance matcher between the source-level signatures of the constructor and its actual representation; in particular, it evaluates strict arguments and unboxes flattened arguments.  In the case of a constructor for an indexed data type, it additionally has to apply the coercion between the type function representing the source type and its representation as a vanilla data type.  So, for example, if we have (continuing the example from above) 
    175 {{{ 
    176 data T t1 .. tn b1 .. bm = C s1 .. sk 
    177 }}} 
    178 then we generate a wrapper 
    179 {{{ 
    180 C = /\c1..cr b1..bm -> 
    181     \x1..xk -> 
    182       Con C [c1,..,cr,b1,..,bm] [x1,..,xk] |> sym (cTinst@c1..@cr b1 .. bm) 
    183 }}} 
    184 The generation of constructor wrappers is performed by `MkId.mkDataConIds`. 
    185  
    186 ==== Case expressions for indexed data types ==== 
    187  
    188 When we scrutinise an indexed type in a case expression, we need to first cast it to the vanilla data type representing the family member from which the constructors guarding the alternatives are drawn.  (This implies that we cannot have any case expression mixing constructors from two or more family members.  In fact, if we had that capability, we would have open GADT definitions in the Löh/Hinze sense.) 
    189  
    190 So, whether we need to cast the scrutinee of a case expression depends on the constructors appearing in the alternatives, which are type checked by `TcPat.tcConPat`.  This function uses `TcUnify.boxySplitTyConApp` to match the type of the scrutinee against the result type of the data constructor.  In the case of GADTs and indexed types, this is not just a matter of extracting the arguments from the type constructor application, but we need to match against type patterns.  This matching is already conveniently performed by the code for GADTs. 
    191  
    192 If the data constructor is from an indexed type, we need to propagate a coercion (to be applied to the scrutinee) outwards.  For this, GHC also already has a mechanism, namely the variant `CoPat` of `HsPat.Pat`.  It enables us to attach a coercion function, of type `HsBinds.ExprCoFun`, to a pattern, which the desugarer will pick up in `Match.matchCoercion` and apply to the match variable of the case expression. 
    193  
    194 `ExprCoFun` represents, besides coercions due to type instantiation, also type equality coercions of type `Coercion.Coercion`.  We use them for coercions that are exactly the converse of the coercion used in the wrapper of the data constructor of the current case alternative.  (There is also an equivalent of `CoPat` for expressions, namely `HsCoerce` of `HsExpr.HsExpr`.) 
    195  
    196 === Type checking associated types === 
    197  
    198 ==== Class declarations ==== 
    199  
    200  * As part of the knot tying exercises in `TcTyClsDecls.tcTyAndClassDecls`, we extract all AT declarations from classes and add them to the list of class and data type declarations to be processed. This ensures that AT declarations participate in kind checking and that they are entered into the global type-checker environment. 
    201  * We do ''not'' update the data declarations within class declarations (field `tcdATs` within `ClassDecl`), as the `Class.Class` structure produced by the type checker only contains the `Id`s of a classes associated types. 
    202  * We check that we have -fglasgow-exts. 
    203  
    204 ==== Instance declarations ==== 
    205  
    206 We need to handle ATs in `TcInstDcls.tcInstDecls1`, which is where the type information in instances - i.e., in vanilla Haskell just the instance heads - are processed. 
    207  
    208 === Representation of type functions after type checking === 
    209  
    210 Type functions have a number of properties in common with class instances; in particular, they require a machinery for matching type patterns against types, as instance heads do during context simplification.  Hence, we probably want some structure similar to `InstEnv.Instance` for type functions - for instances this is maintained in the field `iSpec` of `TcEnv.InstInfo` (for type functions we don't need anything like `iBinds` as they are pure type-level entities).  If possible, it would be ideal if we can reuse (or generalise) some of the matching machinery for instance heads. 
    211  
    212 The essentials of a module after type checking are in `HscTypes.ModGuts`; in particular, we have two fields `mg_insts :: [Instance]` and `mg_binds :: [CoreRule]` containing all instance heads and all rewrite rules respectively.  Similarly, we now want something like `mg_tyequa :: [TyEqua]` to represent all type euqations.   
    21368 
    21469== Possible Extensions ==