Version 54 (modified by adamgundry, 8 months ago) (diff)


Overloaded record fields: implementation notes

Here be dragons. This page describes implementation details and progress on the implementation of the overloaded record fields plan. Development of the extension is taking place on forks of the ghc and packages-base repositories (on branch 'overloaded-record-fields'). A prototype implementation is also available.

The basic idea

The Has and Upd classes, and GetResult and SetResult type families, are defined in the module GHC.Records in the base package.

Typechecking a record datatype still generates record selectors, but their names have a $sel prefix and end with the name of their type. Moreover, instances for the classes and type families are generated. For example,

data T = MkT { x :: Int }


$sel_x_T :: T -> Int -- record selector (used to be called `x`)
$sel_x_T (MkT x) = x

$dfHasTx :: forall a . a ~ Int => Has T "x" a -- corresponds to the Has instance decl
$dfHasTx = Has { getField _ = $sel_x_T }

$dfUpdTx :: forall a . a ~ Int => Upd T "x" a -- corresponds to the Upd instance decl
$dfUpdTx = Upd { setField _ s e = s { x = e } }

axiom TFCo:R:GetResultTx : GetResult T "x" = Int   -- corresponds to the GetResult type family instance
axiom TFCo:R:SetResultTx : SetResult T "x" Int = T -- corresponds to the SetResult type family instance

The naming of cats


A field is represented by the following datatype, parameterised by the representation of names:

data FieldLbl a = FieldLabel {
      flOccName   :: OccName,   -- ^ Label of the field
      flSelector  :: a,         -- ^ Record selector function
      flInstances :: FldInsts a -- ^ Instances for overloading

type FieldLabel = FieldLbl Name

data FldInsts a = FldInsts { fldInstsHas :: a
                           , fldInstsUpd :: a
                           , fldInstsGetResult :: a
                           , fldInstsSetResult :: a }

Every field has a label (OccName), selector, and names for the dfuns and axioms (currently stored together in the FldInsts record, but this could change). The dcFields field of DataCon stores a list of FieldLabel, whereas the ifConFields field of IfaceConDecl stores a list of FieldLbl OccName. The motivation for storing the names of the pieces is to avoid dragging extendInteractiveContext into the monad with gresFromAvails.

AvailInfo and IE

The new definition of AvailInfo is:

data AvailInfo      = Avail Name | AvailTC Name [Name] AvailFields
data AvailFlds name = NonOverloaded [name] | Overloaded [OccName]
type AvailFields    = AvailFlds Name

The AvailTC constructor represents a type and its pieces that are in scope. Record fields are now stored in a separate list (the third argument). If the fields are not overloaded, we store the selector names, whereas if they are overloaded, we store only the labels.

AMG This isn't quite enough, because we need to know the module of the selectors. (Data families mean this need not be the same as the parent's module.) I'm inclined to use Overloaded [(OccName, name)] in the interests of simplicity, and because otherwise we have to go to some trouble to avoid gresFromAvails outside the monad.

The IEThingWith name [name] [OccName] constructor of IE, which represents a thing that can be imported or exported, stores only the OccNames.

Parent and GlobalRdrElt

The Parent type has an extra constructor FldParent Name OccName that stores the parent Name and the field OccName. The GlobalRdrElt (GRE) for a field stores the selector name directly, and uses the FldParent constructor to store the field. Thus a field x of type T gives rise this entry in the GlobalRdrEnv:

x |->  GRE $sel_x_T (FldParent T x) LocalDef

Note that the OccName used when adding a GRE to the environment (greOccName) now depends on the parent field: for FldParent it is the field label rather than the selector name. Since AvailInfo does not store selectors for overloaded fields, gresFromAvails is now defined in the TcRnIf monad so that it can call lookupOrig to find the selectors. As a consequence of this, GHC.getPackageModuleInfo cannot call gresFromAvails, so it now returns Nothing in minf_rdr_env.

Source expressions

The HsExpr type has extra constructors HsOverloadedRecFld OccName and HsSingleRecFld OccName id. When -XOverloadedRecordFields is enabled, and rnExpr encounters HsVar "x" where x refers to multiple GREs that are all record fields, it replaces it with HsOverloadedRecFld "x". When the typechecker sees HsOverloadedRecFld x it emits a wanted constraint Has alpha x beta and returns type alpha -> beta where alpha and beta are fresh unification variables.

When the flag is not enabled, rnExpr turns an unambiguous record field foo into HsSingleRecFld foo $sel_foo_T. The point of this constructor is so we can pretty-print the field name but store the selector name for typechecking.

Where an AST representation type (e.g. HsRecField or ConDeclField) contained an argument of type Located id for a field, it now stores a Located RdrName for the label, and some representation of the selector. The parser uses an error thunk for the selector; it is filled in by the renamer (by rnHsRecFields1 in RnPat, and rnField in RnTypes). The new definition of ConDeclField (used in types) is:

data ConDeclField name
  = ConDeclField { cd_fld_lbl  :: Located RdrName,
                   cd_fld_sel  :: name,
                   cd_fld_type :: LBangType name, 
                   cd_fld_doc  :: Maybe LHsDocString }

The new definition of HsRecField is:

data HsRecField id arg = HsRecField {
        hsRecFieldLbl :: Located RdrName,
        hsRecFieldSel :: Either id [(id, id)],
        hsRecFieldArg :: arg,
        hsRecPun      :: Bool }

The renamer (rnHsRecFields1) supplies Left sel_name for the selector if it is unambiguous, or Right xs if it is ambiguous (because it is for a record update, and there are multiple fields with the correct label in scope). In the latter case, the possibilities xs are represented as a list of (parent name, selector name) pairs. The typechecker (tcExpr) tries three ways to disambiguate the update:

  1. Perhaps only one type has all the fields that are being updated.
  1. Use the type being pushed in, if it is already a TyConApp.
  1. Use the type signature of the record expression, if it exists and is a TyConApp.

Automatic instance generation

Typeclass and family instances are generated and typechecked by makeOverloadedRecFldInsts in TcInstDecls, regardless of whether or not the extension is enabled. This is called by tcTopSrcDecls to generate instances for fields from datatypes in the current group (just after derived instances, from deriving clauses, are generated). Overloaded record field instances are not exported to other modules (via tcg_insts), though underlying dfun ids and axioms are exported from the module as usual.

Since the instances are not in scope in the usual way, matchClassInst and tcLookupFamInst look for the relevant constraints or type families and find the instances directly, rather than consulting tcg_inst_env or tcg_fam_inst_env. They first perform a lookup to check that the field name is in scope. A new field tcg_fld_inst_env in TcGblEnv maps a selector name in the current module to its DFunIds and FamInsts; this is needed for solving constraints that arise while checking the automatically generated instances themselves.

AMG The instance lookup is currently implemented as a separate check, but needs to be integrated with the existing code to properly handle some obscure cases.

Unused imports

Unused imports and generation of the minimal import list (RnNames.warnUnusedImportDecls) use a map from selector names to labels, in order to print fields correctly. However, fields may currently be reported as unused even if the corresponding Has instance is used. Consider the following:

module A where
  data T = MkT { x,y:Int }

module B where
  data S = MkS { x,y::Bool }

module C where
  import A( T(x) )
  import B( S(x) )

  foo :: T -> Int
  foo r = r.x + 2

Now, do we expect to report the 'x' in S(x) import as unused? Actually the entire 'import B' is unused. Only the typechecker will eventually know that. But I think the type checker does actually record which instances are used, so perhaps we can make use of that info to give accurate unused-import info.

AMG I thought this would be the tcg_ev_binds field of the TcGblEnv, but this seems to be empty by the end of tcRnModule. We could also look at the inert_solved_dicts field of InertSet, but I'm not sure how to propagate the required information out of the TcS monad to the TcM monad where unused names are reported.

Deprecated field names

Consider the following:

module M where
  {-# DEPRECATED foo "Don't use foo" #-}

  data S = MkS { foo :: Int }
  data T = MkT { foo :: Int }

module N where
  import M

  data U = MkU { foo :: Int }

  goo = foo (MkT 42) 
  bar = foo (MkU 42)
  baz x = foo x

The DEPRECATED pragma applies to all fields foo exported by the module M, since it is based on the OccName. The renamer will issue a deprecation warning for every use of foo in N, regardless of whether it will later resolve to one of the fields from M (as in goo), a field definitely not in M (as in bar), or a polymorphic field (as in baz). It might be possible to delay the warnings to type-checking time and report deprecations more precisely, as for unused imports.

GADT record updates

Consider the example

data W a where
    MkW :: a ~ b => { x :: a, y :: b } -> W (a, b)

It would be nice to generate

-- setField :: proxy "x" -> W (a, b) -> a -> W (a, b)
setField _ s e = s { x = e }

but this record update is rejected by the typechecker, even though it is perfectly sensible, because of #2595. The currently implemented workaround is instead to generate the explicit update

setField _ (MkW _ y) x = MkW x y

which is fine, but rather long-winded if there are many constructors or fields. Essentially this is doing the job of the desugarer for record updates.

Note that W does not admit type-changing single update for either field, because of the a ~ b constraint. Without it, though, type-changing update should be allowed.

Type-changing update: phantom arguments

Consider the datatype

data T a = MkT { foo :: Int }

where a is a phantom type argument (it does not occur in the type of foo). The traditional update syntax can change the phantom argument, for example if r :: T Int then r { foo = 3 } :: T Bool typechecks. However, setField cannot do so, because this is illegal:

type instance SetResult (T a) "foo" Int = T b

Note that the result of the type family involves an unbound variable b.

In general, a use of setField can only change type variables that occur in the field type being updated, and do not occur in any of the other fields' types.

Data families

Consider the following:

data family F (a :: *) :: *
data instance F Int  = MkF1 { foo :: Int }
data instance F Bool = MkF2 { foo :: Bool }

This is perfectly sensible, and should give rise to two *different* record selectors foo, and corresponding Has instances:

instance t ~ Int => Has (F Int) "foo" t
instance t ~ Bool => Has (F Bool) "foo" t

However, what can we call the record selectors? They can't both be $sel_foo_F! Ideally we would use the name of the representation tycon, rather than the family tycon, but that isn't introduced until the typechecker (tcDataFamInstDecl in TcInstDcls), and we need to create the selector in the renamer (getLocalNonValBinders in RnNames). We can't just pick an arbitrary unique name, because we need to look up the selector to associate it with its data constructor (extendRecordFieldEnv in RnSource).

For the moment, I've simply disallowed duplicate fields for a single data family in a single module. It's fine to duplicate fields between different data families or across different modules, however.

Qualified names

Consider the following:

module M where
  data S = MkS { foo :: Int }

module N where
  data T = MkT { foo :: Int }
  data U = MkU { foo :: Int }

module O where
  import M
  import N

  f x = x
  g x = x
  h x = foo x

Should there be a difference between f, g and h? It would seem odd if f could turn out to use the foo from T or U even though it explicitly says I can see three sensible options:

  • Treat qualified and unqualified fields identically, but issue a warning for qualified fields
  • Forbid referring to overloaded fields with qualified names (so and yield errors)
  • Treat a qualified name as a non-overloaded field, generating an ambiguity error if necessary (so is okay but is ambiguous)

Of course, it is fine to use a qualified name in a record update.

For now we've decided on the third option, allowing qualified names to refer only to a single field.

Mangling selector names

We could mangle selector names (using $sel_foo_T instead of foo) even when the extension is disabled, but we decided not to because the selectors really should be in scope with their original names, and doing otherwise leads to:

  • Trouble with import/export
  • Trouble with deriving instances in GHC.Generics (makes up un-renamed syntax using field RdrNames)
  • Boot files that export record selectors not working

Outstanding bugs

  • typechecker/should_fail/tcfail102 (changed error message)

To do

  • With fundep in class, we don't need it in the instance.
  • When there is only one thing in scope, don't do make it polymorphic (but document trade-offs). But maybe it should still support lenses?
  • Forbid ambiguous qualified overloaded fields.
  • Add HsVarOut RdrName id instead of HsSingleRecFld (or perhaps rename HsVar to HsVarIn); also useful to recall how the user referred to something.
  • Sort out reporting of unused imports.
  • Haddock prints selector names in index and LaTeX exports list.
  • What's going on with deprecations and fixity decls?
  • Consider syntactic sugar for Upd constraints.
  • Improve unsolved Accessor p f error message where p is something silly?
  • Consider defaulting Accessor p to p = (->), and defaulting Has r "f" t constraints where there is only one datatype with a field f in scope.
  • Document the extension, including new warnings.