wiki:Records/OverloadedRecordFields/Plan

Version 14 (modified by adamgundry, 10 months ago) (diff)

tell the latest story

Overloaded record fields: a plan for implementation

This is a plan to implement overloaded record fields, along the lines of SPJ's Simple Overloaded Record Fields proposal, as a Google Summer of Code project. (See the GSoC project details, for reference.) The page on Records gives the motivation and many options. In particular, the proposal for Declared Overloaded Record Fields is closely related but makes some different design decisions.

Development of the extension is taking place on forks of the ghc and packages-base repositories (on branch 'overloaded-record-fields').

Motivation

A serious limitation of the Haskell record system is the inability to overload field names in record types: for example, if the data types

data Person  = Person  { personId :: Int, name :: String }
data Address = Address { personId :: Int, address :: String }

are declared in the same module, there is no way to determine which type an occurrence of the personId record selector refers to. A common workaround is to use a unique prefix for each record type, but this leads to less clear code and obfuscates relationships between fields of different records. Qualified names can be used to distinguish record selectors from different modules, but using one module per record is often impractical.

Instead, we want to be able to write polymorphic record projections, so that the ambiguous identifier personId is resolved using the type of e. In general, this requires a new form of constraint r { x :: t } stating that type r has a field x of type t. For example, the following declaration should be accepted:

getPersonId :: r { personId :: Int } => r -> Int
getPersonId e = personId e

A constraint R { x :: t } is solved if R is a datatype that has a field x of type t in scope. An error is generated if R has no field called x, it has the wrong type, or the field is not in scope.

Design

In the sequel, we will describe the -XOverloadedRecordFields extension, which permits multiple field declarations with the same label and introduces new record field constraints.

Previous versions of this proposal suggested changing the lexical syntax so that record projections could be written postfix, immediately following a dot. For example, e.personId would be roughly equivalent to personId e. This would be a breaking change (when the extension was enabled) as composition would need spaces around the dot operator. However, it would mean that the field name would not have to be in scope, allowing better library separation. For example, e.personId would be valid even if no personId fields were in scope.

In the light of feedback, we propose no changes to dot syntax for the time being. In the future, we could add a separate extension to treat dot as postfix function application. Note that the lens library encourages the use of dot with no spaces, as composition is used to chain lenses.

Projections

Record field constraints are introduced by projections. If there are two or more fields x in scope, then an occurrence of x has type a { x :: b } => a -> b instead of generating an ambiguity error. If there is a single field x in scope, then it refers to the usual monomorphic record selector (ensuring backwards compatibility). If there are any normal identifiers x in scope (as well as fields) then a use of x leads to an ambiguity error.

Record field constraints

Record field constraints r { x :: t } are syntactic sugar for typeclass constraints Has r "x" t, where

class Has (r :: *) (x :: Symbol) (t :: *) where
  getFld :: r -> t

Recall that Symbol is the kind of type-level strings. The notation extends to conjunctions: r {x :: tx, y :: ty} means (Has r "x" tx, Has r "y" ty). Note also that r and t might be arbitrary types, not just type variables or type constructors. For example, T (Maybe v) { x :: [Maybe v] } means (Has (T (Maybe b)) "x" [Maybe v]).

Instances for the Has typeclass are automatically generated (for modules with -XOverloadedRecordFields enabled) using the record fields that are in scope. For example, the data type

data T a = MkT { x :: [a] }

has the corresponding instance

instance (b ~ [a]) => Has (T a) "x" b where
  getFld (MkT { x = x }) = x

The (b ~ [a]) in the instance is important, so that we get an instance match from the first two fields only. For example, if the constraint Has (T c) "x" d is encountered during type inference, the instance will match and generate the constraints (a ~ c, b ~ d, b ~ [a]).

Representation hiding

At present, a datatype in one module can declare a field, but if the selector function is not exported, then the field is hidden from clients of the module. It is important to support this. Typeclasses in general have no controls over their scope, but for implicitly generated Has instances, the instance is available for a module if -XOverloadedRecordFields is enabled for that module and the record field selector function is in scope.

This enables representation hiding: just like at present, exporting the field selector permits access to the field. For example, consider the following module:

module M ( R(x) ) where

data R = R { x :: Int }
data S = S { x :: Bool }

Any module that imports M will have access to the x field from R but not from S, because the instance Has R "x" Int will be available but the instance Has S "x" Bool will not be. Thus R { x :: Int } will be solved but S { x :: Bool } will not.

Multiple modules and automatic instance generation

Note that Has instances are generated on a per-module basis, using the fields that are in scope for that module, and automatically generated instances are never exported. Thus it doesn't matter whether -XOverloadedRecordFields was on in the module that defined the datatype. The availability of the instances in a particular module depends only on whether the flag is enabled for that module.

Suppose module M imports module N, N imports module O, and only N has the extension enabled. Now N can project any field in scope (including those defined in O), but M cannot access any Has instances.

This means that

  • the extension is required whenever a Has constraint must be solved;
  • no new mechanism for hiding instances is required; and
  • records defined in existing modules (or other packages) without the extension can still be overloaded.

Higher-rank fields

If a field has a rank-1 type, the Has encoding works fine: for example,

data T = MkT { x :: forall a . a -> a }

gives rise to the instance

instance (b ~ a -> a) => Has T "x" b

However, if a field has a rank-2 type or higher (so the selector function has rank at least 3), things are looking dangerously impredicative:

data T b = MkT { x :: (forall a . a -> a) -> b }

would give

instance (c ~ ((forall a . a -> a) -> b)) => Has (T b) "x" c

but this is currently forbidden by GHC, even with -XImpredicativeTypes enabled. Indeed, it would not be much use if it were possible, because bidirectional type inference relies on being able to immediately infer the type of neutral terms like x e, but overloaded record fields prevent this. Non-overloaded field names are likely to be needed in this case.

Record update

Supporting polymorphic record update is rather more complex than polymorphic lookup. In particular:

  • the type of the record may change as a result of the update;
  • multiple fields must be updated simultaneously for an update to be type correct (so iterated single update is not enough); and
  • records may include higher-rank components.

These problems have already been described in some detail. In the interests of doing something, even if imperfect, we plan to support only monomorphic record update. For overloaded fields to be updated, a type signature may be required in order to specify the type being updated. For example,

e { x = t }

currently relies on the name x to determine the datatype of the record. If this is ambiguous, a type signature can be given either to e or to the whole expression. Thus either

  e :: T Int { x = t }

or

  e { x = t } :: T Int

will be accepted. (Really only the type constructor is needed, whereas this approach requires the whole type to be specified, but it seems simpler than inventing a whole new syntax.)

Design choices

Record update: avoiding redundant annotations

If e is a variable, whose type is given explicitly in the context, we could look it up rather than requiring it to be given again. Thus

f :: T Int -> T Int
f v = v { x = 5 }

would not require an extra annotation. On the other hand, we would need an annotation on the update in

  \v -> (v { x = 4 }, [v, w :: T Int])

because the type of v is only determined later, by constraint solving.

Annoyingly, nested updates will require some annotations. In the following example, the outer update need not be annotated (since v is a variable that is explicitly given a type by the context) but the inner update must be (since x v is not a variable):

  f :: T Int -> T Int
  f v = v { x = (x v){ y = 6 } }

Polymorphic record update for lenses

As noted above, supporting a polymorphic version of the existing record update syntax (in its full generality) is difficult. However, even if the existing record update syntax remains monomorphic, an additional motivation for polymorphic update comes from lens. If we automatically generate instances of an extra class like

class Set (r :: *) (x :: Symbol) (t :: *) where
  setFld :: r -> t -> r

and supply the instance (where &x is used for explicit type application of the x argument)

instance (Functor f, Has s x a, Set s x a, fs ~ f s, fa ~ f a) => Has (a -> fa) x (s -> fs) where
  getFld f r = setFld &s &x &a r <$> f (getFld &s &x &a r) 

then every record field (for which a Set instance can be generated) is automagically a lens. This reduces the need for the current name-mangling Template Haskell implemented in the lens library. (Note that this instance requires explicit type application, or a proxy-based workaround, in order to supply the x argument.)

More work is needed to identify the right way to formulate the Set class: type-changing update requires a slightly more general version, and there is a story for multiple update. Higher-rank fields remain problematic.

User-defined Has instances

Should the user be allowed to write explicit Has instances? For example:

instance ctx => Has r "x" t where
  getFld = blah :: r -> t

Even with an explicit Has instance as above, the name x will not be in scope unless a datatype has a field with name x. Thus it is not really useful. The previous proposal, where (.x) always meant "project out the x field", used explicit Has instances for virtual fields.

Hiding record selectors

Optionally, we could add a flag `-XNoMonoRecordFields` to suppress the record selectors. Just as -XOverloadedRecordFields applies to a client module, and generates Has instances for that module, so -XNoMonoRecordFields in a client module would hide all the record selectors that should otherwise be in scope. The idea is that another record system could use Template Haskell to generate functions in place of selectors, and these would not clash.

Since the selectors are hidden by clients (on import) rather than on export, fields can still be mentioned in import and export lists, to control access to them (as discussed in the representation hiding section), and used for record update.

Example of constraint solving

Consider the example

module M ( R(R, x), S(S, y), T(T, x) ) where

  data R = R { x :: Int }
  data S = S { x :: Bool, y :: Bool }
  data T = T { x :: forall a . a }

module N where
  import M

  foo e = x e
 
  bar :: Bool
  bar = foo T

  baz = foo S

  quux = y

When checking foo, e is a variable of unknown type alpha, and the projection generates the constraint alpha { x :: beta } where beta is fresh. This constraint cannot be solved immediately, so generalisation yields the type a { x :: b } => a -> b.

When checking bar, the application of foo gives rise to the constraint T { x :: Bool }, which is solved since Bool is an instance of forall a . a (the type T gives to x).

When checking baz, the constraint S { x :: gamma } is generated and rejected, since the x from S is not in scope.

When checking quux, the only y field in scope is of type S -> Bool so that is its type.