Changes between Version 20 and Version 21 of Records/OverloadedRecordFields


Ignore:
Timestamp:
Mar 2, 2014 1:54:44 PM (18 months ago)
Author:
adamgundry
Comment:

Overview of new OverloadedRecordFields extension

Legend:

Unmodified
Added
Removed
Modified
  • Records/OverloadedRecordFields

    v20 v21  
    1 = Simple Overloaded Record Fields (SORF) =
     1= OverloadedRecordFields =
    22
    3 This page summarises a possible design that would allow
    4 different records to share a single field label.  Although it's a simple enough
    5 idea there are numerous ramifications.  Records are a swamp!
     3The `OverloadedRecordFields` extension for GHC allows multiple record datatypes to share the same field names, and uses type information to disambiguate them. For more information, see:
    64
    7 '''Nevertheless, this is the simplest proposal that I know that satisfies the main
    8 request, that of having multiple records with the same field name.'''
     5 * [wiki:Records/OverloadedRecordFields/Design Design of the extension] (for power users)
     6 * [wiki:Records/OverloadedRecordFields/Implementation Notes on the implementation] (for GHC hackers)
    97
    10 See also a similar [http://research.microsoft.com/en-us/um/people/simonpj/Haskell/records.html 2003 proposal by Simon PJ and Greg Morrisset].  It is essentially the same as the proposal below, but (a) has less detail and (b) adds anonymous record types.   Anonymous type could be an add-on feature to the design described here.
     8Content previously on this page has been moved to the [wiki:Records/OverloadedRecordFields/SORF SORF] page.
    119
    12 '''AMG note''': the [wiki:Records/OverloadedRecordFields/Plan version of this proposal that I am implementing] is described elsewhere. The linked page will be updated to reflect the final design as it evolves.
     10== Code ==
    1311
    14 = The base design =
     12Development of the extension is taking place on forks of the [https://github.com/adamgundry/ghc ghc], [https://github.com/adamgundry/packages-base packages-base] and [https://github.com/adamgundry/haddock haddock] repositories (on branch `overloaded-record-fields`). A [https://github.com/adamgundry/records-prototype prototype implementation] is also available.
    1513
    16 The '''base design''' has the following distinct components:
     14The plan is to merge these branches shortly after the release of GHC 7.8, so power users can try out the extension and we can refine the design. The extension should be stable and available for general use in GHC 7.10.
    1715
    18  * A library class
    19 {{{
    20 class Has (r :: *) (f :: String) (t :: *) where
    21   get :: r -> t
    22 }}}
     16== History ==
    2317
    24  * A record declaration generates an instance declaration for each field. For example
    25 {{{
    26 data T a = T1 { x :: a, y :: Bool }
    27          | T2 { x :: a }
    28 }}}
    29  would generate
    30 {{{
    31 instance (t~a) => Has (T a) "x" t where
    32   get (T1 x _) = x
    33   get (T2 x)   = x
    34 instance (t~Bool) => Has (T a) "y" t where
    35   get (T1 _ y) = y
    36 }}}
     18The extension was implemented in 2013 as a Google Summer of Code project, by Adam Gundry under the mentorship of Simon Peyton Jones.
    3719
    38  * Dot-notation would typecheck (and desugar to) as a call of the `get` method.  So `r.x` would typecheck just as if you had written `(get @ T _ @ "x" @ _ r)`, where the `@` means type application, and `_` means "some type".  (GHC doesn't support explicit type appplication yet, but you get the idea.)
    39 
    40 Each of these components embodies choices which are described, with their variants, below.
    41 
    42 Note that the proposal naturally supports record-polymorphic functions, such as
    43 {{{
    44 fullName :: (Has r "firstName" String, Has r "lastName" String)
    45          => r -> String
    46 fullName r = r.firstName ++ " " ++ r.lastName
    47 }}}
    48 admittedly with a rather verbose type (but see below).
    49 
    50 -----------------------------------
    51 = Design choices and variations =
    52 
    53 == The `String` type parameter to `Has` ==
    54 
    55 In the base design, the field is a `String` type parameter to the `Has` class.  Another possiblity would be to invent a new fake class for each field, thus
    56 {{{
    57 class Has_x (r :: *) (t :: *) where
    58   get_x :: r -> t
    59 }}}
    60 This seems clunky to me, and I don't really want to deal with this infinite family of classes (eg how are the brought into scope).  I see no disadvantages to a `String` parameter, which will form part of GHC's new kind system.
    61 
    62 == Scope control by generalising the `String` type in `Has` ==
    63 
    64 The proposal above doesn't allow label names to be scoped: if one module internally uses `"field"` as a label name then another module can break the abstraction by using the same string `"field"`.
    65 
    66 We can fix this by instead of having
    67 {{{
    68 class Has (r :: *) (f :: String)       (t :: *) where
    69 }}}
    70 having something like
    71 {{{
    72 class Has (r :: *) (ft :: *) (f :: ft) (t :: *) where
    73 }}}
    74 (where `ft` stands for field type).
    75 
    76 The expression
    77 {{{
    78 foo.field
    79 }}}
    80 (starting with a lowercase letter) would behave as described above, with `ft ~ String`.
    81 
    82 But
    83 {{{
    84 foo.Field
    85 }}}
    86 (starting with an uppercase letter) would use the Field constructor that is in scope, e.g. if we had
    87 {{{
    88 data FieldT = Field
    89 }}}
    90 then `ft ~ FieldT`. Then we can choose whether or not to export `FieldT(Field)`.
    91 
    92 In my opinion, this is ugly, since the selector can be either a type name or a label and the semantics are nonsame.
    93 Rather, we need scoped instances.
    94 –strake888
    95 
    96 == Should `get` have a proxy argument? ==
    97 
    98 The type of `get` is very odd:
    99 {{{
    100 get :: Has r f t => r -> t
    101 }}}
    102 It's odd because "f" is not nentioned in the type after the "`=>`".  That in turn means you could not write an unambiguous call of `get`, at least not without support for explicit type application (which is another story).
    103 
    104 But I think it's OK not to be able to write a call to `get`, ''because that's what the dot-notation generates''.  The dot-notation produces a suitably-instantiated call to `get`. 
    105 
    106 It would be equally possible to have the slightly clunkier definition
    107 {{{
    108 data Proxy (a :: String)
    109 
    110 class Has (r :: *) (f :: String) (t :: *) where
    111    get :: r -> Proxy f -> t
    112 }}}
    113 Now `r.x` would typecheck and desugar as `get r (undefined :: Proxy "x")`.  This choice does not make a major difference either way.
    114 
    115 == Higher rank types and type functions ==
    116 
    117 It is very tempting to define `Has` with an associated type, like this:
    118 {{{
    119 class Has (r :: *) (f :: String) where
    120   type FieldTy r f :: *
    121   get :: r -> FieldTy r f
    122 }}}
    123 After all, the field type is fixed by the record type, isn't it?
    124 For example, we could get these instance declarations:
    125 {{{
    126 instance Has (T a) "x" where
    127   type FieldTy (T a) "x" = a
    128   get = ...as before...
    129 instance Has (T a) "y" where
    130   type FieldTy (T a) "y" = Bool
    131   get = ...as before...
    132 }}}
    133 But this approach fails for fields with higher rank types.  For example, this should work:
    134 {{{
    135 data HR = HR { rev :: forall a. [a] -> [a] }
    136 
    137 f :: HR -> ([Bool], [Char])
    138 f r = (r.rev [True], r.rev "hello")
    139 }}}
    140 Records with polymorphic fields are very important in practice, so it would be a
    141 major wart not to support them.
    142 
    143 However we are not allowed to say
    144 {{{
    145 instance Has HR "rev" where
    146   type FieldTy HR "rev" = forall a. [a] -> [a]
    147   get (HR rev) = rev
    148 }}}
    149 because we are not allowed to have a type instance whose RHS is a polytype, because then
    150 we don't konw when to instantiate it.  This restriction is '''not''' easy to lift.
    151 
    152 The base design instead gives `Has` ''three'' parameters, and uses
    153 a functional-dependency-like mechanism (but using equalities) for the result type.
    154 Using this we can deal with `HR`:
    155 {{{
    156 instance (t ~ [a] -> [a]) => Has HR "rev" t where
    157   get (HR rev) = rev
    158 }}}
    159 and all is well.
    160 
    161 == Virtual record selectors ==
    162 
    163 Suppose we have
    164 {{{
    165 data Shape = Rect Float Float | Circle Float | ...
    166 
    167 area :: Shape -> Float
    168 area (Rect x y) = x*y
    169 area (Circle r) = ...
    170 ...etc...
    171 }}}
    172 Can the user write this?
    173 {{{
    174 instance Has Shape "area" Float where
    175   get = area
    176 }}}
    177 Then he can write `shape.area` with the expected result.  I'll call
    178 `area` a '''virtual record selector''' because it can be used as if it
    179 is a record selector, but is implemented with more complex behaviour.
    180 
    181 The same idea works
    182 even for overloaded functions like `fullName` (whose definition appears above):
    183 {{{
    184 instance (Has r "firstName", Has r "lastName")
    185        => Has r "fullName" String where
    186   get = fullName
    187 }}}
    188 I see no reason to prohibit virtual record selectors; indeed
    189 we would have to go to extra trouble to do so. 
    190 On the contrary, one might want a pragma to generate the
    191 two lines of boilerplate.
    192 
    193 == Unboxed fields ==
    194 
    195 The mechanism does not work at all for records with unboxed fields:
    196 {{{
    197 data T = MkT { x :: Int# }
    198 }}}
    199 Reason: type-class constraints can only be instantiate with lifted types.  There is a good reason for this restriction, because unboxed types can have varying widths, so you can't generate code that works uniformly for boxed and unboxed types.
    200 
    201 This is a real problem, but it is one that we might be willing to live with.  In particular, it's fine to have UNPACKed fields:
    202 {{{
    203 data T = MkT { x :: {-# UNPACK #-} Int }
    204 }}}
    205 So you can have efficiently-represented records without having to expose the unboxed types.
    206 
    207 
    208 == Selectors as functions ==
    209 
    210 Currently a record declaration brings into scope its selectors, so that
    211 {{{
    212 data T = MkT { x,y :: Int }
    213 }}}
    214 brings into scope (as top-level functions) `x :: T -> Int` and `y :: T -> Int`.
    215 
    216 Does that still happen under the new design?  The simple answer is "yes,for backard compatibility",
    217 but note that if there were more than one record
    218 with the same field name, then there would be ''two'' top level functions both
    219 called `x`, so you could not call either of them directly,
    220 
    221 == Representation hiding ==
    222 
    223 Currently if you say
    224 {{{
    225 module M( T( x ) )
    226   data T = MkT { x,y :: Int }
    227 }}}
    228 then the selector `x` is exported, but not `y`.  This is quite important; we need
    229 a way to hide the representation of T, include the existence or otherwise of a 'y' field.
    230 
    231 What happens under the new system?  If we generate instances for `(Has
    232 T "x" Int)` and `(Has T "y" Int)`, then presumably they'd ''both'' be
    233 exported (since we have no way to restrict instances).  But that means
    234 we cannot ever hide the existence of a field, which seems Plain Wrong.
    235 
    236 I can think of three solutions, all unsatisfactory in one way or another:
    237 
    238  * An instance `(Has T "x" ty)` is exported only if the record selector for
    239    T is exported.  Again, this doesn't work for virtual record selectors;
    240    and it would require a new implementation mechanism to record exactly
    241    which instances were exported from a module.
    242 
    243  * An instance `(Has T "x" ty)` is ignored unless the record selector
    244    `x` of type `T` is in scope.  This would require an ''ad hoc'' hack to
    245    the instance lookup mechanism; and it doesn't work at all for
    246    virtual record selectors. 
    247 
    248  * Require that every instance of `Has` looks like
    249 {{{
    250 instance blah => Has T "f" ty where
    251    get = the_getter
    252 }}}
    253   where `the_getter` is a named function. 
    254   An instance `(Has T "x" ty)` is ignored unless the function
    255   witnessing the instance (`the_getter`) is in scope.  This has the
    256   merit of working for virtual selectors, but it means that instances
    257   of `Has` must be restricted to a special form (more to specify). 
    258   However it comes with a reasonable story: the instances are simply
    259   a way to choose among the many in-scope "x"'s.
    260 
    261  * Use the "Scope control by generalising the `String` type in `Has`" proposal above
    262 
    263 Solving this issue is important.
    264 
    265 ------------------------
    266 = Syntax =
    267 
    268 == The dot notation ==
    269 
    270 It is critical to support dot-notation. There are quite a few things to think about:
    271 
    272  * Dot notation must work in cascades (left-associatively), and with an expression to the left:
    273 {{{
    274   r.x
    275   r.x.y
    276   (foo v).y
    277 }}}
    278 
    279  * There is an overlap with the function composition operator, but that is already true with qualified names.  It comes down  this:
    280    * Function composition will only work when surrounded by spaces, thus `(f . g)`.
    281    * Dot-notation for record selection only works with no spaces.
    282 
    283  * The expression `(f r.x)` must, I think, mean `(f (r.x))`.  That is, dot notation binds more tightly than function application.
    284 
    285  * Should it be possible to partly apply dot-notation, in a section, like this: `map (.x) ts`?  (Note that the dual, `map (r.) fs` makes no sense.).  I'm inclined to take the simple approach and say "no", you have to write `map (\r -> r.x) ts`.
    286 
    287 == Syntactic sugar for `Has` ==
    288 
    289 The type of `fullName` above is rather verbose, and it might be nice to have
    290 some syntactic sugar for `Has`, perhaps something like this:
    291 {{{
    292 fullName :: r { firstName :: String, lastName :: String }
    293          => r -> String
    294 fullName r = r.firstName ++ " " ++ r.lastName
    295 
    296 instance r { firstName :: String, lastName :: String }
    297       => r { fullName :: String } where
    298   get = fullName
    299 }}}
    300 We replace "`Has r f t`" by "`r { f :: t }`", and allow multiple such constaints
    301 with the same `r` to be combined.  Similarly, multiple constraints with the
    302 same `r` and `t` could be combined further, just as they are in record declarations:
    303 {{{
    304 fullName :: r { firstName, lastName :: String }
    305          => r -> String
    306 }}}
    307 
    308 
    309 ------------------------
    310 = Record updates =
    311 
    312 At first it seems that record update could be handled readily, by adding a 'set' field to `Has`:
    313 {{{
    314 class Has (r :: *) (f :: String) (t :: *) where
    315   get :: r -> t
    316   set :: t -> r -> r
    317 }}}
    318 Now, similar to dot-notation, record update syntax `(r { x = e1, y = e2 })` would typecheck and
    319 desugar as if you had written `(set @ "x" e1 (set @ "y" e2 r))`.  (I've left out a couple of
    320 type parameters to save clutter.)
    321 
    322 There are three problems:
    323  * There is no natural `set` for "pseudo" fields like `area` or `fullName` above.  Maybe this could be solved by stratifying the classes:
    324 {{{
    325 class Has (r :: *) (f :: String) (t :: *) where
    326   get :: r -> t
    327 
    328 class Has r f t => Upd (r :: *) (f :: String) (t :: *) where
    329   set :: t -> r -> r
    330 }}}
    331  But that adds complexity, and casts into doubt the proposed syntactic sugar for `Has` predicates.
    332 
    333  * Haskell 98's record update can change the type of the record
    334  (there is a type-checked response to this bullet point [https://raw.github.com/ntc2/haskell-records/master/GHCWiki_SimpleOverloadedRecordFields.lhs here]):
    335 {{{
    336 data R a = R { x :: a }
    337 
    338 upd :: R Int -> R Bool
    339 upd r = r { x = True }
    340 }}}
    341  But `set` cannot accomodate this change of type, at least not without much more complexity.  Moreover, '''NO''' field-at-a-time encoding can deal with the case when more than one field shares the changed type. For example, this is legal Haskell:
    342 {{{
    343 data S a = S { x,y :: a }
    344 upd :: S Int -> S Bool
    345 upd s = s { x = True, y = False }
    346 }}}
    347  But any system that updates `x` and then `y` will have an ill-typed intermediate in which `x` has a `Bool` while `y` has an `Int`.  This is really the death-knell for any field-at-a-time story that seeks to be compatible with Haskell as she is now.
    348 
    349  * This approach doesn't work at all in the higher-rank case.  For example, if `r :: HR`, we are currently allowed to say
    350 {{{
    351    r { rev = reverse }
    352 }}}
    353  but if we say `set @ "rev" reverse r`, the `reverse` will be instantiated.  Another way to say it is that the putative  instance declaration
    354 {{{
    355 instance (t ~ [a] -> [a]) => Has HR "rev" t where
    356   set rv r = r { rev = rv }
    357 }}}
    358  will be rejected because `rv` is not polymorphic.  Again, this is '''not''' easy to fix.   
    359 
    360  This problem seems to be a killer: if record-update syntax is interpreted as a call to `set`, we cannot, ever, use record-update syntax to update a record with a polymorphic field. (We could use alternative notation; instead of `r { rev = reverse }` we could say
    361 {{{
    362   case r of { HR { .. } -> HR { rev = reverse, .. } }
    363 }}}
    364  where I'm using GHC's dot-dot notation to fill in the missing fields.  But it's clumsy and would get worse if the data type had several constructors.)  And even if we were willing to give up on record-update syntax for polymorphic fields, we'd need a rule to say which fields are or are not made an instance of `Has` and `Upd`.
    365 
    366 I'm not very happy with any of this.  Faced with these difficulties,
    367 another alternative is to stick with the status quo for record
    368 updates; that is, not to support any sort of overloading.  But even
    369 ''that'' is problematic: what does `e { x = True }` mean if there are lots of "x" fields
    370 in scope (which is precisely what we want to allow). Haskell's current record-update
    371 syntax really relies on learning which type is involved, from the record selector; but if
    372 there are many x's, it can't figure it out.  Insisting that only one "x" selector must
    373 be in scope puts us back to square one.  I think the only decent solution is to
    374 invent a new syntax for monomorphic (non-overloaded) record update, something like this:
    375 {{{
    376    HR { r | rev = reverse }
    377 }}}
    378 In general
    379 {{{
    380    <type constructor> { expression | field = expression, ... }
    381 }}}
    382 Of course this isn't very satisfactory either
    383  * New syntax required
    384  * Not overloaded, so you can't abstract over it.
    385 
    386 == Alternative Proposal ==
    387 
    388 First, we define a class for types with a member at 'k':
    389 
    390 {{{
    391 class Has r k v where select :: r -> k -> v;
    392 }}}
    393 
    394 Next, we define a class for types with a mutable member of certain type:
    395 
    396 {{{
    397 class Quasifunctor r s k u v where qfmap :: k -> (u -> v) -> r -> s;
    398 }}}
    399 
    400 `Quasifunctor r s k u v` means that `r` and `s` have members of types `u` and `v`, in turn, both with selector `k`; thus, one can mutate the member at 'k' with an arbitrary function of type `u -> v`, and the overall function is of type `r -> s`; i.e. one can lift a function of type `u -> v` to a function of type `r -> s`. This is the record update.
    401 
    402 `qfmap` is the lifter function. The first argument serves to specify which member is meant.
    403 
    404 This ought to allow polymorphic mutation; to set member at "x" to value `x` is simply `qfmap (undefined :: "x") (const x)`, though this mechanism is of course more general; one could `qfmap` any arbitrary function, not just a constant function.
    405 
    406 For example:
    407 {{{
    408 data R a = R { x :: a };
    409 
    410 -- automatically-generated instances
    411 instance Has (R a) "x" a where ...
    412 instance Quasifunctor (R a) (R b) "x" a b where ...
    413 
    414 r = R { x = 256 };
    415 -- assign string value to r.x
    416 let s = qfmap (undefined :: "x") (const "Hello, world!") r in ...
    417 }}}
    418 
    419 ------------------------
    420 = Relationship to Type Directed Name Resolution =
    421 
    422 This proposal is quite closely related to the [http://hackage.haskell.org/trac/haskell-prime/wiki/TypeDirectedNameResolution Type Directed Name Resolution] idea, becuase TDNR
    423 would internally generate `Has` constraints exactly as described above.  The difference is
    424 that TDNR wasn't intended to support ''abstraction'' over the constraint, and was explained
    425 rather differently.
     20 * [wiki:Records/OverloadedRecordFields/SORF Simple Overloaded Record Fields (SORF)], Simon PJ's original proposal
     21 * [wiki:Records/DeclaredOverloadedRecordFields Declared Overloaded Record Fields (DORF)], a counterpoint proposal by Anthony Clayden
     22 * [wiki:Records Discussion of the problem and possible solutions]
     23 * [http://www.google-melange.com/gsoc/project/google/gsoc2013/adamgundry/4766932662222848 Google Summer of Code project details]