Changes between Version 20 and Version 21 of Records/OverloadedRecordFields


Ignore:
Timestamp:
Mar 2, 2014 1:54:44 PM (12 months ago)
Author:
adamgundry
Comment:

Overview of new OverloadedRecordFields extension

Legend:

Unmodified
Added
Removed
Modified
  • Records/OverloadedRecordFields

    v20 v21  
    1 = Simple Overloaded Record Fields (SORF) = 
     1= OverloadedRecordFields = 
    22 
    3 This page summarises a possible design that would allow 
    4 different records to share a single field label.  Although it's a simple enough 
    5 idea there are numerous ramifications.  Records are a swamp!  
     3The `OverloadedRecordFields` extension for GHC allows multiple record datatypes to share the same field names, and uses type information to disambiguate them. For more information, see: 
    64 
    7 '''Nevertheless, this is the simplest proposal that I know that satisfies the main 
    8 request, that of having multiple records with the same field name.''' 
     5 * [wiki:Records/OverloadedRecordFields/Design Design of the extension] (for power users) 
     6 * [wiki:Records/OverloadedRecordFields/Implementation Notes on the implementation] (for GHC hackers) 
    97 
    10 See also a similar [http://research.microsoft.com/en-us/um/people/simonpj/Haskell/records.html 2003 proposal by Simon PJ and Greg Morrisset].  It is essentially the same as the proposal below, but (a) has less detail and (b) adds anonymous record types.   Anonymous type could be an add-on feature to the design described here. 
     8Content previously on this page has been moved to the [wiki:Records/OverloadedRecordFields/SORF SORF] page. 
    119 
    12 '''AMG note''': the [wiki:Records/OverloadedRecordFields/Plan version of this proposal that I am implementing] is described elsewhere. The linked page will be updated to reflect the final design as it evolves. 
     10== Code == 
    1311 
    14 = The base design = 
     12Development of the extension is taking place on forks of the [https://github.com/adamgundry/ghc ghc], [https://github.com/adamgundry/packages-base packages-base] and [https://github.com/adamgundry/haddock haddock] repositories (on branch `overloaded-record-fields`). A [https://github.com/adamgundry/records-prototype prototype implementation] is also available. 
    1513 
    16 The '''base design''' has the following distinct components: 
     14The plan is to merge these branches shortly after the release of GHC 7.8, so power users can try out the extension and we can refine the design. The extension should be stable and available for general use in GHC 7.10. 
    1715 
    18  * A library class 
    19 {{{ 
    20 class Has (r :: *) (f :: String) (t :: *) where 
    21   get :: r -> t 
    22 }}} 
     16== History == 
    2317 
    24  * A record declaration generates an instance declaration for each field. For example 
    25 {{{ 
    26 data T a = T1 { x :: a, y :: Bool } 
    27          | T2 { x :: a } 
    28 }}} 
    29  would generate 
    30 {{{ 
    31 instance (t~a) => Has (T a) "x" t where 
    32   get (T1 x _) = x 
    33   get (T2 x)   = x 
    34 instance (t~Bool) => Has (T a) "y" t where 
    35   get (T1 _ y) = y 
    36 }}} 
     18The extension was implemented in 2013 as a Google Summer of Code project, by Adam Gundry under the mentorship of Simon Peyton Jones. 
    3719 
    38  * Dot-notation would typecheck (and desugar to) as a call of the `get` method.  So `r.x` would typecheck just as if you had written `(get @ T _ @ "x" @ _ r)`, where the `@` means type application, and `_` means "some type".  (GHC doesn't support explicit type appplication yet, but you get the idea.) 
    39  
    40 Each of these components embodies choices which are described, with their variants, below. 
    41  
    42 Note that the proposal naturally supports record-polymorphic functions, such as 
    43 {{{ 
    44 fullName :: (Has r "firstName" String, Has r "lastName" String) 
    45          => r -> String 
    46 fullName r = r.firstName ++ " " ++ r.lastName 
    47 }}} 
    48 admittedly with a rather verbose type (but see below). 
    49  
    50 ----------------------------------- 
    51 = Design choices and variations = 
    52  
    53 == The `String` type parameter to `Has` == 
    54  
    55 In the base design, the field is a `String` type parameter to the `Has` class.  Another possiblity would be to invent a new fake class for each field, thus 
    56 {{{ 
    57 class Has_x (r :: *) (t :: *) where 
    58   get_x :: r -> t 
    59 }}} 
    60 This seems clunky to me, and I don't really want to deal with this infinite family of classes (eg how are the brought into scope).  I see no disadvantages to a `String` parameter, which will form part of GHC's new kind system. 
    61  
    62 == Scope control by generalising the `String` type in `Has` == 
    63  
    64 The proposal above doesn't allow label names to be scoped: if one module internally uses `"field"` as a label name then another module can break the abstraction by using the same string `"field"`. 
    65  
    66 We can fix this by instead of having 
    67 {{{ 
    68 class Has (r :: *) (f :: String)       (t :: *) where 
    69 }}} 
    70 having something like 
    71 {{{ 
    72 class Has (r :: *) (ft :: *) (f :: ft) (t :: *) where 
    73 }}} 
    74 (where `ft` stands for field type). 
    75  
    76 The expression 
    77 {{{ 
    78 foo.field 
    79 }}} 
    80 (starting with a lowercase letter) would behave as described above, with `ft ~ String`. 
    81  
    82 But 
    83 {{{ 
    84 foo.Field 
    85 }}} 
    86 (starting with an uppercase letter) would use the Field constructor that is in scope, e.g. if we had 
    87 {{{ 
    88 data FieldT = Field 
    89 }}} 
    90 then `ft ~ FieldT`. Then we can choose whether or not to export `FieldT(Field)`. 
    91  
    92 In my opinion, this is ugly, since the selector can be either a type name or a label and the semantics are nonsame. 
    93 Rather, we need scoped instances. 
    94 –strake888 
    95  
    96 == Should `get` have a proxy argument? == 
    97  
    98 The type of `get` is very odd: 
    99 {{{ 
    100 get :: Has r f t => r -> t 
    101 }}} 
    102 It's odd because "f" is not nentioned in the type after the "`=>`".  That in turn means you could not write an unambiguous call of `get`, at least not without support for explicit type application (which is another story). 
    103  
    104 But I think it's OK not to be able to write a call to `get`, ''because that's what the dot-notation generates''.  The dot-notation produces a suitably-instantiated call to `get`.   
    105  
    106 It would be equally possible to have the slightly clunkier definition 
    107 {{{ 
    108 data Proxy (a :: String) 
    109  
    110 class Has (r :: *) (f :: String) (t :: *) where 
    111    get :: r -> Proxy f -> t 
    112 }}} 
    113 Now `r.x` would typecheck and desugar as `get r (undefined :: Proxy "x")`.  This choice does not make a major difference either way. 
    114  
    115 == Higher rank types and type functions == 
    116  
    117 It is very tempting to define `Has` with an associated type, like this: 
    118 {{{ 
    119 class Has (r :: *) (f :: String) where 
    120   type FieldTy r f :: * 
    121   get :: r -> FieldTy r f 
    122 }}} 
    123 After all, the field type is fixed by the record type, isn't it? 
    124 For example, we could get these instance declarations: 
    125 {{{ 
    126 instance Has (T a) "x" where 
    127   type FieldTy (T a) "x" = a 
    128   get = ...as before... 
    129 instance Has (T a) "y" where 
    130   type FieldTy (T a) "y" = Bool  
    131   get = ...as before... 
    132 }}} 
    133 But this approach fails for fields with higher rank types.  For example, this should work: 
    134 {{{ 
    135 data HR = HR { rev :: forall a. [a] -> [a] } 
    136  
    137 f :: HR -> ([Bool], [Char]) 
    138 f r = (r.rev [True], r.rev "hello") 
    139 }}} 
    140 Records with polymorphic fields are very important in practice, so it would be a 
    141 major wart not to support them. 
    142  
    143 However we are not allowed to say 
    144 {{{ 
    145 instance Has HR "rev" where 
    146   type FieldTy HR "rev" = forall a. [a] -> [a] 
    147   get (HR rev) = rev 
    148 }}} 
    149 because we are not allowed to have a type instance whose RHS is a polytype, because then 
    150 we don't konw when to instantiate it.  This restriction is '''not''' easy to lift. 
    151  
    152 The base design instead gives `Has` ''three'' parameters, and uses  
    153 a functional-dependency-like mechanism (but using equalities) for the result type. 
    154 Using this we can deal with `HR`: 
    155 {{{ 
    156 instance (t ~ [a] -> [a]) => Has HR "rev" t where 
    157   get (HR rev) = rev 
    158 }}} 
    159 and all is well. 
    160  
    161 == Virtual record selectors == 
    162  
    163 Suppose we have 
    164 {{{ 
    165 data Shape = Rect Float Float | Circle Float | ... 
    166  
    167 area :: Shape -> Float 
    168 area (Rect x y) = x*y 
    169 area (Circle r) = ... 
    170 ...etc... 
    171 }}} 
    172 Can the user write this? 
    173 {{{ 
    174 instance Has Shape "area" Float where 
    175   get = area  
    176 }}} 
    177 Then he can write `shape.area` with the expected result.  I'll call 
    178 `area` a '''virtual record selector''' because it can be used as if it 
    179 is a record selector, but is implemented with more complex behaviour. 
    180  
    181 The same idea works 
    182 even for overloaded functions like `fullName` (whose definition appears above): 
    183 {{{ 
    184 instance (Has r "firstName", Has r "lastName")  
    185        => Has r "fullName" String where 
    186   get = fullName 
    187 }}} 
    188 I see no reason to prohibit virtual record selectors; indeed 
    189 we would have to go to extra trouble to do so.   
    190 On the contrary, one might want a pragma to generate the  
    191 two lines of boilerplate. 
    192  
    193 == Unboxed fields == 
    194  
    195 The mechanism does not work at all for records with unboxed fields: 
    196 {{{ 
    197 data T = MkT { x :: Int# } 
    198 }}} 
    199 Reason: type-class constraints can only be instantiate with lifted types.  There is a good reason for this restriction, because unboxed types can have varying widths, so you can't generate code that works uniformly for boxed and unboxed types. 
    200  
    201 This is a real problem, but it is one that we might be willing to live with.  In particular, it's fine to have UNPACKed fields: 
    202 {{{ 
    203 data T = MkT { x :: {-# UNPACK #-} Int } 
    204 }}} 
    205 So you can have efficiently-represented records without having to expose the unboxed types. 
    206  
    207  
    208 == Selectors as functions == 
    209  
    210 Currently a record declaration brings into scope its selectors, so that 
    211 {{{ 
    212 data T = MkT { x,y :: Int } 
    213 }}} 
    214 brings into scope (as top-level functions) `x :: T -> Int` and `y :: T -> Int`. 
    215  
    216 Does that still happen under the new design?  The simple answer is "yes,for backard compatibility", 
    217 but note that if there were more than one record 
    218 with the same field name, then there would be ''two'' top level functions both 
    219 called `x`, so you could not call either of them directly,  
    220  
    221 == Representation hiding == 
    222  
    223 Currently if you say 
    224 {{{ 
    225 module M( T( x ) ) 
    226   data T = MkT { x,y :: Int } 
    227 }}} 
    228 then the selector `x` is exported, but not `y`.  This is quite important; we need  
    229 a way to hide the representation of T, include the existence or otherwise of a 'y' field. 
    230  
    231 What happens under the new system?  If we generate instances for `(Has 
    232 T "x" Int)` and `(Has T "y" Int)`, then presumably they'd ''both'' be 
    233 exported (since we have no way to restrict instances).  But that means 
    234 we cannot ever hide the existence of a field, which seems Plain Wrong. 
    235  
    236 I can think of three solutions, all unsatisfactory in one way or another: 
    237  
    238  * An instance `(Has T "x" ty)` is exported only if the record selector for 
    239    T is exported.  Again, this doesn't work for virtual record selectors; 
    240    and it would require a new implementation mechanism to record exactly 
    241    which instances were exported from a module. 
    242  
    243  * An instance `(Has T "x" ty)` is ignored unless the record selector 
    244    `x` of type `T` is in scope.  This would require an ''ad hoc'' hack to 
    245    the instance lookup mechanism; and it doesn't work at all for  
    246    virtual record selectors.   
    247  
    248  * Require that every instance of `Has` looks like 
    249 {{{ 
    250 instance blah => Has T "f" ty where 
    251    get = the_getter 
    252 }}} 
    253   where `the_getter` is a named function.   
    254   An instance `(Has T "x" ty)` is ignored unless the function  
    255   witnessing the instance (`the_getter`) is in scope.  This has the 
    256   merit of working for virtual selectors, but it means that instances 
    257   of `Has` must be restricted to a special form (more to specify).   
    258   However it comes with a reasonable story: the instances are simply 
    259   a way to choose among the many in-scope "x"'s. 
    260  
    261  * Use the "Scope control by generalising the `String` type in `Has`" proposal above 
    262  
    263 Solving this issue is important. 
    264  
    265 ------------------------ 
    266 = Syntax =  
    267  
    268 == The dot notation == 
    269  
    270 It is critical to support dot-notation. There are quite a few things to think about: 
    271  
    272  * Dot notation must work in cascades (left-associatively), and with an expression to the left: 
    273 {{{ 
    274   r.x 
    275   r.x.y 
    276   (foo v).y 
    277 }}} 
    278  
    279  * There is an overlap with the function composition operator, but that is already true with qualified names.  It comes down  this:  
    280    * Function composition will only work when surrounded by spaces, thus `(f . g)`. 
    281    * Dot-notation for record selection only works with no spaces. 
    282  
    283  * The expression `(f r.x)` must, I think, mean `(f (r.x))`.  That is, dot notation binds more tightly than function application. 
    284  
    285  * Should it be possible to partly apply dot-notation, in a section, like this: `map (.x) ts`?  (Note that the dual, `map (r.) fs` makes no sense.).  I'm inclined to take the simple approach and say "no", you have to write `map (\r -> r.x) ts`. 
    286  
    287 == Syntactic sugar for `Has` == 
    288  
    289 The type of `fullName` above is rather verbose, and it might be nice to have 
    290 some syntactic sugar for `Has`, perhaps something like this: 
    291 {{{ 
    292 fullName :: r { firstName :: String, lastName :: String } 
    293          => r -> String 
    294 fullName r = r.firstName ++ " " ++ r.lastName 
    295  
    296 instance r { firstName :: String, lastName :: String } 
    297       => r { fullName :: String } where 
    298   get = fullName 
    299 }}} 
    300 We replace "`Has r f t`" by "`r { f :: t }`", and allow multiple such constaints 
    301 with the same `r` to be combined.  Similarly, multiple constraints with the  
    302 same `r` and `t` could be combined further, just as they are in record declarations: 
    303 {{{ 
    304 fullName :: r { firstName, lastName :: String } 
    305          => r -> String 
    306 }}} 
    307  
    308  
    309 ------------------------ 
    310 = Record updates = 
    311  
    312 At first it seems that record update could be handled readily, by adding a 'set' field to `Has`: 
    313 {{{ 
    314 class Has (r :: *) (f :: String) (t :: *) where 
    315   get :: r -> t 
    316   set :: t -> r -> r 
    317 }}} 
    318 Now, similar to dot-notation, record update syntax `(r { x = e1, y = e2 })` would typecheck and 
    319 desugar as if you had written `(set @ "x" e1 (set @ "y" e2 r))`.  (I've left out a couple of  
    320 type parameters to save clutter.)  
    321  
    322 There are three problems: 
    323  * There is no natural `set` for "pseudo" fields like `area` or `fullName` above.  Maybe this could be solved by stratifying the classes: 
    324 {{{ 
    325 class Has (r :: *) (f :: String) (t :: *) where 
    326   get :: r -> t 
    327  
    328 class Has r f t => Upd (r :: *) (f :: String) (t :: *) where 
    329   set :: t -> r -> r 
    330 }}} 
    331  But that adds complexity, and casts into doubt the proposed syntactic sugar for `Has` predicates. 
    332  
    333  * Haskell 98's record update can change the type of the record 
    334  (there is a type-checked response to this bullet point [https://raw.github.com/ntc2/haskell-records/master/GHCWiki_SimpleOverloadedRecordFields.lhs here]): 
    335 {{{ 
    336 data R a = R { x :: a } 
    337  
    338 upd :: R Int -> R Bool 
    339 upd r = r { x = True } 
    340 }}} 
    341  But `set` cannot accomodate this change of type, at least not without much more complexity.  Moreover, '''NO''' field-at-a-time encoding can deal with the case when more than one field shares the changed type. For example, this is legal Haskell: 
    342 {{{ 
    343 data S a = S { x,y :: a } 
    344 upd :: S Int -> S Bool 
    345 upd s = s { x = True, y = False } 
    346 }}} 
    347  But any system that updates `x` and then `y` will have an ill-typed intermediate in which `x` has a `Bool` while `y` has an `Int`.  This is really the death-knell for any field-at-a-time story that seeks to be compatible with Haskell as she is now. 
    348  
    349  * This approach doesn't work at all in the higher-rank case.  For example, if `r :: HR`, we are currently allowed to say 
    350 {{{ 
    351    r { rev = reverse }  
    352 }}} 
    353  but if we say `set @ "rev" reverse r`, the `reverse` will be instantiated.  Another way to say it is that the putative  instance declaration 
    354 {{{ 
    355 instance (t ~ [a] -> [a]) => Has HR "rev" t where 
    356   set rv r = r { rev = rv } 
    357 }}} 
    358  will be rejected because `rv` is not polymorphic.  Again, this is '''not''' easy to fix.    
    359  
    360  This problem seems to be a killer: if record-update syntax is interpreted as a call to `set`, we cannot, ever, use record-update syntax to update a record with a polymorphic field. (We could use alternative notation; instead of `r { rev = reverse }` we could say 
    361 {{{ 
    362   case r of { HR { .. } -> HR { rev = reverse, .. } } 
    363 }}} 
    364  where I'm using GHC's dot-dot notation to fill in the missing fields.  But it's clumsy and would get worse if the data type had several constructors.)  And even if we were willing to give up on record-update syntax for polymorphic fields, we'd need a rule to say which fields are or are not made an instance of `Has` and `Upd`. 
    365  
    366 I'm not very happy with any of this.  Faced with these difficulties, 
    367 another alternative is to stick with the status quo for record 
    368 updates; that is, not to support any sort of overloading.  But even 
    369 ''that'' is problematic: what does `e { x = True }` mean if there are lots of "x" fields 
    370 in scope (which is precisely what we want to allow). Haskell's current record-update 
    371 syntax really relies on learning which type is involved, from the record selector; but if 
    372 there are many x's, it can't figure it out.  Insisting that only one "x" selector must 
    373 be in scope puts us back to square one.  I think the only decent solution is to  
    374 invent a new syntax for monomorphic (non-overloaded) record update, something like this: 
    375 {{{ 
    376    HR { r | rev = reverse } 
    377 }}} 
    378 In general 
    379 {{{ 
    380    <type constructor> { expression | field = expression, ... } 
    381 }}} 
    382 Of course this isn't very satisfactory either 
    383  * New syntax required 
    384  * Not overloaded, so you can't abstract over it. 
    385  
    386 == Alternative Proposal == 
    387  
    388 First, we define a class for types with a member at 'k': 
    389  
    390 {{{ 
    391 class Has r k v where select :: r -> k -> v; 
    392 }}} 
    393  
    394 Next, we define a class for types with a mutable member of certain type: 
    395  
    396 {{{ 
    397 class Quasifunctor r s k u v where qfmap :: k -> (u -> v) -> r -> s; 
    398 }}} 
    399  
    400 `Quasifunctor r s k u v` means that `r` and `s` have members of types `u` and `v`, in turn, both with selector `k`; thus, one can mutate the member at 'k' with an arbitrary function of type `u -> v`, and the overall function is of type `r -> s`; i.e. one can lift a function of type `u -> v` to a function of type `r -> s`. This is the record update. 
    401  
    402 `qfmap` is the lifter function. The first argument serves to specify which member is meant. 
    403  
    404 This ought to allow polymorphic mutation; to set member at "x" to value `x` is simply `qfmap (undefined :: "x") (const x)`, though this mechanism is of course more general; one could `qfmap` any arbitrary function, not just a constant function. 
    405  
    406 For example: 
    407 {{{ 
    408 data R a = R { x :: a }; 
    409  
    410 -- automatically-generated instances 
    411 instance Has (R a) "x" a where ... 
    412 instance Quasifunctor (R a) (R b) "x" a b where ... 
    413  
    414 r = R { x = 256 }; 
    415 -- assign string value to r.x 
    416 let s = qfmap (undefined :: "x") (const "Hello, world!") r in ... 
    417 }}} 
    418  
    419 ------------------------ 
    420 = Relationship to Type Directed Name Resolution = 
    421  
    422 This proposal is quite closely related to the [http://hackage.haskell.org/trac/haskell-prime/wiki/TypeDirectedNameResolution Type Directed Name Resolution] idea, becuase TDNR  
    423 would internally generate `Has` constraints exactly as described above.  The difference is 
    424 that TDNR wasn't intended to support ''abstraction'' over the constraint, and was explained 
    425 rather differently. 
     20 * [wiki:Records/OverloadedRecordFields/SORF Simple Overloaded Record Fields (SORF)], Simon PJ's original proposal 
     21 * [wiki:Records/DeclaredOverloadedRecordFields Declared Overloaded Record Fields (DORF)], a counterpoint proposal by Anthony Clayden 
     22 * [wiki:Records Discussion of the problem and possible solutions] 
     23 * [http://www.google-melange.com/gsoc/project/google/gsoc2013/adamgundry/4766932662222848 Google Summer of Code project details]