Changes between Initial Version and Version 1 of Records/OverloadedRecordFields/SORF


Ignore:
Timestamp:
Mar 2, 2014 1:27:19 PM (17 months ago)
Author:
adamgundry
Comment:

Import Records/OverloadedRecordFields and update introduction

Legend:

Unmodified
Added
Removed
Modified
  • Records/OverloadedRecordFields/SORF

    v1 v1  
     1= Simple Overloaded Record Fields (SORF)  =
     2
     3'''This page describes the original design by Simon PJ for Simple Overloaded Record Fields, which eventually became the [wiki:Records/OverloadedRecordFields OverloadedRecordFields] extension. It is retained for historical purposes, and does not reflect the final design as implemented.'''
     4
     5This page summarises a possible design that would allow different records to share a single field label.  Although it's a simple enough idea there are numerous ramifications.  Records are a swamp! Nevertheless, this is the simplest proposal that I know that satisfies the main request, that of having multiple records with the same field name.
     6
     7See also a similar [http://research.microsoft.com/en-us/um/people/simonpj/Haskell/records.html 2003 proposal by Simon PJ and Greg Morrisset].  It is essentially the same as the proposal below, but (a) has less detail and (b) adds anonymous record types.   Anonymous type could be an add-on feature to the design described here.
     8
     9= The base design =
     10
     11The '''base design''' has the following distinct components:
     12
     13 * A library class
     14{{{
     15class Has (r :: *) (f :: String) (t :: *) where
     16  get :: r -> t
     17}}}
     18
     19 * A record declaration generates an instance declaration for each field. For example
     20{{{
     21data T a = T1 { x :: a, y :: Bool }
     22         | T2 { x :: a }
     23}}}
     24 would generate
     25{{{
     26instance (t~a) => Has (T a) "x" t where
     27  get (T1 x _) = x
     28  get (T2 x)   = x
     29instance (t~Bool) => Has (T a) "y" t where
     30  get (T1 _ y) = y
     31}}}
     32
     33 * Dot-notation would typecheck (and desugar to) as a call of the `get` method.  So `r.x` would typecheck just as if you had written `(get @ T _ @ "x" @ _ r)`, where the `@` means type application, and `_` means "some type".  (GHC doesn't support explicit type appplication yet, but you get the idea.)
     34
     35Each of these components embodies choices which are described, with their variants, below.
     36
     37Note that the proposal naturally supports record-polymorphic functions, such as
     38{{{
     39fullName :: (Has r "firstName" String, Has r "lastName" String)
     40         => r -> String
     41fullName r = r.firstName ++ " " ++ r.lastName
     42}}}
     43admittedly with a rather verbose type (but see below).
     44
     45-----------------------------------
     46= Design choices and variations =
     47
     48== The `String` type parameter to `Has` ==
     49
     50In the base design, the field is a `String` type parameter to the `Has` class.  Another possiblity would be to invent a new fake class for each field, thus
     51{{{
     52class Has_x (r :: *) (t :: *) where
     53  get_x :: r -> t
     54}}}
     55This seems clunky to me, and I don't really want to deal with this infinite family of classes (eg how are the brought into scope).  I see no disadvantages to a `String` parameter, which will form part of GHC's new kind system.
     56
     57== Scope control by generalising the `String` type in `Has` ==
     58
     59The proposal above doesn't allow label names to be scoped: if one module internally uses `"field"` as a label name then another module can break the abstraction by using the same string `"field"`.
     60
     61We can fix this by instead of having
     62{{{
     63class Has (r :: *) (f :: String)       (t :: *) where
     64}}}
     65having something like
     66{{{
     67class Has (r :: *) (ft :: *) (f :: ft) (t :: *) where
     68}}}
     69(where `ft` stands for field type).
     70
     71The expression
     72{{{
     73foo.field
     74}}}
     75(starting with a lowercase letter) would behave as described above, with `ft ~ String`.
     76
     77But
     78{{{
     79foo.Field
     80}}}
     81(starting with an uppercase letter) would use the Field constructor that is in scope, e.g. if we had
     82{{{
     83data FieldT = Field
     84}}}
     85then `ft ~ FieldT`. Then we can choose whether or not to export `FieldT(Field)`.
     86
     87In my opinion, this is ugly, since the selector can be either a type name or a label and the semantics are nonsame.
     88Rather, we need scoped instances.
     89–strake888
     90
     91== Should `get` have a proxy argument? ==
     92
     93The type of `get` is very odd:
     94{{{
     95get :: Has r f t => r -> t
     96}}}
     97It's odd because "f" is not nentioned in the type after the "`=>`".  That in turn means you could not write an unambiguous call of `get`, at least not without support for explicit type application (which is another story).
     98
     99But I think it's OK not to be able to write a call to `get`, ''because that's what the dot-notation generates''.  The dot-notation produces a suitably-instantiated call to `get`. 
     100
     101It would be equally possible to have the slightly clunkier definition
     102{{{
     103data Proxy (a :: String)
     104
     105class Has (r :: *) (f :: String) (t :: *) where
     106   get :: r -> Proxy f -> t
     107}}}
     108Now `r.x` would typecheck and desugar as `get r (undefined :: Proxy "x")`.  This choice does not make a major difference either way.
     109
     110== Higher rank types and type functions ==
     111
     112It is very tempting to define `Has` with an associated type, like this:
     113{{{
     114class Has (r :: *) (f :: String) where
     115  type FieldTy r f :: *
     116  get :: r -> FieldTy r f
     117}}}
     118After all, the field type is fixed by the record type, isn't it?
     119For example, we could get these instance declarations:
     120{{{
     121instance Has (T a) "x" where
     122  type FieldTy (T a) "x" = a
     123  get = ...as before...
     124instance Has (T a) "y" where
     125  type FieldTy (T a) "y" = Bool
     126  get = ...as before...
     127}}}
     128But this approach fails for fields with higher rank types.  For example, this should work:
     129{{{
     130data HR = HR { rev :: forall a. [a] -> [a] }
     131
     132f :: HR -> ([Bool], [Char])
     133f r = (r.rev [True], r.rev "hello")
     134}}}
     135Records with polymorphic fields are very important in practice, so it would be a
     136major wart not to support them.
     137
     138However we are not allowed to say
     139{{{
     140instance Has HR "rev" where
     141  type FieldTy HR "rev" = forall a. [a] -> [a]
     142  get (HR rev) = rev
     143}}}
     144because we are not allowed to have a type instance whose RHS is a polytype, because then
     145we don't konw when to instantiate it.  This restriction is '''not''' easy to lift.
     146
     147The base design instead gives `Has` ''three'' parameters, and uses
     148a functional-dependency-like mechanism (but using equalities) for the result type.
     149Using this we can deal with `HR`:
     150{{{
     151instance (t ~ [a] -> [a]) => Has HR "rev" t where
     152  get (HR rev) = rev
     153}}}
     154and all is well.
     155
     156== Virtual record selectors ==
     157
     158Suppose we have
     159{{{
     160data Shape = Rect Float Float | Circle Float | ...
     161
     162area :: Shape -> Float
     163area (Rect x y) = x*y
     164area (Circle r) = ...
     165...etc...
     166}}}
     167Can the user write this?
     168{{{
     169instance Has Shape "area" Float where
     170  get = area
     171}}}
     172Then he can write `shape.area` with the expected result.  I'll call
     173`area` a '''virtual record selector''' because it can be used as if it
     174is a record selector, but is implemented with more complex behaviour.
     175
     176The same idea works
     177even for overloaded functions like `fullName` (whose definition appears above):
     178{{{
     179instance (Has r "firstName", Has r "lastName")
     180       => Has r "fullName" String where
     181  get = fullName
     182}}}
     183I see no reason to prohibit virtual record selectors; indeed
     184we would have to go to extra trouble to do so. 
     185On the contrary, one might want a pragma to generate the
     186two lines of boilerplate.
     187
     188== Unboxed fields ==
     189
     190The mechanism does not work at all for records with unboxed fields:
     191{{{
     192data T = MkT { x :: Int# }
     193}}}
     194Reason: type-class constraints can only be instantiate with lifted types.  There is a good reason for this restriction, because unboxed types can have varying widths, so you can't generate code that works uniformly for boxed and unboxed types.
     195
     196This is a real problem, but it is one that we might be willing to live with.  In particular, it's fine to have UNPACKed fields:
     197{{{
     198data T = MkT { x :: {-# UNPACK #-} Int }
     199}}}
     200So you can have efficiently-represented records without having to expose the unboxed types.
     201
     202
     203== Selectors as functions ==
     204
     205Currently a record declaration brings into scope its selectors, so that
     206{{{
     207data T = MkT { x,y :: Int }
     208}}}
     209brings into scope (as top-level functions) `x :: T -> Int` and `y :: T -> Int`.
     210
     211Does that still happen under the new design?  The simple answer is "yes,for backard compatibility",
     212but note that if there were more than one record
     213with the same field name, then there would be ''two'' top level functions both
     214called `x`, so you could not call either of them directly,
     215
     216== Representation hiding ==
     217
     218Currently if you say
     219{{{
     220module M( T( x ) )
     221  data T = MkT { x,y :: Int }
     222}}}
     223then the selector `x` is exported, but not `y`.  This is quite important; we need
     224a way to hide the representation of T, include the existence or otherwise of a 'y' field.
     225
     226What happens under the new system?  If we generate instances for `(Has
     227T "x" Int)` and `(Has T "y" Int)`, then presumably they'd ''both'' be
     228exported (since we have no way to restrict instances).  But that means
     229we cannot ever hide the existence of a field, which seems Plain Wrong.
     230
     231I can think of three solutions, all unsatisfactory in one way or another:
     232
     233 * An instance `(Has T "x" ty)` is exported only if the record selector for
     234   T is exported.  Again, this doesn't work for virtual record selectors;
     235   and it would require a new implementation mechanism to record exactly
     236   which instances were exported from a module.
     237
     238 * An instance `(Has T "x" ty)` is ignored unless the record selector
     239   `x` of type `T` is in scope.  This would require an ''ad hoc'' hack to
     240   the instance lookup mechanism; and it doesn't work at all for
     241   virtual record selectors. 
     242
     243 * Require that every instance of `Has` looks like
     244{{{
     245instance blah => Has T "f" ty where
     246   get = the_getter
     247}}}
     248  where `the_getter` is a named function. 
     249  An instance `(Has T "x" ty)` is ignored unless the function
     250  witnessing the instance (`the_getter`) is in scope.  This has the
     251  merit of working for virtual selectors, but it means that instances
     252  of `Has` must be restricted to a special form (more to specify). 
     253  However it comes with a reasonable story: the instances are simply
     254  a way to choose among the many in-scope "x"'s.
     255
     256 * Use the "Scope control by generalising the `String` type in `Has`" proposal above
     257
     258Solving this issue is important.
     259
     260------------------------
     261= Syntax =
     262
     263== The dot notation ==
     264
     265It is critical to support dot-notation. There are quite a few things to think about:
     266
     267 * Dot notation must work in cascades (left-associatively), and with an expression to the left:
     268{{{
     269  r.x
     270  r.x.y
     271  (foo v).y
     272}}}
     273
     274 * There is an overlap with the function composition operator, but that is already true with qualified names.  It comes down  this:
     275   * Function composition will only work when surrounded by spaces, thus `(f . g)`.
     276   * Dot-notation for record selection only works with no spaces.
     277
     278 * The expression `(f r.x)` must, I think, mean `(f (r.x))`.  That is, dot notation binds more tightly than function application.
     279
     280 * Should it be possible to partly apply dot-notation, in a section, like this: `map (.x) ts`?  (Note that the dual, `map (r.) fs` makes no sense.).  I'm inclined to take the simple approach and say "no", you have to write `map (\r -> r.x) ts`.
     281
     282== Syntactic sugar for `Has` ==
     283
     284The type of `fullName` above is rather verbose, and it might be nice to have
     285some syntactic sugar for `Has`, perhaps something like this:
     286{{{
     287fullName :: r { firstName :: String, lastName :: String }
     288         => r -> String
     289fullName r = r.firstName ++ " " ++ r.lastName
     290
     291instance r { firstName :: String, lastName :: String }
     292      => r { fullName :: String } where
     293  get = fullName
     294}}}
     295We replace "`Has r f t`" by "`r { f :: t }`", and allow multiple such constaints
     296with the same `r` to be combined.  Similarly, multiple constraints with the
     297same `r` and `t` could be combined further, just as they are in record declarations:
     298{{{
     299fullName :: r { firstName, lastName :: String }
     300         => r -> String
     301}}}
     302
     303
     304------------------------
     305= Record updates =
     306
     307At first it seems that record update could be handled readily, by adding a 'set' field to `Has`:
     308{{{
     309class Has (r :: *) (f :: String) (t :: *) where
     310  get :: r -> t
     311  set :: t -> r -> r
     312}}}
     313Now, similar to dot-notation, record update syntax `(r { x = e1, y = e2 })` would typecheck and
     314desugar as if you had written `(set @ "x" e1 (set @ "y" e2 r))`.  (I've left out a couple of
     315type parameters to save clutter.)
     316
     317There are three problems:
     318 * There is no natural `set` for "pseudo" fields like `area` or `fullName` above.  Maybe this could be solved by stratifying the classes:
     319{{{
     320class Has (r :: *) (f :: String) (t :: *) where
     321  get :: r -> t
     322
     323class Has r f t => Upd (r :: *) (f :: String) (t :: *) where
     324  set :: t -> r -> r
     325}}}
     326 But that adds complexity, and casts into doubt the proposed syntactic sugar for `Has` predicates.
     327
     328 * Haskell 98's record update can change the type of the record
     329 (there is a type-checked response to this bullet point [https://raw.github.com/ntc2/haskell-records/master/GHCWiki_SimpleOverloadedRecordFields.lhs here]):
     330{{{
     331data R a = R { x :: a }
     332
     333upd :: R Int -> R Bool
     334upd r = r { x = True }
     335}}}
     336 But `set` cannot accomodate this change of type, at least not without much more complexity.  Moreover, '''NO''' field-at-a-time encoding can deal with the case when more than one field shares the changed type. For example, this is legal Haskell:
     337{{{
     338data S a = S { x,y :: a }
     339upd :: S Int -> S Bool
     340upd s = s { x = True, y = False }
     341}}}
     342 But any system that updates `x` and then `y` will have an ill-typed intermediate in which `x` has a `Bool` while `y` has an `Int`.  This is really the death-knell for any field-at-a-time story that seeks to be compatible with Haskell as she is now.
     343
     344 * This approach doesn't work at all in the higher-rank case.  For example, if `r :: HR`, we are currently allowed to say
     345{{{
     346   r { rev = reverse }
     347}}}
     348 but if we say `set @ "rev" reverse r`, the `reverse` will be instantiated.  Another way to say it is that the putative  instance declaration
     349{{{
     350instance (t ~ [a] -> [a]) => Has HR "rev" t where
     351  set rv r = r { rev = rv }
     352}}}
     353 will be rejected because `rv` is not polymorphic.  Again, this is '''not''' easy to fix.   
     354
     355 This problem seems to be a killer: if record-update syntax is interpreted as a call to `set`, we cannot, ever, use record-update syntax to update a record with a polymorphic field. (We could use alternative notation; instead of `r { rev = reverse }` we could say
     356{{{
     357  case r of { HR { .. } -> HR { rev = reverse, .. } }
     358}}}
     359 where I'm using GHC's dot-dot notation to fill in the missing fields.  But it's clumsy and would get worse if the data type had several constructors.)  And even if we were willing to give up on record-update syntax for polymorphic fields, we'd need a rule to say which fields are or are not made an instance of `Has` and `Upd`.
     360
     361I'm not very happy with any of this.  Faced with these difficulties,
     362another alternative is to stick with the status quo for record
     363updates; that is, not to support any sort of overloading.  But even
     364''that'' is problematic: what does `e { x = True }` mean if there are lots of "x" fields
     365in scope (which is precisely what we want to allow). Haskell's current record-update
     366syntax really relies on learning which type is involved, from the record selector; but if
     367there are many x's, it can't figure it out.  Insisting that only one "x" selector must
     368be in scope puts us back to square one.  I think the only decent solution is to
     369invent a new syntax for monomorphic (non-overloaded) record update, something like this:
     370{{{
     371   HR { r | rev = reverse }
     372}}}
     373In general
     374{{{
     375   <type constructor> { expression | field = expression, ... }
     376}}}
     377Of course this isn't very satisfactory either
     378 * New syntax required
     379 * Not overloaded, so you can't abstract over it.
     380
     381== Alternative Proposal ==
     382
     383First, we define a class for types with a member at 'k':
     384
     385{{{
     386class Has r k v where select :: r -> k -> v;
     387}}}
     388
     389Next, we define a class for types with a mutable member of certain type:
     390
     391{{{
     392class Quasifunctor r s k u v where qfmap :: k -> (u -> v) -> r -> s;
     393}}}
     394
     395`Quasifunctor r s k u v` means that `r` and `s` have members of types `u` and `v`, in turn, both with selector `k`; thus, one can mutate the member at 'k' with an arbitrary function of type `u -> v`, and the overall function is of type `r -> s`; i.e. one can lift a function of type `u -> v` to a function of type `r -> s`. This is the record update.
     396
     397`qfmap` is the lifter function. The first argument serves to specify which member is meant.
     398
     399This ought to allow polymorphic mutation; to set member at "x" to value `x` is simply `qfmap (undefined :: "x") (const x)`, though this mechanism is of course more general; one could `qfmap` any arbitrary function, not just a constant function.
     400
     401For example:
     402{{{
     403data R a = R { x :: a };
     404
     405-- automatically-generated instances
     406instance Has (R a) "x" a where ...
     407instance Quasifunctor (R a) (R b) "x" a b where ...
     408
     409r = R { x = 256 };
     410-- assign string value to r.x
     411let s = qfmap (undefined :: "x") (const "Hello, world!") r in ...
     412}}}
     413
     414------------------------
     415= Relationship to Type Directed Name Resolution =
     416
     417This proposal is quite closely related to the [http://hackage.haskell.org/trac/haskell-prime/wiki/TypeDirectedNameResolution Type Directed Name Resolution] idea, becuase TDNR
     418would internally generate `Has` constraints exactly as described above.  The difference is
     419that TDNR wasn't intended to support ''abstraction'' over the constraint, and was explained
     420rather differently.