Changes between Initial Version and Version 1 of Records/OverloadedRecordFields/SORF

Mar 2, 2014 1:27:19 PM (17 months ago)

Import Records/OverloadedRecordFields and update introduction


  • Records/OverloadedRecordFields/SORF

    v1 v1  
     1= Simple Overloaded Record Fields (SORF)  =
     3'''This page describes the original design by Simon PJ for Simple Overloaded Record Fields, which eventually became the [wiki:Records/OverloadedRecordFields OverloadedRecordFields] extension. It is retained for historical purposes, and does not reflect the final design as implemented.'''
     5This page summarises a possible design that would allow different records to share a single field label.  Although it's a simple enough idea there are numerous ramifications.  Records are a swamp! Nevertheless, this is the simplest proposal that I know that satisfies the main request, that of having multiple records with the same field name.
     7See also a similar [ 2003 proposal by Simon PJ and Greg Morrisset].  It is essentially the same as the proposal below, but (a) has less detail and (b) adds anonymous record types.   Anonymous type could be an add-on feature to the design described here.
     9= The base design =
     11The '''base design''' has the following distinct components:
     13 * A library class
     15class Has (r :: *) (f :: String) (t :: *) where
     16  get :: r -> t
     19 * A record declaration generates an instance declaration for each field. For example
     21data T a = T1 { x :: a, y :: Bool }
     22         | T2 { x :: a }
     24 would generate
     26instance (t~a) => Has (T a) "x" t where
     27  get (T1 x _) = x
     28  get (T2 x)   = x
     29instance (t~Bool) => Has (T a) "y" t where
     30  get (T1 _ y) = y
     33 * Dot-notation would typecheck (and desugar to) as a call of the `get` method.  So `r.x` would typecheck just as if you had written `(get @ T _ @ "x" @ _ r)`, where the `@` means type application, and `_` means "some type".  (GHC doesn't support explicit type appplication yet, but you get the idea.)
     35Each of these components embodies choices which are described, with their variants, below.
     37Note that the proposal naturally supports record-polymorphic functions, such as
     39fullName :: (Has r "firstName" String, Has r "lastName" String)
     40         => r -> String
     41fullName r = r.firstName ++ " " ++ r.lastName
     43admittedly with a rather verbose type (but see below).
     46= Design choices and variations =
     48== The `String` type parameter to `Has` ==
     50In the base design, the field is a `String` type parameter to the `Has` class.  Another possiblity would be to invent a new fake class for each field, thus
     52class Has_x (r :: *) (t :: *) where
     53  get_x :: r -> t
     55This seems clunky to me, and I don't really want to deal with this infinite family of classes (eg how are the brought into scope).  I see no disadvantages to a `String` parameter, which will form part of GHC's new kind system.
     57== Scope control by generalising the `String` type in `Has` ==
     59The proposal above doesn't allow label names to be scoped: if one module internally uses `"field"` as a label name then another module can break the abstraction by using the same string `"field"`.
     61We can fix this by instead of having
     63class Has (r :: *) (f :: String)       (t :: *) where
     65having something like
     67class Has (r :: *) (ft :: *) (f :: ft) (t :: *) where
     69(where `ft` stands for field type).
     71The expression
     75(starting with a lowercase letter) would behave as described above, with `ft ~ String`.
     81(starting with an uppercase letter) would use the Field constructor that is in scope, e.g. if we had
     83data FieldT = Field
     85then `ft ~ FieldT`. Then we can choose whether or not to export `FieldT(Field)`.
     87In my opinion, this is ugly, since the selector can be either a type name or a label and the semantics are nonsame.
     88Rather, we need scoped instances.
     91== Should `get` have a proxy argument? ==
     93The type of `get` is very odd:
     95get :: Has r f t => r -> t
     97It's odd because "f" is not nentioned in the type after the "`=>`".  That in turn means you could not write an unambiguous call of `get`, at least not without support for explicit type application (which is another story).
     99But I think it's OK not to be able to write a call to `get`, ''because that's what the dot-notation generates''.  The dot-notation produces a suitably-instantiated call to `get`. 
     101It would be equally possible to have the slightly clunkier definition
     103data Proxy (a :: String)
     105class Has (r :: *) (f :: String) (t :: *) where
     106   get :: r -> Proxy f -> t
     108Now `r.x` would typecheck and desugar as `get r (undefined :: Proxy "x")`.  This choice does not make a major difference either way.
     110== Higher rank types and type functions ==
     112It is very tempting to define `Has` with an associated type, like this:
     114class Has (r :: *) (f :: String) where
     115  type FieldTy r f :: *
     116  get :: r -> FieldTy r f
     118After all, the field type is fixed by the record type, isn't it?
     119For example, we could get these instance declarations:
     121instance Has (T a) "x" where
     122  type FieldTy (T a) "x" = a
     123  get = before...
     124instance Has (T a) "y" where
     125  type FieldTy (T a) "y" = Bool
     126  get = before...
     128But this approach fails for fields with higher rank types.  For example, this should work:
     130data HR = HR { rev :: forall a. [a] -> [a] }
     132f :: HR -> ([Bool], [Char])
     133f r = (r.rev [True], r.rev "hello")
     135Records with polymorphic fields are very important in practice, so it would be a
     136major wart not to support them.
     138However we are not allowed to say
     140instance Has HR "rev" where
     141  type FieldTy HR "rev" = forall a. [a] -> [a]
     142  get (HR rev) = rev
     144because we are not allowed to have a type instance whose RHS is a polytype, because then
     145we don't konw when to instantiate it.  This restriction is '''not''' easy to lift.
     147The base design instead gives `Has` ''three'' parameters, and uses
     148a functional-dependency-like mechanism (but using equalities) for the result type.
     149Using this we can deal with `HR`:
     151instance (t ~ [a] -> [a]) => Has HR "rev" t where
     152  get (HR rev) = rev
     154and all is well.
     156== Virtual record selectors ==
     158Suppose we have
     160data Shape = Rect Float Float | Circle Float | ...
     162area :: Shape -> Float
     163area (Rect x y) = x*y
     164area (Circle r) = ...
     167Can the user write this?
     169instance Has Shape "area" Float where
     170  get = area
     172Then he can write `shape.area` with the expected result.  I'll call
     173`area` a '''virtual record selector''' because it can be used as if it
     174is a record selector, but is implemented with more complex behaviour.
     176The same idea works
     177even for overloaded functions like `fullName` (whose definition appears above):
     179instance (Has r "firstName", Has r "lastName")
     180       => Has r "fullName" String where
     181  get = fullName
     183I see no reason to prohibit virtual record selectors; indeed
     184we would have to go to extra trouble to do so. 
     185On the contrary, one might want a pragma to generate the
     186two lines of boilerplate.
     188== Unboxed fields ==
     190The mechanism does not work at all for records with unboxed fields:
     192data T = MkT { x :: Int# }
     194Reason: type-class constraints can only be instantiate with lifted types.  There is a good reason for this restriction, because unboxed types can have varying widths, so you can't generate code that works uniformly for boxed and unboxed types.
     196This is a real problem, but it is one that we might be willing to live with.  In particular, it's fine to have UNPACKed fields:
     198data T = MkT { x :: {-# UNPACK #-} Int }
     200So you can have efficiently-represented records without having to expose the unboxed types.
     203== Selectors as functions ==
     205Currently a record declaration brings into scope its selectors, so that
     207data T = MkT { x,y :: Int }
     209brings into scope (as top-level functions) `x :: T -> Int` and `y :: T -> Int`.
     211Does that still happen under the new design?  The simple answer is "yes,for backard compatibility",
     212but note that if there were more than one record
     213with the same field name, then there would be ''two'' top level functions both
     214called `x`, so you could not call either of them directly,
     216== Representation hiding ==
     218Currently if you say
     220module M( T( x ) )
     221  data T = MkT { x,y :: Int }
     223then the selector `x` is exported, but not `y`.  This is quite important; we need
     224a way to hide the representation of T, include the existence or otherwise of a 'y' field.
     226What happens under the new system?  If we generate instances for `(Has
     227T "x" Int)` and `(Has T "y" Int)`, then presumably they'd ''both'' be
     228exported (since we have no way to restrict instances).  But that means
     229we cannot ever hide the existence of a field, which seems Plain Wrong.
     231I can think of three solutions, all unsatisfactory in one way or another:
     233 * An instance `(Has T "x" ty)` is exported only if the record selector for
     234   T is exported.  Again, this doesn't work for virtual record selectors;
     235   and it would require a new implementation mechanism to record exactly
     236   which instances were exported from a module.
     238 * An instance `(Has T "x" ty)` is ignored unless the record selector
     239   `x` of type `T` is in scope.  This would require an ''ad hoc'' hack to
     240   the instance lookup mechanism; and it doesn't work at all for
     241   virtual record selectors. 
     243 * Require that every instance of `Has` looks like
     245instance blah => Has T "f" ty where
     246   get = the_getter
     248  where `the_getter` is a named function. 
     249  An instance `(Has T "x" ty)` is ignored unless the function
     250  witnessing the instance (`the_getter`) is in scope.  This has the
     251  merit of working for virtual selectors, but it means that instances
     252  of `Has` must be restricted to a special form (more to specify). 
     253  However it comes with a reasonable story: the instances are simply
     254  a way to choose among the many in-scope "x"'s.
     256 * Use the "Scope control by generalising the `String` type in `Has`" proposal above
     258Solving this issue is important.
     261= Syntax =
     263== The dot notation ==
     265It is critical to support dot-notation. There are quite a few things to think about:
     267 * Dot notation must work in cascades (left-associatively), and with an expression to the left:
     269  r.x
     270  r.x.y
     271  (foo v).y
     274 * There is an overlap with the function composition operator, but that is already true with qualified names.  It comes down  this:
     275   * Function composition will only work when surrounded by spaces, thus `(f . g)`.
     276   * Dot-notation for record selection only works with no spaces.
     278 * The expression `(f r.x)` must, I think, mean `(f (r.x))`.  That is, dot notation binds more tightly than function application.
     280 * Should it be possible to partly apply dot-notation, in a section, like this: `map (.x) ts`?  (Note that the dual, `map (r.) fs` makes no sense.).  I'm inclined to take the simple approach and say "no", you have to write `map (\r -> r.x) ts`.
     282== Syntactic sugar for `Has` ==
     284The type of `fullName` above is rather verbose, and it might be nice to have
     285some syntactic sugar for `Has`, perhaps something like this:
     287fullName :: r { firstName :: String, lastName :: String }
     288         => r -> String
     289fullName r = r.firstName ++ " " ++ r.lastName
     291instance r { firstName :: String, lastName :: String }
     292      => r { fullName :: String } where
     293  get = fullName
     295We replace "`Has r f t`" by "`r { f :: t }`", and allow multiple such constaints
     296with the same `r` to be combined.  Similarly, multiple constraints with the
     297same `r` and `t` could be combined further, just as they are in record declarations:
     299fullName :: r { firstName, lastName :: String }
     300         => r -> String
     305= Record updates =
     307At first it seems that record update could be handled readily, by adding a 'set' field to `Has`:
     309class Has (r :: *) (f :: String) (t :: *) where
     310  get :: r -> t
     311  set :: t -> r -> r
     313Now, similar to dot-notation, record update syntax `(r { x = e1, y = e2 })` would typecheck and
     314desugar as if you had written `(set @ "x" e1 (set @ "y" e2 r))`.  (I've left out a couple of
     315type parameters to save clutter.)
     317There are three problems:
     318 * There is no natural `set` for "pseudo" fields like `area` or `fullName` above.  Maybe this could be solved by stratifying the classes:
     320class Has (r :: *) (f :: String) (t :: *) where
     321  get :: r -> t
     323class Has r f t => Upd (r :: *) (f :: String) (t :: *) where
     324  set :: t -> r -> r
     326 But that adds complexity, and casts into doubt the proposed syntactic sugar for `Has` predicates.
     328 * Haskell 98's record update can change the type of the record
     329 (there is a type-checked response to this bullet point [ here]):
     331data R a = R { x :: a }
     333upd :: R Int -> R Bool
     334upd r = r { x = True }
     336 But `set` cannot accomodate this change of type, at least not without much more complexity.  Moreover, '''NO''' field-at-a-time encoding can deal with the case when more than one field shares the changed type. For example, this is legal Haskell:
     338data S a = S { x,y :: a }
     339upd :: S Int -> S Bool
     340upd s = s { x = True, y = False }
     342 But any system that updates `x` and then `y` will have an ill-typed intermediate in which `x` has a `Bool` while `y` has an `Int`.  This is really the death-knell for any field-at-a-time story that seeks to be compatible with Haskell as she is now.
     344 * This approach doesn't work at all in the higher-rank case.  For example, if `r :: HR`, we are currently allowed to say
     346   r { rev = reverse }
     348 but if we say `set @ "rev" reverse r`, the `reverse` will be instantiated.  Another way to say it is that the putative  instance declaration
     350instance (t ~ [a] -> [a]) => Has HR "rev" t where
     351  set rv r = r { rev = rv }
     353 will be rejected because `rv` is not polymorphic.  Again, this is '''not''' easy to fix.   
     355 This problem seems to be a killer: if record-update syntax is interpreted as a call to `set`, we cannot, ever, use record-update syntax to update a record with a polymorphic field. (We could use alternative notation; instead of `r { rev = reverse }` we could say
     357  case r of { HR { .. } -> HR { rev = reverse, .. } }
     359 where I'm using GHC's dot-dot notation to fill in the missing fields.  But it's clumsy and would get worse if the data type had several constructors.)  And even if we were willing to give up on record-update syntax for polymorphic fields, we'd need a rule to say which fields are or are not made an instance of `Has` and `Upd`.
     361I'm not very happy with any of this.  Faced with these difficulties,
     362another alternative is to stick with the status quo for record
     363updates; that is, not to support any sort of overloading.  But even
     364''that'' is problematic: what does `e { x = True }` mean if there are lots of "x" fields
     365in scope (which is precisely what we want to allow). Haskell's current record-update
     366syntax really relies on learning which type is involved, from the record selector; but if
     367there are many x's, it can't figure it out.  Insisting that only one "x" selector must
     368be in scope puts us back to square one.  I think the only decent solution is to
     369invent a new syntax for monomorphic (non-overloaded) record update, something like this:
     371   HR { r | rev = reverse }
     373In general
     375   <type constructor> { expression | field = expression, ... }
     377Of course this isn't very satisfactory either
     378 * New syntax required
     379 * Not overloaded, so you can't abstract over it.
     381== Alternative Proposal ==
     383First, we define a class for types with a member at 'k':
     386class Has r k v where select :: r -> k -> v;
     389Next, we define a class for types with a mutable member of certain type:
     392class Quasifunctor r s k u v where qfmap :: k -> (u -> v) -> r -> s;
     395`Quasifunctor r s k u v` means that `r` and `s` have members of types `u` and `v`, in turn, both with selector `k`; thus, one can mutate the member at 'k' with an arbitrary function of type `u -> v`, and the overall function is of type `r -> s`; i.e. one can lift a function of type `u -> v` to a function of type `r -> s`. This is the record update.
     397`qfmap` is the lifter function. The first argument serves to specify which member is meant.
     399This ought to allow polymorphic mutation; to set member at "x" to value `x` is simply `qfmap (undefined :: "x") (const x)`, though this mechanism is of course more general; one could `qfmap` any arbitrary function, not just a constant function.
     401For example:
     403data R a = R { x :: a };
     405-- automatically-generated instances
     406instance Has (R a) "x" a where ...
     407instance Quasifunctor (R a) (R b) "x" a b where ...
     409r = R { x = 256 };
     410-- assign string value to r.x
     411let s = qfmap (undefined :: "x") (const "Hello, world!") r in ...
     415= Relationship to Type Directed Name Resolution =
     417This proposal is quite closely related to the [ Type Directed Name Resolution] idea, becuase TDNR
     418would internally generate `Has` constraints exactly as described above.  The difference is
     419that TDNR wasn't intended to support ''abstraction'' over the constraint, and was explained
     420rather differently.