Changes between Initial Version and Version 1 of Records/OverloadedRecordFields/SORF


Ignore:
Timestamp:
Mar 2, 2014 1:27:19 PM (13 months ago)
Author:
adamgundry
Comment:

Import Records/OverloadedRecordFields and update introduction

Legend:

Unmodified
Added
Removed
Modified
  • Records/OverloadedRecordFields/SORF

    v1 v1  
     1= Simple Overloaded Record Fields (SORF)  = 
     2 
     3'''This page describes the original design by Simon PJ for Simple Overloaded Record Fields, which eventually became the [wiki:Records/OverloadedRecordFields OverloadedRecordFields] extension. It is retained for historical purposes, and does not reflect the final design as implemented.''' 
     4 
     5This page summarises a possible design that would allow different records to share a single field label.  Although it's a simple enough idea there are numerous ramifications.  Records are a swamp! Nevertheless, this is the simplest proposal that I know that satisfies the main request, that of having multiple records with the same field name. 
     6 
     7See also a similar [http://research.microsoft.com/en-us/um/people/simonpj/Haskell/records.html 2003 proposal by Simon PJ and Greg Morrisset].  It is essentially the same as the proposal below, but (a) has less detail and (b) adds anonymous record types.   Anonymous type could be an add-on feature to the design described here. 
     8 
     9= The base design = 
     10 
     11The '''base design''' has the following distinct components: 
     12 
     13 * A library class 
     14{{{ 
     15class Has (r :: *) (f :: String) (t :: *) where 
     16  get :: r -> t 
     17}}} 
     18 
     19 * A record declaration generates an instance declaration for each field. For example 
     20{{{ 
     21data T a = T1 { x :: a, y :: Bool } 
     22         | T2 { x :: a } 
     23}}} 
     24 would generate 
     25{{{ 
     26instance (t~a) => Has (T a) "x" t where 
     27  get (T1 x _) = x 
     28  get (T2 x)   = x 
     29instance (t~Bool) => Has (T a) "y" t where 
     30  get (T1 _ y) = y 
     31}}} 
     32 
     33 * Dot-notation would typecheck (and desugar to) as a call of the `get` method.  So `r.x` would typecheck just as if you had written `(get @ T _ @ "x" @ _ r)`, where the `@` means type application, and `_` means "some type".  (GHC doesn't support explicit type appplication yet, but you get the idea.) 
     34 
     35Each of these components embodies choices which are described, with their variants, below. 
     36 
     37Note that the proposal naturally supports record-polymorphic functions, such as 
     38{{{ 
     39fullName :: (Has r "firstName" String, Has r "lastName" String) 
     40         => r -> String 
     41fullName r = r.firstName ++ " " ++ r.lastName 
     42}}} 
     43admittedly with a rather verbose type (but see below). 
     44 
     45----------------------------------- 
     46= Design choices and variations = 
     47 
     48== The `String` type parameter to `Has` == 
     49 
     50In the base design, the field is a `String` type parameter to the `Has` class.  Another possiblity would be to invent a new fake class for each field, thus 
     51{{{ 
     52class Has_x (r :: *) (t :: *) where 
     53  get_x :: r -> t 
     54}}} 
     55This seems clunky to me, and I don't really want to deal with this infinite family of classes (eg how are the brought into scope).  I see no disadvantages to a `String` parameter, which will form part of GHC's new kind system. 
     56 
     57== Scope control by generalising the `String` type in `Has` == 
     58 
     59The proposal above doesn't allow label names to be scoped: if one module internally uses `"field"` as a label name then another module can break the abstraction by using the same string `"field"`. 
     60 
     61We can fix this by instead of having 
     62{{{ 
     63class Has (r :: *) (f :: String)       (t :: *) where 
     64}}} 
     65having something like 
     66{{{ 
     67class Has (r :: *) (ft :: *) (f :: ft) (t :: *) where 
     68}}} 
     69(where `ft` stands for field type). 
     70 
     71The expression 
     72{{{ 
     73foo.field 
     74}}} 
     75(starting with a lowercase letter) would behave as described above, with `ft ~ String`. 
     76 
     77But 
     78{{{ 
     79foo.Field 
     80}}} 
     81(starting with an uppercase letter) would use the Field constructor that is in scope, e.g. if we had 
     82{{{ 
     83data FieldT = Field 
     84}}} 
     85then `ft ~ FieldT`. Then we can choose whether or not to export `FieldT(Field)`. 
     86 
     87In my opinion, this is ugly, since the selector can be either a type name or a label and the semantics are nonsame. 
     88Rather, we need scoped instances. 
     89–strake888 
     90 
     91== Should `get` have a proxy argument? == 
     92 
     93The type of `get` is very odd: 
     94{{{ 
     95get :: Has r f t => r -> t 
     96}}} 
     97It's odd because "f" is not nentioned in the type after the "`=>`".  That in turn means you could not write an unambiguous call of `get`, at least not without support for explicit type application (which is another story). 
     98 
     99But I think it's OK not to be able to write a call to `get`, ''because that's what the dot-notation generates''.  The dot-notation produces a suitably-instantiated call to `get`.   
     100 
     101It would be equally possible to have the slightly clunkier definition 
     102{{{ 
     103data Proxy (a :: String) 
     104 
     105class Has (r :: *) (f :: String) (t :: *) where 
     106   get :: r -> Proxy f -> t 
     107}}} 
     108Now `r.x` would typecheck and desugar as `get r (undefined :: Proxy "x")`.  This choice does not make a major difference either way. 
     109 
     110== Higher rank types and type functions == 
     111 
     112It is very tempting to define `Has` with an associated type, like this: 
     113{{{ 
     114class Has (r :: *) (f :: String) where 
     115  type FieldTy r f :: * 
     116  get :: r -> FieldTy r f 
     117}}} 
     118After all, the field type is fixed by the record type, isn't it? 
     119For example, we could get these instance declarations: 
     120{{{ 
     121instance Has (T a) "x" where 
     122  type FieldTy (T a) "x" = a 
     123  get = ...as before... 
     124instance Has (T a) "y" where 
     125  type FieldTy (T a) "y" = Bool  
     126  get = ...as before... 
     127}}} 
     128But this approach fails for fields with higher rank types.  For example, this should work: 
     129{{{ 
     130data HR = HR { rev :: forall a. [a] -> [a] } 
     131 
     132f :: HR -> ([Bool], [Char]) 
     133f r = (r.rev [True], r.rev "hello") 
     134}}} 
     135Records with polymorphic fields are very important in practice, so it would be a 
     136major wart not to support them. 
     137 
     138However we are not allowed to say 
     139{{{ 
     140instance Has HR "rev" where 
     141  type FieldTy HR "rev" = forall a. [a] -> [a] 
     142  get (HR rev) = rev 
     143}}} 
     144because we are not allowed to have a type instance whose RHS is a polytype, because then 
     145we don't konw when to instantiate it.  This restriction is '''not''' easy to lift. 
     146 
     147The base design instead gives `Has` ''three'' parameters, and uses  
     148a functional-dependency-like mechanism (but using equalities) for the result type. 
     149Using this we can deal with `HR`: 
     150{{{ 
     151instance (t ~ [a] -> [a]) => Has HR "rev" t where 
     152  get (HR rev) = rev 
     153}}} 
     154and all is well. 
     155 
     156== Virtual record selectors == 
     157 
     158Suppose we have 
     159{{{ 
     160data Shape = Rect Float Float | Circle Float | ... 
     161 
     162area :: Shape -> Float 
     163area (Rect x y) = x*y 
     164area (Circle r) = ... 
     165...etc... 
     166}}} 
     167Can the user write this? 
     168{{{ 
     169instance Has Shape "area" Float where 
     170  get = area  
     171}}} 
     172Then he can write `shape.area` with the expected result.  I'll call 
     173`area` a '''virtual record selector''' because it can be used as if it 
     174is a record selector, but is implemented with more complex behaviour. 
     175 
     176The same idea works 
     177even for overloaded functions like `fullName` (whose definition appears above): 
     178{{{ 
     179instance (Has r "firstName", Has r "lastName")  
     180       => Has r "fullName" String where 
     181  get = fullName 
     182}}} 
     183I see no reason to prohibit virtual record selectors; indeed 
     184we would have to go to extra trouble to do so.   
     185On the contrary, one might want a pragma to generate the  
     186two lines of boilerplate. 
     187 
     188== Unboxed fields == 
     189 
     190The mechanism does not work at all for records with unboxed fields: 
     191{{{ 
     192data T = MkT { x :: Int# } 
     193}}} 
     194Reason: type-class constraints can only be instantiate with lifted types.  There is a good reason for this restriction, because unboxed types can have varying widths, so you can't generate code that works uniformly for boxed and unboxed types. 
     195 
     196This is a real problem, but it is one that we might be willing to live with.  In particular, it's fine to have UNPACKed fields: 
     197{{{ 
     198data T = MkT { x :: {-# UNPACK #-} Int } 
     199}}} 
     200So you can have efficiently-represented records without having to expose the unboxed types. 
     201 
     202 
     203== Selectors as functions == 
     204 
     205Currently a record declaration brings into scope its selectors, so that 
     206{{{ 
     207data T = MkT { x,y :: Int } 
     208}}} 
     209brings into scope (as top-level functions) `x :: T -> Int` and `y :: T -> Int`. 
     210 
     211Does that still happen under the new design?  The simple answer is "yes,for backard compatibility", 
     212but note that if there were more than one record 
     213with the same field name, then there would be ''two'' top level functions both 
     214called `x`, so you could not call either of them directly,  
     215 
     216== Representation hiding == 
     217 
     218Currently if you say 
     219{{{ 
     220module M( T( x ) ) 
     221  data T = MkT { x,y :: Int } 
     222}}} 
     223then the selector `x` is exported, but not `y`.  This is quite important; we need  
     224a way to hide the representation of T, include the existence or otherwise of a 'y' field. 
     225 
     226What happens under the new system?  If we generate instances for `(Has 
     227T "x" Int)` and `(Has T "y" Int)`, then presumably they'd ''both'' be 
     228exported (since we have no way to restrict instances).  But that means 
     229we cannot ever hide the existence of a field, which seems Plain Wrong. 
     230 
     231I can think of three solutions, all unsatisfactory in one way or another: 
     232 
     233 * An instance `(Has T "x" ty)` is exported only if the record selector for 
     234   T is exported.  Again, this doesn't work for virtual record selectors; 
     235   and it would require a new implementation mechanism to record exactly 
     236   which instances were exported from a module. 
     237 
     238 * An instance `(Has T "x" ty)` is ignored unless the record selector 
     239   `x` of type `T` is in scope.  This would require an ''ad hoc'' hack to 
     240   the instance lookup mechanism; and it doesn't work at all for  
     241   virtual record selectors.   
     242 
     243 * Require that every instance of `Has` looks like 
     244{{{ 
     245instance blah => Has T "f" ty where 
     246   get = the_getter 
     247}}} 
     248  where `the_getter` is a named function.   
     249  An instance `(Has T "x" ty)` is ignored unless the function  
     250  witnessing the instance (`the_getter`) is in scope.  This has the 
     251  merit of working for virtual selectors, but it means that instances 
     252  of `Has` must be restricted to a special form (more to specify).   
     253  However it comes with a reasonable story: the instances are simply 
     254  a way to choose among the many in-scope "x"'s. 
     255 
     256 * Use the "Scope control by generalising the `String` type in `Has`" proposal above 
     257 
     258Solving this issue is important. 
     259 
     260------------------------ 
     261= Syntax =  
     262 
     263== The dot notation == 
     264 
     265It is critical to support dot-notation. There are quite a few things to think about: 
     266 
     267 * Dot notation must work in cascades (left-associatively), and with an expression to the left: 
     268{{{ 
     269  r.x 
     270  r.x.y 
     271  (foo v).y 
     272}}} 
     273 
     274 * There is an overlap with the function composition operator, but that is already true with qualified names.  It comes down  this:  
     275   * Function composition will only work when surrounded by spaces, thus `(f . g)`. 
     276   * Dot-notation for record selection only works with no spaces. 
     277 
     278 * The expression `(f r.x)` must, I think, mean `(f (r.x))`.  That is, dot notation binds more tightly than function application. 
     279 
     280 * Should it be possible to partly apply dot-notation, in a section, like this: `map (.x) ts`?  (Note that the dual, `map (r.) fs` makes no sense.).  I'm inclined to take the simple approach and say "no", you have to write `map (\r -> r.x) ts`. 
     281 
     282== Syntactic sugar for `Has` == 
     283 
     284The type of `fullName` above is rather verbose, and it might be nice to have 
     285some syntactic sugar for `Has`, perhaps something like this: 
     286{{{ 
     287fullName :: r { firstName :: String, lastName :: String } 
     288         => r -> String 
     289fullName r = r.firstName ++ " " ++ r.lastName 
     290 
     291instance r { firstName :: String, lastName :: String } 
     292      => r { fullName :: String } where 
     293  get = fullName 
     294}}} 
     295We replace "`Has r f t`" by "`r { f :: t }`", and allow multiple such constaints 
     296with the same `r` to be combined.  Similarly, multiple constraints with the  
     297same `r` and `t` could be combined further, just as they are in record declarations: 
     298{{{ 
     299fullName :: r { firstName, lastName :: String } 
     300         => r -> String 
     301}}} 
     302 
     303 
     304------------------------ 
     305= Record updates = 
     306 
     307At first it seems that record update could be handled readily, by adding a 'set' field to `Has`: 
     308{{{ 
     309class Has (r :: *) (f :: String) (t :: *) where 
     310  get :: r -> t 
     311  set :: t -> r -> r 
     312}}} 
     313Now, similar to dot-notation, record update syntax `(r { x = e1, y = e2 })` would typecheck and 
     314desugar as if you had written `(set @ "x" e1 (set @ "y" e2 r))`.  (I've left out a couple of  
     315type parameters to save clutter.)  
     316 
     317There are three problems: 
     318 * There is no natural `set` for "pseudo" fields like `area` or `fullName` above.  Maybe this could be solved by stratifying the classes: 
     319{{{ 
     320class Has (r :: *) (f :: String) (t :: *) where 
     321  get :: r -> t 
     322 
     323class Has r f t => Upd (r :: *) (f :: String) (t :: *) where 
     324  set :: t -> r -> r 
     325}}} 
     326 But that adds complexity, and casts into doubt the proposed syntactic sugar for `Has` predicates. 
     327 
     328 * Haskell 98's record update can change the type of the record 
     329 (there is a type-checked response to this bullet point [https://raw.github.com/ntc2/haskell-records/master/GHCWiki_SimpleOverloadedRecordFields.lhs here]): 
     330{{{ 
     331data R a = R { x :: a } 
     332 
     333upd :: R Int -> R Bool 
     334upd r = r { x = True } 
     335}}} 
     336 But `set` cannot accomodate this change of type, at least not without much more complexity.  Moreover, '''NO''' field-at-a-time encoding can deal with the case when more than one field shares the changed type. For example, this is legal Haskell: 
     337{{{ 
     338data S a = S { x,y :: a } 
     339upd :: S Int -> S Bool 
     340upd s = s { x = True, y = False } 
     341}}} 
     342 But any system that updates `x` and then `y` will have an ill-typed intermediate in which `x` has a `Bool` while `y` has an `Int`.  This is really the death-knell for any field-at-a-time story that seeks to be compatible with Haskell as she is now. 
     343 
     344 * This approach doesn't work at all in the higher-rank case.  For example, if `r :: HR`, we are currently allowed to say 
     345{{{ 
     346   r { rev = reverse }  
     347}}} 
     348 but if we say `set @ "rev" reverse r`, the `reverse` will be instantiated.  Another way to say it is that the putative  instance declaration 
     349{{{ 
     350instance (t ~ [a] -> [a]) => Has HR "rev" t where 
     351  set rv r = r { rev = rv } 
     352}}} 
     353 will be rejected because `rv` is not polymorphic.  Again, this is '''not''' easy to fix.    
     354 
     355 This problem seems to be a killer: if record-update syntax is interpreted as a call to `set`, we cannot, ever, use record-update syntax to update a record with a polymorphic field. (We could use alternative notation; instead of `r { rev = reverse }` we could say 
     356{{{ 
     357  case r of { HR { .. } -> HR { rev = reverse, .. } } 
     358}}} 
     359 where I'm using GHC's dot-dot notation to fill in the missing fields.  But it's clumsy and would get worse if the data type had several constructors.)  And even if we were willing to give up on record-update syntax for polymorphic fields, we'd need a rule to say which fields are or are not made an instance of `Has` and `Upd`. 
     360 
     361I'm not very happy with any of this.  Faced with these difficulties, 
     362another alternative is to stick with the status quo for record 
     363updates; that is, not to support any sort of overloading.  But even 
     364''that'' is problematic: what does `e { x = True }` mean if there are lots of "x" fields 
     365in scope (which is precisely what we want to allow). Haskell's current record-update 
     366syntax really relies on learning which type is involved, from the record selector; but if 
     367there are many x's, it can't figure it out.  Insisting that only one "x" selector must 
     368be in scope puts us back to square one.  I think the only decent solution is to  
     369invent a new syntax for monomorphic (non-overloaded) record update, something like this: 
     370{{{ 
     371   HR { r | rev = reverse } 
     372}}} 
     373In general 
     374{{{ 
     375   <type constructor> { expression | field = expression, ... } 
     376}}} 
     377Of course this isn't very satisfactory either 
     378 * New syntax required 
     379 * Not overloaded, so you can't abstract over it. 
     380 
     381== Alternative Proposal == 
     382 
     383First, we define a class for types with a member at 'k': 
     384 
     385{{{ 
     386class Has r k v where select :: r -> k -> v; 
     387}}} 
     388 
     389Next, we define a class for types with a mutable member of certain type: 
     390 
     391{{{ 
     392class Quasifunctor r s k u v where qfmap :: k -> (u -> v) -> r -> s; 
     393}}} 
     394 
     395`Quasifunctor r s k u v` means that `r` and `s` have members of types `u` and `v`, in turn, both with selector `k`; thus, one can mutate the member at 'k' with an arbitrary function of type `u -> v`, and the overall function is of type `r -> s`; i.e. one can lift a function of type `u -> v` to a function of type `r -> s`. This is the record update. 
     396 
     397`qfmap` is the lifter function. The first argument serves to specify which member is meant. 
     398 
     399This ought to allow polymorphic mutation; to set member at "x" to value `x` is simply `qfmap (undefined :: "x") (const x)`, though this mechanism is of course more general; one could `qfmap` any arbitrary function, not just a constant function. 
     400 
     401For example: 
     402{{{ 
     403data R a = R { x :: a }; 
     404 
     405-- automatically-generated instances 
     406instance Has (R a) "x" a where ... 
     407instance Quasifunctor (R a) (R b) "x" a b where ... 
     408 
     409r = R { x = 256 }; 
     410-- assign string value to r.x 
     411let s = qfmap (undefined :: "x") (const "Hello, world!") r in ... 
     412}}} 
     413 
     414------------------------ 
     415= Relationship to Type Directed Name Resolution = 
     416 
     417This proposal is quite closely related to the [http://hackage.haskell.org/trac/haskell-prime/wiki/TypeDirectedNameResolution Type Directed Name Resolution] idea, becuase TDNR  
     418would internally generate `Has` constraints exactly as described above.  The difference is 
     419that TDNR wasn't intended to support ''abstraction'' over the constraint, and was explained 
     420rather differently.