Version 6 (modified by simonpj, 6 years ago) (diff)


Overloaded record fields: a design proposal

This page summarises a possible design that would allow different records to share a single field label. Although it's a simple enough idea there are numerous ramifications. Records are a swamp!

See also a similar 2003 proposal by Simon PJ and Greg Morrisset. It is essentially the same as the proposal below, but (a) has less detail and (b) adds anonymous reccord types. Anonymous type could be an add-on feature to the design described here.

The base design

The base design has the folllowing distinct components:

  • A library class
    class Has (r :: *) (f :: String) (t :: *) where
      get :: r -> t
  • A record declaration generates an instance declaration for each field. For example
    data T a = T1 { x :: a, y :: Bool }
             | T2 ( x :: a }
    would generate
    instance (t~a) => Has (T a) "x" t where
      get (T1 x _) = x
      get (T2 x)   = x
    instance (t~Bool) => Has (T a) "y" t where
      get (T1 _ y) = y
  • Dot-notation would typecheck (and desugar to) as a call of the get method. So r.x would typecheck just as if you had written (get @ T _ @ "x" @ _ r), where the @ means type application, and _ means "some type". (GHC doesn't support explicit type appplication yet, but you get the idea.)

Each of these components embodies choices which are described, with their variants, below.

Note that the proposal naturally supports record-polymorphic functions, such as

fullName :: (Has r "firstName" String, Has r "lastName" String)
         => r -> String
fullName r = r.firstName ++ " " ++ r.lastName

admittedly with a rather verbose type (but see below).

Design choices and variations

The String type parameter to Has

In the base design, the field is a String type parameter to the Has class. Another possiblity would be to invent a new fake class for each field, thus

class Has_x (r :: *) (t :: *) where
  get_x :: r -> t

This seems clunky to me, and I don't really want to deal with this infinite family of classes (eg how are the brought into scope). I see no disadvantages to a String parameter, which will form part of GHC's new kind system.

Should get have a proxy argument?

The type of get is very odd:

get :: Has r f t => r -> t

It's odd because "f" is not nentioned in the type after the "=>". That in turn means you could not write an unambiguous call of get, at least not without support for explicit type application (which is another story).

But I think it's OK not to be able to write a call to get, because that's what the dot-notation generates. The dot-notation produces a suitably-instantiated call to get.

It would be equally possible to have the slightly clunkier definition

data Proxy (a :: String)

class Has (r :: *) (f :: String) (t :: *) where
   get :: r -> Proxy f -> t

Now r.x would typecheck and desugar as get r (undefined :: Proxy "x"). This choice does not make a major difference either way.

Higher rank types and type functions

It is very tempting to define Has with an associated type, like this:

class Has (r :: *) (f :: String) where
  type FieldTy r f :: *
  get :: r -> FieldTy r f

After all, the field type is fixed by the record type, isn't it? For example, we could get these instance declarations:

instance Has (T a) "x" where
  type FieldTy (T a) "x" = a
  get = before...
instance Has (T a) "y" where
  type FieldTy (T a) "y" = Bool 
  get = before...

But this approach fails for fields with higher rank types. For example, this should work:

data HR = HR { rev :: forall a. [a] -> [a] }

f :: HR -> ([Bool], [Char])
f r = (r.rev [True], r.rev "hello")

Records with polymorphic fields are very important in practice, so it would be a major wart not to support them.

However we are not allowed to say

instance Has HR "rev" where
  type FieldTy HR "rev" = forall a. [a] -> [a]
  get (HR rev) = rev

because we are not allowed to have a type instance whose RHS is a polytype, because then we don't konw when to instantiate it. This restriction is not easy to lift.

The base design instead gives Has three parameters, and uses a functional-dependency-like mechanism (but using equalities) for the result type. Using this we can deal with HR:

instance (t ~ [a] -> [a]) => Has HR "rev" t where
  get (HR rev) = rev

and all is well.

Virtual record selectors

Suppose we have

data Shape = Rect Float Float | Circle Float | ...

area :: Shape -> Float
area (Rect x y) = x*y
area (Circle r) = ...

Can the user write this?

instance Has Shape "area" Float where
  get = area 

Then he can write shape.area with the expected result. I'll call area a virtual record selector because it can be used as if it is a record selector, but is implemented with more complex behaviour.

The same idea works even for overloaded functions like fullName (whose definition appears above):

instance (Has r "firstName", Has r "lastName") 
       => Has r "fullName" String where
  get = fullName

I see no reason to prohibit virtual record selectors; indeed we would have to go to extra trouble to do so. On the contrary, one might want a pragma to generate the two lines of boilerplate.

Selectors as functions

Currently a record declaration brings into scope its selectors, so that

data T = MkT { x,y :: Int }

brings into scope (as top-level functions) x :: T -> Int and y :: T -> Int.

Does that still happen under the new design? The simple answer is "yes,for backard compatibility", but note that if there were more than one record with the same field name, then there would be two top level functions both called x, so you could not call either of them directly,

Representation hiding

Currently if you say

module M( T( x ) )
  data T = MkT { x,y :: Int }

then the selector x is exported, but not y. This is quite important; we need a way to hide the representation of T, include the existence or otherwise of a 'y' field.

What happens under the new system? If we generate instances for `(Has T "x" Int) and (Has T "y" Int)`, then presumably they'd both be exported (since we have no way to restrict instances). But that means we cannot ever hide the existence of a field, which seems Plain Wrong.

I can think of three solutions, all unsatisfactory in one way or another:

  • An instance (Has T "x" ty) is exported only if the record selector for T is exported. Again, this doesn't work for virtual record selectors; and it would require a new implementation mechansim to record exactly which instances were exported from a module.
  • An instance (Has T "x" ty) is ignored unless the record selector x of type T is in scope. This would require an ad hoc hack to the instance lookup mechanism; and it doesn't work at all for virtual record selectors.
  • Require that every instance of Has looks like
    instance blah => Has T "f" ty where
       get = the_getter
    where the_getter is a named function. An instance (Has T "x" ty) is ignored unless the function witnessing the instance (the_getter) is in scope. This has the merit of working for virtual selectors, but it means that instances of Has must be restricted to a special form (more to specify). However it comes with a reasonable story: the instances are simply a way to choose among the many in-scope "x"'s.

Solving this issue is important.


The dot notation

It is critical to support dot-notation. There are quite a few things to think about:

  • Dot notation must work in cascades (left-associatively), and with an expression to the left:
      (foo v).y
  • There is an overlap with the function composition operator, but that is already true with qualified names. It comes down this:
    • Function composition will only work when surrounded by spaces, thus (f . g).
    • Dot-notation for record selection only works with no spaces.
  • The expression (f r.x) must, I think, mean (f (r.x)). That is, dot notation binds more tightly than function application.
  • Should it be possible to partly apply dot-notation, in a section, like this: map (.x) ts? (Note that the dual, map (r.) fs makes no sense.). I'm inclined to take the simple approach and say "no", you have to write map (\r -> r.x) ts.

Syntactic sugar for Has

The type of fullName above is rather verbose, and it might be nice to have some syntactic sugar for Has, perhaps something like this:

fullName :: r { firstName :: String, lastName :: String }
         => r -> String
fullName r = r.firstName ++ " " ++ r.lastName

instance r { firstName :: String, lastName :: String }
      => r { fullName :: String } where
  get = fullName

We replace "Has r f t" by "r { f :: t }", and allow multiple such constaints with the same r to be combined. Similarly, multiple constraints with the same r and t could be combined further, just as they are in record declarations:

fullName :: r { firstName, lastName :: String }
         => r -> String

Record updates

At first it seems that record update could be handled readily, by adding a 'set' field to Has:

class Has (r :: *) (f :: String) (t :: *) where
  get :: r -> t
  set :: t -> r -> r

Now, similar to dot-notation, record update syntax (r { x = e1, y = e2 }) would typecheck and desugar as if you had written (set @ "x" e1 (set @ "y" e2 r)). (I've left out a couple of type parameters to save clutter.)

There are three problems:

  • There is no natural set for "pseudo" fields like area or fullName above. Maybe this could be solved by stratifying the classes:
    class Has (r :: *) (f :: String) (t :: *) where
      get :: r -> t
    class Has r f t => Upd (r :: *) (f :: String) (t :: *) where
      set :: t -> r -> r
    But that adds complexity, and casts into doubt the proposed syntactic sugar for Has predicates.
  • Haskell 98's record update can change the type of the record:
    data R a = R { x :: a }
    upd :: R Int -> R Bool
    upd r = r { x = True }
    But set cannot accomodate this change of type, at least not without much more complexity. Maybe one solution here is to give up on type-checking updates.
  • This approach doesn't work at all in the higher-rank case. For example, if r :: HR, we are currently allowed to say
       r { rev = reverse } 
    but if we say set @ "rev" reverse r, the reverse will be instantiated. Another way to say it is that the putative instance declaration
    instance (t ~ [a] -> [a]) => Has HR "rev" t where
      set rv r = r { rev = rv }
    will be rejected because rv is not polymorphic. Again, this is not easy to fix.

This problem seems to be a killer: if record-update syntax is interprested as a call to set, we cannot, ever, use record-update syntax to update a record with a polymorphic field. (We could use alternative notation; instead of r { rev = reverse } we could say

  case r of { HR { .. } -> HR { rev = reverse, .. } }

where I'm using GHC's dot-dot notation to fill in the missing fields. But it's clumsy and would get worse if the data type had several constructors.) And even if we were willing to give up on record-update syntax for polymorphic fields, we'd need a rule to say which fields are or are not made an instance of Has and Upd.

I'm not very happy with any of this. Faced with these difficulties, another alternative is to stick with the status quo for record updates; that is, not to support any sort of overloading. But even that is problemantic: what does e { x = True } mean if there are lots of "x" fields in scope (which is precisely what we want to allow). Haskell's current record-update syntax really relies on learning which type is involved, from the record selector; but if there are many x's, it can't figure it out. Insisting that only one "x" selector must be in scope puts us back to square one. I think the only decent solution is to invent a new syntax for monomorphic (non-overloaded) record update, something like this:

   HR { r | rev = reverse }

In general

   <type constructor> { expression | field = expression, ... }

Of course this isn't very satisfactory either

  • New syntax required
  • Not overloaded, so you can't abstract over it.

Relationship to Type Directed Name Resolution

This proposal is quite closely related to the Type Directed Name Resolution idea, becuase TDNR would internally generate Has constraints exactly as described above. The difference is that TDNR wasn't intended to support abstraction over the constraint, and was explained rather differently.