Version 5 (modified by guest, 6 years ago) (diff)


Declared Overloaded Record Fields (DORF)

Explained in 5 wiki pages (these proposals are linked but somewhat orthogonal):

Application Programmer's view

This proposal is addressing the "narrow issue" of namespacing for record field names.

I'm avoiding giving implementation details here -- see:

<DORF -- Implementor's view>; and <DORF -- comparison to SORF>

I'm not saying anything about field selection via pattern matching or record construction using explicit data constructors -- those are to behave as currently (using the approach per ‑XDisambiguateRecordFields and friends).

Currently in Haskell two records in the same module can't share a field name. This is because declaring a field name within a data decl creates a monomorphic selector function; and if it's monomorphic, we can only have one. I think the wiki is characterising the problem incorrectly:

  • it's not that the field name appearing in different record decls is ambiguous between the two record types so we need some (syntactical) way of choosing between the different definitions;
  • rather, we have one field name, and we lack the syntax/semantics for sharing it between different records.

An example: let's say I have a database application with a field (meaning type) customer_id. Then it appears in records for name and address, pricing, order entry, etc. This is not a name 'clash', it's 'intended sharing'. (It really galls me to even put it that way for explanatory purposes. Really it's the same customer_id.)

In data model design you'd typically go about identifying all the fields (types aka attributes) and putting them in a data dictionary. Then you'd construct your records from them. You might (possibly) put the data dictionary in a distinct module, for easy maintenance. But you'd certainly want all the customer-related records in the same module. So a data decl:

    data Customer_NameAddress = Cust_NA { customer_id :: Int, ... } 

is not declaring customer_id, it's using (or instancing) an already-declared field for customer_id. Similarly, if I have a family of objects, all with a rese' method, that's not umpteen methods with a 'clash' of names, it's one method with umpteen instances. (And I might create a family of record structures to describe each object, and store the reset method within it.)

What's more, the Haskell 98 field selector (auto-created from the data decl) is half-way to what we want. It's a function:

    customer_id :: Customer_NameAddress -> Int

The DORF proposal generalises that signature: if you want to share a field across different records, its selector function needs to be overloaded to this type:

    customer_id :: r{ customer_id :: Int } => r -> Int

The r{ ... } is syntactic sugar for the constraint meaning "record r has field customer_id at type Int".

We need a way to declare that a name is available as an overloadable field name (roughly speaking, a class/method definition), proposed syntax:

    fieldLabel customer_id :: r -> Int

(The r{ ... } is added by the desugarer.)

The -> Int means the field's domain (type) is Int -- it's just a type. We might also want to constrain the record -- for example to be sure it is savable to persistent storage:

    fieldLabel unitPrice :: (Save r, Num t) => r -> t

(Again the r{ ... } gets added as a further constraint.)

Now we can use the field in a record, and that in effect declares an instance for the field/record. All these definitions are in the same module:

    data Customer_NameAddress = ... (as above)
    data Customer_Price a = Num a => Cust_Price {
                                       customer_id :: Int,
                                       product_id  :: Int,
                                       unit_Price  :: a,
                                       ... }
    data Customer_Order = Cust_Order { customer_id :: Int, ... }

Then a field selection expression like:

... (customer_id r) ... -- H98 style field application

uses familiar type instance resolution to figure out from record type r how to extract the customer_id.

[Possibly that expression could be:

... r.customer_id ...

See <Dot as Postfix Func Apply> for that dot notation, but note that nothing in this proposal assumes dot notation will be needed.]

From here upwards, the r{ ... } constraint is just a constraint, and gets merged with other constraints. For example, you could define a function:

    fullName r = (firstName r) ++ " " ++ (lastName r)  -- per SPJ

The type inferred would be:

    fullName :: r{ firstName, lastName :: String} => r -> String

which is eliding:

    fullName :: (r{ firstName :: String}, r{ lastName :: String })
                 => r -> String

And if you think that's very close to the type of a field selector function, you'd be right. Here's some more examples of pseudo- or 'virtual' fields, using dot notation:

    date.dayOfWeek        -- not a field: calculated from the date
    name.middleInitial    -- extract from the name field
    tuple.fst             -- Prelude functions

[Since they're just functions, they can use dot notation -- or not: personal preference.]

Modules and qualified names for records

Do these field selector functions have a special scope in some way? No! They're just functions. They can be exported/imported.

We can't stop some other developer creating an application/package with a field customer_id which is incompatible with ours. (Say a Sales Order entry application where customer_id is a String, to merge with our Accounts Receivable.) So do we have a problem if someone wants to import both?

No! This is regular business-as-usual familiar name clash, and it's what the module system is designed to handle. The field selectors are just functions, we can use them qualified:

    (My.customer_id myCust)        <===> myCust.My.customer_id
    (Their.customer_id theirCust)  <===> theirCust.Their.customer_id
    (My.customer_id r)       -- fails if r is from the 'other' module

Import/Export and Representation hiding

[See <No Mono Record Fields>, which is implied by DORF.]

Since there is only a single (overloaded) field selector function created, we either have to export it always, or hide it always (that is, we can't control which record instances get exported).

The field selector function is separately declared vs. the records and their fields, so must be exported separately. For example:

{-# OPTIONS_GHC -XDeclaredOverloadedRecordFields             #-}
module M( x )       where
    fieldLabel x,y :: r -> Int
    data T = MkT { x, y :: Int }

Here only the field selector function x is exported. The representation is abstract, the client can't construct or dismantle a record type T;

field y is hidden altogether.

If you say:

{-# OPTIONS_GHC -XDeclaredOverloadedRecordFields
                -XNoMonoRecordFields                   #-}
module M( T( x ) )       where
    fieldLabel x,y :: r -> Int
    data T = MkT { x, y :: Int }

then you are exporting the x field within record type T, but _not_ the field selector x (nor the generated type 'peg' Proxy_x).

Type T and field label x are exported, but not data constructor MkT, so x is unusable.

The existence of field y is hidden altogether.

Field Update for Overloadable Record Fields

You can (continue to) use pattern matching and data constructor tagging for record update:

    case r of {
     Cust_Price {unit_Price, ..}
          -> Cust_Price {unit_Price = unit_Price * 1.05, .. }
    }         -- increases Price by 5%

(This uses ‑XDisambiguateRecordFields, -XRecordWildCards and ‑XNamedFieldPuns -- all mature GHC extensions.)

The new part is polymorphic record update:

    myPrice{ unit_Price = 72 :: Int }

Returns a record with same fields as myPrice, except a different unit_Price. Note that the update can change the type of a field (if the record declaration is polymorphic).

Note that upon first encountering that expression, we don't know the record types (because unit_Price is overloaded). So the types initially inferred are:

    <expr>  :: r { unit_Price :: Int } => r
    myPrice :: _r{ unit_Price :: t }   => _r

That is, the update might be changing the record type as well as the field type -- in case that the record type is parametric over the field type.

Behind the scenes, the update syntax with an expression prefix to the { ... } is syntactic sugar for a call to the polymorphic record update method set:

    set (undefined :: Proxy_unit_Price) (72 :: Int) myPrice

[See <DORF -- Implementor's view> for what the Proxy is doing.]

Normal type inference/instance resolution will find the record type for myPrice, and therefore the correct instance to apply the update.

You can update multiple fields at the same time:

    myCustNA { firstName = "Fred", lastName = "Dagg" }

[There's a poor story to tell here in implementation terms: we split into two calls to set, one nested inside the other. It's wasteful to build the intermediate record. Worse, the two fields' types might be parametric in the record type or polymorphically related (perhaps one is a method to apply to the other), then we get a type failure on the intermediate record.]

Some discussion threads have argued that Haskell's current record update syntax is awkward. The DORF proposal is to implement field update using a polymorphic function. Once this is implemented, alternative syntax could be explored, providing it desugars to a call to set.

This is a brief point-by-point comparison to

I'll only dwell on where the DORF proposal differs.

The 'Base Design' is very similar: a library class Has', methods get/set, with an instance generated for each declared record/field. Field selection (either prefix per H98 or dot notation) is desugared to a call to the get' method.

The String' type parameter to Has', and Scope control

DORF uses a Proxy type for the type/kind-level 'peg' on which to hang the instances. Using a Proxy is a superficial difference, and mostly for practical reasons -- I was building a proptotype in 7.2.1 (which doesn't have user-definable or `String' Kinds).

There is, however, a significant difference on scoping/namespacing: the Proxy type (being a type) is bound by the usual module scoping, export/import and module qualification, etc. Are (`String') Kinds bound by module scope?

The module namespacing in DORF is deliberate, and for exactly the same reason as namespacing in general: a different developer in a different module might 'accidentally' create a clashing name which is unrelated (that is, a name for a field or a record or both). As far as DORF is concerned, these are different names, so need the module name qualification.

In contrast: "The [SORF] proposal ... doesn't allow label names to be scoped: if one module internally uses "field" as a label name then another module can break the abstraction by using the same string "field"." SPJ goes on to discuss a work-round which he sees as "ugly", and concludes "we need scoped instances". DORF does not need any innovations around instances or type inference.

Should `get' have a Proxy argument? "This choice does not make a major difference either way." [SPJ]. DORF likewise doesn't care. The prototype does use a Proxy argument, because it's implemented using types not Kinds, and GHC doesn't (yet) have type application.

Higher Rank Types and Type Functions DORF follows SORF in using a "functional-dependency-like mechanism (but using equalities) " to manage the type inference for Has/get/set.

Virtual record selectors Are a valuable feature, and particularly valuable because they 'look' just like record field selectors. DORF supports them as ordinary (overloaded) functions, defined in terms of field selectors (so that their types are inferred to need the records/fields). Under DORF, virtual record selectors do not need Has' instances, so don't run into the problem of what to do about the definition for set'. (Perhaps DORF has a contrary problem: what if we do want a `set' for a virtual field? -- for example to update a fullName into firstName and lastName. There's food for thought in Wadler's original paper that led to View Patterns "Views: A way for pattern matching to cohabit with data abstraction" [1987], 4. "Viewing a complex number in cartesian and polar coordinates". We may want our implementation of complex to be abstract. We provide (pseudo-) fields to select the coordinates. Then they're ever-so like methods for an (abstract) object. Also we want the (pseudo-) fields to be updatable, which means we need an instance for Has/get/set for fields that are not explicit in the record.) Selectors as functions We need to support H98-style record selectors. One of the design objectives for DORF is that field selectors continue to be functions, just as H98. (The difference is that DORF's are overloaded whereas H98's are monomorphic.) You can use dot notation with H98 field selectors without changing the H98 record definition.

Representation hiding See the discussion under <No Mono Record Fields>. SORF has an issue, because (in effect) it needs to control export of instances. "Solving this issue is important" [SPJ]. DORF doesn't have this issue, because the field selector is just a function, so the module system can control its visibility, as usual. Syntax - The dot notation DORF pretty much agrees with SORF, including in keeping it simple by not supporting part-applied dot. Note that DORF does not rely on dot notation whatsoever -- it's purely syntactic sugar for function application. See <Dot as Postfix Function Apply> Syntax - Syntactic sugar for `Has' SPJ is spot-on. DORF follows SORF.

Record updates DORF follows SORF in having a set' method within class Has'. It's definition and instances are tricky, to support changing the type [] of records/fields, and higher-ranked fields. SPJ is "not very happy with any of this" (that is, any of the SORF approach to field update). DORF has implemented record update using (admitted) hacks. DORF has tried to hide the implementation behind syntactic sugar (for the `Has' constraint), to make available some design space for improvements. (Including possibly the Alternative proposal using Quasifunctor.) To answer SPJ's question "what does e { x = True } mean if there are lots of "x" fields in scope? ": Under DORF, there is only a single "x" in scope(overloaded to appear in many record types). So

e { x = True } :: r{ x :: Bool} => r

[] changing the type of records fields at update seems to be a required feature. DORF has implemented it to be compatible with H98, but this adds considerable complexity. We avoid the question of whether it's desirable. DORF does not have the issues trying to support or avoid set' for virtual fields, because those are not defined using Has'. (Source code that attempts to update via them, such as :

myCust { fullName = "Fred Dagg" }

will get type failure:

no instance Has Customer_NameAddress Proxy_fullName String

or earlier still (if we implement DORF using proxies):

no type Proxy_fullName

Relationship to Type Directed Name Resolution [TDNR] DORF is in some ways a reversion to something closer to TDNR (especially dot notation "works with any function"). DORF, SORF and TDNR all use a `Has' constraint, generated from the record/field declaration. Unlike TDNR, DORF has no special syntax to trigger name resolution. (DORF's dot postfix notation is just syntactic sugar for function application, and name 'resolution' is type-directed only in the sense of familiar instance resolution.) But the crucial difference is that TDNR still treats multiple declarations of the same field name (in different record types) as being different names. DORF treats these as multiple instances of the _same_ name.

Currently (in Haskell 98), declaring a record field generates a field selector function, for example:

data Rec = Rec { f :: String } -- generates:

Rec -> String

And f is 'just a function', to be applied in prefix form.

The DORF Overloaded Record Fields proposal also generates field selectors as 'just functions' (to be overloaded for all the record types thay appear in).

As with TDNR, I propose an additional syntactic form r.f for selecting field f from record r (of type Rec). The dynamic syntax is simply reverse application r.f <===> (f r) (So for overloaded field selectors, we use usual type-directed instance resolution.)

But dot notation must be written with no spaces around the dot -- because it has strange syntactic properties that mean it isn't 'just an operator'.

The reasoning for using dot is exactly as SPJ adumbrates in TDNR ("it is critical to support dot notation") -- essentially, it's standard practice (both for selecting fields from records in data manipulation languages such as SQL, and selecting methods from objects on OO languages). Postfix application supports an object -> action pattern of thinking: focus on the object, then apply some action; or the UNIX pipes concept of the data 'flowing' left to right.

The dot syntax is to be optional (and is orthogonal to DORF -- you can use DORF purely pre-fix). (There are voiciferous calls from some on the discussion thread not to use dot syntax, and/or not to use postfix function application. I'd rather get DORF adopted, so I don't want to get into a confrontation over lexical syntax.)

Suggested compiler flag ‑XDotPostfixFuncApply.

The syntax could use some other character than dot (hash # has been suggested), but there's a danger that is already a user-defined operator in existing code (or in some unrelated GHC extension). There are good syntactic reasons for using dot (see below), besides that it is a convention familiar from other programming paradigms.

Note this proposal differs significantly from others for dot syntax, such as: (TDNR)

Dot notation can be used for any function, not just record fields (as with TDNR, but different to SORF). This supports pseudo- or virtual fields. The declaration:

fullName r = r.firstName ++ " " ++ r.lastName -- SPJ's example

creates fullName as just a function, not a field. But we can still use dot syntax, and here are some other similar examples:

customer.fullName shape.area -- also from SPJ date.dayOfWeek -- not a field: calculated from the date name.middleInitial -- extract from the name field tuple.fst -- Prelude functions list.head list.length

That is, we can use this syntax to 'select' attributes or properties from structures. (Because these are 'virtual', there is no update function.)

Dot notation's "strange syntactic properties"

Dot Apply must bind tighter than function application. This is unlike any other operator. We want:

map toUpper customer.lastName ===> map toUpper (lastName customer)

m.lookup key -- method `lookup' from object m ===> (lookup m) key

Postfix dots can be stacked up, and bind leftwards:

shape.position.xCoord ===> (shape.position).xCoord -- not! shape.(position.xCoord) ===> (xCoord (position shape))

But to facilitate postfix style, there are occasions where we want a loose binding form. We could of course use parentheses, but the requirement is just like loose-binding prefix function application provided by Prelude ($). Suggested operator:

(.$) = flip ($)

(This is an ordinary Haskell operator, but this proposal is asking to reserve the namespace.)

Two examples adapted from SPJ's TDNR wiki, and avoiding the 'something odd' he notices:

m.lookup key

.$ snd .$ reverse

record.list .$ reverse

.$ filter isEven .$ map double .$ foldr (+) 0 -- sum .$ ( 2) -- square

Using Dot notation amongst qualified names

This must be valid (note no embedded spaces):


The syntax rule is:

A name to the left of a dot starting upper case is a module, and the dot binds most tightly. A name to the left starting lower case is postfix apply, and binds less tightly, but tighter than usual function apply. A name at the rightmost end starting upper case is valid, it must be a data constructor. You can use parentheses to override the binding. (And parens would be needed where embedding a data constructor.)

Why no embedded spaces? -- And a code-breaking change

In case anybody's worrying about parsing the long dotted block above, this is already valid Haskell 98:


"It graphically appeals to the notion of a function composed of several functions", according to a poster resistant to dot as postfix function apply.

(Applying the "function" to "hello" yields 'E'.) This is equivalent:

Data.Char.toUpper . Prelude.head . Prelude.tail

That is, there's already a dot operator (function composition), so the 'good syntactic reason' (per above) for using dot as postfix function apply is that we can be sure it's already reserved as Haskell syntax.

The code-breaking change is:

Function composition will only work when the dot is surrounded by spaces.

Dot with no space either side is to change to mean postfix function application (tight-binding).

Dot with space one side only is to change to be invalid syntax -- it's too confusing to assume what's meant.

Note that if 'one-sided' dot were valid as partial application of postfix notation (section style):

(.f) ===> (\r -> r.f) -- eta-expand

===> (\r -> (f r)) -- desugar the dot ===> f -- eta-reduce

So map (.f) ts would mean map f ts.

(r.) -- makes no sense as a section, as SPJ points out.

If you really, really want a section:

(.$ f) (r .$)

There has been lengthy discussion about the interaction of dot syntax and record/field selection -- see the thread starting:

Thank you for everybody's contributions, they've significantly tuned the proposal as it went along.

Relationship to the proposal for Declared Overloaded Record Fields (DORF)

I've concluded that dot notation is controversial (contrary to SPJ's assumption when TDNR started).

So to repeat: DORF does not rely on postfix dot notation, neither does postfix dot notation rely on DORF.

They're related because DORF relies on field selectors being functions; and field selection being function application -- for which postifx dot provides familiar syntax. (That is, familiar from other programming paradigms.)