wiki:Records

Version 35 (modified by GregWeber, 2 years ago) (diff)

--

Records in Haskell

This Yesod blog post, and accompanying Reddit discussion brought to the surface again the thorny issue of records in Haskell.

There are two rather different sets of issues:

  • The narrow issue: namespacing for record field names. Currently in Haskell two records in the same module can't share a field name. This is sometimes extremely painful. This page is about the narrow issue.
  • The broad issue: first class record types. In Haskell there is no "record type" per se. Rather, you can simply give names to the fields of a constructor. Records are not extensible and there is no polymorphism on records.

This page focuses exclusively on the first, narrow issue of disambiguating record field names. We have a separate Wiki page, ExtensibleRecords, on the broad issue of first class record types.

On this page I'd like to summarise the problem, and specify alternative designs. So far it is mostly a skeleton: please fill it out. The idea is to hold a discussion by email (ghc-users?) but to collect results (alternative designs, trade-offs, pros and cons) here, because email threads quickly get lost. Simon PJ.

The problem: record name spacing

(Quoting the Yesod blog.) Consider

data Record = Record { a :: String }
data RecordClash = RecordClash { a :: String }

Compiling this file results in:

record.hs:2:34:
    Multiple declarations of `Main.a'
    Declared at: record.hs:1:24
                 record.hs:2:34

In the Persistent data store library, Yesod works around the issue by having the standard of prefixing every record field with the record name (recordA and recordClashA). But besides being extremely verbose, it also limits us from experimenting with more advanced features like a partial record projection or an unsaved and saved record type.

The verbose name-spacing required is an in-your-face, glaring weakness telling you there is something wrong with Haskell. This issue has been solved in almost every modern programming languages, and there are plenty of possible solutions available to Haskell.

Solutions

So we have decided to avoid the extensible record debate, but how can we have multiple record field selectors in scope and correctly resolve the type of the record?

  1. Overloading: polymorphic selection & update; see Records/OverloadedRecordFields
  2. Namespacing: simple name-spacing & type resolution; see Records/NameSpacing
  3. Are there any other approaches?

The benefit of Overloading over Namespacing is being able to write code that works against any Record with a given field. So I can have a function:

getA = r.a

and that can work for both Record and RecordClash because they both have a field a. With Namespacing this will fail to type check unless the compiler can determine the type of r is either Record or RecordClash. The advantage of Namespacing is that the implementation is clear, straightforward, and has already been done (in the Frege language, whereas there are still questions as to the feasibility of Overloading). Overloading has seen other downsides in practice. In the words of the Frege author, who abandoned Overloading:

  • only very inefficient code could be generated, if you have to access or update a field of some unknown record. In the end, every record type was basically a map.
  • it turned out that errors stemming from mistyping a field name often could not be diagnosed at the point where they were committed, but led to inferred types with crazy signatures and an incomprehensible type error at the use side of the function that contained the error.
  • the extra constraints complicated the type checker and did not play well with higher kinded type variables (at least in the code I had then, I do not claim that this is nessecarily so).

Type directed name resolution

The discussion has many similarities with the original Type directed name resolution proposal: the question seems to be largely about nailing down a concrete implementation. The original TDNR proposal had Overloading in mind, but Namespacing ends up having similarities. -- Greg Weber

All of the name-space mechanisms require some level of user-supplied disambiguation: if there are two fields a in scope, you must use a qualified name to disambiguate them. What is tantalising about this is that the type of the argument immediately specifies which one you mean. There is really no ambiguity at all, so it is frustrating to have to type qualified names to redundantly specify that information. Object-oriented languages take for granted this form of type-directed disambiguation.

One particular way of integrating this idea into Haskell is called Type Directed Name Resolution (TDNR). Proposed a couple of years ago, the Haskell community didn't like it much. (But I still do; SLPJ.)

I believe the community rejected TDNR because they wanted extensible records. I think it is a shame that the debate over *extensible* records has resulted in holding back any form of progress , but I do think that the TDNR proposal seems a little weak for some reasons pointed out in the proposal itself, but also because it proposes not to solve name-spacing record updates. Note that the Frege proposal incorporates the TDNR syntax, and it tackles record updates. -- Greg Weber

Attachments (1)

Download all attachments as: .zip