Version 22 (modified by GregWeber, 6 years ago) (diff)


See Records for the bigger picture. This is a proposal to solve the records name-spacing issue with simple name-spacing and simple type resolution.

This approach is an attempt to port the records solution in Frege, a haskell-like language on the JVM. Please read Sections 3.2 (primary expressions) and 4.2.1 (Algebraic Data type Declaration - Constructors with labeled fields) of the Frege user manual Many thanks to the Frege author, Ingo Wechsung for explaining his implementation and exploring this implementation territory for us.

The DDC language (again, very much like Haskell, but focused on effect tracking and an overall different conceptual approach to purity) puts forth a similar solution. See the thesis section 2.7 - 2.7.4 pages 115 - 119

Better name spacing

In Haskell, you can look at an occurrence of any identifier f or M.f and decide where it is bound without thinking about types at all. Broadly speaking it works like this:

  • For qualified names, M.f, find an import that binds M.f.
  • For unqualified names, f, find the innermost binding of f; or, if that takes you to top level, look for top level binding of f or an import that binds f.

If there is ambiguity (eg two imports both import something called f) then an error is reported. And that's what happens for the Record and RecordClash example above.

So the proposed solution for record field names is to specify more precisely which one you mean by using the type name. Note that a data declaration now creates a module-like namespace, so we aren't so much using the type name as using the data type namespace in the same way we use a module namespace.

So you could say Record.a or RecordClash.a rather than a, to specify which field selector you mean. The difficulty here is that it's hard to know whether you are writing <module-name>.f or <record-name>.f. That is, is Record the name of a type or of a module? (Currently it legally could be both.)

The module/record ambiguity is dealt with in Frege by preferring modules and requiring a module prefix for the record if there is ambiguity. So if your record named Record was inside a module named Record you would need Record.Record.a. I think we could improve upon this case to prefer a record rather than the name of the existing module, which should not need to be referenced. So just Record.a.

However, we still have the case of conflicting imports between the names of modules and records. We have the choice of either requiring a module prefix or making this a compilation error. Generally, programmers will avoid this situation by doing what they do now: structuring their programs to avoid name collisions. We can try and give the greater assistance in this regard by providing simpler ways for them to alter the names of import types.

One way to avoid the Module.Record.x problem is to use type alias names, for example:

data InconvenientName = X { f :: Int }
type IN = InconvenientName
-- IN.f is the same as InconvenientName.f

Alternative name-spacing techiniques

Use the module name space mechanism. But putting each record definition in its own module is a bit heavyweight. So maybe we need local modules (just for name space control) and local import declarations. Details are unclear. (This was proposed in 2008 in this discussion on the Haskell cafe mailing list and in #2551. - Yitz).

Rather than strictly re-use modules it may make more sense to have a name-spacing implementation construct that is shared between both records and modules - hopefully this would make implementation easier and unify behavior. In the Frege approach, each data declaration is its own namespace - if we were to go this far (instead of stopping purely at records) there may be much less need for local namespaces. Overall this seems to be more of an interesting implementation detail than a concrete design proposal relating to records. -- Greg Weber.

Getting rid of the Verbosity with the dot operator

We have name-spaces, but the equivalent is already being accomplished by adding prefixes to record fields: data Record = Record { recordA :: String }

Verbosity is solved in Frege by using the dot syntax concept. In data Record = Record {a::String};r = Record "A"; r.a The final r.a resolves to Record.a r. See below for how we resolve the type of this code.

Details on the dot

This proposal requires the current Haskell function composition dot operator to have spaces on both sides. No spaces around the dot are reserved for name-spacing: this use and the current module namespace use. No space to the right would be partial application (see TDNR. The dot operator should bind as tightly as possible.

Given the dot's expanded use here, plus its common use in custom operators, it is possible to end up with dot-heavy code.

quux (y . (foo>.<  bar).baz (f . g)) moo

It's not that easy to distinguish from

quux (y . (foo>.<  bar) . baz (f . g)) moo

What then is the future of the dot if this proposal is accepted? I think the community need to chose among 2 alternatives:

1) discourage the use of dot in custom operators: >.< is bad, use a different character or none: >< 2) discourage the use of dot for function composition - use a different operator for that task. Indeed, Frege users have the choice between <~ or the proper unicode dot.

Simple type resolution

Frege has a detailed explanation of the semantics of its record implementation, and the language is *very* similar to Haskell. After reading the Frege manual sections, one is still left wondering: how does Frege implement type resolution for its dot syntax. The answer is fairly simple: overloaded record fields are not allowed. So you can't write code that works against multiple record types. Please see the comparison with Overloading in [wiki Records], which includes a discussion of the relative merits. Note that the DDC thesis takes the same approach.

Back to simple type resolution. From the Frege Author:

  • Expressions of the form T.n are trivial, just look up n in the namespace T.
  • Expressions of the form x.n: first infer the type of x. If this is just an unbound type variable (i.e. the type is unknown yet), then check if n is an overloaded name (i.e. a class operation). If this is not the case, then x.n is not typeable. OTOH, if the type of x can be inferred, find the type constructor and look up n in the associated name space.

Under no circumstances, however, will the notation x.n contribute in any way in inferring the type of x, except for the case when n is a class operation, where an appropriate class constraint is generated.

Note that this means it is possible to improve upon Frege in the number of cases where the type can be inferred - we could look to see if there is only one record namespace containing n, and if that is the case infer the type of x -- Greg Weber

So lets see examples behavior from the Frege Author:

For example, lets say we have:

data R = R { f :: Int }

bar R{f=42} = true
bar R{} = false

foo r = bar r || r.f==43
baz r = r.f==47 || bar r

foobaz r = r.f

Function bar has no difficulties, after desugaring of the record patterns it's just plain old pattern matching.

Function foo is also ok, because through the application of r to bar the type checker knows already that r must be an R when it arrives at r.f

Function baz is ok as long as the type checker does not have a left to right bias (Frege currently does have this bias, but will hopefully be improved).

The last function foobaz gives a type error too, as there is no way to find out the type of r.

Hence, the records in Frege are a very conservative extension to plain old algebraic data types, actually all record constructs will be desugared and reduced to non-record form in the way I have described in the language reference. For example, the data R above will become:

data R = R Int where
    f (R x) = x

To be sure, the where clause is the crucial point here. It puts f in the name space R. The global scope is not affected, there is nothing named f outside the R namespace.

The record namespace is searched only in 3 cases:

  • when some name is explicitly qualifed with R: R.f
  • when the type checker sees x.f and knows that x::R
  • In code that lives itself in the namespace R, here even an unqualified f will resolve to R.f (unless, of course, if there is a local binding for f)

Increased need for type annotation

This is the only real downside of the proposal. The Frege author says:

I estimate that in 2/3 of all cases one does not need to write T.e x in sparsely type annotated code, despite the fact that the frege type checker has a left to right bias and does not yet attempt to find the type of x in the code that "follows" the x.e construct (after let unrolling etc.) I think one could do better and guarantee that, if the type of x is inferrable at all, then so will be x.e (Still, it must be more than just a type variable.)

Syntax for updates (in the Frege manual)

  • the function that updates field x of data type T is T.{x=}
  • the function that sets field x in a T to 42 is T.{x=42}
  • If a::T then a.{x=} and a.{x=42} are valid
  • the function that changes field x of a T by applying some function to it is T.{x <-}

The function update syntax is a new addition to Haskell that we do not need to immediately implement.

Alternative approach: using tuple selectors

let { r.x = x'; r.y = y'; r.z = z'; } in r

If we allow tuples of selectors:

r.(x, y, z) = (r.x, r.y, r.z)

then one can simply write

let r.(x, y, z) = (x', y', z') in r   

Interaction with Typeclasses

In the Frege system, the record's namespace is closed where it is defined. However, making a record an instance of a class puts the class functions in the record name-space.

module RExtension where

import original.M(R)    -- access the R record defined in module original.M

class Rextension1 r where
      f :: .....
      g :: .....

instance Rextension1 R where
     -- implementation for new functions

And now, in another module one could

import RExtension()      -- equivalent to qualified import in Haskell

the new functions f and g are accessible (only) through R. So we have a technique for lifting new functions into the Record namespace. For the initial records implementaion we probably want to maintain f and g at both the top-level and through the name-space. See below for a discussion of future directions.

Compatibility with existing records

Seems like it should be OK to use old records in the new system playing by the new rules, although those records likely already include some type of prefixing and would be quite verbose. There is a chance for deeper though on this issue.

Partial application

see TDNR syntax discusion. .a r == r.a

Potential Downside: mixing of 2 styles of code

data Record = Record { a::String }
b :: Record -> String

let r = Record "a" in b r.a 

It bothers some that the code does not look like the previous b a r - chiefly that the record is now in the middle. Chaining can make this perception even worse: (e . d) r.a.b.c

Is it possible we can have an equivalent of the dot that changes the ordering? b a.@r is possible, but requires an operator that binds tightly to the right.

Partial Application

Partial application provides a potential solution: b . .a $ r

So if we have a function f r = b r.a then one can write it points-free: b . .a

Our longer example from above: e . d . .c . .b . .a

At first glance it may look odd, but it is starting to grow on me. Also let us consider real use with longer names:

echo . delta . .charlie . .beta . .alpha

Is there are more convenient syntax for this? b <.a Note that a move to a different operator for function composition (see brief discussion of the dot operator above) would make things much clearer: b <~ .a, where the unicode dot might be even nicer

Extending data name-spacing and dot syntax

This is mostly just something interesting to contemplate.

Dot syntax does not have to be limited to records (although it probably should be for the initial implementation until this new record system is vetted). I think it is a bad idea to attempt to attempt to extend the dot syntax to accomplish general function chaining through extending the dot syntax. However, it is consistent to extend the function name-spaced to a record data type concept to any data type (as it is in Frege), and use dot syntax for that. The dot (without spaces) *always* means tapping into a namespace (and simple type resolution).

Placing functions within a data name-space can make for nicer data-structure oriented code where the intent is clearer. It can help to achieve the data-oriented goal of OO (without the entanglement of state). With control over how the data namespace is exported (similar to controlling module namesapces), it is possible to create virtual record field setters and getters that can be accessed through dot syntax.

Both Frege and the DDC thesis take this approach.

In this brave new world (see above where typeclass functions are also placed under the namespace of the data), there are few functions that *absolutlely must* be at the top level of a module. Although a library author might take attempt the approach of no top-level functions, obviously it will still be most convenient for users to define functions at the top level of modules rather than have to lift them into data structures.