wiki:TypeDirectedNameResolution

Version 8 (modified by simonpj@…, 4 years ago) (diff)

--

Proposal: TypeDirectedNameResolution

Ticket #129
Dependencies
Related ExistingRecords

See also

Exploiting the power of the dot

What do I envy about object-oriented languages? Not state, not subtyping, not inheritance. But I do envy the power of type-based name resolution. Here's what I mean:

  • Programers can explore a library, with the help of an IDE, by typing "x.", at which point a list of x's methods pops up.
  • x's methods can have short, perspicous names. "x.reset" calls x's reset method, while "y.reset" calls y's. The types (ie classes) to which x and y belong may be completely different; but both classes can have a method called "reset" without confusion. (And similarly for fields.)

What I only realised recently is that I was envying a feature that has a cultural connection to OO, but that turns out to be fully compatible with a functional language. This page describes a possible extension to Haskell, which I'll call Type-directed name resolution, or TDNR, intended to add the power of the dot to Haskell.

The problem

In Haskell, when we have several similarly named functions, we reply on the module system to say which one we mean. For example:

  module Foo where
    import Button( Button, reset ) as B
    import Canvas( Canvas, reset ) as C

    f :: Button -> Canvas -> IO ()
    f b c = do { B.reset b; C.reset c }

Here I'm using a qualified name to say which reset I mean. And in principle an IDE could pop up a list of what's imported from B if I type "B.".

This is sufficient, but it is just sufficiently inconvenient that people don't use it much. It is tantalising that knowing that b is a Button, there is only one reset that can possibly work. Indeed, any red-blooded Java programmer would expect to write

    f :: Button -> Canvas -> IO ()
    f b c = do { b.reset; c.reset }

The same story applies in spades if 'reset' is actually the name of a record field. Object-oriented folk are (literally) incredulous when they find that two different data types cannot use the same field name! To get that effect we must define the two data types in different modules, and import them as above. For example, maybe module Button looks like this:

  module Button where
    data Button = MkButton { reset :: IO (), .... }

(and similarly Canvas). In Haskell98, here is how the client looks:

  module Foo where
    import Button( Button(..) ) as B
    import Canvas( Canvas(..) ) as C

    mkBut :: IO () -> Button
    mkBut r = Button { B.reset = r, .... }

    updBut :: IO () -> Button -> Button
    updBut r b = b { B.reset = r, .... }

Even though it is crystal clear in mkBut that the only reset that could possibly make sense is the one from B, we must still say which one. It's slightly less crystal clear in updBut, so Haskell 98 is being consistent here. Nevertheless, GHC has an extension -XDisambiguateRecordFields that allows you to omit teh "B." in mkBut.

The proposal

The goal of TDNR is identical to that of qualified names: to offer an alternative way to specify which reset (among perhaps several that are in scope) is intended by an occurrence of "reset" in the source code. TDNR is not "overloading" in the sense of type classes.

The basic design is simple:

  • There is a new lexeme, of the form '.' varid, which we call dot_var. The part after the dot a simple unqualified name (not a qualified name).
  • Extend the syntax of atoms:
    atom ::= var_qual              -- Qualified variable name
          |  var_unqual            -- Unqualified variable name
          |  '(' expr ')'          -- Parenthesised expression
          |  atom dot_var          -- TDNR invocation
    
    For example, a, a.f and a.f.g are atoms.
  • The dynamic semantics of a.f is simply reverse application (f a).
  • Unlike normal unqualified variable occurrences, it is legal for there to be many f's in scope. To resolve which one is intended, find the type of a, and the type of all of the in-scope f's. If there is exactly one f whose type matches that of a, that resolve the occurrence of f. Otherwise the program is in error.

That's it.

Syntax

TDNR is driven by a new syntactic form "a.f" which, for lack of a better term, we call a "TDNR invocation". That is what makes the type-directed name resolution spring into action. Otherwise it is inactive. You are free to write "(f a)" instead, but then you must use Haskell's module system to disambiguate which f you mean.

I have deliberately used dot "." for this feature, for several reasons:

  • It's standard practice, and that counts for a lot.
  • Selecting a field from a record is a particularly convenient special case, and the standard notation for that is "r.f".
  • TDNR is doing the same job as Haskell's existing qualified names, so it makes sense to use the sane notation.

Of course, there is a problem: Haskell already uses dot for (a) qualified names, and (b) the function composition operator.

Concerning function composition, TDNR uses the same solution as for qualified names: if the dot is surrounded by spaces it means function composition; qualified names and TDNR are both written without spaces.

The overlap with qualified names is a bit more tiresome. A qualified name for a non-data-constructor (such as "M.a") always finishes with a lower-case name, or operator. So a subsequent ".f" must be a use of TDNR. That is why I only allow var_unqual (e.g. "a") or var_qual (e.g. "M.a") to the left of the dot, but not con_unqual (e.g. "C") or con_unqual (e.g. "M.C"). The latter are indistinguishable from qualified names. If you want a TDNR invocation for a constructor you must use parentheses, thus "(C).f" or "(M.C).f". But it's rare to want this.

Another choice I have made is that you can say (h 3).f, i.e. a TNDR application with a function call to the left. In Java, people often write strings of dotted things, like "x.f(3).g(v,w).h". In Haskell with TDNR this would be rendered "((x.f 3).g v w).h". I can't say I like all those stacked-up parentheses, but I can't see a better way to do it.

Does TDNR bind more or less tightly than function application. That is, does "f x.g" mean "(f x).g" or "f (x.g)"? If TDNR is regarded as a kind of extension of qualified names, it ought to be the latter.

Stacking operations

Another possibility for syntax is this:

expr ::= atom
      |  expr atom       -- Application
      |  expr dot_occ    -- TDNR invocation
      |  ...

atom ::= var             -- Unchanged
      | '(' expr ')'
      | ...

-- dot_occ is a lexeme of form ".v"

Here the form ".v" is a lexical token. The two forms (f x) and (x .f) are application and TDNR invocation respectively. Both forms associate to the left, so x .f 3 .g 7 5 means (((((x .f) 3) .g ) 7) 5). This means that you can stack up successive TDNR invocations without parens:

  m .lookup key
    .snd
    .reverse

which means reverse(snd (lookup m key)). And that in turn means the same as

  reverse . snd . (\m -> lookup m key) $ m

There is something odd about this, however. We'd really like to write

  x .reverse
    .filter isEven
    .map double

but that doesn't work because the list is filter's last argument, not its first. Maybe you should be able to write

  x .reverse .(filter isEven) .(map double)

(Of course the spaces can be omitted). And that would be fine provided we allowed this:

expr ::= atom | expr atom 
       | expr dot_occ
       | expr '.(' var expr1 .. exprn ')
       ...

where f .(g x) means (g x f). The oddness here is that the TDNR invocation can "look inside" the .(..) to see the function at the head. (And it had better BE a function, too.)

I think this is probably worth it, although it's a little odd. To me, the ability to "stack up" postfix operations is rather important, and the fact that it doesn't fit nicely is the biggest shortcoming of this whole proposal. Can anyone improve it?

Discussion

Works with any function

Notice that, although record field names work great with TDNR, nothing requires the "x" in "a.x" to be a record field name. The function "x" can be any old function whose type looks like "t1 -> t2". So the first argument of a curried function is special. For example, consider a finite map module with signatures like this:

data Map k v
lookup :: Ord k => Map k v -> k -> Maybe v
insert :: Ord k => Map k v -> k -> v -> Map k v
elems  :: Map k v -> [v]

Then, if m :: Map k v, we can write

  m.lookup :: k -> Maybe v
  m.insert :: k -> v -> Map k v
  m.elems  :: [v]

This is good if there are several sorts of maps or association lists floating around, each with their own lookup and insert functions.

Details of name resolution

It's very important that whether or not a program typechecks should be independent of the order in which the typechecker traverses the program. Consider

  f x = (x.reset, x::Button)

In a right-to-left order the type checker discovers that x has type Button, so it can resolve which reset you mean. But if it worked left to right, it would know nothing about x. To maintain the same behaviour in both cases, the type inference engine should generate a name-resolution constraint if it can't immediately figure out the answer, in order to defer the choice. Then the constraint can be solve at the end.

In a sense, this is like type-class constraints, except that these special constraints are never quantified over; they serve only to defer name resolution until the type of x is known.

Multiple records with the same field

So far we have assumed Haskell 98 rules, apart from the TDNR extension. So it is still illegal to say

  data S = MkS { x,y :: Int }
  data T = MkT { x,y :: Float }

because that would define two top-level functions, both called x. (And similarly y.) But we probably really want to allow this. The problem is that you could then only refer to x and y using TDNR, and I rather dislike that; I would prefer TDNR to be a convenient abbreviation for something one could write more verbosely. If you like, what is their "original name"?

I can't see any decent way to be able to directly name "the x from S". But it's not untenable. If you wanted to pass S's selector x to map, say, you'd have to eta-expand:

  map (\s::S. s.x) ss

Multiple values with the same name

Do we want to allow this?

  get :: S -> Int
  get s = s.x + s.y

  get :: T -> Int
  get t = t.x + t.y

That is, do we want to allow two top-level functions, both called get? Arguably yes: if we allow both S and T to be defined in the same module, even though they have the same field names, it'd make sense to allow their accessor functions to defined too.

But, again I could only refer to them using TDNR and, worse, there is nothing in general to force them to have types that are amenable to TDNR. This latter is different to record fields (whose types are inherently suitable). So I propose to allow a module to contain muliple types with the same field; but not multiple functions with the same name.

Record syntax

TDNR can apply to record construction and pattern-matching, as indeed already happens in GHC with -XDisambiguteRecordFields. However I propose not to apply it to record update, a construct that is already rather complicated to typecheck. I do not want to make it worse.

Section-style selection

We have the option to allow '(.x)' as a valid expression, with its meaning given by the translation

  (.x)  ===  (\f -> f.x)

This allows things like

  map (.x) rs

and means that (.x) f is equivalent to f.x. The syntax and meaning is consistent with right-section, although it is not really a right-section.

What about (.x.y)? Does that expand to (\f -> f.x.y)?