wiki:Holes

Version 20 (modified by spl, 2 years ago) (diff)

Remove now-unused section

This page discusses the design and potential implementation of "holes" in GHC.

Introduction

Informally, a "hole" is something to be filled. A hole in the ground can cause an injury if you stepped into it, but you can also build a house around it without a problem. You can easily measure a hole to determine how much material (e.g. dirt or concrete) is needed to fill it.

The analogy of a hole in the ground can be transferred to a hole in a program. If you run the program and encounter a hole, the runtime will halt (e.g. as if you encountered undefined). But you can still compile a program with holes. The compiler reports information about the hole, so that the programmer knows what code should replace the hole.

These are the requirements of the problem that holes solve:

  1. Extract information about subterms in a program.
  2. Do not interrupt compilation.

The extracted information from a hole can include, among other things:

  • The expected type of the hole
  • The local bindings (and their types) in the scope of the hole

We first describe related work, including concepts that are similar in other languages as well as other approaches to solving the problem proposed. Then, we discuss the proposal in detail.

Related Work

Goals in Agda

One of the features of the Emacs mode for Agda is the ability to insert a goal, a placeholder for code that is yet to be written. By inserting a ? in an expression, the compiler will introduce a goal. After loading the file (which typechecks it), Agda gives an overview of the goals in the file and their types.

For example:

test : List Bool
test = Cons ? (Cons ? Nil)

gets turned into

test : List Bool
test = Cons { }0 (Cons { }1 Nil)

with the output

?0 : Bool
?1 : Bool

As can be seen here, goals are numbered, and the typechecker returns the inferred type for each of these goals.

Goals can make it a lot easier to write code. They allow typechecking to continue although certain parts of code are missing. They also work as a to-do list for incomplete programs.

Deferring type errors to runtime

The proposal DeferErrorsToRuntime implements a flag that turns every type error into a warning with an associated bit of code that is run when the offending ill-typed term is encountered at runtime. At the time of writing, this is implemented in GHC as -fdefer-type-errors.

Deferring type errors alone is not a solution that fits the problem description; however, it can be used in conjunction with other things to solve the problem.

Inserting deliberate type errors

One technique for finding the type of a subterm that has been often seen on the mailing lists is deliberately inserting ill-typed terms, so that the type error reports the expected type of the term.

Here is an example:

1 + ()

We get an error that there is no instance for Num ().

We can do this is a more refined manner by using undefined with a type annotation:

test :: [Bool]
test = undefined : ((undefined :: ()) ++ [])

This gives the error:

test.hs:2:22:
    Couldn't match expected type `[Bool]' with actual type `()'
    In the first argument of `(++)', namely `(undefined :: ())'
    In the second argument of `(:)', namely `((undefined :: ()) ++ [])'
    In the expression: undefined : ((undefined :: ()) ++ [])
Failed, modules loaded: none.

The advantage of using undefined is that we can remove the type annotation, and the program will probably compile.

The clear problem with deliberate type errors is that the program does not type-check. We cannot use this technique on multiple locations at one time. We also only get type errors and not other information.

With deliberate errors and deferred type errors, we do get a program that type-checks. This is actually a reasonable solution; however, it still has two problems:

  • Deferring type errors is indiscriminate: you defer both the deliberate and unintended type errors.
  • It does not provide useful information other than type errors.

Implicit parameters

The implicit parameters extension allows a programmer to delay the binding of a variable while still preserving its type.

Here is an example:

$ ghci -XImplicitParams
Prelude> :t ?x : (?y ++ [])
?x : (?y ++ []) :: (?x::a, ?y::[a]) => [a]

The implicit parameters ?a and ?b appear as constraints in the type of the term containing them.

Implicit parameters must be bound, either in a term, e.g. let ?x = ... in ..., or in a type signature, e.g. let f :: (?x::a, Num a) => a; f = 1 + ?x. If an implicit parameter is not bound, it results in a type error. Of course, we can defer the type errors, but then we have the same problem with indiscriminate deferral.

Implicit parameters do not serve very well for debugging. Due to the binding requirement, they are not suitable for typing things with unknown types. Either you must change type signatures or you must deal with the unbound implicit parameter type errors (or warnings in case the errors are deferred). Lastly, since implicit parameters are meant for usage in programs, it does not seem like they should be used for extracting additional information about the parameter's location. (This is not a technical argument, of course.)

A proposed concrete design for Haskell

Here is a specification of the proposed user-level view of the design.

A hole in a term is written _?thing. Example:

test : List Bool
test = Cons _?x (Cons _?x Nil)

If -XHoles is set, we want the following:

  1. The program should be type-checked as if every hole _?h is replaced by undefined, except:
    • If type-checking would fail due to unsolved constraints that could be solved by giving a type to a hole.
  2. If the program is well-typed (as above), then:
    • The types of all holes should be reported.
    • Reporting the hole types should not cause type-checking (or compiling in general) to stop (in error).
  3. (optional) If the program is ill-typed, then:
    • The types of all holes should be reported.

The above should hold true with and without the monomorphism restriction. In particular, if an undefined somewhere in a program type-checked with the monomorphism restriction would cause type-checking to fail, then a hole in that same position should also cause type-checking to fail.

The type of a hole should be the resolved type with minimum constraints. That is, the type of a hole should only have constraints that have not be solved but are either inferred from the context (e.g. show _?h) or given in a type annotation/signature (e.g. _?h :: Show a => a.


Stuff below here might be out of date; please edit

How this could be implemented in GHC

These two approaches are not ideal, they either don't give enough information, or they hinder using the code.

Agda-style

The simplest way would be to implement them in the same way as Agda: add a new syntax (we shall use two underscores as an example here, __) to denote a hole, and after typechecking, show the user a list of all the types of all the holes in their source files. In what cases "after typechecking" we do this is still subject of discussion. We expect at the very least to show it after (re)loading a module into GHCi or typechecking an expression in GHCi directly (:t).

Example:

test :: [Bool]
test = __ : (__ ++ [])

Theoretical output:

> :l test.hs
[1 of 1] Compiling Main             ( test.hs, interpreted )
Found a hole at test.hs:2:6-7: Bool
Found a hole at test.hs:2:12-13: [Bool]
> 

Named holes

Implicit parameters have some good features too: they can be named, and so used in multiple places. In the Agda-style, this would require let-binding a hole, which is a lot of effort for something that should be a temporary placeholder. So one idea is to allow giving holes a name, just like implicit parameters.

For example:

test :: [Bool]
test = _a : (_b ++ [])

test2 = _c ++ _b

Theoretical output:

> :l test.hs
[1 of 1] Compiling Main             ( test.hs, interpreted )
Found a hole _a: Bool
Found a hole _b: [Bool]
Found a hole _c: [Bool]
> 

These could either be made shared within all functions in module, or not shared between functions at all (so not the confusing situation with implicit parameters, which are only shared when required).

Not holes, ranges

Holes can be useful for finding the type of something that still needs to be written, but a more general way of looking at it is this: it is currently quite easy to typecheck a complete expression, for example with :t, in GHCi. However, finding the type of a part of an expression within a module is hard. In a large, complicated function it could be useful to ask the compiler for the types of certain subexpressions. :t does not help here, as the function's parameters and where/let/lambda-bound terms are not in scope. It would be useful if it were possible to annotate code to ask the compiler to give you the type found for the annotated expression.

Simple example (let {_ _} denote a request for the type here):

test :: [Bool]
test = {_ undefined _} : ({_ undefined ++ [] _})

Could result in:

> :l test.hs
[1 of 1] Compiling Main             ( test.hs, interpreted )
Found type of undefined (test.hs:2:11-19): Bool
Found type of undefined ++ [] (test.hs:2:30-38): [Bool]

The same effect of holes can then be achieved by using {_ undefined _} . To return to the conciseness of holes, __ could be syntactic sugar for {_ undefined _}. (Note that defining __ = {_ undefined _} in Haskell would not do this. The type would be forall a. a.)

Not ranges, but types

A variation of the previous proposal that is a bit more powerful and less syntactically intrusive is to implement this in the type language. So giving an expression (or even a pattern) a type of __ would leave this type arbitrary and make GHC print the type. This subsumes the previous proposals, e.g.

test :: [Bool]
test = (undefined::__) : (undefined ++ [] ::__)

and again a __ on the term level could be syntactic sugar for (undefined::__).

The benefit from this variant is that __ could occur as parts of types as well, e.g.

test :: [__]
test = (undefined::__) : (undefined ++ [True] ::__)