wiki:Holes

Version 36 (modified by spl, 2 years ago) (diff)

Respond to SLPJ on runtime error

This page discusses the design and potential implementation of "holes" in GHC. Discussion on this feature in GHC is in #5910. The development repository is here, and implementation issues are here.

Introduction

Informally, a "hole" is something to be filled. A hole in the ground can cause an injury if you stepped into it, but you can also build a house around it without a problem. You can easily measure a hole to determine how much material (e.g. dirt or concrete) is needed to fill it.

The analogy of a hole in the ground can be transferred to a hole in a program. If you run the program and encounter a hole, the runtime will halt (e.g. as if you encountered undefined). But you can still compile a program with holes. The compiler reports information about the hole, so that the programmer knows what code should replace the hole.

These are the requirements of the problem that holes solve:

  1. Extract information about subterms in a program.
  2. Do not interrupt compilation.

The extracted information from a hole can include, among other things:

  • The expected type of the hole
  • The local bindings (and their types) in the scope of the hole

We first describe related work, including concepts that are similar in other languages as well as other approaches to solving the problem proposed. Then, we discuss the proposal in detail.

Related Work

Goals in Agda

One of the features of the Emacs mode for Agda is the ability to insert a goal, a placeholder for code that is yet to be written. By inserting a ? in an expression, the compiler will introduce a goal. After loading the file (which typechecks it), Agda gives an overview of the goals in the file and their types.

For example:

test : List Bool
test = Cons ? (Cons ? Nil)

gets turned into

test : List Bool
test = Cons { }0 (Cons { }1 Nil)

with the output

?0 : Bool
?1 : Bool

As can be seen here, goals are numbered, and the typechecker returns the inferred type for each of these goals.

Goals can make it a lot easier to write code. They allow typechecking to continue although certain parts of code are missing. They also work as a to-do list for incomplete programs.

Deferring type errors to runtime

The proposal DeferErrorsToRuntime implements a flag that turns every type error into a warning with an associated bit of code that is run when the offending ill-typed term is encountered at runtime. At the time of writing, this is implemented in GHC as -fdefer-type-errors.

Deferring type errors alone is not a solution that fits the problem description; however, it can be used in conjunction with other things to solve the problem.

Inserting deliberate type errors

One technique for finding the type of a subterm that has been often seen on the mailing lists is deliberately inserting ill-typed terms, so that the type error reports the expected type of the term.

Here is an example:

1 + ()

We get an error that there is no instance for Num ().

We can do this is a more refined manner by using undefined with a type annotation:

test :: [Bool]
test = undefined : ((undefined :: ()) ++ [])

This gives the error:

test.hs:2:22:
    Couldn't match expected type `[Bool]' with actual type `()'
    In the first argument of `(++)', namely `(undefined :: ())'
    In the second argument of `(:)', namely `((undefined :: ()) ++ [])'
    In the expression: undefined : ((undefined :: ()) ++ [])
Failed, modules loaded: none.

The advantage of using undefined is that we can remove the type annotation, and the program will probably compile.

The clear problem with deliberate type errors is that the program does not type-check. We cannot use this technique on multiple locations at one time. We also only get type errors and not other information.

With deliberate errors and deferred type errors, we do get a program that type-checks. This is actually a reasonable solution; however, it still has two problems:

  • Deferring type errors is indiscriminate: you defer both the deliberate and unintended type errors.
  • It does not provide useful information other than type errors.

Implicit parameters

The implicit parameters extension allows a programmer to delay the binding of a variable while still preserving its type.

Here is an example:

$ ghci -XImplicitParams
Prelude> :t ?x : (?y ++ [])
?x : (?y ++ []) :: (?x::a, ?y::[a]) => [a]

The implicit parameters ?a and ?b appear as constraints in the type of the term containing them.

Implicit parameters must be bound, either in a term, e.g. let ?x = ... in ..., or in a type signature, e.g. let f :: (?x::a, Num a) => a; f = 1 + ?x. If an implicit parameter is not bound, it results in a type error. Of course, we can defer the type errors, but then we have the same problem with indiscriminate deferral.

Implicit parameters do not serve very well for debugging. Due to the binding requirement, they are not suitable for typing things with unknown types. Either you must change type signatures or you must deal with the unbound implicit parameter type errors (or warnings in case the errors are deferred). Lastly, since implicit parameters are meant for usage in programs, it does not seem like they should be used for extracting additional information about the parameter's location. (This is not a technical argument, of course.)


Proposal

In this section, we discuss the proposed extension to GHC.

Language extension

Since we are changing the syntax and semantics of Haskell, we feel that this should become a language extension (rather than another kind of compiler flag). For now, we proposed the name Holes (as in -XHoles), though this could change after discussion.

Syntactic placement of holes

We view a hole as a piece of syntax that is inserted in code as a placeholder until the programmer fills that location with something else. That placeholder can not only allow the program to be compiled but also allow the compiler to report information that is significant to the location of the placeholder.

Numerous views on the syntax and semantics of holes have been suggested. We first classify the various options by the syntactic categories of types or expressions. Then, we classify them within each category and describe the various trade-offs.

Types

Here is an example:

f :: Bool -> _ ()
f x = if x then return () else (undefined :: _) >>= guard

We show a hole (with the wildcard syntax _) in several locations: a type signature and a type annotation. In the signature, a hole fills in for a type constructor, and in the annotation, the hole fills in for the type of undefined. Upon typing, we expect to learn the types that are inferred for these holes. For example, we might get the following report:

Found a hole in the signature: test :: Bool -> _ ()
Source location: ...
Inferred type: MonadPlus m => m :: * -> *

Found a hole in the annotation: undefined :: _
Source location: ...
Inferred type: MonadPlus m => m Bool :: *

Note that all type variables should be the same for all reports, so all types above refer to the same m.

Expressions

Here is an example:

f x = do
  y <- _ x
  y `mplus` _

We show a monadic expression with holes (_) in the function position of an application and as an argument to mplus. Upon typing, we expect to learn the types that would be inferred for the missing expressions in these holes. For example, we might get the following report:

Found a hole in the expression: _ x
Source location: ...
Inferred type: MonadPlus m => a -> m b

Found a hole in the expression: y `mplus` _
Source location: ...
Inferred type: MonadPlus m => m b

Again, note that the type variables are universally quantified over all reports, not each report separately.

Comparison

If we had to choose, should holes be in types or expressions? Let's look at the advantages and disadvantages.

Kinds. With types, we can put a hole in a type constructor position (see example above), so we have more flexibility in the restrictions we can place on a particular type: e.g. _ () or Monad m => m _. With expressions, we are limited to the kinds of types that expressions can have (e.g. * and not * -> *).

Information reporting. We would like to extract additional information from a hole other than just the type. The primary consideration is the set of available bindings and their types. With type (annotation) holes, it is not clear if that is possible. With expression holes, it seems intuitive.

Evaluation. With expressions, we must consider how holes are evaluated. We would expect _ would be treated the same as undefined, so this may cause unexpected program death when evaluating a hole. With types, we are not concerned with this. Type holes are never evaluated, so they cannot break a well-typed program.

Expression size. Type annotation holes can be used to find the type of arbitrarily large expressions, relying only on the normal expression bracketing. Expression holes (as presented so far) only replace a whole expression with a placeholder. Another approach would be needed to support expression hole bracketing.

Verbosity. Type annotation holes, e.g. undefined :: _, are possibly too verbose for simple uses, while expression holes are straightforward.

Extensions to basic holes

In the previous section, we described holes as simple "wildcard" syntax that can be put in either a type or an expression. There are a few basic extensions or variations to this idea.

Names

We might consider the wildcard syntax _ to be an "anonymous" or "unnamed" hole. We could also support a named hole, e.g. _?x. Named holes have two advantages over wildcards.

Better reporting. The reports can now explicitly mention the hole (in addition to the source location). For example:

Found the hole _?x in the expression: _?x + 4
...

Multiple uses. When a named hole is used in multiple source locations, intuitively, that named hole should have the type resulting from the unification of the hole types at each location. For example:

f y = print (y + _?x)
g z = mappend z _?x

This might produce the following report:

Found the hole _?x in the expressions:
  print (y + _?x) at source location ...
  mappend z _?x at source location ...
Inferred type: Show a, Num a, Monoid a => a

It seems intuitive how named holes work in expressions. How would they work in types?

Brackets (ranges)

As we mentioned in #Comparison, the expression wildcard does not allow finding out the type of an existing expression. A special pair of brackets could be used to do this.

Assuming {_ and _} are the brackets for a hole, we have this example:

f y = print {_ y + mempty _}

And the report might be:

Found hole brackets around the expression: y + mempty
Source location: ...
Inferred type: Num a, Monoid a => a

We could combine the naming and bracketing extensions into one.

User's view

For this specification, we use the named term variables variant (though it may also apply to other variants).

When using holes (i.e. -XHoles is set), we expect the following:

  1. The program should type-check as if every hole _?h is replaced with undefined. See Ambiguous types for an exception to this rule.
  2. If the program is well-typed (as above), then:
    • The types of all holes should be reported.
    • Assuming no other errors, the program should compile and run.
    • If running a program causes a hole to be evaluated, the evaluation should fail with an runtime error. See Runtime error for an example.
  3. (optional) If the program is ill-typed, then:
    • The types of all holes should be reported.

Ambiguous types

Suppose that we replace every hole with undefined and type-check the program without -XHoles. Some programs would not type-check due to unsolved class constraints that result in ambiguous types. For example, show _?h becomes show undefined, and the Show constraint is not instantiated, so the program does not type-check.

We think holes can be extremely useful with ambiguous types. We prefer that a program, in which there is a hole with unsolved constraints on the type of the hole, still be considered well-typed, assuming the rest of the program is well-typed. In the above example, we would expect show _?h to have the type String with the reported hole type _?h :: Show a => a.

SLPJ do you want programs with these ambiguity errors to run too? Or what? Can you give a complete little example module, with the error messages you expect, whether you expect it to run, and if so what should happen?

Monomorphism restriction

Some ambiguous types fall under the monomorphism restriction. That is, the following program will not type under Haskell2010 due to the restriction that f have a monomorphic type:

f = undefined >>= undefined

We also expect holes to be very useful in these cases, for example by replacing each undefined with a hole:

f = _?h >>= _?i

Thus, we prefer that this program be considered well-typed with f :: Monad m => m b and the holes _?h :: Monad m => m a and _?i :: Monad m => a -> m b.

If -XNoMonomorphismRestriction is used, we expect that the typing of the holes will not change.

Type of a hole

The type of a hole should be the resolved type with minimum constraints. That is, the type of a hole should only have constraints that have not been solved but are either inferred from the context (e.g. show _?h) or given in a type annotation/signature (e.g. _?h :: Show a => a).

Runtime error

Given the following module:

main = _?x

we expect (something like) the runtime error:

blah: blah.hs:2:1:
    Found the hole `_?x' with type `IO t'
    In the expression: _?x
    In the definition of `main': main = _?x