Add saturated constructor applications to Core
A long-standing performance bug in GHC is its behaviour on nested tuples.
- The poster-child case is #5642.
- The problem is that when you have a nested tuple application, e.g.
((4,True),('c','d'))
, the size of the type arguments grows quadratically with the expression size. This bites badly in some approaches to generic programming, which use nested tuples heavily. - It's explained in detail in Section 2.3 of Scrap your type applications.
The same paper describes a solution, a modification of System F called System IF. It's neat, and I did once implement in in GHC. But it was quite complicated; most of the time the win was not big; and I don't know how it'd interact with casts, coercions, and type dependency in the latest version of GHC's Core.
So here's another idea, which came up in conversation at ICFP'16: add a staturated constructor application to Core. So Core looks like
data Expr v = Var v
| App (Expr v) (Expr v)
...
| ConApp DataCon [Expr v] -- NEW
Now I hate the idea of adding a new constructor to Core; I often brag about how few constructors it has. But the idea is this:
- A
ConApp
is always saturated; think of it as an uncurried application. - Every data constructor has a wrapper Id. For simple constructors like
(:)
, the wrapper is just a curried version:
(:) = /\a. \(x:a). \(y:[a]). (:) [x,y]
where the "
(:) [x,y]
" is just my concrete syntax for aConApp
node.
- We are used to having an intro and elim form for each type former. So for
(->)
the intro form is\x.e
and the elim form is(e1 e2)
. For a data type likeMaybe
, the elim form iscase
, but what's the intro form?ConApp
of course. - A
ConApp
mentions theDataCon
explicitly, rather than having it buried inside theIdDetails
of anId
, which somehow seems more honest.
The proximate reason for doing this in the first place is to allow us to omit type arguments. Consider Just True
. Curently we represent this as
Var "Just" `App` Type boolTy `App` Var "True"
But with ConApp
we can say
ConApp "Just" [ConApp "True" []]
because it's easy to figure out the boolTy
type argument from the argument. (We don't really have strings there, but you get the idea.)
Can we omit all the type arguments? No: we can omit only those that appear free in any of the argument types. That is usally all of them (including existentials) but not always. Consider:
data (,) a b where
(,):: forall a b. a -> b -> (a,b)
data [] a where
[] :: forall a. [a]
(:) :: forall a. a -> [a] -> [a]
data Either a b where
Left :: forall a b. a -> Either a b
Right :: forall a b. b -> Either a b
data (:=:) a b where
Refl :: forall a b. (a~b) -> :=: a b
data Foo where
MkFoo :: a -> (a -> Int) -> Foo
For all of these data constructors except []
(nil), Left
and Right
we can omit all the type arguments, because we can recover them by simple matching against the types of the arguments. A very concrete way to think about this is how to implement
exprType :: Expr Id -> Type
exprType (Var v) = varType v
exprType (Lam b e) = mkFunTy (varType b) (exprType e)
exprType (App f a) = funResultTy (exprType f)
...
exprType (ConApp con args) = mkTyConApp (dataConTyCon con) ???
We know that the result type of type of a ConApp
will be T t1 ..tn
where T
is the parent type constructor of the DataCon
. But what about the (universal) type args ???
? We can get them from the types of the arguments map exprType args
:
- For
ConApp "(:)" [e1, e2]
, the type argument is justexprType e1
. - For
ConApp "(:=:)" [e]
", we expectexprType e
to return a type looking likeTyConApp "~" [t1, t2]
. Thent1
andt2
are the types we want. So matching is required. - What about an application of
Left
?? We need to recover two type args, butexprType e1
gives us only one. So we must retain the other one in the application:ConApp "Left" [Type ty2, e1]
. Similarly for the empty list.
A simple once-and-for-all analysis on the DataCon
will establish how to do the matching, which type args to retain, etc.
Tradeoffs:
- Pro: We can eliminate almost all type args of almost all data constructors; and for nested tuples we can eliminate all of them.
- Pro: it's elegant having the intro/elim duality.
- Pro: in GHC we often ask "is this expression a saturated constructor application?" (see
exprIsConApp_maybe
) andConApp
makes it easier to answer that question. - Pro: we do exactly this in types: we have
AppTy
andTyConApp
. (In types aTyConApp
is not required to be saturated, but we could review that choice.) - Con: adding a constructor is a big deal. In lots of places we'd end up saying
f (App e1 e2) = App (f e1) (f e2)
f (ConApp c es) = ConApp c (map f es)
In the olden days GHC's
App
had multiple arguments and the continual need to work over the list was a bit tiresome. StillConApp
is very simple and uniform; quite often addingmap
won't be difficult; and it may well be that constructors need to be treated differently anyway.
- Con: it's not a general solution to the type-argument problem. See #9198 (closed) for example. System IF is the only general solution I know, but it seems like too big a sledgehammer.
We'll only know if we try it. I estimate that it would take less than a week to work this change through all of GHC. 90% of it is routine.
Other possibly-relevant tickets are #8523, #7428, #9630 (closed).