GHC: Ticket #2092: Quadratic amount of code generated
http://ghc.haskell.org/trac/ghc/ticket/2092
<p>
Originally discovered by Twan van Laarhoven, here:
<a class="ext-link" href="http://www.haskell.org/pipermail/cvs-ghc/2008-February/040981.html"><span class="icon"></span>http://www.haskell.org/pipermail/cvs-ghc/2008-February/040981.html</a>
</p>
<p>
On the HEAD, compiling this module:
</p>
<pre class="wiki">{-# LANGUAGE MagicHash #-}
module M1 where
import GHC.Exts
type FastInt = Int#
data U = Mk1 { a :: (), b :: FastInt, c :: () }
| Mk2 { a :: (), b :: FastInt, c :: () }
instance Eq U where
x == y = b x ==# b y
</pre><p>
with
</p>
<pre class="wiki">ghc -c M1.hs -O -ddump-simpl
</pre><p>
we see
</p>
<pre class="wiki">M1.== :: M1.U -> M1.U -> GHC.Base.Bool
[GlobalId]
[Arity 2
NoCafRefs
Str: DmdType SS]
M1.== =
\ (x_a5J :: M1.U) (y_a5L :: M1.U) ->
case case y_a5L of tpl_B2 {
M1.Mk1 ipv_B3 ipv1_B4 ipv2_B5 -> ipv1_B4;
M1.Mk2 ipv_B3 ipv1_B4 ipv2_B5 -> ipv1_B4
}
of wild_B1 { __DEFAULT ->
case case x_a5J of tpl_B2 {
M1.Mk1 ipv_B3 ipv1_B4 ipv2_B5 -> ipv1_B4;
M1.Mk2 ipv_B3 ipv1_B4 ipv2_B5 -> ipv1_B4
}
of wild1_Xk { __DEFAULT ->
GHC.Prim.==# wild1_Xk wild_B1
}
}
</pre><p>
which looks good: Extract the FastInt from one value, then the other, then compare.
</p>
<p>
However, if we have this module instead:
</p>
<pre class="wiki">module M2 where
import GHC.Exts
newtype FastInt = FastInt Int
deriving Eq
data U = Mk1 { a :: (), b :: {-# UNPACK #-} !FastInt, c :: () }
| Mk2 { a :: (), b :: {-# UNPACK #-} !FastInt, c :: () }
instance Eq U where
x == y = b x == b y
</pre><p>
again compiling with
</p>
<pre class="wiki">ghc -c M2.hs -O -ddump-simpl
</pre><p>
we see
</p>
<pre class="wiki">M2.== :: M2.U -> M2.U -> GHC.Base.Bool
[GlobalId]
[Arity 2
NoCafRefs
Str: DmdType SS]
M2.== =
\ (x_a5M :: M2.U) (y_a5O :: M2.U) ->
case x_a5M of tpl_Xj {
M2.Mk1 ipv_Xn rb_B6 ipv1_B5 ->
case y_a5O of tpl1_Xl {
M2.Mk1 ipv2_Xp rb1_Xw ipv3_XX -> GHC.Prim.==# rb_B6 rb1_Xw;
M2.Mk2 ipv2_Xp rb1_Xw ipv3_XX -> GHC.Prim.==# rb_B6 rb1_Xw
};
M2.Mk2 ipv_Xn rb_B6 ipv1_B5 ->
case y_a5O of tpl1_Xl {
M2.Mk1 ipv2_Xp rb1_Xw ipv3_XX -> GHC.Prim.==# rb_B6 rb1_Xw;
M2.Mk2 ipv2_Xp rb1_Xw ipv3_XX -> GHC.Prim.==# rb_B6 rb1_Xw
}
}
</pre><p>
where the extraction of the second FastInt happens inside the branches of the extraction of the first FastInt, giving a quadratic (in the number of constructors) amount of code. We would expect to get code like that in the first example.
</p>
en-usGHChttp://ghc.haskell.org/trac/ghc/chrome/site/ghc_logo.png
http://ghc.haskell.org/trac/ghc/ticket/2092
Trac 1.2simonmarWed, 18 Jun 2008 13:53:15 GMTstatus changed; resolution set
http://ghc.haskell.org/trac/ghc/ticket/2092#comment:1
http://ghc.haskell.org/trac/ghc/ticket/2092#comment:1
<ul>
<li><strong>status</strong>
changed from <em>new</em> to <em>closed</em>
</li>
<li><strong>resolution</strong>
set to <em>fixed</em>
</li>
</ul>
<p>
I looked into this, it's not as straightforward as it seems. In the second example given above, the amount of code generated is not in fact quadratic: try adding another constructor. GHC is making sensible choices about when to inline and duplicate code; in this case in fact the code it generates is about as good as you can get.
</p>
<p>
However, in the first example, we got this:
</p>
<pre class="wiki">M1.== =
\ (x_a5J :: M1.U) (y_a5L :: M1.U) ->
case case y_a5L of tpl_B2 {
M1.Mk1 ipv_B3 ipv1_B4 ipv2_B5 -> ipv1_B4;
M1.Mk2 ipv_B3 ipv1_B4 ipv2_B5 -> ipv1_B4
}
of wild_B1 { __DEFAULT ->
case case x_a5J of tpl_B2 {
M1.Mk1 ipv_B3 ipv1_B4 ipv2_B5 -> ipv1_B4;
M1.Mk2 ipv_B3 ipv1_B4 ipv2_B5 -> ipv1_B4
}
of wild1_Xk { __DEFAULT ->
GHC.Prim.==# wild1_Xk wild_B1
}
}
</pre><p>
which is in fact quite bad. There's an unnecessary case-of-case, where the outer case is doing nothing at all except serving as a join point. This leads to an extra call/return in the generated code compared to the second example where the case-of-case has been expanded out. Even if we were to express the join point as a let(-no-escape), that would be better than the case-of-case.
</p>
<p>
The problem was that GHC has some heuristics to avoid expanding case-of-case in some cases (see <code>Note single-alternative</code> in <code>compiler/simplCore/Simplify.lhs</code>), and it was also catching this example. I've now fixed it:
</p>
<pre class="wiki">Tue Jun 17 05:35:10 PDT 2008 Simon Marlow <marlowsd@gmail.com>
* Fix an example where we weren't doing case-of-case when we should
</pre>
TicketsimonmarTue, 30 Sep 2008 15:40:31 GMTarchitecture changed
http://ghc.haskell.org/trac/ghc/ticket/2092#comment:2
http://ghc.haskell.org/trac/ghc/ticket/2092#comment:2
<ul>
<li><strong>architecture</strong>
changed from <em>Unknown</em> to <em>Unknown/Multiple</em>
</li>
</ul>
TicketsimonmarTue, 30 Sep 2008 15:51:40 GMTos changed
http://ghc.haskell.org/trac/ghc/ticket/2092#comment:3
http://ghc.haskell.org/trac/ghc/ticket/2092#comment:3
<ul>
<li><strong>os</strong>
changed from <em>Unknown</em> to <em>Unknown/Multiple</em>
</li>
</ul>
TicketsimonpjFri, 28 Nov 2008 15:05:06 GMT
http://ghc.haskell.org/trac/ghc/ticket/2092#comment:4
http://ghc.haskell.org/trac/ghc/ticket/2092#comment:4
<p>
Although this bug is closed, I'm not convinced we're done. Some time ago I stopped the simplifier creating quite so many join points. The patch that closed this bug narrowed the cases for which my heuristic applied. And with good reason. But I think that if we had bad cases before adding the "don't create a joint point" heuristic, we'll get bad cases with Simon's change too.
</p>
<p>
Looking at the comments, though, I can't see a compelling case for <em>not</em> always doing the case-of-case and creating a join point. I think it arose in a program that Roman cared about, so I asked him. He replies:
</p>
<pre class="wiki">This is what I can find in my archives:
> I think this is what happens. We start out with (essentially)
>
> primes n = f (g h (abs n)))
>
> where h is a function (which is passed as an argument to g) and f, g and
> h are all inlined at some point.
>
> We end up with
>
> primes n = let j = \h m -> <inlined f> (<inlined g> h m)
> in
> case n >= 0 of
> True -> j <inlined h> n
> False -> j <inlined h> (-n)
>
> This is bad because we really want the code for f, g and h to be
> combined to produce an efficient loop. So what we want is probably
>
> primes n = let j = \m -> <inlined f> (<inlined g> <inlined h> m)
> in
> case n >= 0 of
> True -> j n
> False -> j (-n)
and:
> PROBLEM.
> Suppose we have
>
> f (abs x `div` y)
>
> and a RULE
>
> f (a `div` b) = ...
>
> Then you'd expect the RULE to fire.
> But if we inline abs, then, because div is strict, we'll get>
> f (case x>0 of
> T -> x `div` y
> F -> negate x `div y)
>
> (or, worse, a join point might created if the code was bigger).
> Anyway, the f/div RULE can't fire any more.
>
> Roman also thinks there may be a problem even in the absence
> of RULES, but we're not sure it's real. See message attached.
> ***Roman will try to find an example***
>
> POSSIBLE SOLUTION.
>
> a) Never push strict functions inside case
>
> f (case .. of alts)
> stays just like that.
> b) Only push strict functions inside a case with one alternative
> f (case … of p -> e) è case … of p -> f e
>
> (b) seems less draconian, but if f is lazy and g is strict then
> f (g (case…)) è f (case .. of p-> g..)
> and now an f/g rule won’t fire.
</pre>
TicketsimonmarMon, 16 Nov 2009 13:28:35 GMTfailure set
http://ghc.haskell.org/trac/ghc/ticket/2092#comment:5
http://ghc.haskell.org/trac/ghc/ticket/2092#comment:5
<ul>
<li><strong>failure</strong>
set to <em>Runtime performance bug</em>
</li>
</ul>
Ticket