## #2092 closed bug (fixed)

# Quadratic amount of code generated

Reported by: | igloo | Owned by: | |
---|---|---|---|

Priority: | normal | Milestone: | 6.10 branch |

Component: | Compiler | Version: | 6.9 |

Keywords: | Cc: | ||

Operating System: | Unknown/Multiple | Architecture: | Unknown/Multiple |

Type of failure: | Runtime performance bug | Test Case: | |

Blocked By: | Blocking: | ||

Related Tickets: | Differential Rev(s): | ||

Wiki Page: |

### Description

Originally discovered by Twan van Laarhoven, here: http://www.haskell.org/pipermail/cvs-ghc/2008-February/040981.html

On the HEAD, compiling this module:

{-# LANGUAGE MagicHash #-} module M1 where import GHC.Exts type FastInt = Int# data U = Mk1 { a :: (), b :: FastInt, c :: () } | Mk2 { a :: (), b :: FastInt, c :: () } instance Eq U where x == y = b x ==# b y

with

ghc -c M1.hs -O -ddump-simpl

we see

M1.== :: M1.U -> M1.U -> GHC.Base.Bool [GlobalId] [Arity 2 NoCafRefs Str: DmdType SS] M1.== = \ (x_a5J :: M1.U) (y_a5L :: M1.U) -> case case y_a5L of tpl_B2 { M1.Mk1 ipv_B3 ipv1_B4 ipv2_B5 -> ipv1_B4; M1.Mk2 ipv_B3 ipv1_B4 ipv2_B5 -> ipv1_B4 } of wild_B1 { __DEFAULT -> case case x_a5J of tpl_B2 { M1.Mk1 ipv_B3 ipv1_B4 ipv2_B5 -> ipv1_B4; M1.Mk2 ipv_B3 ipv1_B4 ipv2_B5 -> ipv1_B4 } of wild1_Xk { __DEFAULT -> GHC.Prim.==# wild1_Xk wild_B1 } }

which looks good: Extract the FastInt from one value, then the other, then compare.

However, if we have this module instead:

module M2 where import GHC.Exts newtype FastInt = FastInt Int deriving Eq data U = Mk1 { a :: (), b :: {-# UNPACK #-} !FastInt, c :: () } | Mk2 { a :: (), b :: {-# UNPACK #-} !FastInt, c :: () } instance Eq U where x == y = b x == b y

again compiling with

ghc -c M2.hs -O -ddump-simpl

we see

M2.== :: M2.U -> M2.U -> GHC.Base.Bool [GlobalId] [Arity 2 NoCafRefs Str: DmdType SS] M2.== = \ (x_a5M :: M2.U) (y_a5O :: M2.U) -> case x_a5M of tpl_Xj { M2.Mk1 ipv_Xn rb_B6 ipv1_B5 -> case y_a5O of tpl1_Xl { M2.Mk1 ipv2_Xp rb1_Xw ipv3_XX -> GHC.Prim.==# rb_B6 rb1_Xw; M2.Mk2 ipv2_Xp rb1_Xw ipv3_XX -> GHC.Prim.==# rb_B6 rb1_Xw }; M2.Mk2 ipv_Xn rb_B6 ipv1_B5 -> case y_a5O of tpl1_Xl { M2.Mk1 ipv2_Xp rb1_Xw ipv3_XX -> GHC.Prim.==# rb_B6 rb1_Xw; M2.Mk2 ipv2_Xp rb1_Xw ipv3_XX -> GHC.Prim.==# rb_B6 rb1_Xw } }

where the extraction of the second FastInt happens inside the branches of the extraction of the first FastInt, giving a quadratic (in the number of constructors) amount of code. We would expect to get code like that in the first example.

### Change History (5)

### comment:1 Changed 9 years ago by

Resolution: | → fixed |
---|---|

Status: | new → closed |

### comment:2 Changed 9 years ago by

Architecture: | Unknown → Unknown/Multiple |
---|

### comment:3 Changed 9 years ago by

Operating System: | Unknown → Unknown/Multiple |
---|

### comment:4 Changed 9 years ago by

Although this bug is closed, I'm not convinced we're done. Some time ago I stopped the simplifier creating quite so many join points. The patch that closed this bug narrowed the cases for which my heuristic applied. And with good reason. But I think that if we had bad cases before adding the "don't create a joint point" heuristic, we'll get bad cases with Simon's change too.

Looking at the comments, though, I can't see a compelling case for *not* always doing the case-of-case and creating a join point. I think it arose in a program that Roman cared about, so I asked him. He replies:

This is what I can find in my archives: > I think this is what happens. We start out with (essentially) > > primes n = f (g h (abs n))) > > where h is a function (which is passed as an argument to g) and f, g and > h are all inlined at some point. > > We end up with > > primes n = let j = \h m -> <inlined f> (<inlined g> h m) > in > case n >= 0 of > True -> j <inlined h> n > False -> j <inlined h> (-n) > > This is bad because we really want the code for f, g and h to be > combined to produce an efficient loop. So what we want is probably > > primes n = let j = \m -> <inlined f> (<inlined g> <inlined h> m) > in > case n >= 0 of > True -> j n > False -> j (-n) and: > PROBLEM. > Suppose we have > > f (abs x `div` y) > > and a RULE > > f (a `div` b) = ... > > Then you'd expect the RULE to fire. > But if we inline abs, then, because div is strict, we'll get> > f (case x>0 of > T -> x `div` y > F -> negate x `div y) > > (or, worse, a join point might created if the code was bigger). > Anyway, the f/div RULE can't fire any more. > > Roman also thinks there may be a problem even in the absence > of RULES, but we're not sure it's real. See message attached. > ***Roman will try to find an example*** > > POSSIBLE SOLUTION. > > a) Never push strict functions inside case > > f (case .. of alts) > stays just like that. > b) Only push strict functions inside a case with one alternative > f (case … of p -> e) è case … of p -> f e > > (b) seems less draconian, but if f is lazy and g is strict then > f (g (case…)) è f (case .. of p-> g..) > and now an f/g rule won’t fire.

### comment:5 Changed 8 years ago by

Type of failure: | → Runtime performance bug |
---|

**Note:**See TracTickets for help on using tickets.

I looked into this, it's not as straightforward as it seems. In the second example given above, the amount of code generated is not in fact quadratic: try adding another constructor. GHC is making sensible choices about when to inline and duplicate code; in this case in fact the code it generates is about as good as you can get.

However, in the first example, we got this:

which is in fact quite bad. There's an unnecessary case-of-case, where the outer case is doing nothing at all except serving as a join point. This leads to an extra call/return in the generated code compared to the second example where the case-of-case has been expanded out. Even if we were to express the join point as a let(-no-escape), that would be better than the case-of-case.

The problem was that GHC has some heuristics to avoid expanding case-of-case in some cases (see

`Note single-alternative`

in`compiler/simplCore/Simplify.lhs`

), and it was also catching this example. I've now fixed it: