Opened 10 years ago

Closed 9 years ago

Last modified 8 years ago

#2092 closed bug (fixed)

Quadratic amount of code generated

Reported by: igloo Owned by:
Priority: normal Milestone: 6.10 branch
Component: Compiler Version: 6.9
Keywords: Cc:
Operating System: Unknown/Multiple Architecture: Unknown/Multiple
Type of failure: Runtime performance bug Test Case:
Blocked By: Blocking:
Related Tickets: Differential Rev(s):
Wiki Page:

Description

Originally discovered by Twan van Laarhoven, here: http://www.haskell.org/pipermail/cvs-ghc/2008-February/040981.html

On the HEAD, compiling this module:

{-# LANGUAGE MagicHash #-}

module M1 where

import GHC.Exts

type FastInt = Int#

data U = Mk1 { a :: (), b :: FastInt, c :: () }
       | Mk2 { a :: (), b :: FastInt, c :: () }

instance Eq U where
    x == y = b x ==# b y

with

ghc -c M1.hs -O -ddump-simpl

we see

M1.== :: M1.U -> M1.U -> GHC.Base.Bool
[GlobalId]
[Arity 2
 NoCafRefs
 Str: DmdType SS]
M1.== =
  \ (x_a5J :: M1.U) (y_a5L :: M1.U) ->
    case case y_a5L of tpl_B2 {
           M1.Mk1 ipv_B3 ipv1_B4 ipv2_B5 -> ipv1_B4;
           M1.Mk2 ipv_B3 ipv1_B4 ipv2_B5 -> ipv1_B4
         }
    of wild_B1 { __DEFAULT ->
    case case x_a5J of tpl_B2 {
           M1.Mk1 ipv_B3 ipv1_B4 ipv2_B5 -> ipv1_B4;
           M1.Mk2 ipv_B3 ipv1_B4 ipv2_B5 -> ipv1_B4
         }
    of wild1_Xk { __DEFAULT ->
    GHC.Prim.==# wild1_Xk wild_B1
    }
    }

which looks good: Extract the FastInt from one value, then the other, then compare.

However, if we have this module instead:

module M2 where

import GHC.Exts

newtype FastInt = FastInt Int
    deriving Eq

data U = Mk1 { a :: (), b :: {-# UNPACK #-} !FastInt, c :: () }
       | Mk2 { a :: (), b :: {-# UNPACK #-} !FastInt, c :: () }

instance Eq U where
    x == y = b x == b y

again compiling with

ghc -c M2.hs -O -ddump-simpl

we see

M2.== :: M2.U -> M2.U -> GHC.Base.Bool
[GlobalId]
[Arity 2
 NoCafRefs
 Str: DmdType SS]
M2.== =
  \ (x_a5M :: M2.U) (y_a5O :: M2.U) ->
    case x_a5M of tpl_Xj {
      M2.Mk1 ipv_Xn rb_B6 ipv1_B5 ->
        case y_a5O of tpl1_Xl {
          M2.Mk1 ipv2_Xp rb1_Xw ipv3_XX -> GHC.Prim.==# rb_B6 rb1_Xw;
          M2.Mk2 ipv2_Xp rb1_Xw ipv3_XX -> GHC.Prim.==# rb_B6 rb1_Xw
        };
      M2.Mk2 ipv_Xn rb_B6 ipv1_B5 ->
        case y_a5O of tpl1_Xl {
          M2.Mk1 ipv2_Xp rb1_Xw ipv3_XX -> GHC.Prim.==# rb_B6 rb1_Xw;
          M2.Mk2 ipv2_Xp rb1_Xw ipv3_XX -> GHC.Prim.==# rb_B6 rb1_Xw
        }
    }

where the extraction of the second FastInt happens inside the branches of the extraction of the first FastInt, giving a quadratic (in the number of constructors) amount of code. We would expect to get code like that in the first example.

Change History (5)

comment:1 Changed 9 years ago by simonmar

Resolution: fixed
Status: newclosed

I looked into this, it's not as straightforward as it seems. In the second example given above, the amount of code generated is not in fact quadratic: try adding another constructor. GHC is making sensible choices about when to inline and duplicate code; in this case in fact the code it generates is about as good as you can get.

However, in the first example, we got this:

M1.== =
  \ (x_a5J :: M1.U) (y_a5L :: M1.U) ->
    case case y_a5L of tpl_B2 {
           M1.Mk1 ipv_B3 ipv1_B4 ipv2_B5 -> ipv1_B4;
           M1.Mk2 ipv_B3 ipv1_B4 ipv2_B5 -> ipv1_B4
         }
    of wild_B1 { __DEFAULT ->
    case case x_a5J of tpl_B2 {
           M1.Mk1 ipv_B3 ipv1_B4 ipv2_B5 -> ipv1_B4;
           M1.Mk2 ipv_B3 ipv1_B4 ipv2_B5 -> ipv1_B4
         }
    of wild1_Xk { __DEFAULT ->
    GHC.Prim.==# wild1_Xk wild_B1
    }
    }

which is in fact quite bad. There's an unnecessary case-of-case, where the outer case is doing nothing at all except serving as a join point. This leads to an extra call/return in the generated code compared to the second example where the case-of-case has been expanded out. Even if we were to express the join point as a let(-no-escape), that would be better than the case-of-case.

The problem was that GHC has some heuristics to avoid expanding case-of-case in some cases (see Note single-alternative in compiler/simplCore/Simplify.lhs), and it was also catching this example. I've now fixed it:

Tue Jun 17 05:35:10 PDT 2008  Simon Marlow <marlowsd@gmail.com>
  * Fix an example where we weren't doing case-of-case when we should

comment:2 Changed 9 years ago by simonmar

Architecture: UnknownUnknown/Multiple

comment:3 Changed 9 years ago by simonmar

Operating System: UnknownUnknown/Multiple

comment:4 Changed 9 years ago by simonpj

Although this bug is closed, I'm not convinced we're done. Some time ago I stopped the simplifier creating quite so many join points. The patch that closed this bug narrowed the cases for which my heuristic applied. And with good reason. But I think that if we had bad cases before adding the "don't create a joint point" heuristic, we'll get bad cases with Simon's change too.

Looking at the comments, though, I can't see a compelling case for not always doing the case-of-case and creating a join point. I think it arose in a program that Roman cared about, so I asked him. He replies:

This is what I can find in my archives:

> I think this is what happens. We start out with (essentially)
>
> primes n = f (g h (abs n)))
>
> where h is a function (which is passed as an argument to g) and f, g and
> h are all inlined at some point.
>
> We end up with
>
> primes n = let j = \h m -> <inlined f> (<inlined g> h m)
>            in
>            case n >= 0 of
>              True  -> j <inlined h> n
>              False -> j <inlined h> (-n)
>
> This is bad because we really want the code for f, g and h to be
> combined to produce an efficient loop. So what we want is probably
>
> primes n = let j = \m -> <inlined f> (<inlined g> <inlined h> m)
>            in
>            case n >= 0 of
>              True  -> j n
>              False -> j (-n)

and:

> PROBLEM.
> Suppose we have
>
>           f (abs x `div` y)
>
> and a RULE
>
>           f (a `div` b) = ...
>
> Then you'd expect the RULE to fire.  
> But if we inline abs, then, because div is strict, we'll get>
>           f (case x>0 of
>                    T -> x `div` y
>                    F -> negate x `div y)
>
> (or, worse, a join point might created if the code was bigger).  
> Anyway, the f/div RULE can't fire any more.
>
> Roman also thinks there may be a problem even in the absence 
> of RULES, but we're not sure it's real.  See message attached.
>  ***Roman will try to find an example***
>
> POSSIBLE SOLUTION.
>
> a)    Never push strict functions inside case
>
> f (case .. of alts)
>           stays just like that.

> b)    Only push strict functions inside a case with one alternative
> f (case … of p -> e)  è  case … of p -> f e
>
> (b) seems less draconian, but if f is lazy and g is strict then
>           f (g (case…)) è  f (case .. of p-> g..)
> and now an f/g rule won’t fire.

comment:5 Changed 8 years ago by simonmar

Type of failure: Runtime performance bug
Note: See TracTickets for help on using tickets.