Opened 2 years ago

Closed 2 years ago

Last modified 23 months ago

#10788 closed bug (fixed)

performance regression involving minimum

Reported by: rwbarton Owned by: ekmett
Priority: normal Milestone: 8.0.1
Component: Core Libraries Version: 7.10.1
Keywords: Cc:
Operating System: Unknown/Multiple Architecture: Unknown/Multiple
Type of failure: Runtime performance bug Test Case:
Blocked By: Blocking:
Related Tickets: Differential Rev(s):
Wiki Page:

Description

This program (taken from http://stackoverflow.com/questions/32158319/difference-in-performance-for-coin-change-between-haskell-and-c) runs about 50% slower when compiled with ghc-7.10.1 -O compared to ghc-7.8.4 -O.

import Data.Vector.Unboxed as U ((!), constructN, length)

coinchangev :: Int -> [Int] -> Int
coinchangev n cs = v ! n
 where v = constructN (n+1) f
       f w = case U.length w of
             0 -> 0
             m -> 1 + minimum [w ! x | x <- map (m-) cs, x >= 0]

main = print $ coinchangev 10000000 [1, 5, 10, 25, 100]

However if I change minimum to sum, while the runtime in 7.8.4 is unchanged, the runtime in 7.10.1 drops by a factor of 5! Allocations also decrease by a large factor. So my guess is that something has gone wrong with call arity analysis for minimum (and gone very right for sum).

Change History (17)

comment:1 Changed 2 years ago by nomeata

In 7.8, both minimum and sum were non-fusing left-folds. In 7.10, sum, via foldl, is fusing, so this is the 5× improvement you observe. minimum is not easily foldable: It is a foldl1, which treats the first (:) different from the rest, and it is not clear how to fix that.

So performance difference between minimum and sum in 7.10 can be explained. What needs to be investigated is why 7.10 degraded by 50% over 7.8. I do not expect Call Arity/foldl fusion to play a role here, but I might be wrong.

BTW: One could reasonably expect the compiler to transform minimum (x:xs) into foldl' min x xs, which could then maybe fuse, but that does not seem to be the case.

comment:2 Changed 2 years ago by nomeata

GHC 7.8 will actually inline minimum (and hence foldl1 and foldl), allowing the compiler to specialize it for the type at hand:

Rec {
$wlgo_r6X4 :: GHC.Prim.Int# -> [GHC.Types.Int] -> GHC.Prim.Int#
[GblId, Arity=2, Caf=NoCafRefs, Str=DmdType <L,U><S,1*U>]
$wlgo_r6X4 =
  \ (ww_s6TV :: GHC.Prim.Int#) (w_s6TS :: [GHC.Types.Int]) ->
    case w_s6TS of _ [Occ=Dead] {
      [] -> ww_s6TV;
      : x_a52E xs_a52F ->
        case x_a52E of _ [Occ=Dead] { GHC.Types.I# y1_a52Q ->
        case GHC.Prim.tagToEnum#
               @ GHC.Types.Bool (GHC.Prim.<=# ww_s6TV y1_a52Q)
        of _ [Occ=Dead] {
          GHC.Types.False -> $wlgo_r6X4 y1_a52Q xs_a52F;
          GHC.Types.True -> $wlgo_r6X4 ww_s6TV xs_a52F
        }
        }
    }
end Rec }

GHC-7.10 will not inline minimum, but only replace it by minimumStrict via a rule, and the latter then called polymorphically:

...
            case strictMinimum @ Int GHC.Classes.$fOrdInt (go_a6gJ cs_r9bl)
            of _ [Occ=Dead] { I# y_a6hm ->

GHC-7.8’s inlining seems to be a little excessive, but in 7.10 there is certainly a lack of specialization. Maybe some INLINEABLE pragma would help? Not sure...

comment:3 Changed 2 years ago by rwbarton

It looks like more inlining happened in 7.10.1 at the definition site of strictMinimum, and the result was that the unfolding became too large for GHC to want to inline it at use sites.

7.8.4:

Considering inlining: Data.List.strictMinimum
    arg infos [ValueArg, NonTrivArg]
    uf arity 2
    interesting continuation CaseCtxt
    some_benefit True
    is exp: True
    is work-free: True
    guidance IF_ARGS [30 30] 80 0
    discounted size = -10
    ANSWER = YES

  strictMinimum :: GHC.Classes.Ord a -> [a] -> a
    {- Arity: 2, Strictness: <L,1*U(A,A,A,A,A,A,A,1*C(C1(U)))><S,1*U>,
       Unfolding: (\ @ a $dOrd :: GHC.Classes.Ord a ds :: [a] ->
                   case ds of wild {
                     [] -> Data.List.minimum1 @ a
                     : ipv ipv1
                     -> Data.List.foldl'
                          @ a
                          @ a
                          (GHC.Classes.min @ a $dOrd)
                          ipv
                          ipv1 }) -}

7.10.1:

Considering inlining: strictMinimum
  arg infos [ValueArg, NonTrivArg]
  interesting continuation CaseCtxt
  some_benefit True
  is exp: True
  is work-free: True
  guidance IF_ARGS [30 30] 200 0
  discounted size = 110
  ANSWER = NO

  strictMinimum :: Ord a => [a] -> a
  {- Arity: 2, Strictness: <L,1*U(A,A,A,A,A,A,A,1*U)><S,1*U>,
     Unfolding: (\ @ a $dOrd :: Ord a ds :: [a] ->
                 case ds of wild {
                   [] -> minimum1 @ a
                   : ipv ipv1
                   -> let {
                        k :: a -> a -> a = min @ a $dOrd
                      } in
                      letrec {
                        go :: [a] -> a -> a {- Arity: 2, Strictness: <S,1*U><S,1*U> -}
                        = \ ds1 :: [a] eta :: a ->
                          case ds1 of wild1 {
                            [] -> eta : y ys -> case eta of z { DEFAULT -> go ys (k z y) } }
                      } in
                      go ipv1 ipv }) -}

comment:4 Changed 2 years ago by rwbarton

Component: CompilerCore Libraries
Owner: set to ekmett
Summary: performance regression involving minimum (and maybe Vector)performance regression involving minimum

comment:5 Changed 2 years ago by nomeata

So one solution would be to mark strictMinimum as INLINE, so that its unfolding stays small and both strictMinimum and foldl will be inlined at the use site?

comment:6 Changed 2 years ago by nomeata

So one solution would be to mark strictMinimum as INLINE, so that its unfolding stays small and both strictMinimum and foldl will be inlined at the use site?

Tried that. The unfolding will then mention foldr instead of the above code, but is still too large. I was expecting the unfolding to mention foldl1', as I thought the unfolding of something marked INLINE is never changed? Weird.

Maybe someone else can give this a shot?

comment:7 Changed 2 years ago by nomeata

With

{-# SPECIALIZE  minimum :: [Int] -> Int #-}

instead of the rule rewriting it to strictMinimum, and *not* adding an INLINE pragma to minimum, I get good code in GHC.List, and this is being used here without too much inlining (it inlines the wrapper that distinguishes [] from a non-empty list and unboxes the int, but then calls the wrapper in GHC.List.$wgo1.

Removing INLINE is important, as otherwise we’d be having this worker in every use of minimum.

Even without INLINE we have this in the interface

  minimum :: Ord a => [a] -> a
  {- Arity: 2, Strictness: <L,1*U(A,A,A,A,A,A,A,1*U)><S,1*U>,
     Unfolding: (\ @ a11 $dOrd :: Ord a11 ds :: [a11] ->
                 case ds of wild {
                   [] -> minimum3 @ a11
                   : ipv ipv1
                   -> let {
                        k :: a11 -> a11 -> a11 = min @ a11 $dOrd
                      } in
                      letrec {
                        go :: [a11] -> a11 -> a11 {- Arity: 2, Strictness: <S,1*U><L,U> -}
                        = \ ds1 :: [a11] eta :: a11 ->
                          case ds1 of wild1 { [] -> eta : y ys -> go ys (k eta y) }
                      } in
                      go ipv1 ipv }) -}

so more specialization in some client module seems to be possible. Adding INLINEABLE for reliability does not hurt, though.

This seems to be the right thing to be doing here, but maybe not in general – how can I tell?

This is all so brittle...

comment:8 Changed 2 years ago by simonpj

This is all so brittle...

I agree that brittle-ness is bad. Can you stand back and give a description of the brittle-ness?

  • Of course, if a function is inlined, it can be specialised for the call site, and without pragmas that decision is indeed dependent on how big the function is. I see no way to avoid that.
  • But pragmas should not be brittle. Can you show an example in which they seem to be.

comment:9 Changed 2 years ago by nomeata

Can you stand back and give a description of the brittle-ness?

I actually planned to write something to the list about this... and I now have.

In this particular case, I am quite happy with the INLINEABLE/SPECIALIZE solution, and will submit a DR soon.

comment:10 Changed 2 years ago by nomeata

In this particular case, I am quite happy with the INLINEABLE/SPECIALIZE solution, and will submit a DR soon.

Spoke too soon. Looking at the core of List, the maximum for Int is great (worker with a strict unboxed Int), but for Integer, the strictness analyzer is failing me, and I get this loop:

Rec {
-- RHS size: {terms: 12, types: 8, coercions: 0}
maximum_go [Occ=LoopBreaker] :: [Integer] -> Integer -> Integer
[GblId, Arity=2, Caf=NoCafRefs, Str=DmdType <S,1*U><S,U>]
maximum_go =
  \ (ds_a2d4 :: [Integer]) (eta_B1 :: Integer) ->
    case ds_a2d4 of _ [Occ=Dead] {
      [] -> eta_B1;
      : y_a2d9 ys_a2da ->
        maximum_go
          ys_a2da
          (integer-gmp-1.0.0.0:GHC.Integer.Type.$fOrdInteger_$cmax
             eta_B1 y_a2d9)
    }
end Rec }

So it sees that go is strict in its second argument. Why is it then not strictly evaluated before the recursive call, avoiding this obvious space leak?

Note that max is not inlined (as it is for Int), but the strictness data is there, so that should not make a difference.

Anyways, I guess this discussion is derailing for this particular ticket. I’ll look into a combination of rewriting with RULES to strictMinimum to avoid relying on the strictness analyzer to produce good code.

Update: I looked into this, and it seems that even if -ddump-simpl looks like this, with a space-leaky way of accumulating the argument, that in the CorePrep stage the evaluation of cmax is pulled before the call to go and alls is well. I did not know that there are still such things going on in that stage.

Last edited 2 years ago by nomeata (previous) (diff)

comment:11 Changed 2 years ago by Joachim Breitner <mail@…>

In dc671a1c/ghc:

SPECIALIZE strictMinimum for Int and Integer

This fixes a regression reported in #10788, where due to less inlining
compared to earlier versions, we’d get worse code. With the SPECIALIZE,
we get the good code, and moreover, the good code is in List.hs and
_not_ inlined to the use site, so smaller code size and less compilation
time.

comment:12 Changed 2 years ago by nomeata

Resolution: fixed
Status: newclosed

I applied a small (and probably uncontroversial) change that improves upon this particular issue.

There is no good story for minimum applied to any other data type with a strict min, though.

comment:13 Changed 2 years ago by nomeata

And a slightly larger patch, which avoids the use of strictMinimum, for this is currently tested via Phabricator at Phab:1229.

comment:14 Changed 2 years ago by Joachim Breitner <mail@…>

In c6b82e99/ghc:

Further simplify the story around minimum/maximum

After I have found out that I should look at -ddump-prep and not
-ddump-core, I noticed that these days, GHC is perfectly capeable of
turning (the equivalent) of foldl to (the equivalent) of foldl' if the
operation in question is strict. So instead of using rewrite rules to
rewrite maximum to a strictMaximum for certain types, we simply use
SPECIALIZE.

This also marks maximum/minimum as INLINEABLE, so that client code can
get similar specializations, hopefully even automatically. And inded,
minimum applied to [Double] produces good code (although due to
inlineing, not due to specialization, it seems).

I checked (by looking at the core) that this still fixes #10788.

Differential revision: https://phabricator.haskell.org/D1229

comment:15 Changed 2 years ago by simonpj

Update: I looked into this, and it seems that even if -ddump-simpl looks like this, with a space-leaky way of accumulating the argument, that in the CorePrep stage the evaluation of cmax is pulled before the call to go and alls is well. I did not know that there are still such things going on in that stage.

Suppose we have a function call f (g x) and f is strict. Then GHC keeps it looking like that so that rewrite rules apply easily. In CorePrep GHC just makes the order of evaluation explicit, by moving to

 case (g x) of y -> f y

There is no new analysis; the call always was call-by-value; but after CorePrep that fact is 100% explicit.

I believe that the conclusion here is that GHC is behaving perfectly predictably; but library authors need to take a little care with pragmas and rules if they want to get predictable fusion. That's what your email in comment:9 is about, if I read it right. That is, you are not seeking any change to GHC. Correct?

comment:16 Changed 2 years ago by nomeata

That is, you are not seeking any change to GHC. Correct?

Correct! I was a bit confused along the way, due to mis-reading the Core, but there is purely a library issue at hand.

comment:17 Changed 23 months ago by thomie

Milestone: 8.0.1
Note: See TracTickets for help on using tickets.