#8766 closed bug (fixed)

length [Integer] is twice as slow but length [Int] is 10 times faster

Reported by: George Owned by:
Priority: normal Milestone: 7.8.1
Component: Compiler Version: 7.8.1-rc1
Keywords: Cc:
Operating System: Unknown/Multiple Architecture: Unknown/Multiple
Type of failure: Runtime performance bug Test Case: T8755
Blocked By: Blocking:
Related Tickets: Differential Revisions:

Description

Compared to 7.6.3 length in 7.8.1-rc1 has a performance regression:

 ghc --version
The Glorious Glasgow Haskell Compilation System, version 7.8.20140130
bash-3.2$ ghc -O2 LengthIntegerList.hs
[1 of 1] Compiling Main             ( LengthIntegerList.hs, LengthIntegerList.o )
Linking LengthIntegerList ...
bash-3.2$ time ./LengthIntegerList
1073741824

real	0m45.344s
user	0m44.230s
sys	0m0.494s
bash-3.2$ /usr/bin/ghc --version
The Glorious Glasgow Haskell Compilation System, version 7.6.3
bash-3.2$ /usr/bin/ghc -O2 LengthIntegerList.hs
[1 of 1] Compiling Main             ( LengthIntegerList.hs, LengthIntegerList.o )
Linking LengthIntegerList ...
bash-3.2$ time ./LengthIntegerList
1073741824

real	0m22.769s
user	0m22.042s
sys	0m0.385s
bash-3.2$ cat LengthIntegerList.hs
{-# OPTIONS_GHC -Wall #-}

module Main where

main :: IO()
main = print $ length [1..(2^(30::Int)::Integer)]


thus length of [Integer] is twice as slow in rc1 but length of [Int] is 10 times faster:

 ghc --version
The Glorious Glasgow Haskell Compilation System, version 7.8.20140130
bash-3.2$ ghc -O2 LengthIntList.hs
[1 of 1] Compiling Main             ( LengthIntList.hs, LengthIntList.o )
Linking LengthIntList ...
bash-3.2$ time ./LengthIntList
1073741824

real	0m0.723s
user	0m0.693s
sys	0m0.003s
bash-3.2$ /usr/bin/ghc --version
The Glorious Glasgow Haskell Compilation System, version 7.6.3
bash-3.2$ /usr/bin/ghc -O2 LengthIntList.hs
[1 of 1] Compiling Main             ( LengthIntList.hs, LengthIntList.o )
Linking LengthIntList ...
bash-3.2$ time ./LengthIntList
1073741824

real	0m11.805s
user	0m10.900s
sys	0m0.351s
bash-3.2$ cat LengthIntList.hs
{-# OPTIONS_GHC -Wall #-}


module Main where

main :: IO()
main = print $ length [1..(2^(30::Int)::Int)]


Change History (15)

comment:1 Changed 16 months ago by hvr

...here's what simplified Core of

intlen :: Int
intlen = length [1..(2^(30::Int))::Int]

looks like for GHC 7.6.3:

intlen =
  case $wf 2 30 of ww { __DEFAULT ->
  case $wlen (eftInt 1 ww) 0 of ww1 { __DEFAULT -> I# ww1 }
  }

vs. GHC 7.8.20140130:

intlen =
  case $wf 2 30 of ww4 { __DEFAULT ->
  case tagToEnum# (># 1 ww4) of _ {
    False ->
      letrec {
        $wgo
        $wgo =
          \ w w1 ->
            case tagToEnum# (==# w ww4) of _ {
              False -> $wgo (+# w 1) (+# w1 1);
              True -> +# w1 1
            }; } in
      case $wgo 1 0 of ww { __DEFAULT -> I# ww };
    True -> I# 0
  }
  }

So the significant runtime reduction for the Int-case is seemingly due to proper inlining of eftInt and thus turning a length [a..b] into a tighter counting-loop w/o any [Int] involved anymore.

comment:2 follow-up: Changed 16 months ago by nomeata

The improvement for Int is most likely from 82f56e5 (#876). But I guess the important point of the ticket is the regression for Integer.

Maybe it is because of #8638? There might now be many checks whether the value fits in a S# that were not done before.

comment:3 Changed 16 months ago by nomeata

Nevermind, [82f56e5a/base] did not change addition, so it should not be the problem here.

Last edited 16 months ago by hvr (previous) (diff)

comment:4 in reply to: ↑ 2 Changed 16 months ago by hvr

Replying to nomeata:

The improvement for Int is most likely from 82f56e5 (#876). But I guess the important point of the ticket is the regression for Integer.

I was getting to that (needed to verify something first though):

The Core for

intlen = length [1..(2^(30::Int))::Integer]

Otoh, differs as following:

GHC 7.6.3:

intlen1 = enumDeltaToInteger intlen3 intlen3 intlen2

vs.

GHC 7.8.20140130:

intlen1 = enumDeltaToIntegerFB (incLen) I# intlen3 intlen3 intlen2

The use of enumDeltaToIntegerFB instead of enumDeltaToInteger accounts for the speed difference. I've verified that by copying the definition from GHC.Enum into the test-code and using enumDeltaToInteger directly with GHC 7.8.20140130 which resulted in a slightly better runtime than for GHC 7.6.3

Last edited 16 months ago by hvr (previous) (diff)

comment:5 Changed 15 months ago by George

Right, the important point of the ticket is the regression for [Integer] but implicit in mentioning the order of magnitude improvement for [Int] is the question of why [Integer] is, even after this bug is fixed, an order of magnitude slower than [Int] and would it be possible to make the time for [Integer] much closer to [Int]. After all length only traverses the spine of the list so why the big difference? This is probably obvious to many but not to me and I'd like to know why.

comment:6 Changed 15 months ago by nomeata

hvr, your analysis is a bit misleading:

In GHC 7.6.3 we have

intlen1 :: [Integer]
intlen1 = enumDeltaToInteger intlen4 intlen4 intlen2
intlen :: Int
intlen = case $wlen @ Integer intlen1 0 of ww { __DEFAULT -> I# ww }

and in 7.9 we have

intlen1 :: Int# -> Int
intlen1 = enumDeltaToIntegerFB @ (Int# -> Int) (incLen @ Integer) I# intlen4 intlen4 intlen2
intlen :: Int
intlen = intlen1 0

Note how the intlen1 do not directly correspond to each other.

So we have a case of successful list fusion that does not speed up the program.

The two enumDeltaToInteger... functions (at the end of source:base/GHC/Enum.lhs) are almost the same; both call auxiliary functions, the only difference is whether : is used, or an explicitly passed c.

I believe the problem is that we are using enumDeltaToIntegerFB in a higher order way (note the Int# -> Int), which allocates partial function applications – basically the same that happens when foldl is implemented with foldr (#7994).

The fix for that would to make sure that stuff is inlined far enough for the go from up_fb can be visible (as it is for Int in comment:1). Comparing the code for Int and Integer, I think the key is that for Int we have special code for enumFromTo, while for Integer we go via enumFromThenTo, which is slower.

Last edited 15 months ago by nomeata (previous) (diff)

comment:7 Changed 15 months ago by nomeata

Ok, with this change (i.e. a special eftIntegerFB) I get good results: Run time decreases from 16.3s in 7.6.3 to 11.7s in GHC HEAD. But if I use-fno-call-arity (Call arity analysis is not in 7.8), I still get 23s, which is still a regression over 16s, but better than the 30 I got with the unmodified library code.

comment:8 Changed 15 months ago by nomeata

The same is achieved with a much simpler change, namely the rule

"enumDeltaToInteger1"   [0] forall c n x . enumDeltaToIntegerFB c n x 1 = up_fb c n x 1

now I get 11.7s with -fcall-arity and 19.7s without, which is quite close the original 11.7s, and non-invasive enough to go into 7.8, I’d say. I’ll prepare a patch.

comment:9 Changed 15 months ago by nomeata

  • Milestone set to 7.8.1
  • Owner set to nomeata

comment:10 Changed 15 months ago by Joachim Breitner <mail@…>

In a60eeccf06c28bee3b87c561450320d17c7399e3/base:

Improve list fusion for [n::Integer..m]

enumFromTo for Integers goes via enumDeltaToInteger, which is less
efficient, as the "delta > = 0" check prevents more inlining which is
required for good fusion code. This rule avoids tihs check for the
common case of "delta = 1", makes up_fb visible and hence inlineable,
which greatly improves "length [n:Integer..m]"; even more so with
CallArity enabled. (#8766)

comment:11 Changed 15 months ago by Joachim Breitner <mail@…>

In f28ee96d7c246f78d5e8cb9c853f417e54ff3d91/base:

Wrong bug number

the previous commit added a testcase for #8766, not #8374 -- too many
tabs open.

comment:12 Changed 15 months ago by nomeata

  • Status changed from new to merge
  • Test Case set to T8755

Pushed, please merge to 7.8.

If you also merge the test case (the first performance test in libraries/base/tests – I hope there was no good reason why there weren’t any before) you probably have to adjust the numbers, unless we also merge the call-arity analysis (which I don’t think has seen enough testing to go into a stable release).

comment:13 Changed 15 months ago by thoughtpolice

  • Resolution set to fixed
  • Status changed from merge to closed

Merged.

comment:14 Changed 15 months ago by nomeata

  • Owner nomeata deleted
  • Resolution fixed deleted
  • Status changed from closed to new

Hmm, this is not fully fixed, it seems. Consider

f :: Integer -> Integer
f n = n
{-# NOINLINE f #-}

main :: IO ()
main = sum (map f [0 .. 10000]) `seq` return ()

Here, the rule

{-# RULES
"enumDeltaToInteger1"   [0] forall c n x . enumDeltaToIntegerFB c n x 1 = up_fb c n x 1
 #-}

does not fire and we end up with

main3 :: Integer -> Integer
main3 =
  enumDeltaToIntegerFB
    @ (Integer -> Integer) main6 (id @ Integer) main2 main5 main4

main5 :: Integer
main5 = __integer 1

and that despite that we have this code in Phase = 0:

    enumDeltaToIntegerFB
           @ (Integer -> Integer)
           (\ (x :: Integer) (ys [OS=OneShot] :: Integer -> Integer) ->
              let {
                ds [OS=ProbOneShot] :: Integer
                ds = f x } in
              \ (ds2 :: Integer) -> ys (plusInteger ds2 ds))
           (id @ Integer)
           (__integer 0)
           (__integer 1)
           (__integer 10000)
           (__integer 0)

The rule engine is still a bit of a mystery to me. Why does the rule not match reliably here?

comment:15 Changed 15 months ago by nomeata

  • Resolution set to fixed
  • Status changed from new to closed

Darn, ignore me. Used the binary from the wrong working copy. That’s what happens when I start work in ghc-master/ that I don’t get to push right away, and ghc/ becomes my working copy that tracks master...

Note: See TracTickets for help on using tickets.