Opened 11 months ago

Last modified 10 months ago

#13193 new bug

Integer (gmp) performance regression?

Reported by: j.waldmann Owned by:
Priority: normal Milestone:
Component: Compiler Version: 8.0.1
Keywords: newcomer Cc:
Operating System: Linux Architecture: x86_64 (amd64)
Type of failure: Runtime performance bug Test Case:
Blocked By: Blocking:
Related Tickets: Differential Rev(s):
Wiki Page:

Description

For a simple program that uses Integer, but actual data would fit in an Int, ghc-8.0.2 seems to have more runtime overhead than ghc-6.12.3 had: http://www.imn.htwk-leipzig.de/~waldmann/etc/mob/

I am not really sure what to make of it - an answer could be: "if you know that your Integer fits in an Int, then just use Int".

Change History (12)

comment:1 Changed 11 months ago by simonpj

That's odd

  • Can you explain how to reproduce the performance change you see?
  • Can you say how big the perf change is?
  • Can you say why you think it's tied up with "does your Integer fit into an Int"?

Thanks

comment:2 Changed 11 months ago by j.waldmann

Hi.

  • how to reproduce: add the type declaration that prevents the defaulting
    a :: Int -> ST s Int -> ST s Int -> ST s Int -> ST s Int -> ST s Int -> ST s Int`
    
  • how big the change is
    ghc-6.12.3:  Int:  1.28 sec  Integer: 1.56 sec
    ghc-8.0.2 :  Int:  1.30 sec  Integer: 2.30 sec
    
  • "Integer fit in Int" - when I replace Integer by Int, the program produces the exact same result, so I assume there were no overflows. When I replace the numerals 1, -1 by something really large (+/- 1000000000000000000), the execution path should be identical (affine transformation). The Int result does no longer agree with the Integer result, and the Integer runtime goes up slightly.

comment:3 Changed 11 months ago by simonpj

So to paraphase: the difference between Int and Integer perf for 6.10 is smaller than 8.0.

Thanks. Does anyone feel able to take a few minutes to dig into why this is the case?

comment:4 Changed 11 months ago by mpickering

Keywords: newcomer added

comment:5 Changed 11 months ago by rwbarton

The allocations go up between 7.8 and 7.10, so I'm sure it has to do with the new integer-gmp library in 7.10. Will take a look.

comment:6 Changed 11 months ago by rwbarton

Yes, the additional allocations do have to do with the new integer-gmp, though they aren't entirely the integer-gmp library's fault. The additional allocations occur in subtraction (in pred).

-- | Subtract one 'Integer' from another.
minusInteger :: Integer -> Integer -> Integer
minusInteger x       (S# 0#)            = x
minusInteger (S# 0#) (S# INT_MINBOUND#) = Jp# (wordToBigNat ABS_INT_MINBOUND##)
minusInteger (S# 0#) (S# y#)            = S# (negateInt# y#)
minusInteger (S# x#) (S# y#)
  = case subIntC# x# y# of
    (# z#, 0# #) -> S# z#
    (# 0#, _  #) -> Jn# (wordToBigNat2 1## 0##)
    (# z#, _  #)
      | isTrue# (z# ># 0#) -> Jn# (wordToBigNat ( (int2Word# (negateInt# z#))))
      | True               -> Jp# (wordToBigNat ( (int2Word# z#)))
-- more cases, that aren't (S# _) (S# _)

Based on looking at the assembly, this seems to have been compiled in 7.10.1 to something like

minusInteger x y = case y of {
  S# y# -> case y# of {
    0# -> ...
    INT_MINBOUND# -> ...
    __DEFAULT__ -> let minusY = S# (negateInt# y#)
                   in case x of {
                      S# x# -> ...
                      Jp# {} -> ...
                      Jn# {} -> ...
                   }
    }
  Jp# {} -> ...
  Jn# {} -> ...
  }

Note that we allocate S# (negateInt# y#) even though we're unlikely to actually need it, and will never need it more than once.

I see two issues here:

  • There are many special cases in minusInteger, that are almost never helpful, and harmful in the common case of small integers (i.e., S#). The old code which began
    minusInteger (S# i)      (S# j)    = case subIntC# i j of ...
    
    was much better. But, by itself this shouldn't cause additional allocations, just additional work in the form of conditionals and branches.
  • GHC floated out S# (negateInt# y#), which was a bad choice: increased allocations, and never any gain, as far as I can see.

Aha, now that I look at the rest of minusInteger, (negateInt# y#) does appear again in some branches like

minusInteger (Jp# x) (S# y#)
  | isTrue# (y# >=# 0#) = bigNatToInteger (minusBigNatWord x (int2Word# y#))
  | True                = Jp# (plusBigNatWord x (int2Word# (negateInt# y#)))

But it's never used more than once per branch, and the S# box that we allocated is still not used.

comment:7 Changed 11 months ago by rwbarton

Aha! the source code I was looking at changed since the version of the object code I was examining. So disregard the second point above.

Phab:D2278 is new in HEAD; the old implementation was basically x - y = x + (-y), which has to allocate -y. I think the code could still be made somewhat shorter/better, but that would be a relatively minor improvement.

j.waldmann, could you test with a perf build of HEAD?

comment:8 Changed 11 months ago by rwbarton

Since I spent all that time looking at the wrong version of minusInteger anyways, I went ahead and simlified it slightly: Phab:D3034.

comment:9 Changed 11 months ago by simonpj

Thanks for working on this. If you think that HEAD is compiling code that is sub-optimal, can you boil it out to an example? But if it is just that the original Haskell forces GHC to generate bad code, then yes let's just improve the implementation of minusInteger.

I couldn't quite figure out which is the case from the notes above.

I did try to compile the code from comment:6. For the record, here's what I compiled; and it seemed to give good code (no allocation)

{-# LANGUAGE MagicHash, UnboxedTuples #-}

module Foo where
import Prelude ()
import GHC.Classes
import GHC.Magic
import GHC.Prim
import GHC.Types
#if WORD_SIZE_IN_BITS < 64
import GHC.IntWord64
#endif

#define INT_MINBOUND -300
#define ABS_INT_MINBOUND 300

wordToBigNat :: Word# -> BigNat
wordToBigNat x = 7

wordToBigNat2 :: Word# -> Word# -> BigNat
wordToBigNat2 _ lw# = 3

data Integer  = S#                !Int#
                -- ^ iff value in @[minBound::'Int', maxBound::'Int']@ range
              | Jp# {-# UNPACK #-} !BigNat
                -- ^ iff value in @]maxBound::'Int', +inf[@ range
              | Jn# {-# UNPACK #-} !BigNat
                -- ^ iff value in @]-inf, minBound::'Int'[@ range
type BigNat = Int

-- | Subtract one 'Integer' from another.
minusInteger :: Integer -> Integer -> Integer
minusInteger x       (S# 0#)            = x
minusInteger (S# 0#) (S# INT_MINBOUND#) = Jp# (wordToBigNat ABS_INT_MINBOUND##)
minusInteger (S# 0#) (S# y#)            = S# (negateInt# y#)
minusInteger (S# x#) (S# y#)
  = case subIntC# x# y# of
    (# z#, 0# #) -> S# z#
    (# 0#, _  #) -> Jn# (wordToBigNat2 1## 0##)
    (# z#, _  #)
      | isTrue# (z# ># 0#) -> Jn# (wordToBigNat ( (int2Word# (negateInt# z#))))
      | True               -> Jp# (wordToBigNat ( (int2Word# z#)))
-- more cases, that aren't (S# _) (S# _)

comment:10 in reply to:  9 Changed 11 months ago by rwbarton

Replying to simonpj:

Thanks for working on this. If you think that HEAD is compiling code that is sub-optimal, can you boil it out to an example? But if it is just that the original Haskell forces GHC to generate bad code, then yes let's just improve the implementation of minusInteger.

I couldn't quite figure out which is the case from the notes above.

Please disregard the second bullet point of comment:6. I was comparing compiled code from 7.10 with the source code from HEAD, but there was a change that I didn't know about in the source between those versions.

My hope is that this change (Phab:D2278 from this past May) will bring HEAD's performance back in line with 7.8's, but I don't have a suitable setup to test that at the moment. If it does so then we can close this ticket (optionally: after landing Phab:D3034, improving the implementation of minusInteger slightly).

comment:11 Changed 11 months ago by j.waldmann

test with perf build of HEAD

ghc-8.1.20170128  :  2.14 sec

so this is roughly the performance of 7.10, better than 8.0, but still not 7.8 or earlier. I also updated the data at http://www.imn.htwk-leipzig.de/~waldmann/etc/mob/

comment:12 Changed 10 months ago by rwbarton

Interesting. Looks like your test was before Phab:D3034 landed, so maybe that will help some.

There was another difference unrelated to integer-gmp when I compared the Core for 7.8 and 7.10, with something getting marked as one-shot in 7.10. I didn't investigate that closely (and I also found it very hard to understand what this benchmark program does). I may take another look later, but unlikely to be in the immediate future.

Note: See TracTickets for help on using tickets.