Opened 4 months ago

Closed 4 months ago

Last modified 4 months ago

#8638 closed feature request (fixed)

Optimize by demoting "denormalized" Integers (i.e. J# -> S#)

Reported by: hvr Owned by: hvr
Priority: normal Milestone: 7.8.1
Component: libraries (other) Version: 7.7
Keywords: integer-gmp Cc:
Operating System: Unknown/Multiple Architecture: Unknown/Multiple
Type of failure: Runtime performance bug Difficulty: Unknown
Test Case: Blocked By:
Blocking: Related Tickets: #8647

Description (last modified by hvr)

In the course of a recent recent reddit discussion it was highlighted, that integer-gmp doesn't try to demote J# result-values to the more efficient S# even though they would fit into a machine word.

The attached proof-of-concept patch introduces a "smart" J# constructor which constructs a S# value instead (if possible):

-- | Demote 'J#' to 'S#' if possible. See also 'smartJ#'.
toSmall :: Integer -> Integer
toSmall i@(S# _)  = i
toSmall (J# 0# _) = S# 0#
toSmall (J# 1# mb#)  | isTrue# (v ># 0#) = S# v
    where
      v = indexIntArray# mb# 0#
toSmall (J# -1# mb#) | isTrue# (v <# 0#) = S# v
    where
      v = negateInt# (indexIntArray# mb# 0#)
toSmall i         = i

-- | Smart 'J#' constructor which tries to construct 'S#' if possible
smartJ# :: Int# -> ByteArray# -> Integer
smartJ# s# mb# = toSmall (J# s# mb#)

The patch replaces a couple of J#-invocations which are likely to produce a S#-fitting Integer. A nofib comparison for vanilla GHC HEAD vs. patched GHC HEAD is attached for further discussion.

Attachments (5)

0001-Try-harder-to-demote-results-from-J-to-S-re-8638.patch (12.3 KB) - added by hvr 4 months ago.
nofib-smartJ-constructor.txt (186.6 KB) - added by hvr 4 months ago.
nofib comparision GHC HEAD vanilla vs. patched integer-gmp
nofib-smartJ-constructor-noDecodeFloat.txt (186.6 KB) - added by hvr 4 months ago.
0001-Try-harder-to-demote-results-from-J-to-S-re-8638.v2.patch (12.9 KB) - added by hvr 4 months ago.
nofib-smartJ-constructor-v2.txt​ (186.6 KB) - added by hvr 4 months ago.

Download all attachments as: .zip

Change History (19)

Changed 4 months ago by hvr

nofib comparision GHC HEAD vanilla vs. patched integer-gmp

comment:1 Changed 4 months ago by hvr

  • Description modified (diff)

comment:2 Changed 4 months ago by hvr

  • Status changed from new to patch

comment:3 follow-up: Changed 4 months ago by simonpj

  • Status changed from patch to infoneeded

Interesting. In general it seems like a good idea. Although making the test has a cost, if you get a win, it's a big win.

I'd first like to know why mandel started allocating nearly 50% more store. (I use -ticky for answering this kind of question.) Maybe investigating that will show how the patch can be improved further.

Simon

comment:4 in reply to: ↑ 3 ; follow-up: Changed 4 months ago by hvr

Replying to simonpj:

I'd first like to know why mandel started allocating nearly 50% more store. (I use -ticky for answering this kind of question.) Maybe investigating that will show how the patch can be improved further.

I'm not sure yet why mandel allocates more as I couldn't see find any significant use of Integers in the implementation. After some profiling I found out that swapping out the implementation of magnitude inside Mandel.diverge with a more naive one had a huge effect on the allocation.

Now the report reads:

--------------------------------------------------------------------------------
        Program           Size    Allocs   Runtime   Elapsed  TotalMem
--------------------------------------------------------------------------------
...
       maillist          +0.0%     +0.0%      0.04      0.04     +2.2%
         mandel          -0.2%    -45.4%      0.02      0.02     +0.0%
        mandel2          +0.0%     +0.0%      0.00      0.00     +0.0%
...
--------------------------------------------------------------------------------
            Min          -0.2%    -45.4%    -32.3%    -32.3%    -20.0%
            Max          +0.1%     +7.2%     +1.6%     +1.6%    +23.1%
 Geometric Mean          +0.1%     -1.3%     -1.7%     -1.8%     +0.0%

The modification I made was simply

  • spectral/mandel/Mandel.lhs

    diff --git a/spectral/mandel/Mandel.lhs b/spectral/mandel/Mandel.lhs
    index 3f460ce..cc601f5 100644
    a b is the prelude function that calculates the euclidean norm 
    109109 
    110110diverge::Complex Double -> Double -> Bool 
    111111diverge cmplx radius =  magnitude cmplx > radius 
     112  where 
     113    magnitude (x :+ y) = sqrt (x*x + y*y) 
    112114\end{code} 
    113115 

comment:5 in reply to: ↑ 4 Changed 4 months ago by hvr

Replying to hvr:

I'm not sure yet why mandel allocates more as I couldn't see find any significant use of Integers in the implementation. After some profiling I found out that swapping out the implementation of magnitude inside Mandel.diverge with a more naive one had a huge effect on the allocation.

Ok, found the culprit; magnitude uses scaleFloat which in turn uses decodeFloat/encodeFloat. I removed the smartJ# I put into decodeFloat and the allocation-delta for the mandel-benchmark went back to more or less 0. As the main purpose of decodeFloat seems to be to be used in combination with encodeFloat I guess there's actually little benefit trying to demote the significant to S# anyway...

I'll attach the new nofib report shortly.

comment:6 Changed 4 months ago by hvr

  • Status changed from infoneeded to patch

comment:7 follow-up: Changed 4 months ago by simonpj

Great. Now, what's happening in symalg and kahan (+4.7 and 6.7% resp)?

Working on these small deltas seems tiresome, but what is a small delta in one program can be a massive one in another. It's not that we must have uniform improvement; just that we like to know why something gets worse, and deem it an acceptable compromise.

What is puzzling to me is that I don't think smartJ# does any allocation (apart from its result) so I don't see why allocation should ever increase.

Thanks

Simon

comment:8 Changed 4 months ago by simonpj

  • Status changed from patch to infoneeded

comment:9 in reply to: ↑ 7 Changed 4 months ago by hvr

Replying to simonpj:

Great. Now, what's happening in symalg and kahan (+4.7 and 6.7% resp)?

Thunks are happening... (see below)

What is puzzling to me is that I don't think smartJ# does any allocation (apart from its result) so I don't see why allocation should ever increase.

...because I missed the fact that (# smartJ# ... , smartJ# ... #) creates a thunk (there was even a source code comment about that pitfall in quotRemInteger#)

However, I replaced all smartJ#-in-(#,#) occurences (incl. decodeDouble) by let !x = smartJ# ... in (# x, ... #) constructs, and the results look better now:

        Program           Size    Allocs   Runtime   Elapsed  TotalMem
--------------------------------------------------------------------------------
...
         gamteb          +0.1%    -19.0%      0.03      0.03     +0.0%
...

          kahan          +0.2%     -1.2%      0.17      0.17     +0.0%
...
         mandel          +0.1%     -7.7%      0.05      0.05     +0.0%
...
          power          +0.1%    -40.8%    -32.5%    -32.5%     +0.0%
...
         symalg          +0.2%     -0.5%      0.01      0.01     +0.0%
...
--------------------------------------------------------------------------------
            Min          +0.0%    -40.8%    -32.5%    -32.5%     -5.1%
            Max          +0.2%     +0.1%     +2.0%     +2.0%     +0.0%
 Geometric Mean          +0.1%     -1.0%     -2.5%     -2.5%     -0.1%

Changed 4 months ago by hvr

comment:10 Changed 4 months ago by hvr

  • Status changed from infoneeded to patch

comment:11 Changed 4 months ago by simonpj

Terrific, thanks. I'll commit.

Simon

comment:12 Changed 4 months ago by hvr

I've investigated some more how to optimize integer-gmp, see also #8647

The scary part is, that the Integer is involved in seemingly unrelated operations, such as operating on Double or showing Double values, causing several avoidable allocations.

comment:13 follow-up: Changed 4 months ago by simonpj

  • Resolution set to fixed
  • Status changed from patch to closed

Thanks. I committed two patches below. I'll close this one since you have opened a new ticket for #8647.

Simon

commit 301269aef0fb331bf272de4f6592eb71471a3b16
Author: Herbert Valerio Riedel <hvr@gnu.org>
Date:   Mon Dec 30 16:05:20 2013 +0100

    Try harder to demote results from `J#` to `S#` (re #8638)
    
    Signed-off-by: Herbert Valerio Riedel <hvr@gnu.org>


>---------------------------------------------------------------

301269aef0fb331bf272de4f6592eb71471a3b16
 GHC/Integer/Type.lhs |   92 ++++++++++++++++++++++++++++++++------------------
 1 file changed, 59 insertions(+), 33 deletions(-)

and

commit 3c93d7f61821345f29b9ee8a99346fa464d708a4
Author: Simon Peyton Jones <simonpj@microsoft.com>
Date:   Fri Jan 3 16:13:19 2014 +0000

    Refactor and comment the smartJ# changes (re Trac #8638)


>---------------------------------------------------------------

3c93d7f61821345f29b9ee8a99346fa464d708a4
 GHC/Integer/Type.lhs |   50 +++++++++++++++++++++++++++++++++++++++++---------
 1 file changed, 41 insertions(+), 9 deletions(-)

diff --git a/GHC/Integer/Type.lhs b/GHC/Integer/Type.lhs index c206462..77d529a 100644
--- a/GHC/Integer/Type.lhs
+++ b/GHC/Integer/Type.lhs
@@ -152,21 +152,52 @@ toBig i@(J# _ _) = i
 
 -- | Demote 'J#' to 'S#' if possible. See also 'smartJ#'.
 toSmall :: Integer -> Integer
-toSmall i@(S# _)  = i
-toSmall (J# 0# _) = S# 0#
-toSmall (J# 1# mb#)  | isTrue# (v ># 0#) = S# v
+toSmall i@(S# _)    = i
+toSmall (J# s# mb#) = smartJ# s# mb#
+
+
+-- | Smart 'J#' constructor which tries to construct 'S#' if possible 
+smartJ# :: Int# -> ByteArray# -> Integer smartJ# 0# _ = S# 0# smartJ# 
+1# mb#  | isTrue# (v ># 0#) = S# v
     where
       v = indexIntArray# mb# 0#
-toSmall (J# -1# mb#) | isTrue# (v <# 0#) = S# v
+smartJ# (-1#) mb# | isTrue# (v <# 0#) = S# v
     where
       v = negateInt# (indexIntArray# mb# 0#)
-toSmall i         = i
-
--- | Smart 'J#' constructor which tries to construct 'S#' if possible -smartJ# :: Int# -> ByteArray# -> Integer -smartJ# s# mb# = toSmall (J# s# mb#)
+smartJ# s# mb# = J# s# mb#
 \end{code}
 
+Note [Use S# if possible]
+~~~~~~~~~~~~~~~~~~~~~~~~~
+It's a big win to use S#, rather than J#, whenever possible.  Not only 
+does it take less space, but (probably more important) subsequent 
+operations are more efficient. See Trac #8638.
+
+'smartJ#' is the smart constructor for J# that performs the necessary 
+tests.  When returning a nested result, we always use smartJ# strictly, 
+thus
+       let !r = smartJ# a b in (# r, somthing_else #) to avoid creating 
+a thunk that is subsequently evaluated to a J#.
+smartJ# itself does a pretty small amount of work, so it's not worth 
+thunking it.
+
+We call 'smartJ#' in places like quotRemInteger where a big input might 
+produce a small output.
+
+Just using smartJ# in this way has good results:
+
+        Program           Size    Allocs   Runtime   Elapsed  TotalMem
+--------------------------------------------------------------------------------
+         gamteb          +0.1%    -19.0%      0.03      0.03     +0.0%
+          kahan          +0.2%     -1.2%      0.17      0.17     +0.0%
+         mandel          +0.1%     -7.7%      0.05      0.05     +0.0%
+          power          +0.1%    -40.8%    -32.5%    -32.5%     +0.0%
+         symalg          +0.2%     -0.5%      0.01      0.01     +0.0%
+--------------------------------------------------------------------------------
+            Min          +0.0%    -40.8%    -32.5%    -32.5%     -5.1%
+            Max          +0.2%     +0.1%     +2.0%     +2.0%     +0.0%
+ Geometric Mean          +0.1%     -1.0%     -2.5%     -2.5%     -0.1%
 
 %*********************************************************
 %*                                                      *
@@ -200,6 +231,7 @@ quotRemInteger (J# s1 d1) (J# s2 d2)
           (# s3, d3, s4, d4 #) -> let !q = smartJ# s3 d3
                                       !r = smartJ# s4 d4
                                   in (# q, r #)
+                           -- See Note [Use S# if possible]
 
 {-# NOINLINE divModInteger #-}
 divModInteger :: Integer -> Integer -> (# Integer, Integer #)

comment:14 in reply to: ↑ 13 Changed 4 months ago by hvr

Replying to simonpj:

Thanks. I committed two patches below.

For convenience, here are Trac links for the two commits:

Note: See TracTickets for help on using tickets.