Opened 9 years ago

Closed 7 years ago

#3181 closed bug (invalid)

Regression in unboxing

Reported by: dolio Owned by:
Priority: normal Milestone: 6.12.3
Component: Compiler Version: 6.10.2
Keywords: unboxing boxing Cc: pumpkingod@…
Operating System: Linux Architecture: x86_64 (amd64)
Type of failure: Runtime performance bug Test Case:
Blocked By: Blocking:
Related Tickets: Differential Rev(s):
Wiki Page:

Description

Greetings,

Finally having added some benchmarking to my uvector-algorithms package, I came across a regression in performance in the heap sort implementation. With 6.10.2, it did significantly more allocation than I'd remembered, and looking at the core, I spotted a boxed type that I have since determined to be the culprit. I'll attach the relevant portion of the algorithm as a test case (I apologize that it's rather large, but I haven't yet figured out how to make a smaller example that works).

The key lines are:

do (child' :*: ac) <- maximumChild cmp a off child len
   case cmp val ac of
     LT -> writeMU a (root + off) ac >> sift val child' len
     _  -> writeMU a (root + off) val

The monadic (ST) pattern match against the strict pair gets compiled to a function that accepts the arguments of the pair (as well as the ST state parameter) like so:

$w$j_s118 :: State# RealWorld
             -> Int
             -> Int#
             -> (# State# RealWorld, () #)
[Arity 3

$w$j_s118 =
  \ (w_s108 :: State# RealWorld)
    (ww_s10b :: Int)
    (ww1_s10e :: Int#) ->
    case <# sc3_s11t ww1_s10e of wild11_aY2 {
      False ->
        case writeIntArray#
               @ RealWorld
               marr#_aVT
               (+# sc2_s11s 40)
               sc3_s11t
               w_s108
        of s2#1_aX6 { __DEFAULT ->
        (# s2#1_aX6, () #)
        };
      True ->
        case writeIntArray#
               @ RealWorld
               marr#_aVT
               (+# sc2_s11s 40)
               ww1_s10e
               w_s108
        of s2#1_aX6 { __DEFAULT ->
        case ww_s10b of w1_X10F { I# ww2_X11k ->
        $s$wa_s11D s2#1_aX6 sc1_s11r ww2_X11k sc3_s11t
        }
        }
    }

As can be seen, all that is done with the boxed Int is that it is taken apart in one branch (this identifies the boxed argument as the child' variable, the Int being returned by maximumChild; I verified this by modifying the code to use a custom type:

data IP e = IP {-# UNPACK #-} !Int !e

this results in the desired unboxing behavior). Further, all calls to this function look like:

$w$j_s118 s2#3_XZ1 (I# x_aTU) r#_aXD

So this seems to be unnecessary boxing. Further, I have reports that 6.8.2 *did* spot this unboxing opportunity, which would explain why my algorithm wasn't performing as well as I had remembered, since the last time I'd seriously inspected the performance was in the 6.8 series.

One theory I had was that the fact that the value was only used in one branch of the case was causing it to look non-strict somehow, despite the pair being strict in its fields (perhaps the pair was eliminated before strictness analysis was done?). However, I've added bang patterns to the child' match, and changed the case statement to:

case child' `seq` cmp val ac of ...

without it making a difference. So I am a bit stumped as to why exactly GHC isn't eliminating this box.

As noted, I can get the desired unboxing with a custom unpacked type, but fixing the regression would be preferable. Thanks for your time and help!

Attachments (2)

StrictPair.hs (1.9 KB) - added by dolio 9 years ago.
A test case: a portion of a heap sort algorithm
Vector.hs (1.1 KB) - added by dolio 9 years ago.
A minimal implementation of Data.Array.Vector for this test.

Download all attachments as: .zip

Change History (16)

Changed 9 years ago by dolio

Attachment: StrictPair.hs added

A test case: a portion of a heap sort algorithm

comment:1 Changed 9 years ago by simonpj

difficulty: Unknown

Thanks for a nice well-characterised bug report.

I can't reproduce it because you didn't include Data.Vector. Could you do so? (It's v helpful for tickets to be self contained where possible.) If it's big, you may be able to cut it down to just what the test case needs. I don't necessarily even need to run it: just to compile it. So you may be able to go

module Vector( foo ) where
  foo :: Int -> Int
  foo = error "stub"

or stuff like that.

Anyway, I have seen cases like this before. Here's the relevant comment from Simplify.lhs:

Note [Case binders and join points]
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
Consider this 
   case (case .. ) of c {
     I# c# -> ....c....

If we make a join point with c but not c# we get
  $j = \c -> ....c....

But if later inlining scrutines the c, thus

  $j = \c -> ... case c of { I# y -> ... } ...

we won't see that 'c' has already been scrutinised.
An alternative plan is this:

   $j = \c# -> let c = I# c# in ...c....

but that is bad if 'c' is *not* later scrutinised.  

Whether or not it happens depends delicately on exactly when inlining takes place. Hence (I think) the regression.

I have a fix for my test HEAD, but I'd like to use your example as a test case.

Simon

comment:2 Changed 9 years ago by dolio

Ah, sorry. Data.Array.Vector is from the uvector package, which is available on hackage. If that's unacceptable, I can try to construct a new test case, or make a dummy module as you said, but my initial attempts writing do-nothing loops similar to the one above didn't trigger the bug (so obviously results are quite variable).

comment:3 Changed 9 years ago by simonpj

Not unacceptable, just less convenient. If you've failed to cut it down with a modest investment, then don't worry.

However, can you pls say exactly which version of uvector you were using? That at least helps to ensure the bug is remains reproducible.

Simon

comment:4 Changed 9 years ago by dolio

I was using uvector-0.2, which is only available via darcs currently. However, I just tested with 0.1.0.3 (currently the latest) from hackage and get the same results.

comment:5 Changed 9 years ago by simonpj

OK that's good enough I guess. I assume that 0.1.0.3 will stay there even if later versions get uploaded. Thx

Changed 9 years ago by dolio

Attachment: Vector.hs added

A minimal implementation of Data.Array.Vector for this test.

comment:6 Changed 9 years ago by dolio

I took the plunge and implemented a very reduced version of Data.Array.Vector (should be attached now).

It appears to generate similar code to me, and it does result in the unboxing bug, so I think this should be self-contained now. Obviously Vector.hs needs to go in Data/Array/ below StrictPair.hs

Hope that helps.

comment:7 Changed 9 years ago by pumpkin

Cc: pumpkingod@… added

comment:8 Changed 8 years ago by igloo

Milestone: 6.12.1

comment:9 Changed 8 years ago by igloo

Thanks for all your work in boiling down the testcase.

comment:10 Changed 8 years ago by simonmar

Type of failure: Runtime performance bug

comment:11 Changed 8 years ago by igloo

Milestone: 6.12.16.12.2

comment:12 Changed 8 years ago by igloo

I can't reproduce this. With 6.8.2:

$ ghc --version
The Glorious Glasgow Haskell Compilation System, version 6.8.2
$ ghc -O -c Vector.hs
$ ghc -O -c StrictPair.hs -ddump-simpl

I get this:

                  $w$j_sBq =
                    \ (w2_sAq :: GHC.Prim.State# GHC.Prim.RealWorld)
                      (ww2_sAt :: GHC.Base.Int)
                      (ww3_sAw :: GHC.Prim.Int#) ->
                      case w_sAC of wild11_auu { GHC.Base.I# x#_auw ->
                      case GHC.Prim.<# x#_auw ww3_sAw of wild2_axd {
                        GHC.Base.False ->
                          case GHC.Prim.==# x#_auw ww3_sAw of wild12_axg {
                            GHC.Base.False ->
                              case GHC.Prim.writeIntArray#
                                     @ GHC.Prim.RealWorld arr_avS (GHC.Prim.+# ww_sAF 40) x#_auw w2_sAq
                              of s'1_awJ { __DEFAULT ->
                              (# s'1_awJ, GHC.Base.() #)
                              };
                            GHC.Base.True ->
                              case GHC.Prim.writeIntArray#
                                     @ GHC.Prim.RealWorld arr_avS (GHC.Prim.+# ww_sAF 40) x#_auw w2_sAq
                              of s'1_awJ { __DEFAULT ->
                              (# s'1_awJ, GHC.Base.() #)
                              }
                          };
                        GHC.Base.True ->
                          case GHC.Prim.writeIntArray#
                                 @ GHC.Prim.RealWorld arr_avS (GHC.Prim.+# ww_sAF 40) ww3_sAw w2_sAq
                          of s'1_awJ { __DEFAULT ->
                          case ww2_sAt of w3_XAV { GHC.Base.I# ww4_XBA ->
                          $wa_sBo wild11_auu ww4_XBA ww1_sAJ s'1_awJ
                          }
                          }
                      }

which seems to have the same boxed Int. Can you say exactly how you are compiling it to get the result you want with 6.8.2 please?

comment:13 Changed 7 years ago by igloo

Milestone: 6.12.26.12.3
Status: newinfoneeded

comment:14 Changed 7 years ago by igloo

Resolution: invalid
Status: infoneededclosed

No response from submitter, so closing.

Note: See TracTickets for help on using tickets.