Opened 9 years ago
Closed 8 years ago
#3181 closed bug (invalid)
Regression in unboxing
Reported by: | dolio | Owned by: | |
---|---|---|---|
Priority: | normal | Milestone: | 6.12.3 |
Component: | Compiler | Version: | 6.10.2 |
Keywords: | unboxing boxing | Cc: | pumpkingod@… |
Operating System: | Linux | Architecture: | x86_64 (amd64) |
Type of failure: | Runtime performance bug | Test Case: | |
Blocked By: | Blocking: | ||
Related Tickets: | Differential Rev(s): | ||
Wiki Page: |
Description
Greetings,
Finally having added some benchmarking to my uvector-algorithms package, I came across a regression in performance in the heap sort implementation. With 6.10.2, it did significantly more allocation than I'd remembered, and looking at the core, I spotted a boxed type that I have since determined to be the culprit. I'll attach the relevant portion of the algorithm as a test case (I apologize that it's rather large, but I haven't yet figured out how to make a smaller example that works).
The key lines are:
do (child' :*: ac) <- maximumChild cmp a off child len case cmp val ac of LT -> writeMU a (root + off) ac >> sift val child' len _ -> writeMU a (root + off) val
The monadic (ST) pattern match against the strict pair gets compiled to a function that accepts the arguments of the pair (as well as the ST state parameter) like so:
$w$j_s118 :: State# RealWorld -> Int -> Int# -> (# State# RealWorld, () #) [Arity 3 $w$j_s118 = \ (w_s108 :: State# RealWorld) (ww_s10b :: Int) (ww1_s10e :: Int#) -> case <# sc3_s11t ww1_s10e of wild11_aY2 { False -> case writeIntArray# @ RealWorld marr#_aVT (+# sc2_s11s 40) sc3_s11t w_s108 of s2#1_aX6 { __DEFAULT -> (# s2#1_aX6, () #) }; True -> case writeIntArray# @ RealWorld marr#_aVT (+# sc2_s11s 40) ww1_s10e w_s108 of s2#1_aX6 { __DEFAULT -> case ww_s10b of w1_X10F { I# ww2_X11k -> $s$wa_s11D s2#1_aX6 sc1_s11r ww2_X11k sc3_s11t } } }
As can be seen, all that is done with the boxed Int is that it is taken apart in one branch (this identifies the boxed argument as the child'
variable, the Int
being returned by maximumChild
; I verified this by modifying the code to use a custom type:
data IP e = IP {-# UNPACK #-} !Int !e
this results in the desired unboxing behavior). Further, all calls to this function look like:
$w$j_s118 s2#3_XZ1 (I# x_aTU) r#_aXD
So this seems to be unnecessary boxing. Further, I have reports that 6.8.2 *did* spot this unboxing opportunity, which would explain why my algorithm wasn't performing as well as I had remembered, since the last time I'd seriously inspected the performance was in the 6.8 series.
One theory I had was that the fact that the value was only used in one branch of the case was causing it to look non-strict somehow, despite the pair being strict in its fields (perhaps the pair was eliminated before strictness analysis was done?). However, I've added bang patterns to the child'
match, and changed the case statement to:
case child' `seq` cmp val ac of ...
without it making a difference. So I am a bit stumped as to why exactly GHC isn't eliminating this box.
As noted, I can get the desired unboxing with a custom unpacked type, but fixing the regression would be preferable. Thanks for your time and help!
Attachments (2)
Change History (16)
Changed 9 years ago by
Attachment: | StrictPair.hs added |
---|
comment:1 Changed 9 years ago by
difficulty: | → Unknown |
---|
Thanks for a nice well-characterised bug report.
I can't reproduce it because you didn't include Data.Vector
. Could you do so? (It's v helpful for tickets to be self contained where possible.) If it's big, you may be able to cut it down to just what the test case needs. I don't necessarily even need to run it: just to compile it. So you may be able to go
module Vector( foo ) where foo :: Int -> Int foo = error "stub"
or stuff like that.
Anyway, I have seen cases like this before. Here's the relevant comment from Simplify.lhs
:
Note [Case binders and join points] ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ Consider this case (case .. ) of c { I# c# -> ....c.... If we make a join point with c but not c# we get $j = \c -> ....c.... But if later inlining scrutines the c, thus $j = \c -> ... case c of { I# y -> ... } ... we won't see that 'c' has already been scrutinised. An alternative plan is this: $j = \c# -> let c = I# c# in ...c.... but that is bad if 'c' is *not* later scrutinised.
Whether or not it happens depends delicately on exactly when inlining takes place. Hence (I think) the regression.
I have a fix for my test HEAD, but I'd like to use your example as a test case.
Simon
comment:2 Changed 9 years ago by
Ah, sorry. Data.Array.Vector is from the uvector package, which is available on hackage. If that's unacceptable, I can try to construct a new test case, or make a dummy module as you said, but my initial attempts writing do-nothing loops similar to the one above didn't trigger the bug (so obviously results are quite variable).
comment:3 Changed 9 years ago by
Not unacceptable, just less convenient. If you've failed to cut it down with a modest investment, then don't worry.
However, can you pls say exactly which version of uvector you were using? That at least helps to ensure the bug is remains reproducible.
Simon
comment:4 Changed 9 years ago by
I was using uvector-0.2, which is only available via darcs currently. However, I just tested with 0.1.0.3 (currently the latest) from hackage and get the same results.
comment:5 Changed 9 years ago by
OK that's good enough I guess. I assume that 0.1.0.3 will stay there even if later versions get uploaded. Thx
Changed 9 years ago by
A minimal implementation of Data.Array.Vector for this test.
comment:6 Changed 9 years ago by
I took the plunge and implemented a very reduced version of Data.Array.Vector (should be attached now).
It appears to generate similar code to me, and it does result in the unboxing bug, so I think this should be self-contained now. Obviously Vector.hs needs to go in Data/Array/ below StrictPair.hs
Hope that helps.
comment:7 Changed 9 years ago by
Cc: | pumpkingod@… added |
---|
comment:8 Changed 9 years ago by
Milestone: | → 6.12.1 |
---|
comment:10 Changed 8 years ago by
Type of failure: | → Runtime performance bug |
---|
comment:11 Changed 8 years ago by
Milestone: | 6.12.1 → 6.12.2 |
---|
comment:12 Changed 8 years ago by
I can't reproduce this. With 6.8.2:
$ ghc --version The Glorious Glasgow Haskell Compilation System, version 6.8.2 $ ghc -O -c Vector.hs $ ghc -O -c StrictPair.hs -ddump-simpl
I get this:
$w$j_sBq = \ (w2_sAq :: GHC.Prim.State# GHC.Prim.RealWorld) (ww2_sAt :: GHC.Base.Int) (ww3_sAw :: GHC.Prim.Int#) -> case w_sAC of wild11_auu { GHC.Base.I# x#_auw -> case GHC.Prim.<# x#_auw ww3_sAw of wild2_axd { GHC.Base.False -> case GHC.Prim.==# x#_auw ww3_sAw of wild12_axg { GHC.Base.False -> case GHC.Prim.writeIntArray# @ GHC.Prim.RealWorld arr_avS (GHC.Prim.+# ww_sAF 40) x#_auw w2_sAq of s'1_awJ { __DEFAULT -> (# s'1_awJ, GHC.Base.() #) }; GHC.Base.True -> case GHC.Prim.writeIntArray# @ GHC.Prim.RealWorld arr_avS (GHC.Prim.+# ww_sAF 40) x#_auw w2_sAq of s'1_awJ { __DEFAULT -> (# s'1_awJ, GHC.Base.() #) } }; GHC.Base.True -> case GHC.Prim.writeIntArray# @ GHC.Prim.RealWorld arr_avS (GHC.Prim.+# ww_sAF 40) ww3_sAw w2_sAq of s'1_awJ { __DEFAULT -> case ww2_sAt of w3_XAV { GHC.Base.I# ww4_XBA -> $wa_sBo wild11_auu ww4_XBA ww1_sAJ s'1_awJ } } }
which seems to have the same boxed Int
. Can you say exactly how you are compiling it to get the result you want with 6.8.2 please?
comment:13 Changed 8 years ago by
Milestone: | 6.12.2 → 6.12.3 |
---|---|
Status: | new → infoneeded |
comment:14 Changed 8 years ago by
Resolution: | → invalid |
---|---|
Status: | infoneeded → closed |
No response from submitter, so closing.
A test case: a portion of a heap sort algorithm