Regression in unboxing
Greetings,
Finally having added some benchmarking to my uvector-algorithms package, I came across a regression in performance in the heap sort implementation. With 6.10.2, it did significantly more allocation than I'd remembered, and looking at the core, I spotted a boxed type that I have since determined to be the culprit. I'll attach the relevant portion of the algorithm as a test case (I apologize that it's rather large, but I haven't yet figured out how to make a smaller example that works).
The key lines are:
do (child' :*: ac) <- maximumChild cmp a off child len
case cmp val ac of
LT -> writeMU a (root + off) ac >> sift val child' len
_ -> writeMU a (root + off) val
The monadic (ST) pattern match against the strict pair gets compiled to a function that accepts the arguments of the pair (as well as the ST state parameter) like so:
$w$j_s118 :: State# RealWorld
-> Int
-> Int#
-> (# State# RealWorld, () #)
[Arity 3
$w$j_s118 =
\ (w_s108 :: State# RealWorld)
(ww_s10b :: Int)
(ww1_s10e :: Int#) ->
case <# sc3_s11t ww1_s10e of wild11_aY2 {
False ->
case writeIntArray#
@ RealWorld
marr#_aVT
(+# sc2_s11s 40)
sc3_s11t
w_s108
of s2#1_aX6 { __DEFAULT ->
(# s2#1_aX6, () #)
};
True ->
case writeIntArray#
@ RealWorld
marr#_aVT
(+# sc2_s11s 40)
ww1_s10e
w_s108
of s2#1_aX6 { __DEFAULT ->
case ww_s10b of w1_X10F { I# ww2_X11k ->
$s$wa_s11D s2#1_aX6 sc1_s11r ww2_X11k sc3_s11t
}
}
}
As can be seen, all that is done with the boxed Int is that it is taken apart in one branch (this identifies the boxed argument as the child'
variable, the Int
being returned by maximumChild
; I verified this by modifying the code to use a custom type:
data IP e = IP {-# UNPACK #-} !Int !e
this results in the desired unboxing behavior). Further, all calls to this function look like:
$w$j_s118 s2#3_XZ1 (I# x_aTU) r#_aXD
So this seems to be unnecessary boxing. Further, I have reports that 6.8.2 *did* spot this unboxing opportunity, which would explain why my algorithm wasn't performing as well as I had remembered, since the last time I'd seriously inspected the performance was in the 6.8 series.
One theory I had was that the fact that the value was only used in one branch of the case was causing it to look non-strict somehow, despite the pair being strict in its fields (perhaps the pair was eliminated before strictness analysis was done?). However, I've added bang patterns to the child'
match, and changed the case statement to:
case child' `seq` cmp val ac of ...
without it making a difference. So I am a bit stumped as to why exactly GHC isn't eliminating this box.
As noted, I can get the desired unboxing with a custom unpacked type, but fixing the regression would be preferable. Thanks for your time and help!
Trac metadata
Trac field | Value |
---|---|
Version | 6.10.2 |
Type | Bug |
TypeOfFailure | OtherFailure |
Priority | normal |
Resolution | Unresolved |
Component | Compiler |
Test case | |
Differential revisions | |
BlockedBy | |
Related | |
Blocking | |
CC | |
Operating system | |
Architecture |