Easy SpecConstr opportunity that is nonetheless missed
I was looking at the code using that uses unsafe indexing into STUArrays from http://www.lix.polytechnique.fr/~kaustuv/expo/incr_uarray/
One of the reasons that this code runs so much more slowly than his C version is that the inner loop is not fully unboxed. It turns out that a simple SpecConstr opportunity is being missed, and I'm not sure why.
There is a local recursive function function that looks like this:
letrec {
$wa_X1yE [Occ=LoopBreaker]
:: forall s_aCz.
Data.Array.Base.STUArray
s_aCz GHC.Types.Int GHC.Types.Int
-> GHC.Prim.Int#
-> GHC.Prim.State# s_aCz
-> (# GHC.Prim.State# s_aCz, () #)
[LclId,
Arity=3,
Str=DmdType LLL,
Unf=Unf{Src=<vanilla>, TopLvl=False, Arity=3,
Value=True, ConLike=True, Cheap=True,
Expandable=True,
Guidance=IF_ARGS [2 0 0] 33 3}]
$wa_X1yE =
\ (@ s_XDW)
(w_X1yn
:: Data.Array.Base.STUArray
s_XDW GHC.Types.Int GHC.Types.Int)
(ww_X1yr :: GHC.Prim.Int#)
(w_X1yu :: GHC.Prim.State# s_XDW) ->
case GHC.Prim.># ww_X1yr y_aJN of _ {
GHC.Bool.False ->
case w_X1yn
of wild_XKN [Dmd=Just L]
{ Data.Array.Base.STUArray ds1_XKQ [Dmd=Just U]
ds2_XKT [Dmd=Just U]
n_XKW [Dmd=Just U(L)]
ds3_XKZ [Dmd=Just L] ->
case n_XKW
of _ { GHC.Types.I# x_XNb [Dmd=Just L] ->
letrec {
$wa_X1z0 [Occ=LoopBreaker]
:: GHC.Prim.Int#
-> GHC.Prim.Int#
-> GHC.Prim.State# s_XDW
-> (# GHC.Prim.State# s_XDW, () #)
[LclId,
Arity=3,
Str=DmdType LLL,
Unf=Unf{Src=<vanilla>, TopLvl=False, Arity=3,
Value=True, ConLike=True, Cheap=True,
Expandable=True,
Guidance=IF_ARGS [0 0 0] 13 3}]
$wa_X1z0 =
\ (ww_s1wO :: GHC.Prim.Int#)
(ww_s1wS :: GHC.Prim.Int#)
(w_s1wU :: GHC.Prim.State# s_XDW) ->
case GHC.Prim.># ww_s1wO ww_s1wS of _ {
GHC.Bool.False ->
case GHC.Prim.readIntArray#
@ s_XDW ds3_XKZ ww_s1wO w_s1wU
of _
{ (# s2#_aK9 [Dmd=Just L], e#_aKa [Dmd=Just L] #) ->
case GHC.Prim.writeIntArray#
@ s_XDW
ds3_XKZ
ww_s1wO
(GHC.Prim.+# e#_aKa ww_X1yr)
s2#_aK9
of s2#_aKF [Dmd=Just L] { __DEFAULT ->
$wa_X1z0
(GHC.Prim.+# ww_s1wO 1)
ww_s1wS
s2#_aKF
}
};
GHC.Bool.True ->
(# w_s1wU, GHC.Unit.() #)
}; } in
case $wa_X1z0 0 (GHC.Prim.-# x_XNb 1) w_X1yu
of _ { (# new_s_XMM [Dmd=Just L], _ #) ->
$wa_X1yE
@ s_XDW
wild_XKN
(GHC.Prim.+# ww_X1yr 1)
new_s_XMM
}
}
};
GHC.Bool.True -> (# w_X1yu, GHC.Unit.() #)
}; } in
$wa_X1yE
@ s_aCz wild_aJj (GHC.Prim.+# ww_s1x2 1) new_s_aL3
}
}
};
Is being specialised with a rule like:
[LclId,
Arity=3,
Str=DmdType LLL,
RULES: "SC:$wa0" [ALWAYS]
forall {sc_s1yQ
:: GHC.Prim.State#
GHC.Prim.RealWorld
sc_s1yR :: GHC.Prim.Int#
sc_s1yS :: GHC.Types.Int
sc_s1yT
:: GHC.Prim.MutableByteArray#
GHC.Prim.RealWorld}
$wa_X1yE @ GHC.Prim.RealWorld
(Data.Array.Base.STUArray
@ GHC.Prim.RealWorld
@ GHC.Types.Int
@ GHC.Types.Int
lvl_sSK
ww4_a1ss
sc_s1yS
sc_s1yT)
sc_s1yR
sc_s1yQ
= $s$wa_s1zl
sc_s1yQ sc_s1yR sc_s1yS sc_s1yT]
To the final code:
letrec {
$s$wa_s1zl
:: GHC.Prim.State# GHC.Prim.RealWorld
-> GHC.Prim.Int#
-> GHC.Types.Int
-> GHC.Prim.MutableByteArray# GHC.Prim.RealWorld
-> (# GHC.Prim.State# GHC.Prim.RealWorld, () #)
[LclId, Arity=4, Str=DmdType LLLL]
$s$wa_s1zl =
\ (sc_s1yQ :: GHC.Prim.State# GHC.Prim.RealWorld)
(sc_s1yR :: GHC.Prim.Int#)
(sc_s1yS :: GHC.Types.Int)
(sc_s1yT
:: GHC.Prim.MutableByteArray#
GHC.Prim.RealWorld) ->
case GHC.Prim.># sc_s1yR y_aJN of _ {
GHC.Bool.False ->
case sc_s1yS
of _ { GHC.Types.I# x_XNb [Dmd=Just L] ->
letrec {
$wa_X1z0 [Occ=LoopBreaker]
:: GHC.Prim.Int#
-> GHC.Prim.Int#
-> GHC.Prim.State# GHC.Prim.RealWorld
-> (# GHC.Prim.State# GHC.Prim.RealWorld,
() #)
[LclId, Arity=3, Str=DmdType LLL]
$wa_X1z0 =
\ (ww_s1wO :: GHC.Prim.Int#)
(ww_s1wS :: GHC.Prim.Int#)
(w_s1wU
:: GHC.Prim.State#
GHC.Prim.RealWorld) ->
case GHC.Prim.># ww_s1wO ww_s1wS of _ {
GHC.Bool.False ->
case GHC.Prim.readIntArray#
@ GHC.Prim.RealWorld
sc_s1yT
ww_s1wO
w_s1wU
of _
{ (# s2#_aK9 [Dmd=Just L], e#_aKa [Dmd=Just L] #) ->
case GHC.Prim.writeIntArray#
@ GHC.Prim.RealWorld
sc_s1yT
ww_s1wO
(GHC.Prim.+# e#_aKa sc_s1yR)
s2#_aK9
of s2#_aKF [Dmd=Just L] { __DEFAULT ->
$wa_X1z0
(GHC.Prim.+# ww_s1wO 1)
ww_s1wS
s2#_aKF
}
};
GHC.Bool.True ->
(# w_s1wU, GHC.Unit.() #)
}; } in
case $wa_X1z0 0 (GHC.Prim.-# x_XNb 1) sc_s1yQ
of _ { (# new_s_XMM [Dmd=Just L], _ #) ->
$wa_X1yE
@ GHC.Prim.RealWorld
(Data.Array.Base.STUArray
@ GHC.Prim.RealWorld
@ GHC.Types.Int
@ GHC.Types.Int
lvl_sSK
ww4_a1ss
sc_s1yS
sc_s1yT)
(GHC.Prim.+# sc_s1yR 1)
new_s_XMM
}
};
GHC.Bool.True -> (# sc_s1yQ, GHC.Unit.() #)
};
But this is daft! We can see from $wa_X1yE
that the third component of the STUArray is always (I# x_XNb)
. Why not unpack the constructor in the specialisation too?
(In fact, exactly the same pattern occurs at the original call site of the local recursive function, so this problem isn't because the specialisations are being seeded from the call site rather than the loop body).
To reproduce, compile the attached code with:
ghc -O2 -fforce-recomp --make STUArray.hs -ddump-simpl