Fun with the demand analyser
Max writes: I've been trying to speed up the supercompiler, and in the process I've noticed some infelicities in the demand analyser that might be worth looking at. One is #5949 (closed). The other is:
Infelicity 2: a case where demand summarisation hurts us
I found practical examples where summarising the demand transfomer of a function as a single strictness signature hurt us. The examples were code like:
h :: (Int, Int) -> Int -> (Int, Int)
h x y = if y > 10
then x
else h (case h x 0 of (y1, y2) -> (y2, y1)) (y + 1)
If, at the innermost use of h, we re-analyse h in the demand C(C(U(LL))) rather than just unleashing the vanilla DmdSig computed from the demand C(C(S)) then we can unbox the pair argument. Currently GHC just gives h the DmdType SU(L) which doesn't eliminate the allocation of the pair at all.
This situation occurs in practice with code like:
c :: M.Map Int Int -> (Int, Int)
c m = M.foldrWithKey (\k v (a, b) -> if k + v > 2 then (a, b) else (b,
a)) (0, 1) m
The difference from my earlier example d is that I'm using the version
of foldrWithKey
that doesn't call seq
. Current GHC generates this
inner loop:
Rec {
CPR2.c_go2 [Occ=LoopBreaker]
:: (GHC.Types.Int, GHC.Types.Int)
-> Data.Map.Base.Map GHC.Types.Int GHC.Types.Int
-> (GHC.Types.Int, GHC.Types.Int)
[GblId,
Arity=2,
Caf=NoCafRefs,
Str=DmdType U(L*)S,
Unf=OtherCon []]
CPR2.c_go2 =
\ (z_spW :: (GHC.Types.Int, GHC.Types.Int))
(ds_spU :: Data.Map.Base.Map GHC.Types.Int GHC.Types.Int) ->
case ds_spU of _ {
Data.Map.Base.Tip -> z_spW;
Data.Map.Base.Bin rb_sqq kx_sq2 x_sq9 l_sqj r_sq5 ->
case kx_sq2 of _ { GHC.Types.I# x1_sqc ->
case CPR2.c_go2 z_spW r_sq5 of wild2_sqk { (a_sqh, b_sqg) ->
case x_sq9 of _ { GHC.Types.I# y_sqd ->
case GHC.Prim.+# x1_sqc y_sqd of sat_sqp { __DEFAULT ->
case GHC.Prim.># sat_sqp 2 of _ {
GHC.Types.False ->
let {
sat_sqo :: (GHC.Types.Int, GHC.Types.Int)
[LclId]
sat_sqo = (b_sqg, a_sqh) } in
CPR2.c_go2 sat_sqo l_sqj;
GHC.Types.True -> CPR2.c_go2 wild2_sqk l_sqj
}
}
}
}
}
}
end Rec }
I implemented this but the overhead of reanalysing a function at each occurrence site is prohibitive (with my changes paraffins took 2.5s to compile instead of 0.3s). It does fix the problem though.
A scheme like in "stricterness more relevant" but with let-binding that is polymorphic in the use-site demand might be able to detect the extra strictness here. I think Stefan Holdermanns has a paper describing such a system. But this is anyway much harder to fix than my first infelicity.