but later, when foo1 has been w/w’ed, we inline it (i.e. the wrapper) in the post-w/w simplifer phase 0.
Considering inlining: foo1 arg infos [ValueArg, ValueArg] interesting continuation BoringCtxt some_benefit True is exp: True is work-free: True guidance ALWAYS_IF(arity=2,unsat_ok=True,boring_ok=False) ANSWER = YESInlining done: foo1 Inlined fn: \ @ a @ a w_s12e w_s12f -> case w_s12e of { (ww_s12i, ww_s12j) -> case w_s12f of { (ww_s12n, ww_s12o) -> case $wfoo1 ww_s12i ww_s12j ww_s12n ww_s12o of { (# ww_s12u, ww_s12v #) -> (ww_s12u, ww_s12v) } } } Cont: ApplyToTy a ApplyToTy a ApplyToVal nodup lvl_s11y ApplyToVal nodup (x, xs) Stop[BoringCtxt] (a, [a])
and shortly after, we inline the worker:
Considering inlining: $wfoo1_s12t arg infos [ValueArg, ValueArg, TrivArg, TrivArg] interesting continuation CaseCtxt some_benefit True is exp: True is work-free: True guidance IF_ARGS [60 0 0 0] 180 30 discounted size = -5 ANSWER = YESInlining done: $wfoo1 Inlined fn: \ @ a @ a ww_s12i ww_s12j ww_s12n ww_s12o -> let { ww_s12v ww_s12v = let { z z = map ww_s12i ww_s12o } in letrec { go go = \ ds -> case ds of { [] -> z; : y ys -> : (y ww_s12n) (go ys) }; } in go ww_s12j } in (# ww_s12i ww_s12n, ww_s12v #) Cont: ApplyToTy a ApplyToTy a ApplyToVal nodup ww_s12i ApplyToVal nodup ww_s12j ApplyToVal nodup ww_s12n ApplyToVal nodup ww_s12o Select nodup ww_s12s Stop[BoringCtxt] (a, [a])
So it seems that after splitting the function into two pieces, it is small enough(?) so that both pieces inline? But that seems to be suboptimal: If we are going to inline both pieces anyways, can we not do it earlier, and thus enable useful fusion?
So it seems that after splitting the function into two pieces, it is small enough(?) so that both pieces inline?
Yes that is galling I agree.
Part of the trouble is that strictness analysis does a deep semantic analysis, pulls all the evals to the top, inlines them unconditionally, leaving behind a worker that may now be a lot smaller. The sizeExpr code in CoreUnfold is necessarily much simpler.
The discount we award for a scrutinised argument is computed in size_up here:
alts_size (SizeIs tot tot_disc tot_scrut) -- Size of all alternatives (SizeIs max _ _) -- Size of biggest alternative = SizeIs tot (unitBag (v, 20 + tot - max) `unionBags` tot_disc) tot_scrut
For a single-alternative case (and you have a tuple arg here) tot = max, so there's fixed discount of 20. You could make that into a controllable flag and try varying it.
Idea. If a function starts with case x of blah (even if wrapped in lets) we know that the strictness analyser will find it strict in x. So we know it'll generate a wrapper, and the wrapper will inline. So in the end it'll be as if that case cost nothing at all. It would not be hard to make sizeExpr simply count zero for the cost of such cases; including nested. (Certainly for single-alternative ones.)
I just had a project where it made a difference whether I add {-# RULES "foldr/nil" forall k n . GHC.Base.foldr k n [] = n #-} to my file or not, despite this rule already being present in the library.
Might be a misunderstanding in how I use the GHC API. (It seems that the instance of map on the RHS of a rule has no rules in its IdInfo, and so it does not fire. I’ll ask on the mailing list about that once I get to it.)