Performance regressions from removing LNE analysis
CoreToStg
's LNE analysis has been superseded by the occurrence analyzer's join-point analysis. However, there are a few cases where the occurrence analyzer doesn't find a join point where the old analysis would've found an LNE. Known causes include:
- The LNE analysis was untyped, which (very) occasionally allows an LNE where a join point would've fallen afoul of the polymorphism rule (see Note [The polymorphism rule of join points] in
CoreSyn.hs
). - Sometimes a rewrite rule prevents a binding from being christened a join point; such a binding would've been found by the LNE analysis (which runs after Core Tidy has removed rules from local binders).
The biggest loss in NoFib is spectral/cryptarithm2
, which saw a 0.3% increase in allocations because a rule prevented detection of a join point. Specifically, there's a go
that gets SpecConstr'd as an $sgo
. All the calls to $sgo
happen to be tail calls, but some calls to go
aren't, so it's only the rule that keeps $sgo
from becoming a join point.
In summary (adapted from the cryptarithm2
situation):
letrec $sgo = \x y z -> ... go {- non-tail call -} ...
{-# RULE "SPEC go" forall x y z. go ((x,y):z) = $sgo x y z #-}
go = \xs -> ... go ...
in ... go ... $sgo a b c {- tail call -} ...
The only thing keeping $sgo
from becoming a join point is the rule.
Possible solutions:
- Run the occurrence analyzer and the simplifier (or
simpleOptPgm
) after Tidy Core. Seems like the solution needing the least code. Problems:
* Tidy Core turns top-level binders into GlobalId
s. Would need to stop doing that there or adapt the occurrence analyzer and the simplifier (or simpleOptPgm
).
* Only covers case 2.
- Write a new pass that just throws out all the rules from the local binders in the module, then run it (and then the simplifier) at the end of the Core-to-Core pipeline, just before Core Tidy. (Local rules don't appear in ifaces anyway.)
* Needs a whole new traversal, albeit a very simple one. Though it's possible that dropping the rules would lead to other optimizations ($sgo
would end up pre-inlined in the above example), so it might be helpful anyway.
* Also only covers case 2.
- Write the LNE analysis as an STG-to-STG pass. Only idea that covers both cases.
* Needs even more code, and duplicative of the occurrence analyzer.
- Loosen the join-point invariants somehow. For example, if we could flag a rule as applying only to tail calls, that rule could contain a jump in its RHS even though its LHS isn't a jump.
* But this is already problematic in the above example—we also can't make $sgo
a join point so long as it's mutually recursive with go
, which it still is so long as the rule is there. In other words, even if the rule doesn't directly keep $sgo
from being a join point, it's still the only thing keeping it from breaking free of go
's recursive group, which it must do to become a join point.
* So it doesn't cover case 2 very well and it doesn't cover case 1 at all.