|Version 17 (modified by nomeata, 23 months ago) (diff)|
This is nomeata’s notepad about the nested CPR information:
- #1600 Main tickets where I mention progress.
Tickets with stuff that would make nested CPR better:
- #8598 CPR after IO (partly done)
- Everything in source:testsuite/tests/stranal/sigs/
- Does Nick Frisby’s late λ-lifting alleviate problems when CPR’ing join-points?
- Need to see if his branch can be merged onto master.
- Paper-Writeup of CPR
- Shouldn’t nested CPR help a lot with Complex-heavy code? Is there something in nofib?
- Try passing CPR information from the scrunitee to the pattern variables. For that: Reverse flow of analysis for complex scrunitees (for simple, we want the demand coming from the body, for complex, this is not so important.)
- Use ticky-profiling to learn more about the effects of nested CPR.
- Look at DmdAnal-related [SLPJ-Tickets] and see which ones are affected by nested-cpr.
- Do not destroy join points (see below).
- Can we make sure more stuff gets the Converging flag, e.g. after a case of an unboxed value? Should case binders get the Converging flag? What about pattern match variables in strict data constructors? Unboxed values?
- Why does nested CPR make some stuff so bad?
- Possibly because of character reboxing. Try avoiding CPR’ing C# alltogether!
Degradation exploration and explanation
At one point, I thought that a major contributor to increased allocations is nested-CPR’ing things returning String, causing them to return (# Char#, String #). But removing the CPR information from C# calls has zero effect on the allocations, both on master and on nested-cpr. It had very small (positive) effect on code size. Will have to look at Core...
Here are some case studies with extensive commenting of steps and results:
And here a summary of the problems identified, and solution attempts
- CPR kill join-points, because the wrapper does not completely cancel with anything else.
- Detecting join-points at the position of its binding is not enough.
- A recursive function can have a CPR-beneficial recursive call that makes CPR worthwhile, even if it does not help at the initial call. But it is also not unlikely that the recursive call is a tail-call, and CPR-ing has zero effect on that. Then it all depends on the external call.
CPR can kill join points.
Idea to fix this, and possibly more general benefits: http://www.haskell.org/pipermail/ghc-devs/2013-December/003481.html; prototype in branch wip/common-context.
- On its own, improvements are present but very small: http://www.haskell.org/pipermail/ghc-devs/2013-December/003500.html
- Enabling CPR for sum types in non-top-level-bindings (which is currently disabled due to worries abut lost join points) yields mixed results (min -3.8%, mean -0.0%, max 3.4%).
- Enabling sum types inside nested CPR: Also yields mixed, not very promising results (-6.9% / +0.0% / +11.3%).
Alternative: Detect join points during dmdAnal and make sure that their CPR info is not greater than that of the expression they are a join-point for. Would also fix #5075, see 5075#comment:19 for benchmark numbers.
- On its own, no changes.
- Enabling CPR for sumtypes: (min -3.8%, mean -0.0%, max 1.7%) (slightly better than with Common Context)
- Enabling sum types inside nested CPR: TBD
- Should runSTRep be inlined (see ticket:1600#comment:34)?
- Can we use Terminates CPR information to eagerly evaluate thunks? Yes, and there is a small gain there: #8655
- But why no allocation change? Understand this better!
- Can we statically and/or dynamically count the number of thunks, and the number of CBV’ed thunks?
- Why is cacheprof not deterministic? (→ #8611)
- What became of Simon’s better-ho-cardinality branch? See better-ho-cardinality.
- Try vtunes to get better numbers.