|Version 5 (modified by nomeata, 21 months ago) (diff)|
This is nomeata’s notepad about the nested CPR information:
- #1600 Main tickets where I mention progress.
Tickets with stuff that would make nested CPR better:
- #8598 CPR after IO (partly done)
- Everything in source:testsuite/tests/stranal/sigs/
- Does Nick Frisby’s late λ-lifting alliviate problems when CPR’ing join-points?
- Paper-Writeup of CPR
- Shouldn’t nested CPR help a lot with Complex-heavy code? Is there something in nofib?
- Try passing CPR information from the scrunitee to the pattern variables. For that: Reverse flow of analysis for complex scrunitees (for simple, we want the demand coming from the body, for complex, this is not so important.)
- Why is cacheprof not deterministic? (→ #8611)
- Use ticky-profiling to learn more about the effects of nested CPR.
- Look at DmdAnal-related [SLPJ-Tickets] and see which ones are affected by nested-cpr.
It would be nice to merge the code structure improvements and notes into master, to keep my branch short. But it is based on better-ho-cardinality, and that is not suitable for merging because of unexpected regressions even in nofib and rtak. So I am investigating.
In these tests, it is related to reading and showing data. Small example:
main = (read "10" :: Int) `seq` return ()
Baseline: 49832, better-ho-cardinality: 49968. Unfortunately, the changes to, for example, GHC.Read are not small, and probably mostly benign...
Trying to minimize the problem. This code has an increase in allocation (from 49448 to 49544)
import Text.Read import Text.ParserCombinators.ReadPrec main = (readPrec_to_S readPrec 0 "True":: [(Bool, String)]) `seq` return ()
while copying the definition of readPrec here, i.e.
import qualified Text.Read.Lex as L import Text.Read import Text.ParserCombinators.ReadPrec foo=parens ( do L.Ident s <- lexP case s of "True" -> return True "False" -> return False _ -> pfail ) main = (readPrec_to_S readPrec 0 "True":: [(Bool, String)]) `seq` return ()
yields a decrease (49240 → 49224).
It even happens with read "True" :: Bool So I tried to minimize the problem, which somehow seems to occur in the depths of the Read code. But after manually pasting all the related pieces from GHC.Read and Text.ParserCombinators.ReadP* in one file, the differences in allocations go away.
Maybe it is related to what inlining information about various functions cross module boundaries. For example, fMonadP_$cfail and other functions from Text.ParserCombinators.ReadP lose the InlineRule (1, True, True) annotation. Is that expected? Also, functions returning a ShowS have their arity increased. Can that be a reason for the increase of allocations?
- Should runSTRep be inlined (see ticket:1600#comment:34)?