Strictness analysis regression
Edit: There were two issues discussed here. One is solved. I left the ticket open for the strictness analysis regression part. Analysis of strictness regression starts in comment 7 below.
I ran a simple benchmark that exercises Data.HashMap.Lazy.insert. It's 16% slower using HEAD compared to using 7.6.3. The generated Core is a bit different and the generated Cmm is quite a bit different.
Steps to reproduce
- Download the attached
HashMapInsert.hs
benchmark. - Install unordered-containers with both 7.6.3 and HEAD:
$ cabal install -w ghc-7.6.3 unordered-containers-0.2.3.3
$ cabal install -w inplace/bin/ghc-stage2 unordered-containers-0.2.3.3
- Compile the benchmark with both compilers:
$ ghc-7.6.3 -O2 HashMapInsert.hs
$ mv HashMapInsert HashMapInsertOld
$ inplace/bin/ghc-stage2 -O2 HashMapInsert.hs
$ mv HashMapInsert HashMapInsertNew
Results (best of 3 runs)
- 6.3
$ ./HashMapInsertOld +RTS -s
1,191,223,528 bytes allocated in the heap
141,978,520 bytes copied during GC
37,811,840 bytes maximum residency (8 sample(s))
22,378,432 bytes maximum slop
99 MB total memory in use (0 MB lost due to fragmentation)
Tot time (elapsed) Avg pause Max pause
Gen 0 2277 colls, 0 par 0.06s 0.06s 0.0000s 0.0002s
Gen 1 8 colls, 0 par 0.07s 0.10s 0.0127s 0.0479s
INIT time 0.00s ( 0.00s elapsed)
MUT time 0.24s ( 0.24s elapsed)
GC time 0.13s ( 0.17s elapsed)
EXIT time 0.00s ( 0.01s elapsed)
Total time 0.37s ( 0.41s elapsed)
%GC time 34.8% (40.3% elapsed)
Alloc rate 4,923,204,681 bytes per MUT second
Productivity 65.2% of total user, 59.0% of total elapsed
HEAD:
$ ./HashMapInsertNew +RTS -s
1,191,223,128 bytes allocated in the heap
231,158,688 bytes copied during GC
55,533,064 bytes maximum residency (13 sample(s))
22,378,488 bytes maximum slop
144 MB total memory in use (0 MB lost due to fragmentation)
Tot time (elapsed) Avg pause Max pause
Gen 0 2268 colls, 0 par 0.06s 0.07s 0.0000s 0.0003s
Gen 1 13 colls, 0 par 0.12s 0.16s 0.0127s 0.0468s
INIT time 0.00s ( 0.00s elapsed)
MUT time 0.25s ( 0.25s elapsed)
GC time 0.18s ( 0.23s elapsed)
EXIT time 0.00s ( 0.01s elapsed)
Total time 0.43s ( 0.49s elapsed)
%GC time 41.6% (47.5% elapsed)
Alloc rate 4,738,791,249 bytes per MUT second
Productivity 58.3% of total user, 51.9% of total elapsed
(Note that this is without the patches in #8885 (closed), so they're not the cause.)
An interesting difference is that we spend more time in GC in HEAD. I don't know if that's related.