Strictness analysis regression

Edit: There were two issues discussed here. One is solved. I left the ticket open for the strictness analysis regression part. Analysis of strictness regression starts in comment 7 below.

I ran a simple benchmark that exercises Data.HashMap.Lazy.insert. It's 16% slower using HEAD compared to using 7.6.3. The generated Core is a bit different and the generated Cmm is quite a bit different.

Steps to reproduce

Download the attached HashMapInsert.hs benchmark.
Install unordered-containers with both 7.6.3 and HEAD:

$ cabal install -w ghc-7.6.3 unordered-containers-0.2.3.3
$ cabal install -w inplace/bin/ghc-stage2 unordered-containers-0.2.3.3

Compile the benchmark with both compilers:

$ ghc-7.6.3 -O2 HashMapInsert.hs
$ mv HashMapInsert HashMapInsertOld
$ inplace/bin/ghc-stage2 -O2 HashMapInsert.hs
$ mv HashMapInsert HashMapInsertNew

Results (best of 3 runs)

$ ./HashMapInsertOld +RTS -s
   1,191,223,528 bytes allocated in the heap
     141,978,520 bytes copied during GC
      37,811,840 bytes maximum residency (8 sample(s))
      22,378,432 bytes maximum slop
              99 MB total memory in use (0 MB lost due to fragmentation)

                                    Tot time (elapsed)  Avg pause  Max pause
  Gen  0      2277 colls,     0 par    0.06s    0.06s     0.0000s    0.0002s
  Gen  1         8 colls,     0 par    0.07s    0.10s     0.0127s    0.0479s

  INIT    time    0.00s  (  0.00s elapsed)
  MUT     time    0.24s  (  0.24s elapsed)
  GC      time    0.13s  (  0.17s elapsed)
  EXIT    time    0.00s  (  0.01s elapsed)
  Total   time    0.37s  (  0.41s elapsed)

  %GC     time      34.8%  (40.3% elapsed)

  Alloc rate    4,923,204,681 bytes per MUT second

  Productivity  65.2% of total user, 59.0% of total elapsed

HEAD:

$ ./HashMapInsertNew +RTS -s
   1,191,223,128 bytes allocated in the heap
     231,158,688 bytes copied during GC
      55,533,064 bytes maximum residency (13 sample(s))
      22,378,488 bytes maximum slop
             144 MB total memory in use (0 MB lost due to fragmentation)

                                    Tot time (elapsed)  Avg pause  Max pause
  Gen  0      2268 colls,     0 par    0.06s    0.07s     0.0000s    0.0003s
  Gen  1        13 colls,     0 par    0.12s    0.16s     0.0127s    0.0468s

  INIT    time    0.00s  (  0.00s elapsed)
  MUT     time    0.25s  (  0.25s elapsed)
  GC      time    0.18s  (  0.23s elapsed)
  EXIT    time    0.00s  (  0.01s elapsed)
  Total   time    0.43s  (  0.49s elapsed)

  %GC     time      41.6%  (47.5% elapsed)

  Alloc rate    4,738,791,249 bytes per MUT second

  Productivity  58.3% of total user, 51.9% of total elapsed

(Note that this is without the patches in #8885 (closed), so they're not the cause.)

An interesting difference is that we spend more time in GC in HEAD. I don't know if that's related.

Edited Mar 09, 2019 by tibbe

To upload designs, you'll need to enable LFS and have an admin enable hashed storage. More information