I consider this to be more of a curiosity than a bug, but it might still be worth investigating. The following line,
sequence_(replicate100000000(return()))
causes ghci to grow quite big (here it's 6GB on 64 bit Linux; it's roughly proportional to the 100000000 number).
This used to be better in ancient times--before version 8.0.1, ghci ran this code in what looks like constant memory.
(This came up indirectly on #haskell; somebody had adapted Hutton's game of life program, http://www.cs.nott.ac.uk/~gmh/life.lhs which uses busy waiting for a delay.)
Trac metadata
Trac field
Value
Version
8.4.3
Type
Bug
TypeOfFailure
OtherFailure
Priority
low
Resolution
Unresolved
Component
GHCi
Test case
Differential revisions
BlockedBy
Related
Blocking
CC
Operating system
Architecture
To upload designs, you'll need to enable LFS and have an admin enable hashed storage. More information
Child items
0
Show closed items
No child items are currently assigned. Use child items to break down this issue into smaller parts.
This regression was introduced between 7.10 and 8.0, and reproducible with 8.4 and HEAD. I compared +RTS -s outputs of 7.10 and 8.4.
10:
$ ./ghci +RTS -sGHCi, version 7.10.3: http://www.haskell.org/ghc/ :? for helpλ:2> sequence_ (replicate 100000000 (return ()))λ:3>Leaving GHCi. 12,844,310,920 bytes allocated in the heap 26,753,232 bytes copied during GC 7,744,552 bytes maximum residency (6 sample(s)) 138,736 bytes maximum slop 18 MB total memory in use (0 MB lost due to fragmentation) Tot time (elapsed) Avg pause Max pause Gen 0 1806 colls, 0 par 0.155s 0.158s 0.0001s 0.0055s Gen 1 6 colls, 0 par 0.096s 0.126s 0.0211s 0.0415s TASKS: 4 (1 bound, 3 peak workers (3 total), using -N1) SPARKS: 0 (0 converted, 0 overflowed, 0 dud, 0 GC'd, 0 fizzled) INIT time 0.000s ( 0.002s elapsed) MUT time 4.065s ( 6.690s elapsed) GC time 0.251s ( 0.285s elapsed) EXIT time 0.020s ( 0.023s elapsed) Total time 4.365s ( 7.000s elapsed) Alloc rate 3,159,574,299 bytes per MUT second Productivity 94.3% of total user, 58.8% of total elapsedgc_alloc_block_sync: 0whitehole_spin: 0gen[0].sync: 0gen[1].sync: 0
4:
$ ghci +RTS -sGHCi, version 8.4.3: http://www.haskell.org/ghc/ :? for helpLoaded GHCi configuration from /home/omer/rcbackup/.ghciλ:1> sequence_ (replicate 100000000 (return ()))λ:2>Leaving GHCi. 12,869,399,120 bytes allocated in the heap 13,233,009,504 bytes copied during GC 4,007,601,416 bytes maximum residency (14 sample(s)) 15,805,176 bytes maximum slop 8652 MB total memory in use (0 MB lost due to fragmentation) Tot time (elapsed) Avg pause Max pause Gen 0 209 colls, 0 par 3.455s 3.486s 0.0167s 0.4864s Gen 1 14 colls, 0 par 7.993s 10.629s 0.7592s 4.4344s TASKS: 4 (1 bound, 3 peak workers (3 total), using -N1) SPARKS: 0 (0 converted, 0 overflowed, 0 dud, 0 GC'd, 0 fizzled) INIT time 0.001s ( 0.001s elapsed) MUT time 4.252s ( 6.710s elapsed) GC time 11.448s ( 14.115s elapsed) EXIT time 0.000s ( 0.005s elapsed) Total time 15.701s ( 20.831s elapsed) Alloc rate 3,026,874,614 bytes per MUT second Productivity 27.1% of total user, 32.2% of total elapsedgc_alloc_block_sync: 0whitehole_spin: 0gen[0].sync: 0gen[1].sync: 0
Observations:
Productivity is 94% vs. 27%
Almost same amount of allocation in GHCs (very minor difference)
500x max. residency in GHC 8.4 (7,744,552 vs. 4,007,601,416)
So I think for some reason GHC 8.4 keeps retains more things which causes more GC cycles and longer pauses.
I compared the byte codes generated by both versions and they're identical so I suspect this is either a problem with the interpreter code or some library function.
Note to self: I think running the interpreter with profiled compiler should show us what's retaining all the extra objects in newer GHCs, but I currently can't build the compiler in prof way (the build seems to be broken). I'll try to see what's wrong with the build..
One thing I tried was to run compiled version of this program with GHC 7.10 and GHC 8.4 and the numbers were identical so maybe this isn't related with the library code.
(I have omitted the ghcStepIO stuff; it is essentially the
identity function, and I don't think it affects things.)
Now that main_action thunk will be updated, so if it
remains live for any reason, we'll retain a huge PAP
of the form described in an earlier comment return () >> (return () ...)).
Why is it being kept alive? I'm really not sure. The interpreter
should be tail-callingbindIO...
I wonder if main_action could be floated to top level as a CAF
before it gets to the bytecode generator?
I still have no idea what retains all those PAPs, but I realized that 8.0.1 is when we introduced -fexternal-interpreter. Perhaps that has to do with this issue (reminder: the leak was also introduced with 8.0.1).
With D5351 we can run retainer profiler on ghci so I tried this with ghci +RTS -hb and used 50000000 as the input. I've attached the raw output and a .svg generated with hp2pretty. It's not too helpful: the SYSTEM retainer is retaining GBs of data but we don't have more details on what the retainer is.