interpreter: sequence_ (replicate 100000000 (return ())) gobbles up memory

changed milestone to %8.6.1

changed weight to 3

changed milestone to %8.8.1

changed weight to 7

Bumping priority since this shouldn't be so hard to nail.

Trac metadata

Trac field	Value
Priority	low → high

assigned to @osa1

I'll look into this.

Confirmed that GHCi 7.10.3 runs this in very little memory while GHC HEAD uses about 6G.

Trac metadata

Trac field	Value
Version	8.4.3 → 8.5

This regression was introduced between 7.10 and 8.0, and reproducible with 8.4 and HEAD. I compared +RTS -s outputs of 7.10 and 8.4.

10:

$ ./ghci +RTS -s
GHCi, version 7.10.3: http://www.haskell.org/ghc/  :? for help
λ:2> sequence_ (replicate 100000000 (return ()))
λ:3>
Leaving GHCi.
  12,844,310,920 bytes allocated in the heap
      26,753,232 bytes copied during GC
       7,744,552 bytes maximum residency (6 sample(s))
         138,736 bytes maximum slop
              18 MB total memory in use (0 MB lost due to fragmentation)

                                     Tot time (elapsed)  Avg pause  Max pause
  Gen  0      1806 colls,     0 par    0.155s   0.158s     0.0001s    0.0055s
  Gen  1         6 colls,     0 par    0.096s   0.126s     0.0211s    0.0415s

  TASKS: 4 (1 bound, 3 peak workers (3 total), using -N1)

  SPARKS: 0 (0 converted, 0 overflowed, 0 dud, 0 GC'd, 0 fizzled)

  INIT    time    0.000s  (  0.002s elapsed)
  MUT     time    4.065s  (  6.690s elapsed)
  GC      time    0.251s  (  0.285s elapsed)
  EXIT    time    0.020s  (  0.023s elapsed)
  Total   time    4.365s  (  7.000s elapsed)

  Alloc rate    3,159,574,299 bytes per MUT second

  Productivity  94.3% of total user, 58.8% of total elapsed

gc_alloc_block_sync: 0
whitehole_spin: 0
gen[0].sync: 0
gen[1].sync: 0

4:

$ ghci +RTS -s
GHCi, version 8.4.3: http://www.haskell.org/ghc/  :? for help
Loaded GHCi configuration from /home/omer/rcbackup/.ghci
λ:1> sequence_ (replicate 100000000 (return ()))
λ:2>
Leaving GHCi.
  12,869,399,120 bytes allocated in the heap
  13,233,009,504 bytes copied during GC
   4,007,601,416 bytes maximum residency (14 sample(s))
      15,805,176 bytes maximum slop
            8652 MB total memory in use (0 MB lost due to fragmentation)

                                     Tot time (elapsed)  Avg pause  Max pause
  Gen  0       209 colls,     0 par    3.455s   3.486s     0.0167s    0.4864s
  Gen  1        14 colls,     0 par    7.993s  10.629s     0.7592s    4.4344s

  TASKS: 4 (1 bound, 3 peak workers (3 total), using -N1)

  SPARKS: 0 (0 converted, 0 overflowed, 0 dud, 0 GC'd, 0 fizzled)

  INIT    time    0.001s  (  0.001s elapsed)
  MUT     time    4.252s  (  6.710s elapsed)
  GC      time   11.448s  ( 14.115s elapsed)
  EXIT    time    0.000s  (  0.005s elapsed)
  Total   time   15.701s  ( 20.831s elapsed)

  Alloc rate    3,026,874,614 bytes per MUT second

  Productivity  27.1% of total user, 32.2% of total elapsed

gc_alloc_block_sync: 0
whitehole_spin: 0
gen[0].sync: 0
gen[1].sync: 0

Observations:

Productivity is 94% vs. 27%
Almost same amount of allocation in GHCs (very minor difference)
500x max. residency in GHC 8.4 (7,744,552 vs. 4,007,601,416)

So I think for some reason GHC 8.4 keeps retains more things which causes more GC cycles and longer pauses.

I compared the byte codes generated by both versions and they're identical so I suspect this is either a problem with the interpreter code or some library function.

Note to self: I think running the interpreter with profiled compiler should show us what's retaining all the extra objects in newer GHCs, but I currently can't build the compiler in prof way (the build seems to be broken). I'll try to see what's wrong with the build..

One thing I tried was to run compiled version of this program with GHC 7.10 and GHC 8.4 and the numbers were identical so maybe this isn't related with the library code.

My attempts to profile ghci revealed another bug: #15608 (closed)

Looking at +RTS -h output, we allocate and retain GBs of PAPs. Not sure what are those PAPs for and what's retaining them though.

I speculate that the giant PAP is

return () >> (return () >> (return () >> (... >> return ()...)))

Simon, do you have any guesses on what may be retaining those PAPs?

Btw, here's the Core generated for this expression: (output of -ddump-simpl)

GHC.Base.bindIO
  @ ()
  @ [()]
  (GHC.GHCi.ghciStepIO
     @ GHC.Types.IO
     GHC.GHCi.$fGHCiSandboxIOIO
     @ ()
     (Data.Foldable.sequence_
        @ []
        @ GHC.Types.IO
        @ ()
        Data.Foldable.$fFoldable[]
        GHC.Base.$fMonadIO
        (GHC.List.replicate
           @ (GHC.Types.IO ())
           (GHC.Types.I# 100000000#)
           (GHC.Base.return
              @ GHC.Types.IO GHC.Base.$fMonadIO @ () GHC.Tuple.()))))
  (\ (it_a1UK :: ()) ->
     GHC.Base.returnIO
       @ [()] (GHC.Types.: @ () it_a1UK (GHC.Types.[] @ ())))

More readable version:

GHC.Base.bindIO
    (ghciStepIO <the expression>)
    (\i -> GHC.Base.returnIO [i])

The Core identical to the Core generated by GHC 7.10.

One other thing I tried is -fno-it flag, which disables binding the it variable. It did not fix the leak.

Hmm. Thanks for showing the full code that GHCi compiles to bytecode. It looks like this:

let main_action = ...
in bindIO main_action (\it -> return [it])

So GHCi will build a thunk for main_action and will pass it to the compiled code for GHC.Base.bindIO.

But alas main_action is essentially this

main_action :: IO ()
main_action = sequence_ (replicate ...)

(I have omitted the ghcStepIO stuff; it is essentially the identity function, and I don't think it affects things.)

Now that main_action thunk will be updated, so if it remains live for any reason, we'll retain a huge PAP of the form described in an earlier comment return () >> (return () ...)).

Why is it being kept alive? I'm really not sure. The interpreter should be tail-calling bindIO...

I wonder if main_action could be floated to top level as a CAF before it gets to the bytecode generator?

Anyway, I hope that may help a bit.

I still have no idea what retains all those PAPs, but I realized that 8.0.1 is when we introduced -fexternal-interpreter. Perhaps that has to do with this issue (reminder: the leak was also introduced with 8.0.1).

Attached file ghc-stage2.hp (download).

Raw retainer profiler output

Attached file ghc-stage2.svg (download).

.svg of the retainer profiler output (generated with hp2pretty)

With D5351 we can run retainer profiler on ghci so I tried this with ghci +RTS -hb and used 50000000 as the input. I've attached the raw output and a .svg generated with hp2pretty. It's not too helpful: the SYSTEM retainer is retaining GBs of data but we don't have more details on what the retainer is.

mentioned in issue #16011

Trac metadata

Trac field	Value
Related	- → #16011

added runtime perf label

changed milestone to %8.10.1

added Phigh label

Possibly related: #14789

changed milestone to %9.0.1

mentioned in issue #18352

changed milestone to %9.2.1

changed milestone to %⊥

This will not be fixed in time for %9.2.1

Trac field	Value
Version	8.4.3
Type	Bug
TypeOfFailure	OtherFailure
Priority	low
Resolution	Unresolved
Component	GHCi
Test case
Differential revisions
BlockedBy
Related
Blocking
CC
Operating system
Architecture

interpreter: sequence_ (replicate 100000000 (return ())) gobbles up memory

Child items 0

Activity

interpreter: sequence_ (replicate 100000000 (return ())) gobbles up memory

Relates to

Activity