Opened 2 months ago

Closed 8 days ago

Last modified 44 hours ago

#14257 closed bug (fixed)

Heap profiling with ghc and hp2ps and strict function application ($!) gives samples out of sequence (regression)

Reported by: carlostome Owned by: bgamari
Priority: high Milestone: 8.4.1
Component: Runtime System Version: 8.2.1
Keywords: profiler Cc: mnislaih, alexandersgreen, angerman
Operating System: Unknown/Multiple Architecture: Unknown/Multiple
Type of failure: None/Unknown Test Case: T14257
Blocked By: Blocking:
Related Tickets: #14006, #11645 Differential Rev(s): Phab:D4184
Wiki Page:


The following program generates an invalid .hp file when compiled with ghc 8.2.1 but it does not when using ghc 8.0.2.

{-# LANGUAGE ScopedTypeVariables #-}
module Main where

  eval :: forall a b. (a -> b -> b) -> b -> [a] -> b
  eval f b xs = load xs []
      load :: [a] -> [a] -> b
      load [] stk          = unload b stk
      load (x:xs) stk      = load xs (x : stk)

      unload :: b -> [a] -> b
      unload  v []         = v
      unload  v (x  : stk) = unload ((f $! x) $! v) stk

  main :: IO ()
  main = print (eval (||) False (True : replicate 10000000 False))

If strict application ($!) is substituted for normal application ($) or removed then the .hp generated file is correct.

For reproducing the error:

ghc -O2 --make -prof -fprof-auto Example.hs -fforce-recomp
./Example +RTS -hc
hp2ps -e8in -c Example.hp

It outputs:

hp2ps: Example.hp, line 43, samples out of sequence

Change History (16)

comment:1 Changed 2 months ago by bgamari

Indeed I can reproduce this verbatim. Interestingly enough, the sample in question is at the very end up the hp file and has no cost centers in it,

MAIN	160
(198)GHC.IO.Handle.FD.CAF	680
(218)GHC.Conc.Signal.CAF	640
(225)main	152
(223)Main.CAF	16
(206)GHC.IO.Encoding.Iconv.CAF	120
(208)GHC.IO.Encoding.CAF	1096
(114)PINNED	36816
END_SAMPLE 0.455729
END_SAMPLE 0.455359

comment:2 Changed 2 months ago by bgamari

So the bad sample is being written by endHeapProfiling which gets its timestamp from mut_user_time. This is in contrast to the samples produced during the bulk of the program's execution, which come from dumpCensus, which takes its timestamp from mut_user_time_until(t), where t comes from getProcessTimes (by way of gct->gc_start_cpu).

I've added a bit of instrumentation,

  • rts/ProfHeap.c

    a b void endProfiling( void ) 
    363363static void
    364364printSample(bool beginSample, StgDouble sampleValue)
     366    static StgDouble lastValue = 0;
     367    CHECK(sampleValue >= lastValue);
     368    printf("sampleValue=%f\n", sampleValue);
     369    lastValue = sampleValue;
     370    (void) lastValue;
    366372    fprintf(hp_file, "%s %f\n",
    367373            (beginSample ? "BEGIN_SAMPLE" : "END_SAMPLE"),
    368374            sampleValue);
  • rts/Stats.c

    diff --git a/rts/Stats.c b/rts/Stats.c
    index 6a5f80130e..773c55e431 100644
    a b Time stat_getElapsedTime(void) 
    7878mut_user_time_until( Time t )
     80    printf("mut_user_time_until %f %f \n", TimeToSecondsDbl(t), TimeToSecondsDbl(stats.gc_cpu_ns));
    8081    return TimeToSecondsDbl(t - stats.gc_cpu_ns);
    8182    // heapCensus() time is included in GC_tot_cpu, so we don't need
    8283    // to subtract it here.
    stat_endGC (Capability *cap, gc_thread *gct, 
    331332        stats.cumulative_par_balanced_copied_bytes +=
    332333            stats.gc.par_balanced_copied_bytes;
    333334    }
     335    printf("end gc %f\n", TimeToSecondsDbl(NSToTime(stats.gc.cpu_ns)));
    334336    stats.gc_cpu_ns += stats.gc.cpu_ns;
    335337    stats.gc_elapsed_ns += stats.gc.elapsed_ns;

Which produces the following,

end gc 0.000205
end gc 0.000195
mut_user_time_until 0.930493 0.603123 
end gc 0.000200
end gc 0.014411
mut_user_time_until 0.945025 0.617734 
hi: internal error: ASSERTION FAILED: file rts/ProfHeap.c, line 367

    (GHC version 8.3.20170919 for x86_64_unknown_linux)
    Please report this as a GHC bug:

I believe what is happening here is that we are producing the second-to-last sample before stat_endGC has been run. This means that stats.gc_cpu_ns is stale as it does not include the time spent during this present GC. Then, some time later we go to emit the final sample. However, now stats.gc_cpu_ns includes the time spent in the final GC, meaning that it may appear that we have "gone backwards" in time.

comment:3 Changed 2 months ago by bgamari

Milestone: 8.4.1
Owner: set to bgamari
Priority: normalhigh

comment:4 Changed 2 months ago by mnislaih

Cc: mnislaih added

comment:5 Changed 2 months ago by alexandersgreen

Cc: alexandersgreen added

comment:6 Changed 7 weeks ago by angerman

Cc: angerman added

comment:7 Changed 5 weeks ago by bgamari

Keywords: profiler added

comment:8 Changed 11 days ago by duog

I had a little look at this. The description in comment:2 looks right to me, except that I think ALL censuses are using stale stats.gc.cpu_ns values. I moved the stats_endGC call before the heapCensus call in phab:D4183, and this issue seems to be fixed. Of course it's quite hard to tell that this doesn't break something else.

comment:9 Changed 11 days ago by bgamari

Thanks for picking this up, duog!

I was in the middle of writing this reply with

The issue with this approach is that we will now have runtime which won't be accounted for. The heap census can be a considerable amount of time...

when I noticed that we already have a separate set of fields for accounting heap census time. Presumably this means that we were double-counting time previously. Anyways, this suggests to me at least that your approach should be fine.

Last edited 11 days ago by bgamari (previous) (diff)

comment:10 in reply to:  9 Changed 11 days ago by duog

when I noticed that we already have a separate set of fields for accounting heap census time. Presumably this means that we were double-counting time previously. Anyways, this suggests to me at least that your approach should be fine.

I don't think we were double counting. stat_endGC is only accounting for time, it doesn't call getProcessTimes.

EDIT: This is wrong, getProcessTimes IS called. Moreover, several places subtract heap census times and retainer profiling times from gc times, expecting those times to be included. This is easy to fix though.

Last edited 11 days ago by duog (previous) (diff)

comment:11 Changed 11 days ago by duog

Having done some more investigation, my understanding has improved. The diff linked in comment:8 is quite wrong.

heapCensus should be called during a garbage collection, and it's time should be included in GC stats. It's parameter should be the start time of the current gc.

I now believe the cause of this is a bug in how gc times are accumulated, see Phab:4184, which does fix this issue.

comment:12 Changed 9 days ago by duog

Differential Rev(s): Phab:D4184
Status: newpatch

comment:13 Changed 8 days ago by Ben Gamari <ben@…>

In d9f0c24d/ghc:

rts: Fix gc timing

We were accumulating the gc times of the previous gc.
`stats.gc.{cpu,elappsed}_ns` were being accumulated into
`stats.gc_{cpu,elapsed}_ns` before they were set.

There is also a change in that heap profiling will no longer cause gc
events to
be emitted.

Reviewers: bgamari, erikd, simonmar

Reviewed By: bgamari

Subscribers: rwbarton, thomie

GHC Trac Issues: #14257, #14445

Differential Revision:

comment:14 Changed 8 days ago by bgamari

Resolution: fixed
Status: patchclosed

comment:15 Changed 8 days ago by bgamari

Test Case: T14257

Test coming.

comment:16 Changed 44 hours ago by Ben Gamari <ben@…>

In bb2a08e1/ghc:

testsuite: Add test for #14257

Subscribers: rwbarton, thomie, duog

GHC Trac Issues: #14257

Differential Revision:
Note: See TracTickets for help on using tickets.