Opened 3 years ago

Closed 3 years ago

#5314 closed bug (fixed)

"internal error: heapCencus, unknown object: 0" with retainer profiling

Reported by: akio Owned by: simonmar
Priority: high Milestone: 7.2.1
Component: Runtime System Version: 7.0.4
Keywords: Cc:
Operating System: Linux Architecture: x86_64 (amd64)
Type of failure: Runtime crash Difficulty:
Test Case: Blocked By:
Blocking: Related Tickets:

Description (last modified by igloo)

Compile the attached file as:

ghc --make retainer.hs -prof -rtsopts

And run it with:

./retainer +RTS -hr -V0

you get:

retainer: internal error: heapCensus, unknown object: 0
    (GHC version 7.0.4 for x86_64_unknown_linux)
    Please report this as a GHC bug:  http://www.haskell.org/ghc/reportabug

I found this test case while chasing a "Invalid object in isRetainer(): 39" problem, so this might be related to the ticket #4820. Also, although this particular case requires -V0 to reproduce, the original problem happened without -V0 as well.

Attachments (1)

retainer.hs (280 bytes) - added by akio 3 years ago.
test case

Download all attachments as: .zip

Change History (9)

Changed 3 years ago by akio

test case

comment:1 Changed 3 years ago by simonmar

  • Milestone set to 7.2.1
  • Priority changed from normal to high

Looks bad, we'll investigate. Thanks.

comment:2 Changed 3 years ago by igloo

  • Description modified (diff)

comment:3 Changed 3 years ago by igloo

  • Owner set to igloo

comment:4 Changed 3 years ago by igloo

  • Owner changed from igloo to simonmar
Breakpoint 3, heapCensusChain (census=0xcb3210, bd=0x7ffff6903e00)
    at rts/ProfHeap.c:934
934                 info = get_itbl((StgClosure *)p);
(gdb) n
935                 prim = rtsFalse;
(gdb) p *info
$23 = {prof = {closure_type_off = 1090616, __pad_closure_type_off = 0, 
    closure_desc_off = 1090624, __pad_closure_desc_off = 0}, layout = {
    payload = {ptrs = 1, nptrs = 0}, bitmap = 1, large_bitmap_offset = 1, 
    __pad_large_bitmap_offset = 1, selector_offset = 1}, type = 28, 
  srt_bitmap = 0, code = 0x8eefa0 "H\213[\030H\203\343\370\377#f\017\037D"}
(gdb) p p
$24 = (StgPtr) 0x7ffff69f8000
(gdb) pmem p 64
[...]
0x7ffff69f8088: 0xaaaaaaaaaaaaaaaa
0x7ffff69f8080: 0xaaaaaaaaaaaaaaaa
0x7ffff69f8078: 0xaaaaaaaaaaaaaaaa
0x7ffff69f8070: 0xaaaaaaaaaaaaaaaa
0x7ffff69f8068: 0x0
0x7ffff69f8060: 0xc15b28 <Main_CAFs_cc_ccs>
0x7ffff69f8058: 0x404618 <frame_dummy+1672>
0x7ffff69f8050: 0xc8bdc8 <stg_END_TSO_QUEUE_closure>
0x7ffff69f8048: 0xc8bdc8 <stg_END_TSO_QUEUE_closure>
0x7ffff69f8040: 0xc8bdc8 <stg_END_TSO_QUEUE_closure>
0x7ffff69f8038: 0x0
0x7ffff69f8030: 0xc15b28 <Main_CAFs_cc_ccs>
0x7ffff69f8028: 0x8ef628 <stg_MVAR_CLEAN_info>
0x7ffff69f8020: 0x7ffff69f93e0
0x7ffff69f8018: 0xc8bdc8 <stg_END_TSO_QUEUE_closure>
0x7ffff69f8010: 0x0
0x7ffff69f8008: 0xc8b500 <CCS_SYSTEM>
0x7ffff69f8000: 0x8eefa0 <stg_IND_info>
(gdb) p *bd
$25 = {start = 0x7ffff69f8000, free = 0x7ffff69f8078, link = 0x0, u = {
    back = 0x7ffff69f8078, bitmap = 0x7ffff69f8078, scan = 0x7ffff69f8078}, 
  gen = 0xc9e1a0, gen_no = 0, dest_no = 1, _pad1 = 0, flags = 1, blocks = 1, 
  _padding = {0, 0, 0}}
(gdb) n
937                 switch (info->type) {
(gdb) 
981                     size = BLACKHOLE_sizeW();
(gdb) 
982                     break;
(gdb) p size
$26 = 4
(gdb) n
1066                heapProfObject(census,(StgClosure*)p,size,prim);
(gdb) p sizeof(StgInd)
$27 = 32
(gdb) n
1068                p += size;
(gdb) 
933             while (p < bd->free) {
(gdb) p p
$28 = (StgPtr) 0x7ffff69f8020

To my untrained eye, it looks like p should probably actually have been increased by 5. We went through this case; is the comment right?:

            case IND:
                // Special case/Delicate Hack: INDs don't normally
                // appear, since we're doing this heap census right
                // after GC.  However, GarbageCollect() also does
                // resurrectThreads(), which can update some
                // blackholes when it calls raiseAsync() on the
                // resurrected threads.  So we know that any IND will
                // be the size of a BLACKHOLE.
                size = BLACKHOLE_sizeW();
                break;

comment:5 Changed 3 years ago by simonmar

It is an MVAR_TSO_QUEUE closure being overwritten by removeFromMVarBlockedQueue due to a resurrected thread at the end of GC. I need to think about how best to fix this.

comment:6 Changed 3 years ago by marlowsd@…

commit e903a09466c5a700baea8a34511cbdc2576b136e

Author: Simon Marlow <marlowsd@gmail.com>
Date:   Wed Jul 20 15:29:54 2011 +0100

    Move the call to heapCensus() into GarbageCollect(), just before
    calling resurrectThreads() (fixes #5314).
    
    This avoids a lot of problems, because resurrectThreads() may
    overwrite some closures in the heap, leaving slop behind.  The bug in
    instances, this fix avoids them all in one go.

 rts/Schedule.c |    9 ++++-----
 rts/sm/GC.c    |   10 ++++++++++
 rts/sm/GC.h    |    4 +++-
 3 files changed, 17 insertions(+), 6 deletions(-)

comment:7 Changed 3 years ago by simonmar

  • Status changed from new to merge

comment:8 Changed 3 years ago by igloo

  • Resolution set to fixed
  • Status changed from merge to closed
Note: See TracTickets for help on using tickets.