Opened 2 years ago

Closed 2 months ago

#5909 closed bug (fixed)

Segfault with multi-threaded retainer profiling

Reported by: akio Owned by: simonmar
Priority: high Milestone: 7.6.2
Component: Runtime System Version: 7.4.1
Keywords: Cc: jwlato@…, simonmar
Operating System: Linux Architecture: x86_64 (amd64)
Type of failure: Runtime crash Difficulty: Unknown
Test Case: Blocked By:
Blocking: Related Tickets:

Description

I see the following program often segfaults when compiled and run like this:

% ghc -threaded -O2 -prof -fprof-auto segfault.hs -rtsopts
[1 of 1] Compiling Main             ( segfault.hs, segfault.o )
Linking segfault ...
% ./segfault aaaaaaaaaaaaaaaaaaaaaaa +RTS -hr -N5 -V -A512K
# Segfaults, often within a minute.

It also seems to me that it segfaults more often/quickly when some other process is actively running on the same machine. I kept "ghc -e 'last [1..]' +RTS -A13G" running while making the test case. The machine I used has 4 cores, 8 HT threads and 16GB RAM.

Attachments (2)

segfault.hs (678 bytes) - added by akio 2 years ago.
test case
evac.patch (1.7 KB) - added by akio 14 months ago.

Download all attachments as: .zip

Change History (13)

Changed 2 years ago by akio

test case

comment:1 Changed 2 years ago by simonmar

  • Difficulty set to Unknown
  • Milestone set to 7.4.2
  • Owner set to simonmar
  • Priority changed from normal to high

comment:2 Changed 23 months ago by igloo

  • Milestone changed from 7.4.2 to 7.4.3

comment:3 Changed 22 months ago by akio

If I run the program like this, I always get an immediate segfault:

./segfault +RTS -hrfoo -hc

comment:4 Changed 20 months ago by akio

The "-hrfoo -hc" one seems to be a separate bug, so I opened #7149 for it.

comment:5 Changed 19 months ago by igloo

  • Milestone changed from 7.4.3 to 7.6.2

comment:6 Changed 18 months ago by jwlato

  • Cc jwlato@… added

comment:7 Changed 14 months ago by akio

  • Status changed from new to patch

I think I have found a bug.

Segfaults often happen in the isMember function in rts/RetainerSet.h. It gets a segfault when tries to dereference rs, whose value is 0x4 or 0x6:

(gdb) run aaaaaaaaaaaaaaaaaaaaaaa +RTS -hr -N5 -V -A512K
Starting program: /home/akio/src/test/segfault aaaaaaaaaaaaaaaaaaaaaaa +RTS -hr -N5 -V -A512K
[Thread debugging using libthread_db enabled]
Using host libthread_db library "/lib/x86_64-linux-gnu/libthread_db.so.1".
[New Thread 0x7ffff6acd700 (LWP 14398)]
[New Thread 0x7ffff62cc700 (LWP 14399)]
[New Thread 0x7fffe7fff700 (LWP 14400)]
[New Thread 0x7fffeffff700 (LWP 14401)]
[New Thread 0x7ffff56ff700 (LWP 14402)]
[New Thread 0x7ffff4efe700 (LWP 14403)]

Program received signal SIGSEGV, Segmentation fault.
isMember (rs=<optimized out>, r=0xd46640) at rts/RetainerSet.h:140
140	  if (rs->num < BINARY_SEARCH_THRESHOLD) {
(gdb) bt
#0  isMember (rs=<optimized out>, r=0xd46640) at rts/RetainerSet.h:140
#1  retainClosure (c0=<optimized out>, cp0=<optimized out>, r0=<optimized out>)
    at rts/RetainerProfile.c:1625
#2  0x0000000000946c4e in retain_PAP_payload (pap=0x7ffff5768190, c_child_r=0xd46640, 
    fun=<optimized out>, payload=0x7ffff57681b8, n_args=1) at rts/RetainerProfile.c:1413
#3  0x0000000000946674 in retainClosure (c0=<optimized out>, cp0=<optimized out>, 
    r0=<optimized out>) at rts/RetainerProfile.c:1686
#4  0x0000000000951e4f in markStablePtrTable (evac=0x946a60 <retainRoot>, user=0x0)
    at rts/Stable.c:364
#5  0x00000000009471b5 in computeRetainerSet () at rts/RetainerProfile.c:1775
#6  retainerProfile () at rts/RetainerProfile.c:1979
#7  0x0000000000943be0 in heapCensus (t=<optimized out>) at rts/ProfHeap.c:1086
#8  0x000000000095bd0a in GarbageCollect (collect_gen=<optimized out>, do_heap_census=rtsTrue, 
    gc_type=<optimized out>, cap=0xd57240) at rts/sm/GC.c:735
#9  0x000000000094e7a8 in scheduleDoGC (pcap=<optimized out>, task=0xdc7740, 
    force_major=rtsFalse) at rts/Schedule.c:1643
#10 0x000000000094f4b2 in schedule (initialCapability=<optimized out>, task=0xdc7740)
    at rts/Schedule.c:553
#11 0x0000000000950a15 in scheduleWaitThread (tso=<optimized out>, ret=<optimized out>, 
    pcap=0x7fffffffe040) at rts/Schedule.c:2345
#12 0x00000000009498ee in real_main () at rts/RtsMain.c:63
#13 0x00000000009499ea in hs_main (argc=7, argv=0x7fffffffe1a8, main_closure=0xcb99b0, 
    rts_config=...) at rts/RtsMain.c:114
#14 0x0000000000407f47 in main ()
(gdb) f 1
#1  retainClosure (c0=<optimized out>, cp0=<optimized out>, r0=<optimized out>)
    at rts/RetainerProfile.c:1625
1625		if (isMember(r, retainerSetOfc))
(gdb) p c->header.prof.hp 
$4 = {rs = 0x5, ldvw = 5}
(gdb) p flip
$5 = 1

The attached patch seems to fix this. Probably it can be made more efficient by using the fact that biographical profiling and retainer profiling are never turned on at the same time, but I'm not sure exactly how it should be implemented.

Changed 14 months ago by akio

comment:8 Changed 14 months ago by simonmar

Interesting, thanks for the patch, I'll take a look.

comment:9 Changed 14 months ago by marlowsd@…

commit a5879a6c2412452fbda8c96e9d921c35279b9d9d

Author: Simon Marlow <marlowsd@gmail.com>
Date:   Tue Feb 19 09:58:31 2013 +0000

    Fix segfault in retainer profiling when using multiple cores (#5909)
    
    Thanks to @akio on the ticket for the diagnosis and the patch.  I
    modified the comments a bit.

 rts/sm/Evac.c |   17 +++++++++++++++--
 1 files changed, 15 insertions(+), 2 deletions(-)

comment:10 Changed 14 months ago by simonmar

  • Status changed from patch to merge

Thanks for the patch, I think it was spot on. Well done!

comment:11 Changed 2 months ago by thoughtpolice

  • Cc simonmar added
  • Resolution set to fixed
  • Status changed from merge to closed
Note: See TracTickets for help on using tickets.