Opened 7 years ago

Closed 7 years ago

#2546 closed bug (duplicate)

Reliable crash in checkBlackHoles

Reported by: nogin Owned by:
Priority: normal Milestone:
Component: Compiler Version: 6.8.2
Keywords: crash Cc:
Operating System: Linux Architecture: x86
Type of failure: Test Case:
Blocked By: Blocking:
Related Tickets: Differential Revisions:

Description

I hit a fully reproducible bug in scheduleCheckBlackHoles using GHC 6.8.2 under CentOS 5.2. The program has no FFI and no unsafe calls. Before reporting the bug, I've deleted all the binaries and recompiled using ghc -Wall -Werror -fwarn-simple-patterns -fwarn-tabs -fwarn-incomplete-record-updates -fwarn-monomorphism-restriction -fno-warn-name-shadowing -threaded -O2 -dcore-lint -o XXX YYY/*.hs. I run with +RTS -N3 -A10m -sstderr. The gdb output is given below.


P.S. The code in question is unfortunately proprietary and I doubt that I'd be able to share a test case. The code is very heavy on IORefs (including lots and lost of atomicModifyIORef) and uses a number of MVars (the model is - use atomicModifyIORef if possible; then use an MVar if atomicModifyIORef's output suggests we may need to block), but no STM.

P.P.S. I'd be more than happy to help in debugging, if somebody is willing to provide guidance (I am fairly new to Haskell, but have 10+ years of in-depth OCaml experience).

(gdb) run 4000 +RTS -N3 -A10m -sstderr
Starting program: XXX 4000 +RTS -N3 -A10m -sstderr
(no debugging symbols found)
(no debugging symbols found)
(no debugging symbols found)
[Thread debugging using libthread_db enabled]
XXX 4000 +RTS -N3 -A10m -sstderr
[New Thread 1118336 (LWP 20985)]
[New Thread 24583056 (LWP 21019)]
[New Thread 86469520 (LWP 21020)]
[New Thread 59771792 (LWP 21021)]
[New Thread 117144464 (LWP 21022)]

Program received signal SIGSEGV, Segmentation fault.
[Switching to Thread 59771792 (LWP 21021)]
0x080884b3 in scheduleCheckBlackHoles ()
(gdb) run  4000 +RTS -N3 -A10m -sstderr
The program being debugged has been started already.
Start it from the beginning? (y or n) y

Starting program: XXX 4000 +RTS -N3 -A10m -sstderr
(no debugging symbols found)
(no debugging symbols found)
(no debugging symbols found)
[Thread debugging using libthread_db enabled]
XXX 4000 +RTS -N3 -A10m -sstderr
[New Thread 1118336 (LWP 21029)]
[New Thread 24963984 (LWP 21061)]
[New Thread 68045712 (LWP 21062)]
[New Thread 78535568 (LWP 21063)]
[New Thread 89025424 (LWP 21064)]

Program received signal SIGSEGV, Segmentation fault.
[Switching to Thread 78535568 (LWP 21063)]
0x080884b3 in scheduleCheckBlackHoles ()
(gdb) bt
#0  0x080884b3 in scheduleCheckBlackHoles ()
#1  0x0a0d1f34 in ?? ()
#2  0x00000004 in ?? ()
#3  0x00000001 in ?? ()
#4  0x0a0bf0d8 in ?? ()
#5  0x00000001 in ?? ()
#6  0x08089497 in schedule ()
#7  0x0a0d1ee8 in ?? ()
#8  0x0001ff78 in ?? ()
#9  0x00000000 in ?? ()
(gdb) run  4000 +RTS -N3 -A10m -sstderr
The program being debugged has been started already.
Start it from the beginning? (y or n) y

Starting program: XXX 4000 +RTS -N3 -A10m -sstderr
(no debugging symbols found)
(no debugging symbols found)
(no debugging symbols found)
[Thread debugging using libthread_db enabled]
XXX 4000 +RTS -N3 -A10m -sstderr
[New Thread 1118336 (LWP 21065)]
[New Thread 26430352 (LWP 21097)]
[New Thread 36920208 (LWP 21098)]
[New Thread 130771856 (LWP 21099)]
[New Thread 47410064 (LWP 21100)]

Program received signal SIGSEGV, Segmentation fault.
[Switching to Thread 26430352 (LWP 21097)]
0x080884b3 in scheduleCheckBlackHoles ()
(gdb) bt
#0  0x080884b3 in scheduleCheckBlackHoles ()
#1  0x0977a4d4 in ?? ()
#2  0x00000004 in ?? ()
#3  0x00000001 in ?? ()
#4  0x0976e1f8 in ?? ()
#5  0x00000001 in ?? ()
#6  0x08089497 in schedule ()
#7  0x0977a488 in ?? ()
#8  0x00001f78 in ?? ()
#9  0x00000005 in ?? ()
#10 0x080bca50 in stg_NO_TREC_closure ()
#11 0x0977a4d4 in ?? ()
#12 0x0976e1f8 in ?? ()
#13 0x07126000 in ?? ()
#14 0x00001000 in ?? ()
#15 0x01934b5c in ?? ()
#16 0x07126000 in ?? ()
#17 0x0977a4d4 in ?? ()
#18 0x0977a488 in ?? ()
#19 0x0976e1f8 in ?? ()
#20 0x0976e1f8 in ?? ()
#21 0x0977a4d4 in ?? ()
#22 0x0977a488 in ?? ()
#23 0x0976e1f8 in ?? ()
#24 0x019344b8 in ?? ()
#25 0x08089b24 in workerStart ()
#26 0x080bdbdc in dummy_tso ()
#27 0x0977a488 in ?? ()
#28 0x00000000 in ?? ()

Change History (4)

comment:1 Changed 7 years ago by nogin

By adding the -debug -dppr-debug flags to the compilation command line, I was able to get a much better gdb backtrace:

(gdb) run 4000 +RTS -N3 -A10m -sstderr
Starting program: XXX 4000 +RTS -N3 -A10m -sstderr
[Thread debugging using libthread_db enabled]
XXX 4000 +RTS -N3 -A10m -sstderr
[New Thread 1118336 (LWP 21542)]
[New Thread 25070480 (LWP 21576)]
[New Thread 113068944 (LWP 21577)]
[New Thread 85584784 (LWP 21578)]
[New Thread 131009424 (LWP 21579)]

Program received signal SIGSEGV, Segmentation fault.
[Switching to Thread 131009424 (LWP 21579)]
0x0808c25c in checkBlackHoles (cap=0x8daf278) at Schedule.c:2952
2952    Schedule.c: Нет такого файла или каталога.
        in Schedule.c
(gdb) bt
#0  0x0808c25c in checkBlackHoles (cap=0x8daf278) at Schedule.c:2952
#1  0x0808a5b5 in scheduleCheckBlackHoles (cap=0x8daf278) at Schedule.c:941
#2  0x0808983a in schedule (initialCapability=0x8daf278, task=0x8dc4268) at Schedule.c:458
#3  0x0808b957 in workerStart (task=0x8dc4268) at Schedule.c:2528
#4  0x00d5f46b in start_thread () from /lib/libpthread.so.0
#5  0x00cb6dbe in clone () from /lib/libc.so.6
(gdb) run 4000 +RTS -N3 -A10m -sstderr
The program being debugged has been started already.
Start it from the beginning? (y or n) y

Starting program: XXX 4000 +RTS -N3 -A10m -sstderr
[Thread debugging using libthread_db enabled]
XXX 4000 +RTS -N3 -A10m -sstderr
[New Thread 1118336 (LWP 21588)]
[New Thread 25312144 (LWP 21620)]
[New Thread 62921616 (LWP 21621)]
[New Thread 103971728 (LWP 21622)]
[New Thread 73411472 (LWP 21623)]

Program received signal SIGSEGV, Segmentation fault.
[Switching to Thread 25312144 (LWP 21620)]
0x0808c25c in checkBlackHoles (cap=0x8787278) at Schedule.c:2952
2952    in Schedule.c
(gdb) bt
#0  0x0808c25c in checkBlackHoles (cap=0x8787278) at Schedule.c:2952
#1  0x0808a5b5 in scheduleCheckBlackHoles (cap=0x8787278) at Schedule.c:941
#2  0x0808983a in schedule (initialCapability=0x8787278, task=0x8793578) at Schedule.c:458
#3  0x0808b957 in workerStart (task=0x8793578) at Schedule.c:2528
#4  0x00d5f46b in start_thread () from /lib/libpthread.so.0
#5  0x00cb6dbe in clone () from /lib/libc.so.6

comment:2 Changed 7 years ago by nogin

  • Summary changed from Reliable crash in scheduleCheckBlackHoles to Reliable crash in checkBlackHoles

comment:3 Changed 7 years ago by nogin

This might be a dup of #1898 - will try upgrading to 6.8.3 to check if it's still there.

comment:4 Changed 7 years ago by nogin

  • Resolution set to duplicate
  • Status changed from new to closed

Indeed, this does not seem to be present in 6.8.3

Note: See TracTickets for help on using tickets.