Opened 9 months ago

Last modified 5 months ago

#14329 new bug

GHC 8.2.1 segfaults while bootstrapping master

Reported by: bgamari Owned by:
Priority: highest Milestone: 8.2.2
Component: Compiler Version: 8.2.1
Keywords: Cc:
Operating System: Unknown/Multiple Architecture: Unknown/Multiple
Type of failure: None/Unknown Test Case:
Blocked By: Blocking:
Related Tickets: #12960, #9065, #7762 Differential Rev(s): Phab:D4075
Wiki Page:

Description

Earlier this week the Linux/amd64 Harbormaster started failing somewhat reliably during validation. It seems the stage0 compiler (GHC 8.2.1) often fails with a segmentation fault. This seems to have started with ef26182e2014b0a2a029ae466a4b121bf235e4e4 although I suspect this isn't causal. I was able to capture a core dump of the crashing stage0 compiler which implicates the allocator,

Reading symbols from /opt/ghc/8.2.1/lib/ghc-8.2.1/bin/ghc...(no debugging symbols found)...done.
[New LWP 25151]
[New LWP 25160]
[New LWP 25156]
[New LWP 25158]
[New LWP 25157]
[Thread debugging using libthread_db enabled]
Using host libthread_db library "/lib/x86_64-linux-gnu/libthread_db.so.1".
Core was generated by `/opt/ghc/8.2.1/lib/ghc-8.2.1/bin/ghc -B/opt/ghc/8.2.1/lib/ghc-8.2.1 -hisuf hi -'.
Program terminated with signal SIGSEGV, Segmentation fault.
#0  0x00007f836aaa2c90 in ?? () from /opt/ghc/8.2.1/lib/ghc-8.2.1/bin/../rts/libHSrts_thr-ghc8.2.1.so
[Current thread is 1 (Thread 0x7f83711b5340 (LWP 25151))]
(gdb) bt
#0  0x00007f836aaa2c90 in ?? () from /opt/ghc/8.2.1/lib/ghc-8.2.1/bin/../rts/libHSrts_thr-ghc8.2.1.so
#1  0x00007f836aaa3211 in allocGroupOnNode () from /opt/ghc/8.2.1/lib/ghc-8.2.1/bin/../rts/libHSrts_thr-ghc8.2.1.so
#2  0x00007f836aa9dd41 in ?? () from /opt/ghc/8.2.1/lib/ghc-8.2.1/bin/../rts/libHSrts_thr-ghc8.2.1.so
#3  0x00007f836aa9deb9 in ?? () from /opt/ghc/8.2.1/lib/ghc-8.2.1/bin/../rts/libHSrts_thr-ghc8.2.1.so
#4  0x00007f836aa82a39 in ?? () from /opt/ghc/8.2.1/lib/ghc-8.2.1/bin/../rts/libHSrts_thr-ghc8.2.1.so
#5  0x00007f836aa7fc06 in ?? () from /opt/ghc/8.2.1/lib/ghc-8.2.1/bin/../rts/libHSrts_thr-ghc8.2.1.so
#6  0x00007f836aa9d461 in ?? () from /opt/ghc/8.2.1/lib/ghc-8.2.1/bin/../rts/libHSrts_thr-ghc8.2.1.so
#7  0x00007f836aaa423a in ?? () from /opt/ghc/8.2.1/lib/ghc-8.2.1/bin/../rts/libHSrts_thr-ghc8.2.1.so
#8  0x00007f836aaa4b3c in ?? () from /opt/ghc/8.2.1/lib/ghc-8.2.1/bin/../rts/libHSrts_thr-ghc8.2.1.so
#9  0x00007f836aa8bbc8 in ?? () from /opt/ghc/8.2.1/lib/ghc-8.2.1/bin/../rts/libHSrts_thr-ghc8.2.1.so
#10 0x00007f836aa8c912 in ?? () from /opt/ghc/8.2.1/lib/ghc-8.2.1/bin/../rts/libHSrts_thr-ghc8.2.1.so
#11 0x00007f836aa8da01 in scheduleWaitThread () from /opt/ghc/8.2.1/lib/ghc-8.2.1/bin/../rts/libHSrts_thr-ghc8.2.1.so
#12 0x00007f836aa99fae in hs_main () from /opt/ghc/8.2.1/lib/ghc-8.2.1/bin/../rts/libHSrts_thr-ghc8.2.1.so
#13 0x0000000000427038 in ?? ()
#14 0x00007f83694fd2b1 in __libc_start_main (main=0x426fd0, argc=119, argv=0x7fffcfee8078, init=<optimized out>, fini=<optimized out>, rtld_fini=<optimized out>, stack_end=0x7fffcfee8068) at ../csu/libc-start.c:291
#15 0x0000000000427069 in ?? ()

Change History (8)

comment:1 Changed 9 months ago by bgamari

Differential Rev(s): Phab:D4075
Status: newpatch

I wonder if we are running out of memory; The builder has only 4GB of RAM and four vCPUs. I have seen GHC segfault due to OOM conditions in the past.

I took a look at the allocator and noticed that we never actually check whether commit was successful. I've fixed this in Phab:D4075.

comment:2 Changed 8 months ago by Ben Gamari <ben@…>

In a69fa544/ghc:

rts/posix: Ensure that memory commit succeeds

Previously we wouldn't check that mmap would succeed. I suspect this may
have been the cause of #14329.

Test Plan: Validate under low-memory condition

Reviewers: simonmar, austin, erikd

Reviewed By: simonmar

Subscribers: rwbarton, thomie

GHC Trac Issues: #14329

Differential Revision: https://phabricator.haskell.org/D4075

comment:3 Changed 8 months ago by bgamari

Resolution: fixed
Status: patchclosed

comment:4 Changed 5 months ago by bgamari

Resolution: fixed
Status: closednew

It looks like the issue fixed in comment:2 isn't the only problem. We are still seeing segmentation faults on Harbormaster due to out-of-memory conditions. For instance,

(gdb) run
Starting program: /home/ben/ghc/inplace/lib/bin/ghc-stage1 -B/home/ben/ghc/inplace/lib -hisuf hi -osuf o -hcsuf hc -static -O0 -H64m -Wall -fllvm-fill-undef-with-garbage -Werror -Iincludes -Iincludes/dist -Iincludes/dist-derivedconstants/header -Iincludes/dist-ghcconstants/header -this-unit-id ghc-8.5 -hide-all-packages -i -icompiler/backpack -icompiler/basicTypes -icompiler/cmm -icompiler/codeGen -icompiler/coreSyn -icompiler/deSugar -icompiler/ghci -icompiler/hsSyn -icompiler/iface -icompiler/llvmGen -icompiler/main -icompiler/nativeGen -icompiler/parser -icompiler/prelude -icompiler/profiling -icompiler/rename -icompiler/simplCore -icompiler/simplStg -icompiler/specialise -icompiler/stgSyn -icompiler/stranal -icompiler/typecheck -icompiler/types -icompiler/utils -icompiler/vectorise -icompiler/stage2/build -Icompiler/stage2/build -icompiler/stage2/build/./autogen -Icompiler/stage2/build/./autogen -Icompiler/. -Icompiler/parser -Icompiler/utils -Icompiler/../rts/dist/build -Icompiler/stage2 -optP-DGHCI -optP-include -optPcompiler/stage2/build/./autogen/cabal_macros.h -package-id base-4.11.0.0 -package-id deepseq-1.4.3.0 -package-id directory-1.3.1.5 -package-id process-1.6.2.0 -package-id bytestring-0.10.8.2 -package-id binary-0.8.5.1 -package-id time-1.8.0.2 -package-id containers-0.5.10.2 -package-id array-0.5.2.0 -package-id filepath-1.4.1.2 -package-id template-haskell-2.13.0.0 -package-id hpc-0.6.0.3 -package-id transformers-0.5.5.0 -package-id ghc-boot-8.5 -package-id ghc-boot-th-8.5 -package-id ghci-8.5 -package-id unix-2.7.2.2 -package-id terminfo-0.4.1.1 -Wall -Wno-name-shadowing -Wnoncanonical-monad-instances -Wnoncanonical-monadfail-instances -Wnoncanonical-monoid-instances -this-unit-id ghc -XHaskell2010 -XNoImplicitPrelude -optc-DTHREADED_RTS -DGHCI_TABLES_NEXT_TO_CODE -DSTAGE=2 -Rghc-timing -O -dcore-lint -dno-debug-output -Wcpp-undef -no-user-package-db -rtsopts -Wnoncanonical-monad-instances -odir compiler/stage2/build -hidir compiler/stage2/build -stubdir compiler/stage2/build -dynamic-too -c compiler/types/OptCoercion.hs -o compiler/stage2/build/OptCoercion.o -dyno compiler/stage2/build/OptCoercion.dyn_o -fforce-recomp
[Thread debugging using libthread_db enabled]
Using host libthread_db library "/lib/x86_64-linux-gnu/libthread_db.so.1".

Program received signal SIGSEGV, Segmentation fault.
---Type <return> to continue, or q <return> to quit---
0x0000000002fcfe40 in alloc_mega_group ()
(gdb) bt
#0  0x0000000002fcfe40 in alloc_mega_group ()
#1  0x0000000002fd038d in allocGroupOnNode ()
#2  0x0000000002fe3dff in alloc_todo_block ()
#3  0x0000000002fe3f56 in todo_block_full ()
#4  0x0000000000406497 in evacuate ()
#5  0x00000000004074ec in scavenge_block ()
#6  0x0000000002fe3726 in scavenge_loop ()
#7  0x0000000002fd0ed8 in GarbageCollect ()
#8  0x0000000002fc5eeb in scheduleDoGC ()
#9  0x0000000002fc68ce in scheduleWaitThread ()
#10 0x0000000002fcf010 in hs_main ()
#11 0x0000000000422684 in main ()
(gdb) 

while building 1cb12eae648c964c411f4c83730f3db05e409f48.

comment:5 Changed 5 months ago by bgamari

comment:6 Changed 5 months ago by bgamari

Unfortunately the issue only happens less than one in ten runs even under rather strong memory pressure.

comment:8 Changed 5 months ago by bgamari

The recent spate of Harbormaster crashes seemingly began with the merge of Phab:D4341. However, I've tried reverting this patch with no apparent effect.

Note: See TracTickets for help on using tickets.