Segfault / Assertion failed in RTS (Compact.c)
Our application terminates with a segfault or an internal RTS error in about 80% of our testruns when we use the following runtime flags:
+RTS -G4 -H1g -c -I0
Without them the application runs fine. We discovered the problem only after having done many performance improvements to our code while doing stress tests with fast CPUs with many cores.
We compiled with the debugging runtime and got the following assertion failure:
SalviaDerivationGateway: internal error: ASSERTION FAILED: file rts/sm/Compact.c, line 171
(GHC version 7.0.1.20110121 for x86_64_unknown_linux)
Please report this as a GHC bug:
http://www.haskell.org/ghc/reportabug
We're testing with a custom GHC build from the GHC 7.0 branch (with patches until yesterday).
Without the debugging runtime we sometimes get segfaults and sometimes errors like:
SalviaDerivationGateway: internal error: scavenge_mark_stack: unimplemented/strange closure type 1970861226 @ 0x7f7578f488f8
(GHC version 7.0.1.20110121 for x86_64_unknown_linux)
Please report this as a GHC bug: http://www.haskell.org/ghc/reportabug
The last few system calls before a segfault are:
[pid 30727] rt_sigprocmask(SIG_BLOCK, [HUP INT], [], 8) = 0
[pid 30727] clock_gettime(0xfffffffa /* CLOCK_??? */, {147, 512463346}) = 0
[pid 30727] getrusage(RUSAGE_SELF, {ru_utime={126, 620000}, ru_stime={20, 890000}, ...}) = 0
[pid 30727] mmap(0x7fb643800000, 3145728, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS, -1, 0) = 0x7fb643400000
[pid 30727] --- SIGSEGV (Segmentation fault) @ 0 (0) ---
We were very concerned about the situation because an unstable runtime system really feels like we should better be using Java for "serious" applications. It's absolutely no problem now because we'll just not use the tuned runtime system flags. It might be a good idea to remove them entirely until they're known to work in busy applications. (Or at least include a warning.)
I don't understand any of the details but maybe the problem with retainer profiling (issue #4820 (closed)) has the same cause.
When testing new releases it would probably be a good idea to also test various flag combinations (maybe the GHC compiler binary could just choose some random values during startup if none are given ;-).
I hope this information is of some help. We haven't tried to reproduce the problem with a small test program as we're a bit in a hurry doing a release. If there is anything we can do to help to find the cause of the problem, please let us know.
Trac metadata
Trac field | Value |
---|---|
Version | 7.0.1 |
Type | Bug |
TypeOfFailure | OtherFailure |
Priority | normal |
Resolution | Unresolved |
Component | Runtime System |
Test case | |
Differential revisions | |
BlockedBy | |
Related | |
Blocking | |
CC | wehr@factisresearch.com |
Operating system | |
Architecture |