#7185 closed bug (fixed)

Compiled program crashes

Reported by: waldheinz Owned by: simonmar
Priority: high Milestone: 7.6.1
Component: Compiler Version: 7.4.1
Keywords: Cc:
Operating System: Linux Architecture: x86_64 (amd64)
Type of failure: Runtime crash Difficulty: Unknown
Test Case: Blocked By:
Blocking: Related Tickets:

Description

I have a program which compiles fine, but the resulting executable crashes. First the steps to reproduce:

git clone git@github.com:waldheinz/bling.git
git checkout e2bad3ca6be2409386d28796997709318cf6ff64
cabal configure
cabal build
./dist/build/bling/bling examples/cornell-box-underwater.bling

This will result in either a segfault or produce an "internal error" message, at least for me and two other who tried it as well. The error message is

bling: internal error: scavenge_one: strange object -1083673327
    (GHC version 7.7.20120823 for x86_64_unknown_linux)

The "strange object" varies. Some random observations which might be useful:

  • this happens with GHC 7.4.1 and a fresh compile of HEAD (GHC version 7.7.20120823)
  • I sanitized the code to be completely free of unsafe* function calls. Before, there was quite some unsafe array reading/writing using the vector package. This did not change anything, but the revision above reflects this "safe" state, just in case...
  • the problem first occured when I change the SPPM.mkHash function to use the Utils.GrowVec? type instead of lists for it's intermediate results. [1] GrowVec? is basically a wrapper around a vector which doubles the size when space is exhausted.
  • the optimization level does not seem to affect the problem (it occurs even with -O0 and everything else removed)

Running under GDB gives this stack trace:

#1  0x00000031d12370d8 in abort () from /lib64/libc.so.6
#2  0x0000000000cb4765 in rtsFatalInternalErrorFn ()
#3  0x0000000000cb48dd in barf ()
#4  0x0000000000cd30e9 in scavenge_one ()
#5  0x0000000000cd3645 in scavenge_mutable_list ()
#6  0x0000000000cd3835 in scavenge_capability_mut_lists ()
#7  0x0000000000cb98bc in GarbageCollect ()
#8  0x0000000000cac043 in scheduleDoGC.isra.20 ()
#9  0x0000000000cacabf in scheduleWaitThread ()
#10 0x0000000000cb683e in real_main ()
#11 0x0000000000cb693a in hs_main ()
#12 0x0000000000407003 in main ()

I'm currently trying to debug this further, but my abilities on this front are limited...

[1] https://github.com/waldheinz/bling/commit/e1dc7b3c7e66cdd21a5aa46f9f96ca9448c52407#commitcomment-1761893

Attachments (2)

log.txt.bz2 (39.3 KB) - added by waldheinz 20 months ago.
strange stderr output when compiling the SPPM module
Main.hs (1.5 KB) - added by waldheinz 20 months ago.
self-contained example

Download all attachments as: .zip

Change History (8)

Changed 20 months ago by waldheinz

strange stderr output when compiling the SPPM module

comment:1 Changed 20 months ago by waldheinz

One thing I forgot to mention in the original report was that when compiling the SPPM module, GHC produces a huge message on stderr. It starts with

dmdFix loop
    10 Sigs: [(( bling-0.1:Graphics.Bling.Renderer.SPPM.traceCam{v r3kK} ...

an goes on for about 11000 lines for regular builds, and ~5000 lines for profiling builds. Also, the problem can be reproduced relative fast when setting the imageSize in the scene description file to something smaller like "imageSize 50 50" and running with "+RTS -DS". This gives:

bling: internal error: ASSERTION FAILED: file rts/sm/Storage.c, line 697

    (GHC version 7.4.1 for x86_64_unknown_linux)
    Please report this as a GHC bug:  http://www.haskell.org/ghc/reportabug

Program received signal SIGABRT, Aborted.
0x0000003ee1235925 in raise () from /lib64/libc.so.6
Missing separate debuginfos, use: debuginfo-install glibc-2.15-56.fc17.x86_64 gmp-5.0.2-6.fc17.x86_64 libffi-3.0.10-2.fc17.x86_64 zlib-1.2.5-7.fc17.x86_64
(gdb) bt
#0  0x0000003ee1235925 in raise () from /lib64/libc.so.6
#1  0x0000003ee12370d8 in abort () from /lib64/libc.so.6
#2  0x0000000000d6b95f in rtsFatalInternalErrorFn ()
#3  0x0000000000d6b597 in barf ()
#4  0x0000000000d6b5fa in _assertFail ()
#5  0x0000000000d7f017 in allocate ()
#6  0x0000000000d832a0 in stg_newArrayzh ()
#7  0x0000000000000000 in ?? ()

Changed 20 months ago by waldheinz

self-contained example

comment:2 Changed 20 months ago by waldheinz

I managed to create a self-contained example which reproduces the problem. Compiling with -debug and running with +RTS -DS triggers the same exception in rts/sm/Storage.c as the original code:

convert
bling: internal error: ASSERTION FAILED: file rts/sm/Storage.c, line 697

    (GHC version 7.4.1 for x86_64_unknown_linux)
    Please report this as a GHC bug:  http://www.haskell.org/ghc/reportabug
Abgebrochen (Speicherabzug geschrieben)

This always happens in the "convert" step (see the trace in the code), which applies gvFreeze to the intermediate working set.

comment:3 Changed 20 months ago by simonmar

  • Difficulty set to Unknown
  • Milestone set to 7.6.1
  • Owner set to simonmar
  • Priority changed from normal to high

I have a fix in the pipeline for this.

comment:4 Changed 20 months ago by marlowsd@…

commit 8aabe8d06f7202c9a6cd1133e0b1ebc81338eed9

Author: Simon Marlow <marlowsd@gmail.com>
Date:   Tue Aug 28 15:52:38 2012 +0100

    Fix fencepost and byte/word bugs in cloneArray/copyArray (#7185)

 compiler/cmm/CmmUtils.hs       |    5 +++--
 compiler/codeGen/CgPrimOp.hs   |   36 ++++++++++++++++++++++--------------
 compiler/codeGen/StgCmmPrim.hs |   33 +++++++++++++++++++++------------
 3 files changed, 46 insertions(+), 28 deletions(-)

comment:5 Changed 20 months ago by simonmar

  • Status changed from new to merge

comment:6 Changed 20 months ago by pcapriotti

  • Resolution set to fixed
  • Status changed from merge to closed
Note: See TracTickets for help on using tickets.