Opened 4 years ago

Closed 3 years ago

#4318 closed bug (fixed)

Crash while building HEAD on OS X

Reported by: gwright Owned by:
Priority: normal Milestone: 7.2.1
Component: Compiler Version: 6.13
Keywords: Cc: pho@…
Operating System: MacOS X Architecture: x86_64 (amd64)
Type of failure: Building GHC failed Difficulty:
Test Case: Blocked By:
Blocking: Related Tickets:

Description

Building HEAD on OS X 10.6.3 using 6.10.4 as a bootstrap results in a crash. The 6.10.4 compiler was built 64 bit using MacPorts?.

The crash happens quite late in the build, in stage2. Here's the end of the build log:

"inplace/bin/ghc-stage2"   -H32m -O -Wall  -H64m -O0 -v -keep-tmp-files    -package-name dph-seq-0.4.0 -hide-all-packages -i -ilibraries/dph/dph-seq/../dph-common -ilibraries/dph/dph-seq/dist-install/build -ilibraries/dph/dph-seq/dist-install/build/autogen -Ilibraries/dph/dph-seq/dist-install/build -Ilibraries/dph/dph-seq/dist-install/build/autogen -Ilibraries/dph/dph-seq/.    -optP-include -optPlibraries/dph/dph-seq/dist-install/build/autogen/cabal_macros.h -package array-0.3.0.0 -package base-4.3.0.0 -package dph-base-0.4.0 -package dph-prim-seq-0.4.0 -package ghc-6.13.20100904 -package ghc-prim-0.2.0.0 -package random-1.0.0.2 -package template-haskell-2.4.0.0  -Odph -funbox-strict-fields -fcpr-off -fdph-this -package-name dph-seq -XTypeFamilies -XGADTs -XRankNTypes -XBangPatterns -XMagicHash -XUnboxedTuples -XTypeOperators -no-user-package-conf -rtsopts -O2 -XGenerics -O -dcore-lint -fno-warn-deprecated-flags -Wwarn    -odir libraries/dph/dph-seq/dist-install/build -hidir libraries/dph/dph-seq/dist-install/build -stubdir libraries/dph/dph-seq/dist-install/build -hisuf hi -osuf  o -hcsuf hc -c libraries/dph/dph-seq/../dph-common/Data/Array/Parallel/Lifted/PArray.hs -o libraries/dph/dph-seq/dist-install/build/Data/Array/Parallel/Lifted/PArray.o
Glasgow Haskell Compiler, Version 6.13.20100904, for Haskell 98, stage 2 booted by GHC version 6.10.4
Using binary package database: /Users/gwright/tmp/ghc/inplace/lib/package.conf.d/package.cache
wired-in package ghc-prim mapped to ghc-prim-0.2.0.0-inplace
wired-in package integer-gmp mapped to integer-gmp-0.2.0.0-inplace
wired-in package base mapped to base-4.3.0.0-inplace
wired-in package rts mapped to builtin_rts
wired-in package haskell98 mapped to haskell98-1.0.1.1-inplace
wired-in package template-haskell mapped to template-haskell-2.4.0.0-inplace
wired-in package dph-seq mapped to dph-seq-0.4.0-inplace
wired-in package dph-par mapped to dph-par-0.4.0-inplace
Hsc static flags: -fcpr-off -static
Created temporary directory: /var/folders/3v/3vsAKkKAGBOxE02+vfDFS++++TI/-Tmp-/ghc53820_0
*** C pre-processor:
/usr/bin/gcc -E -undef -traditional -v -I libraries/dph/dph-seq/dist-install/build -I libraries/dph/dph-seq/dist-install/build -I libraries/dph/dph-seq/dist-install/build/autogen -I libraries/dph/dph-seq/. -I /Users/gwright/tmp/ghc/compiler/../libffi/build/include -I /Users/gwright/tmp/ghc/compiler/stage2 -I /Users/gwright/tmp/ghc/compiler/../libraries/base/cbits -I /Users/gwright/tmp/ghc/compiler/../libraries/base/include -I /Users/gwright/tmp/ghc/compiler/. -I /Users/gwright/tmp/ghc/compiler/parser -I /Users/gwright/tmp/ghc/compiler/utils -I /Users/gwright/tmp/ghc/libraries/bytestring/include -I /Users/gwright/tmp/ghc/libraries/process/include -I /Users/gwright/tmp/ghc/libraries/directory/include -I /Users/gwright/tmp/ghc/libraries/unix/include -I /Users/gwright/tmp/ghc/libraries/old-time/include -I /Users/gwright/tmp/ghc/libraries/containers/include -I /Users/gwright/tmp/ghc/libraries/dph/dph-prim-interface/interface -I /Users/gwright/tmp/ghc/libraries/dph/dph-base/include -I /Users/gwright/tmp/ghc/libraries/time/include -I /Users/gwright/tmp/ghc/libraries/array/include -I /Users/gwright/tmp/ghc/libraries/base/include -I /Users/gwright/tmp/ghc/includes -I /Users/gwright/tmp/ghc/libffi/dist-install/build -D__HASKELL1__=5 -D__GLASGOW_HASKELL__=613 -D__HASKELL98__ -D__CONCURRENT_HASKELL__ -Ddarwin_BUILD_OS=1 -Dx86_64_BUILD_ARCH=1 -Ddarwin_HOST_OS=1 -Dx86_64_HOST_ARCH=1 -U __PIC__ -D__PIC__ -include libraries/dph/dph-seq/dist-install/build/autogen/cabal_macros.h -x c libraries/dph/dph-seq/../dph-common/Data/Array/Parallel/Lifted/PArray.hs -o /var/folders/3v/3vsAKkKAGBOxE02+vfDFS++++TI/-Tmp-/ghc53820_0/ghc53820_0.hscpp
Using built-in specs.
Target: i686-apple-darwin10
Configured with: /var/tmp/gcc/gcc-5664~38/src/configure --disable-checking --enable-werror --prefix=/usr --mandir=/share/man --enable-languages=c,objc,c++,obj-c++ --program-transform-name=/^[cg][^.-]*$/s/$/-4.2/ --with-slibdir=/usr/lib --build=i686-apple-darwin10 --program-prefix=i686-apple-darwin10- --host=x86_64-apple-darwin10 --target=i686-apple-darwin10 --with-gxx-include-dir=/include/c++/4.2.1
Thread model: posix
gcc version 4.2.1 (Apple Inc. build 5664)
 /usr/libexec/gcc/i686-apple-darwin10/4.2.1/cc1 -E -traditional-cpp -quiet -v -I libraries/dph/dph-seq/dist-install/build -I libraries/dph/dph-seq/dist-install/build -I libraries/dph/dph-seq/dist-install/build/autogen -I libraries/dph/dph-seq/. -I /Users/gwright/tmp/ghc/compiler/../libffi/build/include -I /Users/gwright/tmp/ghc/compiler/stage2 -I /Users/gwright/tmp/ghc/compiler/../libraries/base/cbits -I /Users/gwright/tmp/ghc/compiler/../libraries/base/include -I /Users/gwright/tmp/ghc/compiler/. -I /Users/gwright/tmp/ghc/compiler/parser -I /Users/gwright/tmp/ghc/compiler/utils -I /Users/gwright/tmp/ghc/libraries/bytestring/include -I /Users/gwright/tmp/ghc/libraries/process/include -I /Users/gwright/tmp/ghc/libraries/directory/include -I /Users/gwright/tmp/ghc/libraries/unix/include -I /Users/gwright/tmp/ghc/libraries/old-time/include -I /Users/gwright/tmp/ghc/libraries/containers/include -I /Users/gwright/tmp/ghc/libraries/dph/dph-prim-interface/interface -I /Users/gwright/tmp/ghc/libraries/dph/dph-base/include -I /Users/gwright/tmp/ghc/libraries/time/include -I /Users/gwright/tmp/ghc/libraries/array/include -I /Users/gwright/tmp/ghc/libraries/base/include -I /Users/gwright/tmp/ghc/includes -I /Users/gwright/tmp/ghc/libffi/dist-install/build -imultilib x86_64 -D__DYNAMIC__ -D__HASKELL1__=5 -D__GLASGOW_HASKELL__=613 -D__HASKELL98__ -D__CONCURRENT_HASKELL__ -Ddarwin_BUILD_OS=1 -Dx86_64_BUILD_ARCH=1 -Ddarwin_HOST_OS=1 -Dx86_64_HOST_ARCH=1 -U __PIC__ -D__PIC__ -include libraries/dph/dph-seq/dist-install/build/autogen/cabal_macros.h libraries/dph/dph-seq/../dph-common/Data/Array/Parallel/Lifted/PArray.hs -o /var/folders/3v/3vsAKkKAGBOxE02+vfDFS++++TI/-Tmp-/ghc53820_0/ghc53820_0.hscpp -fPIC -mmacosx-version-min=10.6.4 -m64 -mtune=core2 -undef
ignoring nonexistent directory "/usr/local/include"
ignoring nonexistent directory "/usr/lib/gcc/i686-apple-darwin10/4.2.1/../../../../i686-apple-darwin10/include"
ignoring duplicate directory "libraries/dph/dph-seq/dist-install/build"
ignoring duplicate directory "/Users/gwright/tmp/ghc/libraries/base/include"
#include "..." search starts here:
#include <...> search starts here:
 libraries/dph/dph-seq/dist-install/build
 libraries/dph/dph-seq/dist-install/build/autogen
 libraries/dph/dph-seq/.
 /Users/gwright/tmp/ghc/compiler/../libffi/build/include
 /Users/gwright/tmp/ghc/compiler/stage2
 /Users/gwright/tmp/ghc/compiler/../libraries/base/cbits
 /Users/gwright/tmp/ghc/compiler/../libraries/base/include
 /Users/gwright/tmp/ghc/compiler/.
 /Users/gwright/tmp/ghc/compiler/parser
 /Users/gwright/tmp/ghc/compiler/utils
 /Users/gwright/tmp/ghc/libraries/bytestring/include
 /Users/gwright/tmp/ghc/libraries/process/include
 /Users/gwright/tmp/ghc/libraries/directory/include
 /Users/gwright/tmp/ghc/libraries/unix/include
 /Users/gwright/tmp/ghc/libraries/old-time/include
 /Users/gwright/tmp/ghc/libraries/containers/include
 /Users/gwright/tmp/ghc/libraries/dph/dph-prim-interface/interface
 /Users/gwright/tmp/ghc/libraries/dph/dph-base/include
 /Users/gwright/tmp/ghc/libraries/time/include
 /Users/gwright/tmp/ghc/libraries/array/include
 /Users/gwright/tmp/ghc/includes
 /Users/gwright/tmp/ghc/libffi/dist-install/build
 /usr/lib/gcc/i686-apple-darwin10/4.2.1/include
 /usr/include
 /System/Library/Frameworks (framework directory)
 /Library/Frameworks (framework directory)
End of search list.
*** Checking old interface for dph-seq:Data.Array.Parallel.Lifted.PArray:
*** Parser:
*** Renamer/typechecker:
WARNING: file compiler/typecheck/TcTyFuns.lhs line 318
[Wanted t_a1qa :: Data.Array.Parallel.Lifted.PArray.PRepr a_a1pZ
                    ~
                  Data.Array.Parallel.Lifted.PArray.PRepr a_a1pZ]
WARNING: file compiler/typecheck/TcTyFuns.lhs line 318
[Wanted t_a1rk :: Data.Array.Parallel.Lifted.PArray.PRepr a_a1rc
                    ~
                  Data.Array.Parallel.Lifted.PArray.PRepr a_a1rc,
 Wanted t_a1rp :: Data.Array.Parallel.Lifted.PArray.PRepr a_a1rc
                    ~
                  Data.Array.Parallel.Lifted.PArray.PRepr a_a1rc]
WARNING: file compiler/typecheck/TcTyFuns.lhs line 318
[Wanted t_a1s9 :: Data.Array.Parallel.Lifted.PArray.PRepr a_a1s1
                    ~
                  Data.Array.Parallel.Lifted.PArray.PRepr a_a1s1,
 Wanted t_a1se :: Data.Array.Parallel.Lifted.PArray.PRepr a_a1s1
                    ~
                  Data.Array.Parallel.Lifted.PArray.PRepr a_a1s1]
WARNING: file compiler/typecheck/TcTyFuns.lhs line 318
[Wanted t_a1sY :: Data.Array.Parallel.Lifted.PArray.PRepr a_a1sQ
                    ~
                  Data.Array.Parallel.Lifted.PArray.PRepr a_a1sQ]
WARNING: file compiler/typecheck/TcTyFuns.lhs line 318
[Wanted t_a1tw :: Data.Array.Parallel.Lifted.PArray.PRepr a_a1to
                    ~
                  Data.Array.Parallel.Lifted.PArray.PRepr a_a1to,
 Wanted t_a1tB :: Data.Array.Parallel.Lifted.PArray.PRepr a_a1to
                    ~
                  Data.Array.Parallel.Lifted.PArray.PRepr a_a1to]
WARNING: file compiler/typecheck/TcTyFuns.lhs line 318
[Wanted t_a1ul :: Data.Array.Parallel.Lifted.PArray.PRepr a_a1ud
                    ~
                  Data.Array.Parallel.Lifted.PArray.PRepr a_a1ud,
 Wanted t_a1uq :: Data.Array.Parallel.Lifted.PArray.PRepr a_a1ud
                    ~
                  Data.Array.Parallel.Lifted.PArray.PRepr a_a1ud]
WARNING: file compiler/typecheck/TcTyFuns.lhs line 318
[Wanted t_a1va :: Data.Array.Parallel.Lifted.PArray.PRepr a_a1v2
                    ~
                  Data.Array.Parallel.Lifted.PArray.PRepr a_a1v2]
WARNING: file compiler/typecheck/TcTyFuns.lhs line 318
[Wanted t_a1vI :: Data.Array.Parallel.Lifted.PArray.PRepr a_a1vA
                    ~
                  Data.Array.Parallel.Lifted.PArray.PRepr a_a1vA]
WARNING: file compiler/typecheck/TcTyFuns.lhs line 318
[Wanted t_a1wi :: Data.Array.Parallel.Lifted.PArray.PRepr a_a1w8
                    ~
                  Data.Array.Parallel.Lifted.PArray.PRepr a_a1w8]
WARNING: file compiler/typecheck/TcTyFuns.lhs line 318
[Wanted t_a1wW :: Data.Array.Parallel.Lifted.PArray.PRepr a_a1wI
                    ~
                  Data.Array.Parallel.Lifted.PArray.PRepr a_a1wI]
WARNING: file compiler/typecheck/TcTyFuns.lhs line 318
[Wanted t_a1xw :: Data.Array.Parallel.Lifted.PArray.PRepr a_a1xl
                    ~
                  Data.Array.Parallel.Lifted.PArray.PRepr a_a1xl]
WARNING: file compiler/typecheck/TcTyFuns.lhs line 318
[Wanted t_a1y6 :: Data.Array.Parallel.Lifted.PArray.PRepr a_a1xV
                    ~
                  Data.Array.Parallel.Lifted.PArray.PRepr a_a1xV]
*** Simplify:
*** CorePrep:
*** ByteCodeGen:
Loading package ghc-prim ... linking ... done.
Loading package integer-gmp ... linking ... done.
Loading package base ... linking ... done.
Loading package array-0.3.0.0 ... linking ... done.
Loading package containers-0.4.0.0 ... linking ... done.
Loading package filepath-1.2.0.0 ... linking ... done.
Loading package old-locale-1.0.0.2 ... linking ... done.
Loading package old-time-1.0.0.5 ... linking ... done.
Loading package unix-2.4.0.1 ... linking ... done.
Loading package directory-1.0.1.2 ... linking ... done.
Loading package pretty-1.0.1.1 ... linking ... done.
Loading package process-1.0.1.3 ... linking ... done.
Loading package Cabal-1.9.2 ... linking ... done.
Loading package bytestring-0.9.1.7 ... linking ... done.
Loading package binary-0.5.0.2 ... linking ... done.
Loading package bin-package-db-0.0.0.0 ... linking ... done.
Loading package hpc-0.5.0.5 ... linking ... done.
Loading package template-haskell ... linking ... done.
Loading package ghc-6.13.20100904 ... linking ... done.
Loading package time-1.2.0.3 ... linking ... done.
Loading package random-1.0.0.2 ... linking ... done.
Loading package dph-base-0.4.0 ... linking ... done.
Loading package dph-prim-interface-0.4.0 ... linking ... done.
Loading package dph-prim-seq-0.4.0 ... linking ... done.
Loading package ffi-1.0 ... linking ... done.
make[1]: *** [libraries/dph/dph-seq/dist-install/build/Data/Array/Parallel/Lifted/PArray.o] Segmentation fault
make: *** [all] Error 2

The above is from running the validate script with configuration
file validate.mk:

#
# Override some validate settings
#

WERROR =
SRC_HC_OPTS += -v -keep-tmp-files
#GhcStage1HcOpts += -DDEBUG
GhcStage2HcOpts += -DDEBUG -ddump-stg
GhcThreaded = NO
#GhcStage2HcOpts += -DDEBUG -ddump-simpl
#EXTRA_CABAL_CONFIGURE_FLAGS += --verbose=3
HADDOCK_DOCS = NO

Gdb and the Crash Reporter log shows that proximate cause to be trying to jump to location 0x00000000. The last instruction executed is jmp *0(%rbp), but the stack pointer %rbp is pointing past the end of the valid stack.

I used the attached gdb_ghc file run the crashing compilation under
gdb (source gdb_ghc). To sneak up on the crash I did the following:

redwing-dakota> cat gdb_trace 
redwing-apache:ghc gwright$ gdb
GNU gdb 6.3.50-20050815 (Apple version gdb-1469) (Wed May  5 04:36:56 UTC 2010)
Copyright 2004 Free Software Foundation, Inc.
GDB is free software, covered by the GNU General Public License, and you are
welcome to change it and/or distribute copies of it under certain conditions.
Type "show copying" to see the conditions.
There is absolutely no warranty for GDB.  Type "show warranty" for details.
This GDB was configured as "x86_64-apple-darwin".
(gdb) source gdbinit
(gdb) source gdb_ghc
Reading symbols for shared libraries .... done
(gdb) break GarbageCollect
Breakpoint 1 at 0x101ef3371
(gdb) ignore 1 13
Will ignore next 13 crossings of breakpoint 1.
(gdb) cont
The program is not being run.
(gdb) run
Starting program: /Users/gwright/tmp/ghc/inplace/lib/ghc-stage2 +RTS -V0 -i0 -RTS -B/Users/gwright/tmp/ghc/inplace/lib -pgmc /usr/bin/gcc -pgma /usr/bin/gcc -pgml /usr/bin/gcc -pgmP "/usr/bin/gcc -E -undef -traditional" -Wall -H512m -O0 -keep-tmp-files -package-name dph-seq-0.4.0 -hide-all-packages -i -ilibraries/dph/dph-seq/../dph-common -ilibraries/dph/dph-seq/dist-install/build -ilibraries/dph/dph-seq/dist-install/build/autogen -Ilibraries/dph/dph-seq/dist-install/build -Ilibraries/dph/dph-seq/dist-install/build/autogen -Ilibraries/dph/dph-seq/.    -optP-include -optPlibraries/dph/dph-seq/dist-install/build/autogen/cabal_macros.h -package array-0.3.0.0 -package base-4.3.0.0 -package dph-base-0.4.0 -package dph-prim-seq-0.4.0 -package ghc-6.13.20100904 -package ghc-prim-0.2.0.0 -package random-1.0.0.2 -package template-haskell-2.4.0.0  -Odph -funbox-strict-fields -fcpr-off -fdph-this -package-name dph-seq -XTypeFamilies -XGADTs -XRankNTypes -XBangPatterns -XMagicHash -XUnboxedTuples -XTypeOperators -no-user-package-conf -rtsopts -O2 -XGenerics -O -dcore-lint -fno-warn-deprecated-flags -Wwarn    -odir libraries/dph/dph-seq/dist-install/build -hidir libraries/dph/dph-seq/dist-install/build -stubdir libraries/dph/dph-seq/dist-install/build -hisuf hi -osuf  o -hcsuf hc -c libraries/dph/dph-seq/../dph-common/Data/Array/Parallel/Lifted/PArray.hs -o libraries/dph/dph-seq/dist-install/build/Data/Array/Parallel/Lifted/PArray.o
Reading symbols for shared libraries +++. done
WARNING: file compiler/typecheck/TcTyFuns.lhs line 318
[Wanted t_a1qa :: Data.Array.Parallel.Lifted.PArray.PRepr a_a1pZ
                    ~
                  Data.Array.Parallel.Lifted.PArray.PRepr a_a1pZ]
WARNING: file compiler/typecheck/TcTyFuns.lhs line 318
[Wanted t_a1rk :: Data.Array.Parallel.Lifted.PArray.PRepr a_a1rc
                    ~
                  Data.Array.Parallel.Lifted.PArray.PRepr a_a1rc,
 Wanted t_a1rp :: Data.Array.Parallel.Lifted.PArray.PRepr a_a1rc
                    ~
                  Data.Array.Parallel.Lifted.PArray.PRepr a_a1rc]
WARNING: file compiler/typecheck/TcTyFuns.lhs line 318
[Wanted t_a1s9 :: Data.Array.Parallel.Lifted.PArray.PRepr a_a1s1
                    ~
                  Data.Array.Parallel.Lifted.PArray.PRepr a_a1s1,
 Wanted t_a1se :: Data.Array.Parallel.Lifted.PArray.PRepr a_a1s1
                    ~
                  Data.Array.Parallel.Lifted.PArray.PRepr a_a1s1]
WARNING: file compiler/typecheck/TcTyFuns.lhs line 318
[Wanted t_a1sY :: Data.Array.Parallel.Lifted.PArray.PRepr a_a1sQ
                    ~
                  Data.Array.Parallel.Lifted.PArray.PRepr a_a1sQ]
WARNING: file compiler/typecheck/TcTyFuns.lhs line 318
[Wanted t_a1tw :: Data.Array.Parallel.Lifted.PArray.PRepr a_a1to
                    ~
                  Data.Array.Parallel.Lifted.PArray.PRepr a_a1to,
 Wanted t_a1tB :: Data.Array.Parallel.Lifted.PArray.PRepr a_a1to
                    ~
                  Data.Array.Parallel.Lifted.PArray.PRepr a_a1to]
WARNING: file compiler/typecheck/TcTyFuns.lhs line 318
[Wanted t_a1ul :: Data.Array.Parallel.Lifted.PArray.PRepr a_a1ud
                    ~
                  Data.Array.Parallel.Lifted.PArray.PRepr a_a1ud,
 Wanted t_a1uq :: Data.Array.Parallel.Lifted.PArray.PRepr a_a1ud
                    ~
                  Data.Array.Parallel.Lifted.PArray.PRepr a_a1ud]
WARNING: file compiler/typecheck/TcTyFuns.lhs line 318
[Wanted t_a1va :: Data.Array.Parallel.Lifted.PArray.PRepr a_a1v2
                    ~
                  Data.Array.Parallel.Lifted.PArray.PRepr a_a1v2]
WARNING: file compiler/typecheck/TcTyFuns.lhs line 318
[Wanted t_a1vI :: Data.Array.Parallel.Lifted.PArray.PRepr a_a1vA
                    ~
                  Data.Array.Parallel.Lifted.PArray.PRepr a_a1vA]
WARNING: file compiler/typecheck/TcTyFuns.lhs line 318
[Wanted t_a1wi :: Data.Array.Parallel.Lifted.PArray.PRepr a_a1w8
                    ~
                  Data.Array.Parallel.Lifted.PArray.PRepr a_a1w8]
WARNING: file compiler/typecheck/TcTyFuns.lhs line 318
[Wanted t_a1wW :: Data.Array.Parallel.Lifted.PArray.PRepr a_a1wI
                    ~
                  Data.Array.Parallel.Lifted.PArray.PRepr a_a1wI]
WARNING: file compiler/typecheck/TcTyFuns.lhs line 318
[Wanted t_a1xw :: Data.Array.Parallel.Lifted.PArray.PRepr a_a1xl
                    ~
                  Data.Array.Parallel.Lifted.PArray.PRepr a_a1xl]
WARNING: file compiler/typecheck/TcTyFuns.lhs line 318
[Wanted t_a1y6 :: Data.Array.Parallel.Lifted.PArray.PRepr a_a1xV
                    ~
                  Data.Array.Parallel.Lifted.PArray.PRepr a_a1xV]

Breakpoint 1, 0x0000000101ef3371 in GarbageCollect ()
(gdb) break stg_ap_p_info
Breakpoint 2 at 0x101f07868
(gdb) ignore 2 23059
Will ignore next 23059 crossings of breakpoint 2.
(gdb) cont
Continuing.
Loading package ghc-prim ... linking ... done.
Loading package integer-gmp ... linking ... done.
Loading package base ... linking ... done.
Loading package array-0.3.0.0 ... linking ... done.
Loading package containers-0.4.0.0 ... linking ... done.
Loading package filepath-1.2.0.0 ... linking ... done.
Loading package old-locale-1.0.0.2 ... linking ... done.
Loading package old-time-1.0.0.5 ... linking ... done.
Loading package unix-2.4.0.1 ... linking ... done.
Loading package directory-1.0.1.2 ... linking ... done.
Loading package pretty-1.0.1.1 ... linking ... done.
Loading package process-1.0.1.3 ... linking ... done.
Loading package Cabal-1.9.2 ... linking ... done.
Loading package bytestring-0.9.1.7 ... linking ... done.
Loading package binary-0.5.0.2 ... linking ... done.
Loading package bin-package-db-0.0.0.0 ... linking ... done.
Loading package hpc-0.5.0.5 ... linking ... done.
Loading package template-haskell ... linking ... done.
Loading package ghc-6.13.20100904 ... linking ... done.
Loading package time-1.2.0.3 ... linking ... done.
Loading package random-1.0.0.2 ... linking ... done.
Loading package dph-base-0.4.0 ... linking ... done.
Loading package dph-prim-interface-0.4.0 ... linking ... done.
Loading package dph-prim-seq-0.4.0 ... linking ... done.
Loading package ffi-1.0 ... linking ... done.

Breakpoint 2, 0x0000000101f07868 in stg_ap_p_info ()
(gdb) break stg_ap_pp_info
Breakpoint 3 at 0x101f081f8
(gdb) ignore 3 75
Will ignore next 75 crossings of breakpoint 3.
(gdb) cont
Continuing.

Breakpoint 3, 0x0000000101f081f8 in stg_ap_pp_info ()
(gdb) s
Single stepping until exit from function stg_ap_pp_info, 
which has no line number information.

Program received signal EXC_BAD_ACCESS, Could not access memory.
Reason: KERN_INVALID_ADDRESS at address: 0x0000000000000000
0x0000000000000000 in ?? ()
(gdb) 

I backed up one call to stg_ap_pp_info and stepped through the instructions to get the following synthetic backtrace:

stg_ap_pp_info
ghczm6zi13zi20100904_DynFlags_zdfEqDynFlagzuzdczeze_info
ssBd_info
snRA_info
ssBe_info
snRB_info
sYE_info
sP6_info
sYD_info
s9nN_info
sdjh_info
sbyG_info
stg_ap_0_fast
stg_AP_info
stg_yield_to_interpreter
stg_returnToSchedNotPaused
StgReturn
scheduleWaitThread
dyld_stub___error
__error
scheduleWaitThread
stopHeapProfTimer
scheduleWaitThread
startHeapProfTimer
scheduleWaitThread
dyld_stub___error
__error
scheduleWaitThread
dirty_TSO
scheduleWaitThread
interpretBCO
scheduleWaitThread
dyld_stub___error
__error
scheduleWaitThread
stopHeapProfTimer
scheduleWaitThread
startHeapProfTimer
scheduleWaitThread
dyld_stub___error
__error
scheduleWaitThread
dirty_TSO
scheduleWaitThread
StgRunIsImplementedInAssembler
stg_returnToStackTop
stg_enter_info
stg_ap_pp_info
??
stg_upd_frame_info
sdoK_info
??
sdoL_info
base_DataziTypeable_typeOf_info
stg_ap_0_fast
??
newDynCAF
??

I discovered later that the fastest way to zero in on the crash was to set a breakpoint at newDynCAF. The crash occurs immediately after the first invocation of newDynCAF in the program.

The symbols sdoK_info, sdoL_info and sbyG_info occur in a number of the saved temporary files, but they are common to the file generated by compiling TcSplice.lhs. So perhaps that has something to do with it.

I'm also suspicious that the crash happens right after the first invocation of newDynCAF, but that is mere suspicion right now.

The crash itself is here:

Breakpoint 1, 0x0000000101ef7d50 in newDynCAF ()
(gdb) display/i $rip 
1: x/i $rip  0x101ef7d50 <newDynCAF>:	mov    (%rsi),%rax
(gdb) si
0x0000000101ef7d53 in newDynCAF ()
1: x/i $rip  0x101ef7d53 <newDynCAF+3>:	mov    %rax,0x18(%rsi)
(gdb) 
0x0000000101ef7d57 in newDynCAF ()
1: x/i $rip  0x101ef7d57 <newDynCAF+7>:	mov    0x33ca72(%rip),%rax        # 0x1022347d0 <revertible_caf_list>
(gdb) 
0x0000000101ef7d5e in newDynCAF ()
1: x/i $rip  0x101ef7d5e <newDynCAF+14>:	mov    %rax,0x10(%rsi)
(gdb) 
0x0000000101ef7d62 in newDynCAF ()
1: x/i $rip  0x101ef7d62 <newDynCAF+18>:	mov    %rsi,0x33ca67(%rip)        # 0x1022347d0 <revertible_caf_list>
(gdb) 
0x0000000101ef7d69 in newDynCAF ()
1: x/i $rip  0x101ef7d69 <newDynCAF+25>:	retq   
(gdb) 
0x0000000126ec63dd in ?? ()
1: x/i $rip  0x126ec63dd:	lea    -0x8(%r12),%rax
(gdb) 
0x0000000126ec63e2 in ?? ()
1: x/i $rip  0x126ec63e2:	mov    %rax,0x8(%rbx)
(gdb) 
0x0000000126ec63e6 in ?? ()
1: x/i $rip  0x126ec63e6:	mov    0x2bc30a3(%rip),%rax        # 0x129a89490
(gdb) 
0x0000000126ec63ed in ?? ()
1: x/i $rip  0x126ec63ed:	mov    %rax,(%rbx)
(gdb) 
0x0000000126ec63f0 in ?? ()
1: x/i $rip  0x126ec63f0:	mov    0x2bc32a9(%rip),%rax        # 0x129a896a0
(gdb) 
0x0000000126ec63f7 in ?? ()
1: x/i $rip  0x126ec63f7:	mov    %rax,-0x10(%rbp)
(gdb) 
0x0000000126ec63fb in ?? ()
1: x/i $rip  0x126ec63fb:	lea    -0x8(%r12),%rax
(gdb) 
0x0000000126ec6400 in ?? ()
1: x/i $rip  0x126ec6400:	mov    %rax,-0x8(%rbp)
(gdb) 
0x0000000126ec6404 in ?? ()
1: x/i $rip  0x126ec6404:	lea    0x760abd(%rip),%rax        # 0x127626ec8
(gdb) 
0x0000000126ec640b in ?? ()
1: x/i $rip  0x126ec640b:	lea    0x1(%rax),%rbx
(gdb) 
0x0000000126ec640f in ?? ()
1: x/i $rip  0x126ec640f:	add    $0xfffffffffffffff0,%rbp
(gdb) 
0x0000000126ec6413 in ?? ()
1: x/i $rip  0x126ec6413:	jmpq   *0x0(%rbp)
(gdb) 
0x0000000000000000 in ?? ()
Disabling display 1 to avoid infinite recursion.
1: x/i $rip  0x0:	warning: Got an error handling event: "Cannot access memory at address 0x0".
(gdb) p8 $rbp
0x12345da30:	0x127627540
0x12345da28:	0x127626ec3
0x12345da20:	0x127627139
0x12345da18:	0x1003163a0 <sbyE_info>
0x12345da10:	0x127627540
0x12345da08:	0x101f07868 <stg_ap_p_info>
0x12345da00:	0x121bb3510
0x12345d9f8:	0x0
(gdb) 

This might be the same bug as one of the other OS X crashes, but the surface symptoms are different, so I'm making it a separate ticket.

Attachments (1)

gdb_ghc (1.5 KB) - added by gwright 4 years ago.
GDB commands to reproduce crash of HEAD build

Download all attachments as: .zip

Change History (23)

Changed 4 years ago by gwright

GDB commands to reproduce crash of HEAD build

comment:1 Changed 4 years ago by gwright

In the synthetic backtrace I gave, the crash occurs at the bottom. This is opposite from the usual way a C stack is shown in gdb.

comment:2 Changed 4 years ago by gwright

Another piece of information is that the crash occurs not long after the last library (ffi-1.0) is loaded. If I set breakpoints at loadObj and stg_ap_p_info before starting the program, when loading ffi-1.0 I've hit the stg_ap_p_info breakpoint 208833 times. The crash occurs after hitting stg_ap_p_info 209211 times. So the failure happens (relatively) quickly after the libraries are loaded.

comment:3 Changed 4 years ago by simonmar

I expect it's a bug in the linker. newDynCAF is a clue that you're running dynamically-loaded code, and it looks like the code has tried to jump to address zero: probably an address that should have been patched by the linker.

You can turn on verbose debug output in the linker by compiling stage2 with debugging on (cd ghc; rm stage2/build/tmp/ghc-stage2; make 2 GhcDebugged=YES), and then run it with +RTS -Dl. That will show you details about all the object files and relocations.

comment:4 Changed 4 years ago by PHO

  • Cc pho@… added

comment:5 Changed 4 years ago by gwright

This does appear to be a linker bug. I can reproduce it just by running ghc-stage2 --interactive. The incorrectly resolved symbol is _stg_bh_upd_frame_info. This address is pushed on the stack in _base_GHCziIOziHandleziFD_stdin_info (referenced in initInterpBuffering). _stg_bh_upd_frame_info is resolved as zero, and when the null pointer on the stack is dereferenced the crash occurs. In the simple case of just trying to start ghc-stage2 interactively, it happens when threadPaused scans the stack.

I'll try +RTS -Dl and report what I find.

comment:6 Changed 4 years ago by gwright

I rebuilt ghc-stage2 according to SimonM's instructions above. The additional output is meagre and unilluminating. I get

redwing-apache:ghc gwright$ inplace/bin/ghc-stage2 --interactive +RTS -Dl
GHCi, version 6.13.20100904: http://www.haskell.org/ghc/  :? for help
Loading package ghc-prim ... loadObj /Users/gwright/tmp/ghc/libraries/ghc-prim/dist-install/build/HSghc-prim-0.2.0.0.o
linking ... done.
Loading package integer-gmp ... loadObj /Users/gwright/tmp/ghc/libraries/integer-gmp/dist-install/build/HSinteger-gmp-0.2.0.0.o
linking ... done.
Loading package base ... addDLL: dll_name = 'libiconv.dylib'
internal_dlopen: dll_name = 'libiconv.dylib'
loadObj /Users/gwright/tmp/ghc/libraries/base/dist-install/build/HSbase-4.3.0.0.o
linking ... done.
Loading package ffi-1.0 ... loadObj /Users/gwright/tmp/ghc/libffi/dist-install/build/HSffi.o
linking ... done.
Segmentation fault

This isn't telling me much. Is there an enhanced interrogation method to make the linker tell us what it's been up to?

comment:7 Changed 4 years ago by simonmar

The author of the Mach-O linker was a bit light on the debugging output. I see a couple of debugBelch calls in there, but that's all. I think you'll need to add some more debug output to see what's going on, or debug it directly in gdb.

comment:8 Changed 4 years ago by igloo

  • Milestone set to 7.0.1

comment:9 Changed 4 years ago by gwright

At last some progress: there's more than one bug here, and I've located and fixed one of them. The check for symbol->n_value == 0 in relocateSection is just wrong. (In fact, according to the header file comments and my reading of the Mach-O object file format, it is simply nonsensical.) This is probably a remnant of the original reverse-engineering approach used to get the Mach-O linker going. I've fixed it to check the symbol->n_type flags correctly.

I still don't know why the RTS symbols are not being resolved. It's not a failure to find the RTS symbols, instead, it appears as if there is no attempt to look them up at all. I've added extensive debugging output to the Mach-O linker and will include them with my eventual patch.

comment:10 Changed 4 years ago by gwright

Now I know what the bug is: relocations of type X86_64_RELOC_GOT and X86_64_RELOC_GOT_LOAD are not handled by the Mach-O linker. There is a stub of code for these cases, but it does something nonsensical. I'm guessing it was simply never finished.

This doesn't seem too hard to fix, but it's fiddly. If I'm lucky and my current hunch works, maybe a patch in a couple of days. Otherwise, I'll need to try to understand Apple's ld64 code to understand these relocations in more detail.

comment:11 Changed 4 years ago by igloo

Good stuff, thanks Greg. If nothing else is possible in time, it would be good to make the linker fail (or at least warn) upon seeing one of these relocations.

comment:12 Changed 4 years ago by gwright

I have ghci from HEAD working, built 64 bit on OS X 10.6. At least it works well enough to define fact and fib at the prompt and get the right answers. (Certainly an improvement over a segfault or abort trap!)

I'll resync with HEAD and apply my patches, then run the testsuite. There may still be some other problems with the linker lurking, since there have clearly been paths through the code that were never exercised in the 64 bit Mach-O case.

The final change to fix the crash wasn't much; better to be lucky than smart.

comment:13 Changed 3 years ago by gwright

Okay, my changes to Linker.c merged into a fresh pull from 18 Oct 2010 are quite encouraging. Here's the results of the validate script (with stop on error disabled in mk/validate.mk):

OVERALL SUMMARY for test run started at Tue Oct 19 20:38:56 EDT 2010
    2610 total tests, which gave rise to
    9751 test cases, of which
       0 caused framework failures
    7444 were skipped

    2194 expected passes
      79 expected failures
       1 unexpected passes
      33 unexpected failures

Unexpected passes:
   simplrun006(optc)

Unexpected failures:
   1372(normal)
   1959(normal)
   2578(normal)
   T1969(normal)
   T3007(normal)
   T3245(normal)
   T3294(normal)
   T4059(normal)
   break001(ghci)
   break006(ghci)
   bug1465(normal)
   cabal01(normal)
   cabal04(normal)
   derefnull(normal)
   driver062a(normal)
   driver062b(normal)
   driver062c(normal)
   driver062d(normal)
   driver062e(normal)
   driver081a(normal)
   driver081b(normal)
   gadt23(normal)
   ghcpkg05(normal)
   hs-boot(normal)
   mod179(normal)
   outofmem(normal)
   print019(ghci)
   prog003(ghci)
   recomp004(normal)
   rn.prog006(normal)
   rtsOpts(normal)
   tcfail138(normal)
   withRtsOpts(normal)

-------------------------------------------------------------------
Oops!  Looks like you have some unexpected test results or framework failures.
Please fix them before pushing/sending patches.
-------------------------------------------------------------------

Before the changes to Linker.c, every ghci test ended in a segfault or abort trap.

I'll tidy the debug messages I added and send the patch upstream tomorrow. I'll also
run a full build and the testsuite.

comment:14 Changed 3 years ago by gwright

For completeness, the ghc version in the above was 7.1.20101018.

comment:15 Changed 3 years ago by gwright

The result of running the testsuite on the complete build:

OVERALL SUMMARY for test run started at Tue Oct 19 21:32:59 EDT 2010
    2610 total tests, which gave rise to
    9754 test cases, of which
       0 caused framework failures
    1864 were skipped

    7544 expected passes
     248 expected failures
       2 unexpected passes
      96 unexpected failures

Unexpected passes:
   simplrun006(optc,optasm)

Unexpected failures:
   1372(normal)
   1959(normal)
   2578(normal)
   4038(ghci)
   CPUTime001(optc,optasm,ghci,threaded2)
   IPRun(normal,optc,hpc,optasm,ghci,threaded1,threaded2)
   T1735(ghci)
   T1969(normal)
   T3007(normal)
   T3245(normal,optc,hpc,optasm,threaded1,threaded2)
   T3294(normal)
   T4059(normal)
   ThreadDelay001(threaded1,threaded2)
   apirecomp001(normal)
   arith005(ghci)
   arith012(ghci)
   arith015(ghci)
   break001(ghci)
   break006(ghci)
   bug1465(normal)
   cabal01(normal)
   cabal04(normal)
   cgrun014(ghci)
   cgrun034(ghci)
   cgrun044(ghci)
   cholewo-eval(ghci)
   derefnull(normal,optc,hpc,optasm,ghci,threaded1,threaded2)
   driver062a(normal)
   driver062b(normal)
   driver062c(normal)
   driver062d(normal)
   driver062e(normal)
   driver081a(normal)
   driver081b(normal)
   dynamic_flags_001(normal)
   gadt23(normal)
   ghcpkg05(normal)
   hClose003(threaded2)
   hpc_markup_multi_001(normal)
   hpc_markup_multi_002(normal)
   hpc_markup_multi_003(normal)
   hs-boot(normal,optc,hpc,optasm)
   mod179(normal)
   num010(ghci)
   numrun014(ghci)
   outofmem(normal)
   print019(ghci)
   process003(threaded2)
   prog003(ghci)
   rand001(ghci)
   readRun002(ghci)
   readRun003(ghci)
   recomp001(normal)
   recomp004(normal)
   recomp007(normal)
   rn.prog006(normal)
   rtsOpts(normal)
   showDouble(ghci)
   signals002(ghci)
   signals004(ghci,threaded1,threaded2)
   tc003(hpc)
   tcfail138(normal)
   tcrun020(normal,optc,hpc,optasm,ghci,threaded1,threaded2)
   withRtsOpts(normal)

More failed tests than showed up with the validate script. Overall, the situation doesn't look too bad, though.

comment:16 follow-up: Changed 3 years ago by simonmar

Looks good, though of course we should be aiming to get the failing tests to be exactly the same as the Linux builds. FYI last night's HEAD build results:

Unexpected failures:
   IndTypesPerf(normal)
   T1735(ghci)
   T1969(normal)
   T3294(normal)
   T3330a(normal)
   hpc_markup_multi_001(normal)
   hpc_markup_multi_002(normal)
   hpc_markup_multi_003(normal)
   tc003(hpc,profc,profasm)

comment:17 in reply to: ↑ 16 Changed 3 years ago by gwright

Replying to simonmar:

Looks good, though of course we should be aiming to get the failing tests to be exactly the same as the Linux builds. FYI last night's HEAD build results...

Toward that end, should I file a bug for each test failure separately, or make some attempt to bundle them into groups of related failures?

One bug per test case has the advantage of being easy to parcel out. Now that we can build 64 bit on OS X SL, we should be able to involve more people in bug hunts.

comment:18 Changed 3 years ago by simonmar

Often it's obvious when several tests are failing for the same reason - identical error messages for instance - in this case file a single ticket for all the failures. If you're not sure, or the error message is generic (e.g. segfault) then file separate tickets.

comment:19 Changed 3 years ago by igloo

  • Milestone changed from 7.0.1 to 7.0.2

comment:20 Changed 3 years ago by igloo

  • Milestone changed from 7.0.2 to 7.2.1

comment:21 Changed 3 years ago by gwright

This bug should be closed. The patch that fixed most of the problem (incorrect lookup of external symbols) was applied although I didn't attach the patch to this ticket. The most obvious remaining linker bug was repaired by #4867.

comment:22 Changed 3 years ago by igloo

  • Resolution set to fixed
  • Status changed from new to closed

Thanks, Greg.

Note: See TracTickets for help on using tickets.