Opened 8 years ago

Closed 5 years ago

#951 closed bug (wontfix)

stage2 on sparc dies with "schedule: re-entered unsafely"

Reported by: duncan Owned by: benl
Priority: normal Milestone: 6.10.2
Component: Build System Version: 6.6
Keywords: Cc: duncan.coutts@…
Operating System: Unknown/Multiple Architecture: sparc
Type of failure: Difficulty: Unknown
Test Case: N/A Blocked By:
Blocking: Related Tickets:

Description

Building a registerised GHC-6.6 on Sparc Solaris or Sparc Linux gives a stage2 with the following problem:

$ ghc --version
ghc-6.6: schedule: re-entered unsafely.
   Perhaps a 'foreign import unsafe' should be 'safe'?

This happens for Christian Maeder on Solaris and for me on Linux. It does not seem to be related to SPLIT_OBJS= in the mk/build.mk.

Christian reports that it works for him with gcc-4.0.3 but not gcc-3.4.4, though that was not the only thing different between the two configurations:

It works with gcc_4.0.3_s10 on
"SunOS leo 5.10 Generic_118833-20 sun4u sparc SUNW,Sun-Fire-280R"

It crashes as above on
"SunOS cni 5.10 Generic_118833-24 sun4u sparc SUNW,Sun-Fire-V240"
with gcc_3.4.4_s10.

My results do not contradict this as I am using gcc-3.4.6 on Sparc Linux.

Attachments (2)

cni-warnings.txt (2.5 KB) - added by maeder@… 8 years ago.
Warning messages of failing build
leo-warnings.txt (1.4 KB) - added by maeder@… 8 years ago.
warning messages of successful build

Download all attachments as: .zip

Change History (14)

comment:1 Changed 8 years ago by simonmar

  • Milestone set to 6.6.1

This should be debuggable. First try to find a program that isn't ghc and still crashes - start with testsuite/tests/ghc-regress/codeGen. You'll need to run them the threaded1 way: make WAY=threaded1.

When you find a test that crashes, run it with +RTS -Ds. Take a look with gdb and see if you can see where the cap->in_haskell field is not being set as it should be - there are only two places in the RTS where it gets set to true, both in rts/Schedule.c.

Changed 8 years ago by maeder@…

Warning messages of failing build

Changed 8 years ago by maeder@…

warning messages of successful build

comment:2 Changed 8 years ago by maeder@…

I could reproduce the error on leo (both machines) by using gcc_3.4.4_s10, so I suppose gcc is the problem (and not NumCPU or something else).

comment:3 Changed 7 years ago by igloo

  • Test Case set to N/A

comment:4 Changed 7 years ago by maeder@…

  • Cc duncan.coutts@… added

I was able to locally install gcc-4.0.3 (with -enable-language=c only) and create a working stage2 compiler (on our machine named cni)!

N.B. The old compiler gcc_3.4.4 that caused "schedule: re-entered unsafely" even generates a seg-faulting stage2 compiler if "-threaded" is commented out in the compiler Makefiles.

I've now idea why these gcc versions behave that different.

comment:5 Changed 7 years ago by maeder@…

if I use

SRC_HC_OPTS += -threaded -debug -optl-L/usr/local/lib -optl-lbfd -optl-liberty

in compiler/Makefile.ghcbin then the error is gone.

comment:6 Changed 7 years ago by guest

Without -threaded in compiler/Makefile.ghcbin gdb shows:

Starting program: /home/maeder/haskell/V240-solaris/ghc/compiler/stage2/ghc-6.7
warning: Lowest section in /lib/libdl.so.1 is .dynamic at 00000094

Program received signal SIGSEGV, Segmentation fault.
0x00da4ca8 in stg_ap_v_fast ()

-debug without -threaded also works

comment:7 Changed 7 years ago by igloo

  • Milestone changed from 6.6.1 to _|_

Set milestone to _|_ as this is a registerised bug in a non-actively-maintained port.

comment:8 Changed 6 years ago by simonmar

  • Operating System changed from Multiple to Unknown/Multiple

comment:9 Changed 5 years ago by simonmar

  • Component changed from Runtime System to Build System
  • Milestone changed from _|_ to 6.10.2

We should update the building guide to say that gcc 4.x is required when building on Sparc, and point to this bug.

comment:10 Changed 5 years ago by benl

  • Owner set to benl
  • Status changed from new to assigned

comment:11 Changed 5 years ago by duncan

I tried using ghc-6.8.3 and gcc-3.4.3 (the Solaris 10 /usr/sfw/bin/gcc). The build went through fine but the stage2/ghc-inplace just hangs when run. GDB reports that we're waiting on something:

(gdb) bt
#0  0xff044a30 in __lwp_park () from /lib/libc.so.1
#1  0xff03e968 in cond_sleep_queue () from /lib/libc.so.1
#2  0xff03ea84 in cond_wait_queue () from /lib/libc.so.1
#3  0xff03f004 in cond_wait () from /lib/libc.so.1
#4  0xff03f040 in pthread_cond_wait () from /lib/libc.so.1
#5  0x018ef654 in waitCondition ()
#6  0x018e52c4 in waitForReturnCapability ()
#7  0x018e86c4 in rts_lock ()
#8  0x018e7de0 in real_main ()
#9  0x018e7f64 in main ()

Relinking stage2/ghc-6.8.3 without -threaded we get a segfault instead. GDB reports that it occurs in stg_ap_v_fast:

(gdb) bt
#0  0x019033b4 in stg_ap_v_fast ()
#1  0x018e9a74 in scheduleWaitThread ()
#2  0x018e6b6c in real_main ()
#3  0x018e6cc8 in main ()

Relinking stage2 with the debug rts seems to make it work ok. At least it does not die on startup. In fact stage2/ghc-inplace --interactive also works in that it can eval 1+1.

Anyway, the conclusion does seem to be that this version of gcc does not like the rts. It's interesting that the debug one works. Does it get built with less aggressive gcc optimisations perhaps?

For the moment I think I'd just recommend that we make ghc's ./configure reject gcc-3.x on Sparc platforms. We may well also like to warn if people are using gcc-4.2 as it is known to be very slow. We've not tested 4.3 because the mangler didn't like it last time we tried.

comment:12 Changed 5 years ago by benl

  • Resolution set to wontfix
  • Status changed from assigned to closed

See Solaris building guide at Building/Solaris

Note: See TracTickets for help on using tickets.