Opened 7 years ago

Last modified 7 months ago

#2805 new bug

Test ffi009(ghci) fails on PPC Mac OS X

Reported by: thorkilnaur Owned by:
Priority: lowest Milestone: 7.12.1
Component: GHCi Version: 6.11
Keywords: Cc: pho@…, dterei
Operating System: MacOS X Architecture: powerpc
Type of failure: None/Unknown Test Case: ffi009(ghci)
Blocked By: Blocking:
Related Tickets: Differential Revisions:

Description

The test ffi009(ghci) has failed for a while on PPC Msc OS X (http://darcs.haskell.org/buildbot/all/builders/tnaur%20PPC%20OSX%20head%202/builds/156/steps/runtestsuite/logs/unexpected):

=====> ffi009(ghci)
cd ./ffi/should_run && '/Volumes/tn18_HD_1/Users/thorkilnaur/tn/buildbot/ghc/tnaur-ppc-osx-2/tnaur-ppc-osx-head-2/build/ghc/stage2-inplace/ghc' -fforce-recomp -dcore-lint -dcmm-lint -Dpowerpc_apple_darwin  -dno-debug-output ffi009.hs --interactive -v0 -ignore-dot-ghci  -fglasgow-exts <ffi009.genscript 1>ffi009.interp.stdout 2>ffi009.interp.stderr
/bin/sh: line 1: 98633 Illegal instruction     '/Volumes/tn18_HD_1/Users/thorkilnaur/tn/buildbot/ghc/tnaur-ppc-osx-2/tnaur-ppc-osx-head-2/build/ghc/stage2-inplace/ghc' -fforce-recomp -dcore-lint -dcmm-lint -Dpowerpc_apple_darwin -dno-debug-output ffi009.hs --interactive -v0 -ignore-dot-ghci -fglasgow-exts < ffi009.genscript > ffi009.interp.stdout 2> ffi009.interp.stderr
Wrong exit code (expected 0 , actual 132 )
Stdout:
Testing 5 Int arguments...
True
True
True
True
True
True
True
True
True
True
Testing 11 Double arguments...

Stderr:

*** unexpected failure for ffi009(ghci)

An extract from the so-called crash report indicates a jump into the wild:

Exception Type:  EXC_BAD_INSTRUCTION (SIGILL)
Exception Codes: 0x0000000000000002, 0x00000000027ffd04
Crashed Thread:  2
...
Thread 2 Crashed:
0   ???                                 0x027ffd04 0 + 41942276
1   ghc                                 0x012da320 setThreadLocalVar + 16
2   ghc                                 0x012fa87c ffi_call_DARWIN + 204 (darwin.S:131)
3   ghc                                 0x012fa3a0 ffi_call + 208 (ffi_darwin.c:457)
4   ghc                                 0x012cacb8 interpretBCO + 4984
5   ghc                                 0x012d46d0 schedule + 1024
6   ghc                                 0x012d4d84 workerStart + 84
7   libSystem.B.dylib                   0x9292f658 _pthread_start + 316

When the test is run with a ghc built with GhcDebugged=YES (see http://hackage.haskell.org/trac/ghc/wiki/Building/Hacking and mk/config.mk), an assertion failure is reported instead:

=====> ffi009(ghci)
cd . && '/Users/thorkilnaur/tn/GHCDarcsRepository/ghc-HEAD-complete-for-pulling-and-copying-20070713_1212/ghc/ghc/stage2-inplace/ghc' -fforce-recomp -dcore-lint -dcmm-lint -Dpowerpc_apple_darwin  -dno-debug-output ffi009.hs --interactive -v0 -ignore-dot-ghci  -fglasgow-exts <ffi009.genscript 1>ffi009.interp.stdout 2>ffi009.interp.stderr
/bin/sh: line 1: 43988 Abort trap              '/Users/thorkilnaur/tn/GHCDarcsRepository/ghc-HEAD-complete-for-pulling-and-copying-20070713_1212/ghc/ghc/stage2-inplace/ghc' -fforce-recomp -dcore-lint -dcmm-lint -Dpowerpc_apple_darwin -dno-debug-output ffi009.hs --interactive -v0 -ignore-dot-ghci -fglasgow-exts < ffi009.genscript > ffi009.interp.stdout 2> ffi009.interp.stderr
Wrong exit code (expected 0 , actual 134 )
Stdout:

Stderr:
ffi009: internal error: ASSERTION FAILED: file Linker.c, line 4380

    (GHC version 6.11.20081121 for powerpc_apple_darwin)
    Please report this as a GHC bug:  http://www.haskell.org/ghc/reportabug

*** unexpected failure for ffi009(ghci)

The assertion failure is reported from this context in Linker.c:

                    if(reloc->r_pcrel)
                    {
#ifdef powerpc_HOST_ARCH
                            // In the .o file, this should be a relative jump to NULL
                            // and we'll change it to a relative jump to the symbol
                        ASSERT(word + reloc->r_address == 0);
                        jumpIsland = (unsigned long)
                                        &makeSymbolExtra(oc,
                                                         reloc->r_symbolnum,
                                                         (unsigned long) symbolAddress)
                                         -> jumpIsland;
                        if(jumpIsland != 0)
                        {
                            offsetToJumpIsland = word + jumpIsland
                                - (((long)image) + sect->offset - sect->addr);
                        }
#endif
                        word += (unsigned long) symbolAddress
                                - (((long)image) + sect->offset - sect->addr);
                    }

The relocations leading to the assertion failure are required by branch instructions generated by gcc for ffi009_stub.c that contains expressions of the form symbol+constant (where symbol is an external symbol) whose distance to the instruction needs to be packed into a 24-bit field. An example is

        bl saveFP+56 ; save f28-f31

and there are actually 4 cases like this in the code generated by gcc for ffi009_stub.c.

This problem does not appear particularly easy to solve: The mechanism used when such a branch needs to address code that cannot be addressed using a 24-bit relative address is to create so-called jump islands, which are small, close-by pieces of code that (hopefully, but see #1845) *can* be reached using 24-bit relative addressing. The branch is changed to address the jump island which, in turn, constructs the actual 32-bit address and branches to it. Currently, however, this mechanism, for the PPC Mac OS X architecture, is limited to a single jump island per external symbol and is not capable of handling the addressing of external symbols with constants added to them. Handling the adding of a constant is doable, I believe, but the problem is that the same external symbol may appear multiple times with different constants added. For example, in addition to the above case, the code for ffi009_stub.c also includes

        bl saveFP+28 ; save f21-f31

which would require creating two jump islands for the single symbol saveFP.

Possible solutions:

  1. Make a special case out of the specific symbols concerned here. This would involve creating a limited list of different jump islands for these symbols, to be used when different constants were added.
  2. Generalize, somehow, the present jump island mechanism to allow more flexibility. It is undoubtably possible to do this, but it does not seem to be particularly easy to do.
  3. The -mlongjump option actually causes gcc to replace the critical relative brach instructions by inline code, at the expense of generating longer and potentially slower code for all calls and possibly other branches as well. And if we try this, we get:
    ffi009: internal error: 
    unknown relocation 13
        (GHC version 6.11.20081121 for powerpc_apple_darwin)
        Please report this as a GHC bug:  http://www.haskell.org/ghc/reportabug
    
    (relocation type 13 is PPC_RELOC_JBSR) so Linker.c needs to be extended to handle this type also.

Any advice on how to proceed in this matter, additional ideas and views, would be most welcome.

Best regards Thorkil

Change History (16)

comment:1 Changed 7 years ago by simonmar

  • difficulty set to Unknown

Good analysis Thorkil!

I expect the quickest way to workaround this problem is to implement the missing relocation type in the Linker and use -mlongjump option to gcc. But presumably this could apply to any compiled C code that we need to load up into GHCi, not just stub files?

comment:2 Changed 7 years ago by thorkilnaur

You are right: The limitations in Linker.c (for example, that it is unable to handle relocations of external symbol+constant) apply to any compiled code that we try to load up into GHCi. So, for example, to trigger this problem, we could code some C function with lots of double arguments and try to call that function from Haskell using GHCi. And to work around this, assuming we had implemented the PPC_RELOC_JBSR relocation type, we would require the C function to be compiled with -mlongcall.

I have implemented rudimentary PPC_RELOC_JBSR support that simply always uses the branch island and with -optc-mlongcall added to the ffi009 test case, the test succeeds. To complete this workaround, I would suggest that we change the ASSERT(word + reloc->r_address == 0); into an actual error message that, perhaps, advices the use of the -mlongcall option.

But I am still uncertain about which direction to take here. Using the -mlongcall option solves the problem in the present case, but there is no guarantee that it will continue to do so in the future. The fact that the bl xxxxFP+yy instructions are replaced by inline code when using -mlongcall is not documented, as far as I have been able to tell. Some other mechanism could be used in later gcc versions. In addition, man gcc says:

       -mlongcall
           ...
           In the future, we may cause GCC to ignore all longcall specifica-
           tions when the linker is known to generate glue.

where the "glue" is code, like the jump islands generated by Linker.c, to enable branching with a 24-bit relative address to reach any 32-bit address via a branch island. So, ultimately, we may have to do this anyway.

Another idea which I have not looked into at all would be to try to use the linker itself to do all these complex things, instead of having to duplicate the functionality ourselves.

As before, any advice, comments, views, new ideas about how to proceed with this are most welcome.

Best regards Thorkil

comment:3 Changed 7 years ago by igloo

  • Milestone set to 6.10 branch

comment:4 Changed 6 years ago by igloo

  • Milestone changed from 6.10 branch to 6.12 branch

Low priority as not a tier 1 arch.

comment:5 Changed 5 years ago by igloo

  • Milestone changed from 6.12 branch to 6.12.3

comment:6 Changed 5 years ago by igloo

  • Milestone changed from 6.12.3 to 6.14.1
  • Priority changed from normal to low

comment:7 Changed 5 years ago by PHO

  • Cc pho@… added
  • Type of failure set to None/Unknown

comment:8 Changed 5 years ago by igloo

  • Milestone changed from 7.0.1 to 7.0.2

comment:9 Changed 4 years ago by igloo

  • Milestone changed from 7.0.2 to 7.2.1

comment:10 Changed 4 years ago by dterei

  • Cc dterei added

comment:11 Changed 4 years ago by igloo

  • Milestone changed from 7.2.1 to 7.4.1

comment:12 Changed 3 years ago by igloo

  • Milestone changed from 7.4.1 to 7.6.1
  • Priority changed from low to lowest

comment:13 Changed 3 years ago by igloo

  • Milestone changed from 7.6.1 to 7.6.2

comment:14 Changed 13 months ago by thoughtpolice

  • Milestone changed from 7.6.2 to 7.10.1

Moving to 7.10.1.

comment:15 Changed 7 months ago by thoughtpolice

  • Milestone changed from 7.10.1 to 7.12.1

Moving to 7.12.1 milestone; if you feel this is an error and should be addressed sooner, please move it back to the 7.10.1 milestone.

comment:16 Changed 7 months ago by thoughtpolice

Moving to 7.12.1 milestone; if you feel this is an error and should be addressed sooner, please move it back to the 7.10.1 milestone.

Note: See TracTickets for help on using tickets.