Opened 3 years ago

Closed 3 years ago

Last modified 22 months ago

#9920 closed bug (fixed)

Segfault in arm binary with llvm 3.5

Reported by: erikd Owned by: erikd
Priority: normal Milestone: 7.10.2
Component: Compiler Version: 7.10.1
Keywords: Cc: bgamari, rwbarton, juhpetersen, erikd
Operating System: Unknown/Multiple Architecture: arm
Type of failure: Runtime crash Test Case:
Blocked By: Blocking:
Related Tickets: Differential Rev(s):
Wiki Page:

Description

Using ghc HEAD (6713f0d9a) I compile a very simple program which segfaults immediately. Stepping through it using GDB I find its crashing on the instruction

0x3f5a98 <stg_init_finish$def+12>               ldr    r5, [r5]

and that just before this instruction the value of r5 is zero. That means its trying to load into r5 the value at the address in r5. Obviously that's going to segfault.

Attachments (1)

arm-lower-tail-calls.patch (6.7 KB) - added by erikd 3 years ago.
Patch against llvm-3.5 that fixes the problem.

Download all attachments as: .zip

Change History (23)

comment:1 Changed 3 years ago by erikd

Version: 7.97.11

comment:2 Changed 3 years ago by erikd

This problem occurs both with and without the two gold linker patches from #9873.

comment:3 Changed 3 years ago by erikd

The function stg_init_finish is defined in rts/StgStartup.cmm as:

stg_init_finish /* no args: explicit stack layout */
{
  jump StgReturn [];
}

and the generated assembly (from a different binary) looks like:

0x211fc4 <stg_init_finish$def>           ldr    r5, [r4, #792]  ; 0x318
0x211fc8 <stg_init_finish$def+4>         ldr    r0, [r5], #4  

but when I step through the code, it looks like the first of these two instructions are not executed. Howver, if I set a breakpoint at address 0x211fc4 it does indeed halt there and executing that instruction loads a value of 0 into r5.

Last edited 3 years ago by erikd (previous) (diff)

comment:4 Changed 3 years ago by bgamari

For the record this is built with LLVM 3.5.

comment:5 Changed 3 years ago by erikd

With the help of @bgamari and @rwbarton, we found that function stg_init_finish ends up being zero length so that it and function stg_init have the same address.

comment:6 Changed 3 years ago by erikd

Cc: bgamari rwbarton added

Captured the various tmp files when compiling rts/StgStartup.cmm. The disassembled llvm byte code for stg_init_finish and stg_init look like this:

; Function Attrs: nounwind
define cc10 void @"stg_init_finish$def"(i32* noalias nocapture %Base_Arg
                , i32* noalias nocapture %Sp_Arg, i32* noalias nocapture %Hp_Arg
                , i32 %R1_Arg, i32 %R2_Arg, i32 %R3_Arg
                , i32 %R4_Arg, i32 %SpLim_Arg) #0 align 4 {
cF:
  tail call cc10 void bitcast (i8* @StgReturn to void
                   (i32*, i32*, i32*, i32, i32, i32, i32, i32)*)(i32* %Base_Arg
                   , i32* %Sp_Arg, i32* %Hp_Arg, i32 %R1_Arg, i32 undef
                   , i32 undef, i32 undef, i32 %SpLim_Arg) #0
  ret void
}

; Function Attrs: nounwind
define cc10 void @"stg_init$def"(i32* noalias nocapture %Base_Arg
                , i32* noalias nocapture readnone %Sp_Arg
                , i32* noalias nocapture %Hp_Arg, i32 %R1_Arg, i32 %R2_Arg
                , i32 %R3_Arg, i32 %R4_Arg, i32 %SpLim_Arg) #0 align 4 {
cH:
  %ln5z = getelementptr inbounds i32* %Base_Arg, i32 198

....

which is fine, but when that gets run through llc we get the following assembly code:

	.text
	.globl	stg_init_finish$def
	.align	2
	.type	stg_init_finish$def,%function
stg_init_finish$def:                    @ @"stg_init_finish$def"
	.fnstart
.Leh_func_begin7:
@ BB#0:                                 @ %cF
	
.Ltmp7:
	.size	stg_init_finish$def, .Ltmp7-stg_init_finish$def
	.cantunwind
	.fnend

	.globl	stg_init$def
	.align	2
	.type	stg_init$def,%function
stg_init$def:                           @ @"stg_init$def"
	.fnstart
.Leh_func_begin8:
@ BB#0:                                 @ %cH
	ldr	r5, [r4, #792]
	ldr	r0, [r5], #4
	
.Ltmp8:
	.size	stg_init$def, .Ltmp8-stg_init$def
	.cantunwind
	.fnend

For some reason llc is dropping the actual body of the function stg_init_finish.

comment:7 Changed 3 years ago by erikd

Managed to reduce the input .ll file to about 30 lines of code containing just the functions stg_init_finish$def and stg_init$def. If I remove the cc10 calling convention (which is as I understand it, only used by GHC) from the LLVM IR code then the stg_init_finish$def function no longer has zero instructions.

Same problem with llc from llvm version 3.6. Llvm version 3.2 doesn't compile this code.

Last edited 3 years ago by erikd (previous) (diff)

comment:8 Changed 3 years ago by erikd

llc from llvm git HEAD (3681929e116d9b 2014/12/24) seems to work and produces the following assembly language:

stg_init_finish$def:                    @ @"stg_init_finish$def"
        .fnstart
.Leh_func_begin0:
@ BB#0:                                 @ %cF
        b       StgReturn
.Ltmp0:
        .size   stg_init_finish$def, .Ltmp0-stg_init_finish$def

which seems correct.

However, using llvm from git HEAD requires changes to the metdata definitons from this:

!0 = metadata !{metadata !1, metadata !1, i64 0}

to this:

!0 = !{!1, !1, i64 0}

comment:9 Changed 3 years ago by erikd

Seems to have been fixed (in LLVM git) by:

commit f7f88095a32d1ac5bc7778204fd9a37a9fb8082c
Author: Tim Northover <tnorthover@apple.com>
Date:   Mon Dec 1 17:46:39 2014 +0000

    ARM: lower tail calls correctly when using GHC calling convention.
    
    Patch by Ben Gamari.
Last edited 3 years ago by erikd (previous) (diff)

comment:10 Changed 3 years ago by erikd

If I compile my test file test.ll with llvm-3.5 compiled from source I get:

llc: /home/erikd/LLVM/llvm-3.5.0.src/lib/Target/ARM/InstPrinter/../ARMGenAsmWriter.inc:6048:
    void llvm::ARMInstPrinter::printInstruction(const llvm::MCInst *, llvm::raw_ostream &):
    Assertion `Bits != 0 && "Cannot print this instruction."' failed.
0  llc             0x0000000001391255 llvm::sys::PrintStackTrace(_IO_FILE*) + 37
1  llc             0x0000000001391a43
2  libpthread.so.0 0x00007ffa635718d0
3  libc.so.6       0x00007ffa6259e107 gsignal + 55
4  libc.so.6       0x00007ffa6259f4e8 abort + 328
5  libc.so.6       0x00007ffa62597226
6  libc.so.6       0x00007ffa625972d2
7  llc             0x0000000000a3df99 llvm::ARMInstPrinter::printInstruction(llvm::MCInst
                                      const*, llvm::raw_ostream&) + 17673
8  llc             0x0000000000a49052 llvm::ARMInstPrinter::printInst(llvm::MCInst const*,
                                      llvm::raw_ostream&, llvm::StringRef) + 4322
9  llc             0x00000000013171f5
10 llc             0x00000000008fa01c
11 llc             0x0000000000dbbba0 llvm::AsmPrinter::EmitFunctionBody() + 3840
12 llc             0x00000000008ebad6
13 llc             0x0000000000ea44fc llvm::MachineFunctionPass::runOnFunction(llvm::Function&) + 124
14 llc             0x00000000012c8cbb llvm::FPPassManager::runOnFunction(llvm::Function&) + 539
15 llc             0x00000000012c8f2b llvm::FPPassManager::runOnModule(llvm::Module&) + 43
16 llc             0x00000000012c94a7 llvm::legacy::PassManagerImpl::run(llvm::Module&) + 967
17 llc             0x00000000005926ea main + 6682
18 libc.so.6       0x00007ffa6258ab45 __libc_start_main + 245
19 llc             0x000000000058f62d
Stack dump:
0.      Program arguments: /home/erikd/LLVM/3.5/bin/llc -O3 -relocation-model=static
                --enable-tbaa=true -mattr=+v7,+vfp3 -float-abi=hard test.ll -o test.s 
1.      Running pass 'Function Pass Manager' on module 'test.ll'.
2.      Running pass 'ARM Assembly / Object Emitter' on function '@"stg_init_finish$def"'

This does not happen with llvm-3.5 installed from Debian.

Changed 3 years ago by erikd

Attachment: arm-lower-tail-calls.patch added

Patch against llvm-3.5 that fixes the problem.

comment:11 Changed 3 years ago by erikd

If I grab the llvm 3.5 release source tarball, apply the attached patch named arm-lower-tail-calls.patch​ to the llvm sources, build them and the copy the new llc binary to /usr/bin/llc-3.5 on my Debian system, I then get myself a working amd64-linux to armhf-linux cross compiler.

comment:12 Changed 3 years ago by erikd

Summary: Segfault in arm binarySegfault in arm binary with llvm 3.5

comment:13 Changed 3 years ago by juhpetersen

Cc: juhpetersen added

comment:14 Changed 3 years ago by erikd

Resolution: fixed
Status: newclosed

LLVM 3.6 has been released with @bgamari's arm-lower-tail-calls patch and git HEAD now also expects llvm-3.6. I've been compiling GHC git HEAD on arm without this problem for weeks now.

comment:15 Changed 3 years ago by erikd

Milestone: 7.10.2
Resolution: fixed
Status: closednew
Version: 7.117.10.1

Re-opening this because it affects the 7.10 branch when used with upstream llvm-3.5.0 (llvm-3.5.1 works correctly).

Debian Testing and Unstable currently ship upstream llvm-3.5.0 with some patches as Debian version 1:3.5-10. This Debian version is is not capable to compiling GHC code to run on Arm. A bug has been raised against Debian's llvm-3.5 as https://bugs.debian.org/cgi-bin/bugreport.cgi?bug=782868 along with a request to roll a new version with the arm-lower-tail-calls.patch patch applied.

I am also working on a configure time test to detect this problem in the llc command that I am hoping will be in the 7.10.2 release.

Last edited 3 years ago by erikd (previous) (diff)

comment:16 Changed 3 years ago by erikd

Cc: erikd added
Owner: set to erikd

comment:17 Changed 3 years ago by erikd

Resolution: fixed
Status: newclosed

Detection of this probelms was added to the ghc-7.10 branch in:

commit b856f3f3d7850ca0456dd80aaa59241b3d297ab9
Author: Erik de Castro Lopo <erikd@mega-nerd.com>
Date:   Sat Apr 25 08:27:49 2015 +0200

    configure: Test for #9920 when compiling for arm
    
    The ghc-7.10 branch requires use of llvm-3.5, but the llvm-3.5.0
    release had a bug that was fixed in llvm-3.5.1.
    
    When we are targeting arm, test for this bug in the llvm program
    `llc` during confgure and if present, abort configuration with
    an informative error message.
    
    Signed-off-by: Erik de Castro Lopo <erikd@mega-nerd.com>
    
    Differential Revision: https://phabricator.haskell.org/D857

comment:18 Changed 3 years ago by erikd

Version 3.5.2 of llvm went into Debian Unstable on 2015/04/30 and will likely hit Debian Testing in 2-3 weeks.

If anyone hits this in Debian Stable they shold reopen https://bugs.debian.org/cgi-bin/bugreport.cgi?bug=782868 asking for a Debian Stable backport of the bug fix.

comment:19 Changed 2 years ago by bgamari

Indeed, LLVM 3.5.0-10 is very much unusable on ARM; sadly this is what Debian Jessie appears to ship with. 3.5.2 appears to work fine. We probably should have merged that configure check into ghc-7.10 but it's water under the bridge now.

comment:20 Changed 22 months ago by nomeata

We probably should have merged that configure check into ghc-7.10 but it's water under the bridge now.

JFTR, the configure script made it into ghc-7.10.3, at least I can see it at

checking for llc-3.4... /usr/bin/llc-3.4
checking for opt-3.4... /usr/bin/opt-3.4
checking whether bootstrap compiler is affected by bug 9439... no
checking if llvm version is affected by bug 9920... yes

configure: error: in `/«PKGBUILDDIR»':
configure: error: Cannot compile for ARM with llc-3.5. See GHC trac ticket #9920.
See `config.log' for more details
debian/rules:50: recipe for target 'override_dh_auto_configure' failed

https://buildd.debian.org/status/fetch.php?pkg=ghc&arch=armel&ver=7.10.3-6~bpo8%2B1&stamp=1453205456

comment:21 Changed 22 months ago by andrewufrank

i installed ghc 7.10.3 from jessie-backports (together with the corresponding LLVM 1:3.5.2.3~bpo8+2) and get for the minimal main = putStrLn "hello" program the error message {{{testthree: schedule: re-entered unsafely.

Perhaps a 'foreign import unsafe' should be 'safe'?

}}

i am not certain if this is related: cabal install for the same minimal program (with a minimal cabal file) hangs. cabal install -v gives

/home/frank/.cabal/setup-exe-cache/setup-Simple-Cabal-1.22.5.0-arm-linux-ghc-7.10.3
configure --verbose=2 --ghc --prefix=/home/frank/.cabal
--bindir=/home/frank/.cabal/bin --libdir=/home/frank/.cabal/lib
--libsubdir=arm-linux-ghc-7.10.3/testthree-0.0.6-3AgAf9bDkWJGZdbGLvO875
--libexecdir=/home/frank/.cabal/libexec --datadir=/home/frank/.cabal/share
--datasubdir=arm-linux-ghc-7.10.3/testthree-0.0.6
--docdir=/home/frank/.cabal/share/doc/arm-linux-ghc-7.10.3/testthree-0.0.6
--htmldir=/home/frank/.cabal/share/doc/arm-linux-ghc-7.10.3/testthree-0.0.6/html
--haddockdir=/home/frank/.cabal/share/doc/arm-linux-ghc-7.10.3/testthree-0.0.6/html
--sysconfdir=/home/frank/.cabal/etc --user
--extra-prog-path=/home/frank/.cabal/bin
--dependency=base=base-4.8.2.0-2f1f71a7fcf013cd47fd21f489064f9a
--disable-tests --exact-configuration --disable-benchmarks
Redirecting build log to {handle: /home/frank/.cabal/logs/testthree-0.0.6.log}

any comment how this can be fixed? thank you - a running 7.10.3 for armhf is a great achievement!

comment:22 Changed 22 months ago by rwbarton

Maybe look at #11190, rather than here?

Note: See TracTickets for help on using tickets.