Opened 10 months ago

Closed 6 months ago

#9391 closed bug (fixed)

LLVM 3.2 crash (AVX messes up GHC calling convention)

Reported by: scpmw Owned by:
Priority: normal Milestone: 7.10.1
Component: Compiler (LLVM) Version: 7.8.2
Keywords: Cc:
Operating System: MacOS X Architecture: Unknown/Multiple
Type of failure: Runtime crash Test Case:
Blocked By: Blocking:
Related Tickets: Differential Revisions:

Description

I stumbled across the problem of LLVM 3.2 builds seemingly randomly crashing on my Mac. After quite a bit of investigation I found that the source of the problem was somewhere in the base library (span to be precise), where it encountered Cmm like follows:

       cp7n:
           if ((Sp + -56) < SpLim) goto cp8c; else goto cp8d;
       ...
       cp8d:
           I64[Sp - 40] = block_cp7g_info;
           _smKL::P64 = P64[R1 + 7];
           _smKO::P64 = P64[R1 + 15];
           _smKP::P64 = P64[R1 + 23];
           _smL0::P64 = P64[R1 + 31];
           R1 = R2;
           P64[Sp - 32] = _smKL::P64;
           P64[Sp - 24] = _smKO::P64;
           P64[Sp - 16] = _smKP::P64;
           P64[Sp - 8] = _smL0::P64;
           Sp = Sp - 40;
           if (R1 & 7 != 0) goto up8R; else goto cp7h;
       up8R:
           call block_cp7g_info(R1) args: 0, res: 0, upd: 0;
       ...

which leads LLVM 3.2 to produce the following assembly:

_smL1_info:                             ## @smL1_info
## BB#0:                                ## %cp7n
        pushq   %rbp
        movq    %rsp, %rbp
        movq    %r14, %rax
        movq    %rbp, %rcx
        leaq    -56(%rcx), %rdx
        cmpq    %r15, %rdx
        jae     LBB160_1
...
LBB160_1:                               ## %cp8d
        leaq    _cp7g_info(%rip), %rdx
        movq    %rdx, -40(%rcx)
        vmovups 7(%rbx), %ymm0
        vmovups %ymm0, -32(%rcx)
        addq    $-40, %rcx
        testb   $7, %al
        je      LBB160_4
## BB#2:                                ## %up8R
        movq    %rcx, %rbp
        movq    %rax, %rbx
        popq    %rbp
        vzeroupper
        jmp     _cp7g_info              ## TAILCALL

So here LLVM has figured out that it can use vmovups in order to move 4 words at the same time. However, there is a puzzling side effect: All of sudden we have a pushq %rbp at the start of the function with a matching popq %rbp at the very end. This overwrites the stack pointer update (movq %rcx, %rbp) and - unsurprisingly - causes the program to crash rather quickly.

My interpretation is that LLVM 3.2 erroneously thinks that AVX instructions are incompatible with frame pointer elimination. The reasoning is that this is exactly the kind of code LLVM generates if we disable this "optimisation" (--disable-fp-elim). Furthermore, disabling AVX instructions (-mattr=-avx) fixes the problem - LLVM falls back to the less efficient movups, with pushq $rbp vanishing as well. Finally, this bug seems to happen exactly with LLVM 3.2, with 3.3 upwards generating correct code.

My proposed fix would be to add -mattr=-avx to the llc command line by default for LLVM 3.2. This issue might be related to #7694.

Attachments (2)

repro.ll (2.5 KB) - added by scpmw 10 months ago.
LLVM code that triggers the bug
avx-llvm-32.patch (2.4 KB) - added by scpmw 10 months ago.
Proposed fix (... with correct trac ID)

Download all attachments as: .zip

Change History (9)

Changed 10 months ago by scpmw

LLVM code that triggers the bug

Changed 10 months ago by scpmw

Proposed fix (... with correct trac ID)

comment:1 Changed 9 months ago by rwbarton

  • Status changed from new to patch

comment:2 Changed 9 months ago by thoughtpolice

  • Status changed from patch to infoneeded

Comments:

For the check on line 1444 for ver == 32, why is this last? If we only issue a warning, then it will simply match on isAvxEnabled dflags (line 1443) and still pass in the wrong arguments. Would it not be more correct to have this branch checked first and instead turn off AVX completely?

Last edited 9 months ago by thoughtpolice (previous) (diff)

comment:3 Changed 9 months ago by scpmw

My reasoning was that if the user explicitly requests AVX using -mavx, it makes sense not to override it. After all, it is a relatively uncommon bug, and there might be ways to work around it for a given module. I might be overthinking this...

comment:4 Changed 7 months ago by thomie

  • Milestone set to 7.10.1
  • Status changed from infoneeded to patch

scpmw's explanation sounds reasonable.

Does GHC actually support LLVM 3.2? For example, in https://ghc.haskell.org/trac/ghc/ticket/9555#comment:7, rwbarton says:

There are known problems with LLVM 3.2 and GHC, can you try with 3.3 or 3.4?

comment:5 Changed 7 months ago by scpmw

LLVM 3.2 isn't blacklisted, so I think we are still "supporting" it, at least on paper. The comment might be referring to #7694? It's quite possible that this is actually the fix for that issue - crashes at virtually random places sounds about right.

All I can say is that after applying the patch I was able to run the full test suite using LLVM 3.2. Without it, we get at least a dozen segfaults in there.

comment:6 Changed 6 months ago by Austin Seipp <austin@…>

In c557f991a9fa6f6afad4850e4f5db6a08fa1cb6c/ghc:

Disable AVX for LLVM 3.2 by default (#9391)

Due to a bug LLVM generates a C-like frame pointer prelude for functions
that use AVX instructions. This causes programs using the GHC calling
convention to crash, therefore we simply disable them. People that want
to use AVX should consider upgrading to a more current LLVM version.

Signed-off-by: Austin Seipp <[email protected]>

comment:7 Changed 6 months ago by thoughtpolice

  • Resolution set to fixed
  • Status changed from patch to closed

OK, merged. Thanks Peter.

Note: See TracTickets for help on using tickets.