Opened 3 years ago

Closed 2 years ago

Last modified 4 months ago

#7192 closed bug (fixed)

Bug in -fregs-graph with -fnew-codegen

Reported by: simonmar Owned by: benl
Priority: highest Milestone: 7.8.1
Component: Compiler Version: 7.7
Keywords: Cc: benl, pho@…
Operating System: Unknown/Multiple Architecture: Unknown/Multiple
Type of failure: None/Unknown Test Case:
Blocked By: Blocking: #4258
Related Tickets: Differential Revisions:

Description

This is triggered by running the test dph-diophantine-opt with -fnew-codegen. It segfaults for me on x86_64/Linux. I verified that adding -fno-regs-graph fixes it, as does -fllvm.

This failure is blocking the new code generator switchover. I don't want to break DPH, although perhaps we could disable -fregs-graph temporarily until this bug is fixed.

Here's the bit of code that is miscompiled. I think there are probably several instances of a similar sequence in the module, because I saw the failure happening in several different places.

Cmm code:

sT8r_info:
    {offset
      c1cnV:
          _sT8r::I64 = R1;
          _sT85::I64 = R2;
          _sT82::I64 = R3;
          _sT8m::I64 = R4;
          _sT87::I64 = R5;
          _sT8a::I64 = R6;
          if (Sp - 96 < SpLim) goto c1co3; else goto c1cnZ;
      c1cnZ:
          Hp = Hp + 16;
          if (Hp > HpLim) goto c1co2; else goto c1coR;
      c1co2:
          HpAlloc = 16;
          goto c1co3;
      c1co3:
          R1 = _sT8r::I64;
          I64[Sp - 8] = _sT8a::I64;
          I64[Sp - 16] = _sT87::I64;
          I64[Sp - 24] = _sT8m::I64;
          I64[Sp - 32] = _sT82::I64;
          I64[Sp - 40] = _sT85::I64;
          Sp = Sp - 40;
          call (stg_gc_fun)() args: 48, res: 0, upd: 8;

asm code:

   0x0000000000438f38 <+0>:     mov    %rbx,%r10    # R1 saved in %r10
   0x0000000000438f3b <+3>:     mov    %rdi,0x48(%rsp)
   0x0000000000438f40 <+8>:     mov    %r8,0x40(%rsp)
   0x0000000000438f45 <+13>:    mov    %r9,%rbx     # %rbx clobbered
   0x0000000000438f48 <+16>:    lea    -0x60(%rbp),%rax
   0x0000000000438f4c <+20>:    cmp    %r15,%rax
   0x0000000000438f4f <+23>:    jb     0x438ff5 <sT8r_info+189>
   ...
   0x0000000000438ff5 <+189>:   mov    %rbx,-0x8(%rbp)
   0x0000000000438ff9 <+193>:   mov    %r8,%rax
   0x0000000000438ffc <+196>:   mov    %rax,-0x10(%rbp)
   0x0000000000439000 <+200>:   mov    %rdi,%rax
   0x0000000000439003 <+203>:   mov    %rax,-0x18(%rbp)
   0x0000000000439007 <+207>:   mov    %rsi,-0x20(%rbp)
   0x000000000043900b <+211>:   mov    %r14,-0x28(%rbp)
   0x000000000043900f <+215>:   add    $0xffffffffffffffd8,%rbp
   0x0000000000439013 <+219>:   jmpq   *-0x8(%r13)

The problem is that the contents of %rbx (aka R1) has been lost at the jump. It is supposed to contain the same value it had on entry to the proc.

BTW I made a change recently which might be relevant: the registers R1-R8 can now be used by the register allocator. See f857f0741515b9ebf186beb38fe64448de355817

The linear register allocator is generating much better code here:

    leaq -96(%rbp),%rax
    cmpq %r15,%rax
    jb .Lc12k1
        ...
    movq %r9,-8(%rbp)
    movq %r8,-16(%rbp)
    movq %rdi,-24(%rbp)
    movq %rsi,-32(%rbp)
    movq %r14,-40(%rbp)
    addq $-40,%rbp
    jmp *-8(%r13)

Change History (18)

comment:1 Changed 3 years ago by simonmar

  • Blocking 4258 added
  • Cc benl added
  • Version changed from 7.4.2 to 7.7

comment:2 Changed 3 years ago by marlowsd@…

commit 4f656e89e802f3238c5d6cf1cd9bc420cd7f42ff

Author: Simon Marlow <[email protected]>
Date:   Tue Aug 28 16:10:48 2012 +0100

    disable -fregs-graph (#7192)

 compiler/main/DynFlags.hs |    3 ++-
 1 files changed, 2 insertions(+), 1 deletions(-)

comment:3 Changed 3 years ago by benl

  • Owner set to benl

comment:4 Changed 2 years ago by benl

On x86-32 compiling with the NCG with and without -fregs-graph causes a stack overflow in the compiled program, but with -fllvm it works fine. Maybe the register liveness determinator is broken, because this is used by both the linear and graph coloring allocators.

comment:5 Changed 2 years ago by simonmar

@benl: are you saying the dph-diophantine-opt test is broken in master at the moment?

comment:6 Changed 2 years ago by benl

@simonmar: yes. It looks like the NCG (or something) is broken independently from the graph allocator.

When compiling with -fllvm I get the right answer:

desire:diophantine benl$ pwd
/Users/benl/devel/ghc/ghc-head-devel/testsuite/tests/dph/diophantine

desire:diophantine benl$ /Users/benl/devel/ghc/ghc-head-devel/inplace/bin/ghc-stage2 --version
The Glorious Glasgow Haskell Compilation System, version 7.7.20121112

desire:diophantine benl$ /Users/benl/devel/ghc/ghc-head-devel/inplace/bin/ghc-stage2 \
  -fforce-recomp -dcore-lint -dcmm-lint \
  --make -o dph-diophantine-copy-fast Main \
  -O -fno-enable-rewrite-rules -package dph-lifted-copy -fllvm

desire:diophantine benl$ ./dph-diophantine-copy-fast 
(1260,[2,2,1,1,0])
(1260,[2,2,1,1,0])
(1260,fromList<PArray> [2,2,1,1,0])

But with with NCG -fno-regs-graph it gives a different answer:

desire:diophantine benl$ /Users/benl/devel/ghc/ghc-head-devel/inplace/bin/ghc-stage2 \
  -fforce-recomp -dcore-lint -dcmm-lint \
  --make -o dph-diophantine-copy-fast Main \
  -O -fno-enable-rewrite-rules -package dph-lifted-copy -fno-regs-graph

desire:diophantine benl$ ./dph-diophantine-copy-fast 
dph-diophantine-copy-fast: Prelude.minimum: empty list

Compiling in different ways by typing make in that same directory sometimes causes a stack overflow instead of Prelude.minimum: empty list

I looked through the output assembly code but didn't find code code sequence in your initial report. My approach was to compile with -ddump-cmmz-sp -ddump-to-file then grep

comment:7 Changed 2 years ago by benl

... (continued from previous)... the cmm code for the sequence you had. For this I compiled the Haskell code with -Odph to get array fusion and -fregs-graph to turn the graph allocator back on.

desire:diophantine benl$ /Users/benl/devel/ghc/ghc-head-devel/inplace/bin/ghc-stage2 \
  -fforce-recomp -dcore-lint -dcmm-lint --make -o dph-diophantine-copy \
  Main -Odph -fregs-graph -package dph-lifted-copy \
  -ddump-cmmz-sp -ddump-asm -ddump-to-file

desire:diophantine benl$ ./dph-diophantine-copy
Stack space overflow: current size 8388608 bytes.
Use `+RTS -Ksize -RTS' to increase it.

desire:diophantine benl$ grep -B 8 -A 16 "if (Sp - 96 < SpLim)" \
  DiophantineVect.dump-cmmz-sp > dump-entry.cmm

From this I get about 12 blocks of cmm code that look like the one in your report. I checked the corresponding asm code and didn't see any register allocation problems. A few of the proc entry blocks assign to %rbx, but the original value this register had on entry to the proc is restored before issuing jmp *-8(%r13), which I assume invokes the GC.

However, I do notice that some of the calls to stg_gc_fun in the cmm code have R1 arguments, and some don't.

c1cr6:
      _s12rI::P64 = R6;
      _s12rF::I64 = R5;
      _s12rU::I64 = R4;
      _s12rA::I64 = R3;
      _s12rD::I64 = R2;
      _s12rZ::P64 = R1;
      if (Sp - 96 < SpLim) goto c1crZ; else goto c1cs2;
      ...

c1crZ:
      R1 = _s12rZ::P64;
      I64[Sp - 40] = _s12rD::I64;
      I64[Sp - 32] = _s12rA::I64;
      I64[Sp - 24] = _s12rU::I64;
      I64[Sp - 16] = _s12rF::I64;
      P64[Sp - 8] = _s12rI::P64;
      Sp = Sp - 40;
      call (stg_gc_fun)() args: 48, res: 0, upd: 8;      *** no R1 argument here

But then:

offset
  c1eWQ:
      _s17H9::P64 = R1;
      if (Sp - 96 < SpLim) goto c1eXm; else goto c1eXl;
      ...
  c1eXm:
      R1 = _s17H9::P64;
      call (stg_gc_fun)(R1) args: 8, res: 0, upd: 8;      *** got an R1 here

If R1 needs to valid at *every* call to stg_gc_fun, then you need to pass it as an argument or the register liveness determinator will mark it as dead -- and no good will come from that.

        c1crZ:
            	movq %vI_s12rZ,%rbx
                    # born:    %r1
                    # r_dying: %vI_s12rZ
                    # w_dying: %r1                 **** R1 dies here 
                     
            	movq %vI_s12rD,-40(%rbp)
                    # r_dying: %vI_s12rD
                     
            	movq %vI_s12rA,-32(%rbp)
                    # r_dying: %vI_s12rA
                     
            	movq %vI_s12rU,-24(%rbp)
                    # r_dying: %vI_s12rU
                     
            	movq %vI_s12rF,-16(%rbp)
                    # r_dying: %vI_s12rF
                     
            	movq %vI_s12rI,-8(%rbp)
                    # r_dying: %vI_s12rI
                     
            	addq $-40,%rbp
                     
            	jmp *-8(%r13)

If the stg_gc_fun() thing is correct then can you tell me how to find the assembly sequence in your initial report? I can hack on it this week.

comment:8 Changed 2 years ago by benl

Yeah, if the liveness determinator can't see R1 being read in the block that calls stg_gc_fun, then it the allocator isn't obliged to preserve it's value across the jump. I think your original cmm code is malformed.

comment:9 follow-up: Changed 2 years ago by simonmar

Aha! You're absolutely right, it's a bug, sorry about that. I'm validating a fix right now. I don't seem to be able to reproduce the original problem, but I've definitely fixed the missing R1 dependency.

I still think we need to look at the graph-colouring allocator though, because I think it is interacting badly with the code generated by the new code generator. The code I've seen doesn't look great. I'm leaving it turned off for the time being, and I'll make a separate ticket.

If you could verify that you don't see the wrong answers any more after my patch, that would be great. Patch coming shortly...

comment:10 Changed 2 years ago by marlowsd@…

commit 4270d7e7485b124dd153399dfe3f571253dc0d1d

Author: Simon Marlow <[email protected]>
Date:   Tue Nov 13 11:43:09 2012 +0000

    Fix the Slow calling convention (#7192)
    
    The Slow calling convention passes the closure in R1, but we were
    ignoring this and hoping it would work, which it often did.  However,
    this bug seems to have been the cause of #7192, because the
    graph-colouring allocator is more sensitive to having correct liveness
    information on jumps.

 compiler/cmm/CmmCallConv.hs      |    6 +++---
 compiler/cmm/CmmParse.y          |    2 +-
 compiler/cmm/MkGraph.hs          |   32 ++++++++------------------------
 compiler/codeGen/StgCmmBind.hs   |   21 +++++++++++----------
 compiler/codeGen/StgCmmExpr.hs   |    4 ++--
 compiler/codeGen/StgCmmHeap.hs   |   12 ++++--------
 compiler/codeGen/StgCmmLayout.hs |    2 +-
 7 files changed, 30 insertions(+), 49 deletions(-)

comment:11 Changed 2 years ago by PHO

  • Cc pho@… added

comment:12 Changed 2 years ago by simonmar

  • Resolution set to fixed
  • Status changed from new to closed

We think this is fixed after the patch above.

comment:13 in reply to: ↑ 9 ; follow-up: Changed 2 years ago by shelarcy

Replying to simonmar:

I still think we need to look at the graph-colouring allocator though, because I think it is interacting badly with the code generated by the new code generator. The code I've seen doesn't look great. I'm leaving it turned off for the time being, and I'll make a separate ticket.

Where is a separated ticket?

comment:14 in reply to: ↑ 13 ; follow-up: Changed 2 years ago by simonmar

Replying to shelarcy:

Where is a separated ticket?

#7679

comment:15 in reply to: ↑ 14 Changed 2 years ago by shelarcy

Replying to simonmar:

#7679

I see. Thank you.

comment:16 Changed 4 months ago by jstolarek

This is marked as fixed, but the -fregs-graph flag is still disabled for -O2. Can I re-enabled it?

Last edited 4 months ago by jstolarek (previous) (diff)

comment:17 follow-up: Changed 4 months ago by simonmar

@jstolarek we don't want to turn it on by default due to #7679. The comment is wrong, I'll fix it.

comment:18 in reply to: ↑ 17 Changed 4 months ago by jstolarek

Replying to simonmar:

The comment is wrong, I'll fix it.

Please don't. I'll fix it. I'm just working on that code in DynFlags.

Note: See TracTickets for help on using tickets.