Bug in -fregs-graph with -fnew-codegen

changed milestone to %7.8.1

changed weight to 10

Trac metadata

Trac field	Value
Version	7.4.2 → 7.7
Blocking	→ #4258 (closed)

mentioned in commit 4f656e89

assigned to @trac-benl

On x86-32 compiling with the NCG with and without -fregs-graph causes a stack overflow in the compiled program, but with -fllvm it works fine. Maybe the register liveness determinator is broken, because this is used by both the linear and graph coloring allocators.

@benl: are you saying the dph-diophantine-opt test is broken in master at the moment?

@simonmar: yes. It looks like the NCG (or something) is broken independently from the graph allocator.

When compiling with -fllvm I get the right answer:

desire:diophantine benl$ pwd
/Users/benl/devel/ghc/ghc-head-devel/testsuite/tests/dph/diophantine

desire:diophantine benl$ /Users/benl/devel/ghc/ghc-head-devel/inplace/bin/ghc-stage2 --version
The Glorious Glasgow Haskell Compilation System, version 7.7.20121112

desire:diophantine benl$ /Users/benl/devel/ghc/ghc-head-devel/inplace/bin/ghc-stage2 \
  -fforce-recomp -dcore-lint -dcmm-lint \
  --make -o dph-diophantine-copy-fast Main \
  -O -fno-enable-rewrite-rules -package dph-lifted-copy -fllvm

desire:diophantine benl$ ./dph-diophantine-copy-fast 
(1260,[2,2,1,1,0])
(1260,[2,2,1,1,0])
(1260,fromList<PArray> [2,2,1,1,0])

But with with NCG -fno-regs-graph it gives a different answer:

desire:diophantine benl$ /Users/benl/devel/ghc/ghc-head-devel/inplace/bin/ghc-stage2 \
  -fforce-recomp -dcore-lint -dcmm-lint \
  --make -o dph-diophantine-copy-fast Main \
  -O -fno-enable-rewrite-rules -package dph-lifted-copy -fno-regs-graph

desire:diophantine benl$ ./dph-diophantine-copy-fast 
dph-diophantine-copy-fast: Prelude.minimum: empty list

Compiling in different ways by typing make in that same directory sometimes causes a stack overflow instead of Prelude.minimum: empty list

I looked through the output assembly code but didn't find code code sequence in your initial report. My approach was to compile with -ddump-cmmz-sp -ddump-to-file then grep

.. (continued from previous)... the cmm code for the sequence you had. For this I compiled the Haskell code with -Odph to get array fusion and -fregs-graph to turn the graph allocator back on.

desire:diophantine benl$ /Users/benl/devel/ghc/ghc-head-devel/inplace/bin/ghc-stage2 \
  -fforce-recomp -dcore-lint -dcmm-lint --make -o dph-diophantine-copy \
  Main -Odph -fregs-graph -package dph-lifted-copy \
  -ddump-cmmz-sp -ddump-asm -ddump-to-file

desire:diophantine benl$ ./dph-diophantine-copy
Stack space overflow: current size 8388608 bytes.
Use `+RTS -Ksize -RTS' to increase it.

desire:diophantine benl$ grep -B 8 -A 16 "if (Sp - 96 < SpLim)" \
  DiophantineVect.dump-cmmz-sp > dump-entry.cmm

From this I get about 12 blocks of cmm code that look like the one in your report. I checked the corresponding asm code and didn't see any register allocation problems. A few of the proc entry blocks assign to %rbx, but the original value this register had on entry to the proc is restored before issuing jmp *-8(%r13), which I assume invokes the GC.

However, I do notice that some of the calls to stg_gc_fun in the cmm code have R1 arguments, and some don't.

c1cr6:
      _s12rI::P64 = R6;
      _s12rF::I64 = R5;
      _s12rU::I64 = R4;
      _s12rA::I64 = R3;
      _s12rD::I64 = R2;
      _s12rZ::P64 = R1;
      if (Sp - 96 < SpLim) goto c1crZ; else goto c1cs2;
      ...

c1crZ:
      R1 = _s12rZ::P64;
      I64[Sp - 40] = _s12rD::I64;
      I64[Sp - 32] = _s12rA::I64;
      I64[Sp - 24] = _s12rU::I64;
      I64[Sp - 16] = _s12rF::I64;
      P64[Sp - 8] = _s12rI::P64;
      Sp = Sp - 40;
      call (stg_gc_fun)() args: 48, res: 0, upd: 8;      *** no R1 argument here

But then:

offset
  c1eWQ:
      _s17H9::P64 = R1;
      if (Sp - 96 < SpLim) goto c1eXm; else goto c1eXl;
      ...
  c1eXm:
      R1 = _s17H9::P64;
      call (stg_gc_fun)(R1) args: 8, res: 0, upd: 8;      *** got an R1 here

If R1 needs to valid at *every* call to stg_gc_fun, then you need to pass it as an argument or the register liveness determinator will mark it as dead -- and no good will come from that.

        c1crZ:
            	movq %vI_s12rZ,%rbx
                    # born:    %r1
                    # r_dying: %vI_s12rZ
                    # w_dying: %r1                 **** R1 dies here 
                     
            	movq %vI_s12rD,-40(%rbp)
                    # r_dying: %vI_s12rD
                     
            	movq %vI_s12rA,-32(%rbp)
                    # r_dying: %vI_s12rA
                     
            	movq %vI_s12rU,-24(%rbp)
                    # r_dying: %vI_s12rU
                     
            	movq %vI_s12rF,-16(%rbp)
                    # r_dying: %vI_s12rF
                     
            	movq %vI_s12rI,-8(%rbp)
                    # r_dying: %vI_s12rI
                     
            	addq $-40,%rbp
                     
            	jmp *-8(%r13)

If the stg_gc_fun() thing is correct then can you tell me how to find the assembly sequence in your initial report? I can hack on it this week.

Yeah, if the liveness determinator can't see R1 being read in the block that calls stg_gc_fun, then it the allocator isn't obliged to preserve it's value across the jump. I think your original cmm code is malformed.

mentioned in commit 4270d7e7

Aha! You're absolutely right, it's a bug, sorry about that. I'm validating a fix right now. I don't seem to be able to reproduce the original problem, but I've definitely fixed the missing R1 dependency.

I still think we need to look at the graph-colouring allocator though, because I think it is interacting badly with the code generated by the new code generator. The code I've seen doesn't look great. I'm leaving it turned off for the time being, and I'll make a separate ticket.

If you could verify that you don't see the wrong answers any more after my patch, that would be great. Patch coming shortly...

closed

We think this is fixed after the patch above.

Trac metadata

Trac field	Value
Resolution	Unresolved → ResolvedFixed

Replying to [ticket:7192#comment:66380 simonmar]:

I still think we need to look at the graph-colouring allocator though, because I think it is interacting badly with the code generated by the new code generator. The code I've seen doesn't look great. I'm leaving it turned off for the time being, and I'll make a separate ticket.

Where is a separated ticket?

mentioned in issue #7679

Replying to [ticket:7192#comment:68140 shelarcy]:

Where is a separated ticket?

#7679

Replying to [ticket:7192#comment:69077 simonmar]:

#7679

I see. Thank you.

This is marked as fixed, but the -fregs-graph flag is still disabled for -O2. Can I re-enabled it?

@jstolarek we don't want to turn it on by default due to #7679. The comment is wrong, I'll fix it.

Replying to [ticket:7192#comment:90623 simonmar]:

The comment is wrong, I'll fix it.

Please don't. I'll fix it. I'm just working on that code in DynFlags.

mentioned in issue #13085 (closed)

added Phighest label

Trac field	Value
Version	7.4.2
Type	Bug
TypeOfFailure	OtherFailure
Priority	highest
Resolution	Unresolved
Component	Compiler
Test case
Differential revisions
BlockedBy
Related
Blocking
CC
Operating system
Architecture

Bug in -fregs-graph with -fnew-codegen

Child items 0

Activity

Bug in -fregs-graph with -fnew-codegen

Relates to

Activity