wiki:Commentary/Compiler/NewCodeGenStupidity

Version 4 (modified by ezyang, 3 years ago) (diff)

more things wrong

Stupidity in the New Code Generator

Presently compiling using the new code generator results in a fairly sizable performance hit, because the new code generator produces sub-optimal (and sometimes absolutely terrible code.) There are a lot of ideas for how to make things better; the idea for this wiki page is to document all of the stupid things the new code generator is doing, to later be correlated with specific refactorings and fixes that will hopefully eliminate classes of these stupid things. The hope here is to develop a sense for what the most endemic problems with the newly generated code is.

Cosmetic issues

The new code generator tends to generate C-- in the following style (which should be optimized away by the backends but can increase "line-noise"):

  • Lots of temporary variables (these can tickle other issues when the temporaries are long-lived, but otherwise would be optimized away). You can at least eliminate some of them by looking at the output of -ddump-opt-cmm, which utilizes some basic temporary inlining when used with the native backend -fasm, but this doesn't currently apply to the GCC or LLVM backends.

Rewriting stacks

3586.hs emits the following code:

 Main.$wa_entry()
         { [const Main.$wa_slow-Main.$wa_info;, const 3591;, const 0;,
    const 458752;, const 0;, const 15;]
         }
     c17W:
         _s16B::F64 = F64[Sp + 20];
         F64[Sp - 8] = _s16B::F64;
         _s16h::I32 = I32[Sp + 16];
         _s16j::I32 = I32[Sp + 12];
         _s16y::I32 = I32[Sp + 8];
         _s16x::I32 = I32[Sp + 4];
         _s16w::I32 = I32[Sp + 0];
         if (Sp - 12 < SpLim) goto u1bR;
         Sp = Sp + 32;
// [SNIP]
         // directEntry else
         // emitCall: Sequel: Return
         F64[Sp - 12] = _s17a::F64;
         I32[Sp - 16] = _s17b::I32;
         I32[Sp - 20] = _s16j::I32;
         I32[Sp - 24] = _s16y::I32;
         I32[Sp - 28] = _s16x::I32;
         I32[Sp - 32] = _s16w::I32;
         Sp = Sp - 32;
         jump Main.$wa_info ();
     u1bR:
         Sp = Sp + 32;
         // outOfLine here
         R1 = Main.$wa_closure;
         F64[Sp - 12] = _s16B::F64;
         I32[Sp - 16] = _s16h::I32;
         I32[Sp - 20] = _s16j::I32;
         I32[Sp - 24] = _s16y::I32;
         I32[Sp - 28] = _s16x::I32;
         I32[Sp - 32] = _s16w::I32;
         Sp = Sp - 32;
         jump stg_gc_fun ();

We see that these temporary variables are being repeatedly rewritten to the stack, even when there are no changes.

Spilling Hp/Sp?

3586.hs emits the following code:

     _c1ao::I32 = Hp - 4;
     I32[Sp - 20] = _c1ao::I32;
     foreign "ccall"
       newCAF((BaseReg, PtrHint), (R1, PtrHint))[_unsafe_call_];
     _c1ao::I32 = I32[Sp - 20];

We see Hp - 4 being allocated to a temp, and then consequently being spilled to the stack even though newCAF definitely will not change Hp, so we could have floated the expression down.

Up and Down

A frequent pattern is the stack pointer being bumped up and then back down again, for no particular reason.

         Sp = Sp + 4;
         Sp = Sp - 4;
         jump block_c7xh_entry ();

This is mentioned at the very top of cmm-notes.

Sp is generally stupid

Here is an optimized C-- sample from arr016.hs.

Main.D:Arbitrary_entry()
        { [const 131084;, const 0;, const 15;]
        }
    c7J5:
        _B1::I32 = I32[Sp + 4];
        _B2::I32 = I32[Sp + 0];
        if ((Sp + 0) < I32[BaseReg + 84]) goto u7Jf;
        Sp = Sp + 12;
        Hp = Hp + 12;
        if (Hp > I32[BaseReg + 92]) goto c7Jc;
        I32[Hp - 8] = Main.D:Arbitrary_con_info;
        I32[Hp - 4] = _B2::I32;
        I32[Hp + 0] = _B1::I32;
        _c7J4::I32 = Hp - 8;
        _c7J4::I32 = _c7J4::I32 + 1;
        R1 = _c7J4::I32;
        Sp = Sp - 4;
        jump (I32[Sp + 0]) ();
    u7Jf:
        Sp = Sp + 12;
        goto c7Jd;
    c7Jc:
        I32[BaseReg + 112] = 12;
        goto c7Jd;
    c7Jd:
        R1 = Main.D:Arbitrary_closure;
        Sp = Sp - 12;
        jump (I32[BaseReg - 4]) ();
}

Compare with the old code:

Main.D:Arbitrary_entry()
        { [const 131084;, const 0;, const 15;]
        }
    c4pX:
        Hp = Hp + 12;
        if (Hp > I32[BaseReg + 92]) goto c4pU;
        I32[Hp - 8] = Main.D:Arbitrary_con_info;
        I32[Hp - 4] = I32[Sp + 0];
        I32[Hp + 0] = I32[Sp + 4];
        R1 = Hp - 7;
        Sp = Sp + 8;
        jump (I32[Sp + 0]) ();
    c4pV:
        R1 = Main.D:Arbitrary_closure;
        jump (I32[BaseReg - 4]) ();
    c4pU:
        I32[BaseReg + 112] = 12;
        goto c4pV;
}

There are a lot of things wrong

  • We do an unnecessary stack check on entry to this function
  • Sp should be bumped before the stack check (but we need this fishy code due to ncg spilling before the check)
  • The R1 assignment is silly (although that's my fault; I should probably fix up the offending patch)
  • Sp is getting bumped too much, and then being adjusted back down again