wiki:Commentary/Compiler/NewCodeGenPipeline

Version 44 (modified by jstolarek, 17 months ago) (diff)

Move CmmPipeline description to CodeGenerator page

Historical page

This page used to store information about Cmm Pipeline in the new code generator. This description has been updated, moved to the Code Generator page and is maintained there. The rest of this page has a few historical notes, mostly about Adams optimisation. That optimisation is also described in Note [sharing continuations] in compiler/codeGen/StgCmmMonad.hs and probably deserves its own wiki page.

Design of the new code generator

This page contains notes about the design of the new code generator. See also: overview of the module structure in the new code generator.

Overview

Code generation now has three stages:

  1. Convert STG to Cmm, with implicit stack implicit, and native Cmm calls.
  2. Optimise the Cmm, and CPS-convert it to have an explicit stack, and no native calls. This part of the pipeline is stitched together in cmm/CmmPipeline.hs.
  3. Feed the CPS-converted Cmm to the existing, unmodified native code generators.

Ultimately our plan is to expand the capability of the new pipeline so that it does native code generation too, and we can ultimately discard the existing code generators. The design of this stage is here: Commentary/Compiler/IntegratedCodeGen

The Cmm pipeline

Description of Cmm pipeline has been moved to Code Generator page. Below are descriptions of two passes that at some point existed in the Cmm pipeline of the new code generator but were later removed and didn't make it into GHC 7.8 (first stable release with the new codegen).

  • (OUTDATED - CmmSpillReload does not exist any more) Add spill/reload, implemented in CmmSpillReload, to spill live C-- variables before a call and reload them afterwards. The spill and reload instructions are simply memory stores and loads respectively, using symbolic stack offsets (see stack layout). For example, a spill of variable 'x' would look like Ptr32[SS(x)] = x.
    • dualLivenessWithInsertion does two things:
      • Spills at the definition of any variable that is subequently live across a call (uses a backward analysis)
      • Adds a reload at each return (or proc) point
      At this point, no (LocalReg) variables are live across a call.
    • TODO avoid f();g() turning into spill x; f(); reload x; spill x; g(); reload x.
  • (OUTDATED - CmmRewriteAssignments is not used any more) Rewrite assignments (assignments to local regs, that is, not stores).
    • Convert graph to annotated graph whose nodes are CmmRewriteAssignments.WithRegUsage. Specifically, CmmAssign is decorated with a flag RegUsage saying whether it is used once or many times.
    • Sink or inline assignments nearer their use points
    • Do constant mach-op folding. This is done in this phase, because folded mach-ops can be inlined, and inlining exposes opportunities for mach-op folding.

Branches to continuations and the "Adams optimisation"

A GC block for a heap check after a call should only take one or two instructions. However the natural code:

    ...put params in R1 R2 etc...
    call foo returns to L
 L: r = R1   -- get return value
    goto M
 M: Hp = Hp + 20
    if (Hp > HpLim) { call do_gc returns to K;
                   K: goto M; }

The label M is the head of the call-gc-and-try-again loop. If we do this, we'll generate two info tables, one for L and one for K.

We can do better like this:

    ...put params in R1 R2 etc...
    call foo returns to L
 L: r = R1
    goto M
 M: Hp = Hp + 20
    if (Hp > HpLim) { R1 = r
                      call do_gc_p returns to K;
                   K: r = R1; goto M; }

Now the do_gc_p call has the same return signature as foo and can use the same continuation, thus:

    ...put params in R1 R2 etc...
    call foo returns to L
 L: r = R1
    goto M
 M: Hp = Hp + 20
    if (Hp > HpLim) { R1 = r
                      call do_gc_p returns to L }

Now we can coalesce the uniquely-used block M into L, thus:

    ...put params in R1 R2 etc...
    call foo returns to L
 L: r = R1
    Hp = Hp + 20
    if (Hp > HpLim) { R1 = r
                      call do_gc_p returns to L }

(A call followed by a goto thus gets optimized down to just the call.)

Now things are good. Simple common block elimination (CBE) will common up K and L, so both calls share the same info table.

Runtime system

  • Garbage collector entry points: see Note [Heap checks] in StgCmmHeapery.
  • PAPs

  • Update frames and exception handling. Also STM frames.
  • Primitives can be rewritten:
    • Use parameters
    • In a few cases, use native calls (notably eval)