Overview of GHC's code generator

The root page for codegen stuff is Commentary/Compiler/CodeGen.

The goal of the code generator is to convert program from STG representation to Cmm representation. STG is a functional language with explicit stack. Cmm is a low-level imperative language - something between C and assembly - that is suitable for machine code generation. Note that terminology might be a bit confusing here: the term "code generator" can refer both to STG->Cmm pass and the whole STG->Cmm->assembly pass. The Cmm->assembly conversion is performed by one the backends, eg. NCG (Native Code Generator or LLVM.

The top-most entry point to the codegen is located in compiler/main/HscMain.hs in the tryNewCodegen function. Code generation is done in two stages:

  1. Convert STG to Cmm with implicit stack, and native Cmm calls. This whole stage lives in compiler/codeGen directory with the entry point being codeGen function in compiler/codeGen/StgCmm.hs module.
  2. Optimise the Cmm, and CPS-convert it to have an explicit stack, and no native calls. This lives in compiler/cmm directory with the cmmPipeline function from compiler/cmm/CmmPipeline.hs module being the entry point.

The CPS-converted Cmm is fed to one of the backends. This is done by codeOutput function (compiler/main/CodeOutput.hs) called from hscGenHardCode after returning from tryNewCodegen.

First stage: STG to Cmm conversion

  • Code generator converts STG to CmmGraph. Implemented in StgCmm* modules (in directory codeGen).
    • Cmm.CmmGraph is pretty much a Hoopl graph of CmmNode.CmmNode nodes. Control transfer instructions are always the last node of a basic block.
    • Parameter passing is made explicit; the calling convention depends on the target architecture. The key function is CmmCallConv.assignArgumentsPos.
      • Parameters are passed in virtual registers R1, R2 etc. [These map 1-1 to real registers.]
      • Overflow parameters are passed on the stack using explicit memory stores, to locations described abstractly using the ''Stack Area'' abstraction.
      • Making the calling convention explicit includes an explicit store instruction of the return address, which is stored explicitly on the stack in the same way as overflow parameters. This is done (obscurely) in StgCmmMonad.mkCall.

Second stage: the Cmm pipeline

The core of the Cmm pipeline is implemented by the cpsTop function in compiler/cmm/CmmPipeline.hs module. Below is a high-level overview of the pipeline. See source code comments in respective modules for a more in-depth explanation of each pass.

  • Control Flow Optimisations, implemented in CmmContFlowOpt, simplifies the control flow graph by:
    • Eliminating blocks that have only one predecessor by concatenating them with that predecessor
    • Shortcuting targets of branches and calls (see Note [What is shortcutting])

If a block becomes unreachable because of shortcutting it is eliminated from the graph. However, it is theoretically possible that this pass will produce unreachable blocks. The reason is the label renaming pass performed after block concatenation has been completed.

This pass might be optionally called for the second time at the end of the pipeline.

  • Common Block Elimination, implemented in CmmCommonBlockElim, eliminates blocks that are identical (except for the label on their first node). Since this pass traverses blocks in depth-first order any unreachable blocks introduced by Control Flow Optimisations are eliminated. This pass is optional.
  • Determine proc-points, implemented in CmmProcPoint. The idea behind the "proc-point splitting" is that we first determine proc-points, ie. blocks in the graph that can be turned into entry points of procedures, and then split a larger function into many smaller ones, each having a proc-point as its entry point. This is required for the LLVM backend. The proc-point splitting itself is done later in the pipeline, but here we only determine the set of proc-points. We first call callProcPoints, which assumes that entry point to a Cmm graph and every continuation of a call is a procpoint. If we are splitting proc-points we update the list of proc-points by calling minimalProcPointSet, which adds all blocks reachable from more than one block in the graph. The set of proc-points is required by the stack layout pass.
  • Figure out the stack layout, implemented in CmmStackLayout. The job of this pass is to:
    • replace references to abstract stack Areas with fixed offsets from Sp.
    • replace the CmmHighStackMark constant used in the stack check with the maximum stack usage of the proc.
    • save any variables that are live across a call, and reload them as necessary.

Important: It may happen that stack layout will invalidate the computed set of proc-points by making a proc-point unreachable. This unreachable block is eliminated by one of subsequent passes that performs depth-first traversal of a graph: sinking pass (if optimisations are enabled), proc-point analysis (if optimisations are disabled and we're doing proc-point splitting) or at the very end of the pipeline (if optimisations are disabled and we're not doing proc-point splitting). This means that starting from this point in the pipeline we have inconsistent data and subsequent steps must be prepared for it.

  • Sinking assignments, implemented in CmmSink, performs these optimizations:
    • moves assignments closer to their uses, to reduce register pressure
    • pushes assignments into a single branch of a conditional if possible
    • inlines assignments to registers that are mentioned only once
    • discards dead assignments

This pass is optional. It currently does not eliminate dead code in loops (#8327) and has some other minor deficiencies (eg. #8336).

  • CAF analysis, implemented in CmmBuildInfoTables. Computed CAF information is returned from cmmPipeline and used to create Static Reference Tables (SRT). See here for some more detail on CAFs and SRTs. This pass is implemented using Hoopl (see below).
  • Proc-point analysis and splitting (only when splitting proc-points), implemented by procPointAnalysis in CmmProcPoint, takes a list of proc-points and for each block and determines from which proc-point the block is reachable. This is implemented using Hoopl. Then the call to splitAtProcPoints splits the Cmm graph into multiple Cmm graphs (each represents a single function) and build info tables to each of them. When doing this we must be prepared for the fact that a proc-point does not actually exist in the graph since it was removed by stack layout pass (see #8205).
  • Attach continuations' info tables (only when NOT splitting proc-points), implemented by attachContInfoTables in CmmProcPoint attaches info tables for the continuations of calls in the graph. [PLEASE WRITE MORE IF YOU KNOW WHY THIS IS DONE]
  • Update info tables to include stack liveness, implemented by setInfoTableStackMap in CmmLayoutStack. Populates info tables of each Cmm function with stack usage information. Uses stack maps created by the stack layout pass.
  • Control Flow Optimisations, same as the beginning of the pipeline, but this pass runs only with -O1 and -O2. Since this pass might produce unreachable blocks it is followed by a call to removeUnreachableBlocksProc (also in CmmContFlowOpt.hs)

Dumping and debugging Cmm

You can dump all stages of Cmm processing. This is helpful for debugging Cmm problems.

Currently we have two ways to run the Cmm pipeline:

  • The .cmm file passed as input

Here we get into hscCompileFile and run the parser first. Ouput from the parser can be dumped via the -ddump-cmm-verbose flag:

    |-> parseCmmFile
    (dump) [-ddump-cmm-verbose]  "== Parsed Cmm =="
    |-> cmmPipeline (..)
  • The Haskell source passed as input

The compiler reached the HscRecomp phase and started to produce hard code through the native codegen:

    |-> doCodegen
    .     |-> StgCmm.codeGen
    .     |
    .     (dump) [-ddump-cmm-from-stg]  "== Cmm produced by codegen =="
    .     |
    .     |-> cmmPipeline (..)
    (dump) [-ddump-cmm]         "== Output Cmm =="
    (dump) [-ddump-cmm-raw]     "== Raw Cmm =="

"Cmm produced by codegen" is emited in HscMain module after converting STG to Cmm. This Cmm has not been processed in any way by the Cmm pipeline. If you see that something is incorrect in that dump it means that the problem is located in the STG->Cmm pass. The last section, "Output Cmm", is also dumped in HscMain but this is done after the Cmm has been processed by the whole Cmm pipeline.

All stages of the Cmm pipeline can be dumped separately (with set of the cmm subflags) or together (when -ddump-cmm-verbose specified). Note, that there is still problem with output into file (see ToDo in CmmPipeline.hs:dumpWith). This dump is divided into several sections:

==================== Post control-flow optimisations ====================

==================== Post common block elimination ====================

==================== Post switch plan ====================

==================== Proc points ====================

==================== Layout Stack ====================

==================== Sink assignments ====================

==================== CAFEnv ====================

==================== procpoint map ====================

==================== Post splitting ====================

==================== after setInfoTableStackMap ====================

==================== Post control-flow optimisations ====================

==================== Post CPS Cmm ====================

As was mentioned you can dump only selected passes with more specific flags. For example, if you know (or suspect) that the sinking pass is performing some incorrect transformations you can make the dump shorter by adding -ddump-cmm-sp -ddump-cmm-sink flags. This will produce only the "Layout Stack" dump (just before sinking pass) and "Sink assignments" dump (just after the sinking pass) allowing you to focus on the changes introduced by the sinking pass.

Last modified 8 months ago Last modified on Apr 11, 2017 11:38:47 AM