Changes between Version 23 and Version 24 of Commentary/Compiler/CodeGen

Feb 1, 2014 3:51:46 PM (4 years ago)

A whole new description of code generator (as implemented by GHC 7.8)


  • Commentary/Compiler/CodeGen

    v23 v24  
     1= Code Generator =
     3This page describes code generator ("codegen") in GHC. It is meant to reflect current state of the implementation. If you notice any inacurracies please update the page (if you know how) or complain on ghc-devs.
    3 = GHC Commentary: The Code Generator =
     5== A brief history of code generator ==
    5 The material below describes the old code generator, which was retired in 2012.  New stuff is here:
    6   * [wiki:Commentary/Compiler/CPS Michael Adams CPS conversion]
    7   * [wiki:Commentary/Compiler/NewCodeGen New code generator]
     7You might ocasionally hear about "old" and "new" code generator. GHC 7.6 and earlier used the old code generator. New code generator was being developed since 2007 and it was [changeset:832077ca5393d298324cb6b0a2cb501e27209768/ghc enabled by default on 31 August 2012] after the release of GHC 7.6.1. The first stable GHC to use the new code generator is 7.8.1 released in early 2014. The commentary on the old code generator can be found [wiki:Commentary/Compiler/OldCodeGen here]. Notes from the development process of the new code generator are located in a couple of pages on the wiki - go to [wiki:TitleIndex Index] and look for pages starting with "!NewCodeGen".
    9 The code generator lives in [[GhcFile(compiler/codeGen)]]
     9There are some plans for the future development of code generator. One plan is to expand the capability of the pipeline so that it does native code generation too so that existing backends can be discarded - see [wiki:Commentary/Compiler/IntegratedCodeGen IntegratedCodeGen] for discussion of the design. It is hard to say if this will ever happen as currently there is no work being done on that subject and in the meanwhile there was an alternative proposal to [wiki:Commentary/Compiler/Backends/LLVM/ReplacingNCG replace native code generator with LLVM].
    11 The stuff below is probably out of date, although there were some similarities between the old and new code-generator.
     11== Overview ==
     13The goal of the code generator is to convert program from [wiki:Commentary/Compiler/GeneratedCode STG] representation to [wiki:Commentary/Compiler/CmmType Cmm] representation. STG is a functional language with explicit stack. Cmm is a low-level imperative language - something between C and assembly - that is suitable for machine code generation. Note that terminology might be a bit confusing here: the term "code generator" refers to STG->Cmm pass but it may also be used to refer to the Cmm->assembly pass. Referring to the latter as the "backend" eliminates the confusion.
    14 == Storage manager representations ==
     15The top-most entry point to the codegen is located in [[GhcFile(compiler/main/HscMain.hs)]] in the `tryNewCodegen` function. Code generation now has three stages:
     16  1. Convert STG to Cmm with implicit stack, and native Cmm calls. This whole stage lives in [[GhcFile(compiler/codeGen)]] directory with the entry point being `codeGen` function in [[GhcFile(compiler/codeGen/StgCmm.hs)]] module.
     17  2. Optimise the Cmm, and CPS-convert it to have an explicit stack, and no native calls. This lives in [[GhcFile(compiler/cmm)]] directory with the `cmmPipeline` function from [[GhcFile(compiler/cmm/CmmPipeline.hs)]] module being the entry point. This is described below in full detail.
     18  3. Feed the CPS-converted Cmm to the existing, unmodified native code generators. This is done by `codeOutput` function ([[GhcFile(compiler/main/CodeOutput.lhs)]] called from `hscGenHardCode` after returning from `tryNewCodegen`.
    16 See [wiki:Commentary/Rts/Storage The Storage Manager] for the [wiki:Commentary/Rts/Storage/Stack Layout of the stack].
     20== Structure of Cmm pipeline ==
    18 The code generator needs to know the layout of heap objects, because it generates code that accesses and constructs those heap objects.  The runtime also needs to know about the layout of heap objects, because it contains the garbage collector.  How can we share the definition of storage layout such that the code generator and the runtime both have access to it, and so that we don't have to keep two independent definitions in sync?
     22The first two steps are described in more detail here:
    20 Currently we solve the problem this way:
     24 * '''Code generator''' converts STG to `CmmGraph`.  Implemented in `StgCmm*` modules (in directory `codeGen`).
     25   * `Cmm.CmmGraph` is pretty much a Hoopl graph of `CmmNode.CmmNode` nodes. Control transfer instructions are always the last node of a basic block.
     26   * Parameter passing is made explicit; the calling convention depends on the target architecture.  The key function is `CmmCallConv.assignArgumentsPos`.
     27     * Parameters are passed in virtual registers R1, R2 etc. [These map 1-1 to real registers.]
     28     * Overflow parameters are passed on the stack using explicit memory stores, to locations described abstractly using the [wiki:Commentary/Compiler/StackAreas ''Stack Area'' abstraction.].   
     29     * Making the calling convention explicit includes an explicit store instruction of the return address, which is stored explicitly on the stack in the same way as overflow parameters. This is done (obscurely) in `MkGraph.mkCall`.
    22  * C types representing heap objects are defined in the C header files, see for example [[GhcFile(includes/rts/storage/Closures.h)]].
     31 * '''Simple control flow optimisation''', implemented in `CmmContFlowOpt`.  It's called both at the beginning and end of the pipeline.
     32   * Branch chain elimination.
     33   * Remove unreachable blocks.
     34   * Block concatenation.  branch to K; and this is the only use of K. 
    24  * A C program, [[GhcFile(includes/mkDerivedConstants.c)]],  `#includes` the runtime headers.
    25    This program is built and run when you type `make` or `make boot` in `includes/`.  It is
    26    run twice: once to generate `includes/DerivedConstants.h`, and again to generate
    27    `includes/GHCConstants.h`.
     36 * '''More control flow optimisations'''.
     37   * Common Block Elimination (like CSE). This essentially implements the Adams optimisation, we believe.
     38   * Consider (sometime): block duplication.  branch to K; and K is a short block.  Branch chain elimination is just a special case of this.
    29  * The file `DerivedConstants.h` contains lots of `#defines` like this:
    30 {{{
    31 #define OFFSET_StgTSO_why_blocked 18
    32 }}}
    33    which says that the offset to the why_blocked field of an `StgTSO` is 18 bytes.  This file
    34    is `#included` into [[GhcFile(includes/Cmm.h)]], so these offests are available to the
    35    [wiki:Commentary/Rts/Cmm hand-written .cmm files].
     40 * '''Proc-point analysis''' and '''transformation''', implemented in `CmmProcPoint`. The transformation part adds a function prologue to the front of each proc-point, following a standard entry convention.
     41    * The analysis produces a set of `BlockId` that should become proc-points
     42    * The transformation inserts a function prologue at the start of each proc-point, and a function epilogue just before each branch to a proc-point.
    37  * The file `GHCConstants.h` contains similar definitions:
    38 {{{
    39 oFFSET_StgTSO_why_blocked = 18::Int
    40 }}}
    41   This time the definitions are in Haskell syntax, and this file is `#included` directly into
    42   [[GhcFile(compiler/main/Constants.lhs)]].  This is the way that these offsets are made
    43   available to GHC's code generator.
     44 * '''Remove dead assignments and stores''', implemented in `CmmLive`, removes assignments to dead variables and things like ``a = a`` or ``I32[Hp] = I32[Hp]``. The latter may more appropriately be done in a general optimization pass, as it doesn't take advantage of liveness information.
    45 == Generated Cmm Naming Convention ==
     46 * '''Figure out the stack layout''', implemented in `CmmStackLayout`.
     47   * Each variable 'x', and each proc-point label 'K', has an associated ''Area'', written SS(x) and SS(k) resp, that names a contiguous portion of the stack frame. 
     48   * The stack layout pass produces a mapping of: ''(`Area` -> `StackOffset`)''. For more detail, see [wiki:Commentary/Compiler/StackAreas#Layingoutthestack the description of stack layout.]
     49   * A `StackOffset` is the byte offset of a stack slot from the old end (high address) of the frame.  It doesn't vary as the physical stack pointer moves.
    47 See [[GhcFile(compiler/cmm/CLabel.hs)]]
     51 * '''Manifest the stack pointer''', implemented in `CmmStackLayout`.  Once the stack layout mapping has been determined, a second pass walks over the graph, making the stack pointer, `Sp` explicit. Before this pass, there is no `Sp` at all.  After this, `Sp` is completely manifest.
     52   * replacing references to `Areas` with offsets from `Sp`.
     53   * adding adjustments to `Sp`.
     55 * '''Split into multiple !CmmProcs''', implemented in `CmmProcPointZ`.  At this point we build an info-table for each of the !CmmProcs, including SRTs.  Done on the basis of the live local variables (by now mapped to stack slots) and live CAF statics.
     56   * `LastCall` and `LastReturn` nodes are replaced by `Jump`s.
    49 Labels generated by the code generator are of the form {{{<name>_<type>}}}
    50 where {{{<name>}}} is {{{<Module>_<name>}}} for external names and {{{<unique>}}} for
    51 internal names. {{{<type>}}} is one of the following:
     58 * '''Build info tables''', implemented in `CmmBuildInfoTables`.. 
     59   * Find each safe `MidForeignCall` node, "lowers" it into the suspend/call/resume sequence (see `Note [Foreign calls]` in `CmmNode.hs`.), and build an info table for them.
     60   * Convert the `CmmInfo` for each `CmmProc` into a `[CmmStatic]`, using the live variable information computed just before "Figure out stack layout". 
    53   info::                   Info table
    54   srt::                    Static reference table
    55   srtd::                   Static reference table descriptor
    56   entry::                  Entry code (function, closure)
    57   slow::                   Slow entry code (if any)
    58   ret::                    Direct return address   
    59   vtbl::                   Vector table
    60   ''n''_alt::              Case alternative (tag ''n'')
    61   dflt::                   Default case alternative
    62   btm::                    Large bitmap vector
    63   closure::                Static closure
    64   con_entry::              Dynamic Constructor entry code
    65   con_info::               Dynamic Constructor info table
    66   static_entry::           Static Constructor entry code
    67   static_info::            Static Constructor info table
    68   sel_info::               Selector info table
    69   sel_entry::              Selector entry code
    70   cc::                     Cost centre
    71   ccs::                    Cost centre stack
     62== Dumping Cmm ==
    73 Many of these distinctions are only for documentation reasons.  For
    74 example, _ret is only distinguished from _entry to make it easy to
    75 tell whether a code fragment is a return point or a closure/function
    76 entry.
     64== Hoopl ==
    78 == Modules ==
    79 === {{{CodeGen}}} ===
    80 Top level, only exports {{{codeGen}}}.
    82 Called from {{{HscMain}}} for each module that needs to be converted from Stg to Cmm.
    84 For each such module {{{codeGen}}} does three things:
    85  * {{{cgTopBinding}}} for the {{{StgBinding}}}
    86  * {{{cgTyCon}}} for the {{{TyCon}}} (These are constructors not constructor calls).
    87  * {{{mkModuleInit}}} for the module
    89 {{{mkModuleInit}}} generates several boilerplate initialization functions
    90 that:
    91  * regiser the module,
    92  * creates an Hpc table,
    93  * setup its profiling info ({{{InitConstCentres}}}, code coverage info {{{initHpc}}}), and
    94  * calls the initialization functions of the modules it imports.
    96 If neither SCC profiling or HPC are used,
    97 then the initialization code short circuits to return.
    99 If the module has already been initialized,
    100 the initialization function just returns.
    102 The {{{Ghc.TopHandler}}} and {{{Ghc.Prim}}} modules get special treatment.
    104 {{{cgTopBinding}}} is a small wrapper around {{{cgTopRhs}}}
    105 which in turn disptaches to:
    106  * {{{cgTopRhsCons}}} for {{{StgRhsCons}}}
    107    (these are bindings of constructor applications not constructors themselves) and
    108  * {{{cgTopRhsClosure}}} for {{{StgRhsClosure}}}.
    110 {{{cgTopRhsCons}}} and {{{cgTopRhsClosure}}} are located in {{{CgCon}}} and {{{CgClosure}}}
    111 which are the primary modules called by {{{CodeGen}}}.
    113 === {{{CgCon}}} ===
    114 TODO
    116 === {{{CgClosure}}} ===
    117 TODO
    119 === {{{CgMonad}}} ===
    120 The monad that most of codeGen operates inside
    121  * Reader
    122  * State
    123  * (could be Writer?)
    124  * fork
    125  * flatten
    127 === {{{CgExpr}}} ===
    128 Called by {{{CgClosure}}} and {{{CgCon}}}.
    130 Since everything in STG is an expression, almost everything branches off from here.
    132 This module exports only one function {{{cgExpr}}},
    133 which for the most part just dispatches
    134 to other functions to handle each specific constructor in {{{StgExpr}}}.
    136 Here are the core functions that each constructor is disptached to
    137 (though some may have little helper functions called in addition to the core function):
    138  {{{StgApp}}}:: Calls to {{{cgTailCall}}} in {{{CgTailCall}}}
    139  {{{StgConApp}}}:: Calls to {{{cgReturnDataCon}}} in {{{CgCon}}}
    140  {{{StgLit}}}::
    141    Calls to {{{cgLit}}} in {{{CgUtil}}}
    142     and {{{performPrimReturn}}} in {{{CgTailCall}}}
    143  {{{StgOpApp}}}::
    144    Is a bit more complicated see below.
    145  {{{StgCase}}}:: Calls to {{{cgCase}}} in {{{CgCase}}}
    146  {{{StgLet}}}:: Calls to {{{cgRhs}}} in {{{CgExpr}}}
    147  {{{StgLetNoEscape}}}::
    148    Calls to {{{cgLetNoEscapeBindings}}} in {{{CgExpr}}}, but with a little bit of wrapping
    149    by {{{nukeDeadBindings}}} and {{{saveVolatileVarsAndRegs}}}.
    150  {{{StgSCC}}}:: Calls to  {{{emitSetCCC}}} in {{{CgProf}}}
    151  {{{StgTick}}}:: Calls to {{{cgTickBox}}} in {{{CgHpc}}}
    152  {{{StgLam}}}::
    153    Does not have a case because it is only for {{{CoreToStg}}}'s work.
    155 Some of these cases call to functions defined in {{{cgExpr}}}.
    156 This is because they need a little bit of wrapping and processing
    157 before calling out to their main worker function.
    159  {{{cgRhs}}}::
    160  * For {{{StgRhsCon}}} calls out to {{{buildDynCon}}} in {{{CgCon}}}.
    161  * For {{{StgRhsClosure}}} calls out to {{{mkRhsClosure}}}.
    162    In turn, {{{mkRhsClosure}}} calls out to {{{cgStdRhsClosure}}} for selectors and thunks,
    163    and calls out to {{{cgRhsClosure}}} in the default case.
    164    Both these are defined in {{{CgClosure}}}.
    166  {{{cgLetNoEscapeBindings}}}::
    167  * Wraps a call to {{{cgLetNoEscapeRhs}}} with {{{addBindsC}}}
    168    depending on whether it is called on a recursive or a non-recursive binding.
    169    In turn {{{cgLetNoEscapeRhs}}} wraps {{{cgLetNoEscapeClosure}}}
    170    defined in {{{CgLetNoEscapeClosure}}}.
    172 {{{StgOpApp}}} has a number of sub-cases.
    173  * {{{StgFCallOp}}}
    174  * {{{StgPrimOp}}} of a !TagToEnumOp
    175  * {{{StgPrimOp}}} that is primOpOutOfLine
    176  * {{{StgPrimOp}}} that returns Void
    177  * {{{StgPrimOp}}} that returns a single primitive
    178  * {{{StgPrimOp}}} that returns an unboxed tuple
    179  * {{{StgPrimOp}}} that returns an enumeration type
    181 (It appears that non-foreign-call, inline [wiki:Commentary/PrimOps PrimOps] are not allowed to return complex data types (e.g. a |Maybe|), but this fact needs to be verified.)
    183 Each of these cases centers around one of these three core calls:
    184  * {{{emitForeignCall}}} in {{{CgForeignCall}}}
    185  * {{{tailCallPrimOp}}} in {{{CgTailCall}}}
    186  * {{{cgPrimOp}}} in {{{CgPrimOp}}}
    188 There is also a little bit of argument and return marshelling with the following functions
    189  Argument marshelling::
    190    {{{shimForeignCallArg}}}, {{{getArgAmods}}}
    191  Return marshelling::
    192    {{{dataReturnConvPrim}}}, {{{primRepToCgRep}}}, {{{newUnboxedTupleRegs}}}
    193  Performing the return::
    194    {{{emitReturnInstr}}}, {{{performReturn}}},
    195    {{{returnUnboxedTuple}}}, {{{ccallReturnUnboxedTuple}}}
    197 In summary the modules that get called in order to handle a specific expression case are:
    198 ==== Also called for top level bindings by {{{CodeGen}}} ====
    199  {{{CgCon}}}:: for {{{StgConApp}}} and the {{{StgRhsCon}}} part of {{{StgLet}}}
    200  {{{CgClosure}}}:: for the {{{StgRhsClosure}}} part of {{{StgLet}}}
    202 ==== Core code generation ====
    203  {{{CgTailCall}}}:: for {{{StgApp}}}, {{{StgLit}}}, and {{{StgOpApp}}}
    204  {{{CgPrimOp}}}:: for {{{StgOpApp}}}
    205  {{{CgLetNoEscapeClosure}}}:: for {{{StgLetNoEscape}}}
    206  {{{CgCase}}}:: for {{{StgCase}}}
    208 ==== Profiling and Code coverage related ====
    209  {{{CgProf}}}:: for {{{StgSCC}}}
    210  {{{CgHpc}}}:: for {{{StgTick}}}
    212 ==== Utility modules that happen to have the functions for code generation ====
    213  {{{CgForeignCall}}}:: for {{{StgOpApp}}}
    214  {{{CgUtil}}}:: for {{{cgLit}}}
    216 Note that the first two are
    217 the same modules that are called for top level bindings by {{{CodeGen}}},
    218 and the last two are really utility modules,
    219 but they happen to have the functions
    220 needed for those code generation cases.
    222 === Memory and Register Management ===
    223  {{{CgBindery}}}::
    224    Module for {{{CgBindings}}} which maps variable names
    225    to all the volitile or stable locations where they are stored
    226    (e.g. register, stack slot, computed from other expressions, etc.)
    227    Provides the {{{addBindC}}}, {{{modifyBindC}}} and {{{getCgIdInfo}}} functions
    228    for adding, modifying and looking up bindings.
    230  {{{CgStackery}}}::
    231    Mostly utility functions for allocating and freeing stack slots.
    232    But also has things on setting up update frames.
    234  {{{CgHeapery}}}::
    235    Functions for allocating objects that appear on the heap such as closures and constructors.
    236    Also includes code for stack and heap checks and {{{emitSetDynHdr}}}.
    238 === Function Calls and Parameter Passing ===
    239 (Note: these will largely go away once CPS conversion is fully implemented.)
    241  {{{CgPrimOp}}}, {{{CgTailCall}}}, {{{CgForeignCall}}}::
    242    Handle different types of calls.
    243  {{{CgCallConv}}}::
    244    Use by the others in this category to determine liveness and
    245    to select in what registers and stack locations arguments and return
    246    values get stored.
    248 === Misc utilities ===
    249  {{{Bitmap}}}::
    250    Utility functions for making bitmaps (e.g. {{{mkBitmap}}} with type {{{[Bool] -> Bitmap}}})
    251  {{{ClosureInfo}}}::
    252    Stores info about closures and bindings.
    253    Includes information about memory layout, how to call a binding ({{{LambdaFormInfo}}})
    254    and information used to build the info table ({{{ClosureInfo}}}).
    255  {{{SMRep}}}::
    256    Storage manager representation of closures.
    257    Part of !ClosureInfo but kept separate to "keep nhc happy."
    258  {{{CgUtils}}}:: TODO
    259  {{{CgInfoTbls}}}:: TODO
    261 === Special runtime support ===
    262  {{{CgTicky}}}:: Ticky-ticky profiling
    263  {{{CgProf}}}:: Cost-centre profiling
    264  {{{CgHpc}}}:: Support for the Haskell Program Coverage (hpc) toolkit, inside GHC.
    265  {{{CgParallel}}}::
    266    Code generation for !GranSim (GRAN) and parallel (PAR).
    267    All the functions are dead stubs except {{{granYield}}} and {{{granFetchAndReschedule}}}.
     66write about Hoopl (link paper, mention which modules are implemented with Hoopl)