Changes between Version 23 and Version 24 of Commentary/Compiler/CodeGen

Feb 1, 2014 3:51:46 PM (17 months ago)

A whole new description of code generator (as implemented by GHC 7.8)


  • Commentary/Compiler/CodeGen

    v23 v24  
     1= Code Generator = 
     3This page describes code generator ("codegen") in GHC. It is meant to reflect current state of the implementation. If you notice any inacurracies please update the page (if you know how) or complain on ghc-devs. 
    3 = GHC Commentary: The Code Generator = 
     5== A brief history of code generator == 
    5 The material below describes the old code generator, which was retired in 2012.  New stuff is here: 
    6   * [wiki:Commentary/Compiler/CPS Michael Adams CPS conversion] 
    7   * [wiki:Commentary/Compiler/NewCodeGen New code generator] 
     7You might ocasionally hear about "old" and "new" code generator. GHC 7.6 and earlier used the old code generator. New code generator was being developed since 2007 and it was [changeset:832077ca5393d298324cb6b0a2cb501e27209768/ghc enabled by default on 31 August 2012] after the release of GHC 7.6.1. The first stable GHC to use the new code generator is 7.8.1 released in early 2014. The commentary on the old code generator can be found [wiki:Commentary/Compiler/OldCodeGen here]. Notes from the development process of the new code generator are located in a couple of pages on the wiki - go to [wiki:TitleIndex Index] and look for pages starting with "!NewCodeGen". 
    9 The code generator lives in [[GhcFile(compiler/codeGen)]] 
     9There are some plans for the future development of code generator. One plan is to expand the capability of the pipeline so that it does native code generation too so that existing backends can be discarded - see [wiki:Commentary/Compiler/IntegratedCodeGen IntegratedCodeGen] for discussion of the design. It is hard to say if this will ever happen as currently there is no work being done on that subject and in the meanwhile there was an alternative proposal to [wiki:Commentary/Compiler/Backends/LLVM/ReplacingNCG replace native code generator with LLVM]. 
    11 The stuff below is probably out of date, although there were some similarities between the old and new code-generator. 
     11== Overview == 
     13The goal of the code generator is to convert program from [wiki:Commentary/Compiler/GeneratedCode STG] representation to [wiki:Commentary/Compiler/CmmType Cmm] representation. STG is a functional language with explicit stack. Cmm is a low-level imperative language - something between C and assembly - that is suitable for machine code generation. Note that terminology might be a bit confusing here: the term "code generator" refers to STG->Cmm pass but it may also be used to refer to the Cmm->assembly pass. Referring to the latter as the "backend" eliminates the confusion. 
    14 == Storage manager representations == 
     15The top-most entry point to the codegen is located in [[GhcFile(compiler/main/HscMain.hs)]] in the `tryNewCodegen` function. Code generation now has three stages: 
     16  1. Convert STG to Cmm with implicit stack, and native Cmm calls. This whole stage lives in [[GhcFile(compiler/codeGen)]] directory with the entry point being `codeGen` function in [[GhcFile(compiler/codeGen/StgCmm.hs)]] module. 
     17  2. Optimise the Cmm, and CPS-convert it to have an explicit stack, and no native calls. This lives in [[GhcFile(compiler/cmm)]] directory with the `cmmPipeline` function from [[GhcFile(compiler/cmm/CmmPipeline.hs)]] module being the entry point. This is described below in full detail. 
     18  3. Feed the CPS-converted Cmm to the existing, unmodified native code generators. This is done by `codeOutput` function ([[GhcFile(compiler/main/CodeOutput.lhs)]] called from `hscGenHardCode` after returning from `tryNewCodegen`. 
    16 See [wiki:Commentary/Rts/Storage The Storage Manager] for the [wiki:Commentary/Rts/Storage/Stack Layout of the stack]. 
     20== Structure of Cmm pipeline == 
    18 The code generator needs to know the layout of heap objects, because it generates code that accesses and constructs those heap objects.  The runtime also needs to know about the layout of heap objects, because it contains the garbage collector.  How can we share the definition of storage layout such that the code generator and the runtime both have access to it, and so that we don't have to keep two independent definitions in sync? 
     22The first two steps are described in more detail here: 
    20 Currently we solve the problem this way: 
     24 * '''Code generator''' converts STG to `CmmGraph`.  Implemented in `StgCmm*` modules (in directory `codeGen`).  
     25   * `Cmm.CmmGraph` is pretty much a Hoopl graph of `CmmNode.CmmNode` nodes. Control transfer instructions are always the last node of a basic block. 
     26   * Parameter passing is made explicit; the calling convention depends on the target architecture.  The key function is `CmmCallConv.assignArgumentsPos`.  
     27     * Parameters are passed in virtual registers R1, R2 etc. [These map 1-1 to real registers.]  
     28     * Overflow parameters are passed on the stack using explicit memory stores, to locations described abstractly using the [wiki:Commentary/Compiler/StackAreas ''Stack Area'' abstraction.].    
     29     * Making the calling convention explicit includes an explicit store instruction of the return address, which is stored explicitly on the stack in the same way as overflow parameters. This is done (obscurely) in `MkGraph.mkCall`. 
    22  * C types representing heap objects are defined in the C header files, see for example [[GhcFile(includes/rts/storage/Closures.h)]]. 
     31 * '''Simple control flow optimisation''', implemented in `CmmContFlowOpt`.  It's called both at the beginning and end of the pipeline. 
     32   * Branch chain elimination. 
     33   * Remove unreachable blocks. 
     34   * Block concatenation.  branch to K; and this is the only use of K.   
    24  * A C program, [[GhcFile(includes/mkDerivedConstants.c)]],  `#includes` the runtime headers. 
    25    This program is built and run when you type `make` or `make boot` in `includes/`.  It is 
    26    run twice: once to generate `includes/DerivedConstants.h`, and again to generate  
    27    `includes/GHCConstants.h`. 
     36 * '''More control flow optimisations'''. 
     37   * Common Block Elimination (like CSE). This essentially implements the Adams optimisation, we believe. 
     38   * Consider (sometime): block duplication.  branch to K; and K is a short block.  Branch chain elimination is just a special case of this. 
    29  * The file `DerivedConstants.h` contains lots of `#defines` like this: 
    30 {{{ 
    31 #define OFFSET_StgTSO_why_blocked 18 
    32 }}} 
    33    which says that the offset to the why_blocked field of an `StgTSO` is 18 bytes.  This file 
    34    is `#included` into [[GhcFile(includes/Cmm.h)]], so these offests are available to the 
    35    [wiki:Commentary/Rts/Cmm hand-written .cmm files]. 
     40 * '''Proc-point analysis''' and '''transformation''', implemented in `CmmProcPoint`. The transformation part adds a function prologue to the front of each proc-point, following a standard entry convention. 
     41    * The analysis produces a set of `BlockId` that should become proc-points 
     42    * The transformation inserts a function prologue at the start of each proc-point, and a function epilogue just before each branch to a proc-point. 
    37  * The file `GHCConstants.h` contains similar definitions: 
    38 {{{ 
    39 oFFSET_StgTSO_why_blocked = 18::Int 
    40 }}} 
    41   This time the definitions are in Haskell syntax, and this file is `#included` directly into 
    42   [[GhcFile(compiler/main/Constants.lhs)]].  This is the way that these offsets are made 
    43   available to GHC's code generator. 
     44 * '''Remove dead assignments and stores''', implemented in `CmmLive`, removes assignments to dead variables and things like ``a = a`` or ``I32[Hp] = I32[Hp]``. The latter may more appropriately be done in a general optimization pass, as it doesn't take advantage of liveness information. 
    45 == Generated Cmm Naming Convention == 
     46 * '''Figure out the stack layout''', implemented in `CmmStackLayout`. 
     47   * Each variable 'x', and each proc-point label 'K', has an associated ''Area'', written SS(x) and SS(k) resp, that names a contiguous portion of the stack frame.   
     48   * The stack layout pass produces a mapping of: ''(`Area` -> `StackOffset`)''. For more detail, see [wiki:Commentary/Compiler/StackAreas#Layingoutthestack the description of stack layout.] 
     49   * A `StackOffset` is the byte offset of a stack slot from the old end (high address) of the frame.  It doesn't vary as the physical stack pointer moves. 
    47 See [[GhcFile(compiler/cmm/CLabel.hs)]] 
     51 * '''Manifest the stack pointer''', implemented in `CmmStackLayout`.  Once the stack layout mapping has been determined, a second pass walks over the graph, making the stack pointer, `Sp` explicit. Before this pass, there is no `Sp` at all.  After this, `Sp` is completely manifest. 
     52   * replacing references to `Areas` with offsets from `Sp`. 
     53   * adding adjustments to `Sp`. 
     55 * '''Split into multiple !CmmProcs''', implemented in `CmmProcPointZ`.  At this point we build an info-table for each of the !CmmProcs, including SRTs.  Done on the basis of the live local variables (by now mapped to stack slots) and live CAF statics. 
     56   * `LastCall` and `LastReturn` nodes are replaced by `Jump`s. 
    49 Labels generated by the code generator are of the form {{{<name>_<type>}}} 
    50 where {{{<name>}}} is {{{<Module>_<name>}}} for external names and {{{<unique>}}} for 
    51 internal names. {{{<type>}}} is one of the following: 
     58 * '''Build info tables''', implemented in `CmmBuildInfoTables`..   
     59   * Find each safe `MidForeignCall` node, "lowers" it into the suspend/call/resume sequence (see `Note [Foreign calls]` in `CmmNode.hs`.), and build an info table for them. 
     60   * Convert the `CmmInfo` for each `CmmProc` into a `[CmmStatic]`, using the live variable information computed just before "Figure out stack layout".   
    53   info::                   Info table 
    54   srt::                    Static reference table 
    55   srtd::                   Static reference table descriptor 
    56   entry::                  Entry code (function, closure) 
    57   slow::                   Slow entry code (if any) 
    58   ret::                    Direct return address     
    59   vtbl::                   Vector table 
    60   ''n''_alt::              Case alternative (tag ''n'') 
    61   dflt::                   Default case alternative 
    62   btm::                    Large bitmap vector 
    63   closure::                Static closure 
    64   con_entry::              Dynamic Constructor entry code 
    65   con_info::               Dynamic Constructor info table 
    66   static_entry::           Static Constructor entry code 
    67   static_info::            Static Constructor info table 
    68   sel_info::               Selector info table 
    69   sel_entry::              Selector entry code 
    70   cc::                     Cost centre 
    71   ccs::                    Cost centre stack 
     62== Dumping Cmm == 
    73 Many of these distinctions are only for documentation reasons.  For 
    74 example, _ret is only distinguished from _entry to make it easy to 
    75 tell whether a code fragment is a return point or a closure/function 
    76 entry. 
     64== Hoopl == 
    78 == Modules == 
    79 === {{{CodeGen}}} === 
    80 Top level, only exports {{{codeGen}}}. 
    82 Called from {{{HscMain}}} for each module that needs to be converted from Stg to Cmm. 
    84 For each such module {{{codeGen}}} does three things: 
    85  * {{{cgTopBinding}}} for the {{{StgBinding}}} 
    86  * {{{cgTyCon}}} for the {{{TyCon}}} (These are constructors not constructor calls). 
    87  * {{{mkModuleInit}}} for the module 
    89 {{{mkModuleInit}}} generates several boilerplate initialization functions 
    90 that: 
    91  * regiser the module, 
    92  * creates an Hpc table, 
    93  * setup its profiling info ({{{InitConstCentres}}}, code coverage info {{{initHpc}}}), and 
    94  * calls the initialization functions of the modules it imports. 
    96 If neither SCC profiling or HPC are used, 
    97 then the initialization code short circuits to return. 
    99 If the module has already been initialized, 
    100 the initialization function just returns. 
    102 The {{{Ghc.TopHandler}}} and {{{Ghc.Prim}}} modules get special treatment. 
    104 {{{cgTopBinding}}} is a small wrapper around {{{cgTopRhs}}} 
    105 which in turn disptaches to: 
    106  * {{{cgTopRhsCons}}} for {{{StgRhsCons}}} 
    107    (these are bindings of constructor applications not constructors themselves) and 
    108  * {{{cgTopRhsClosure}}} for {{{StgRhsClosure}}}. 
    110 {{{cgTopRhsCons}}} and {{{cgTopRhsClosure}}} are located in {{{CgCon}}} and {{{CgClosure}}} 
    111 which are the primary modules called by {{{CodeGen}}}. 
    113 === {{{CgCon}}} === 
    114 TODO 
    116 === {{{CgClosure}}} === 
    117 TODO 
    119 === {{{CgMonad}}} === 
    120 The monad that most of codeGen operates inside 
    121  * Reader 
    122  * State 
    123  * (could be Writer?) 
    124  * fork 
    125  * flatten 
    127 === {{{CgExpr}}} === 
    128 Called by {{{CgClosure}}} and {{{CgCon}}}. 
    130 Since everything in STG is an expression, almost everything branches off from here. 
    132 This module exports only one function {{{cgExpr}}}, 
    133 which for the most part just dispatches 
    134 to other functions to handle each specific constructor in {{{StgExpr}}}. 
    136 Here are the core functions that each constructor is disptached to 
    137 (though some may have little helper functions called in addition to the core function): 
    138  {{{StgApp}}}:: Calls to {{{cgTailCall}}} in {{{CgTailCall}}} 
    139  {{{StgConApp}}}:: Calls to {{{cgReturnDataCon}}} in {{{CgCon}}} 
    140  {{{StgLit}}}:: 
    141    Calls to {{{cgLit}}} in {{{CgUtil}}} 
    142     and {{{performPrimReturn}}} in {{{CgTailCall}}} 
    143  {{{StgOpApp}}}:: 
    144    Is a bit more complicated see below. 
    145  {{{StgCase}}}:: Calls to {{{cgCase}}} in {{{CgCase}}} 
    146  {{{StgLet}}}:: Calls to {{{cgRhs}}} in {{{CgExpr}}} 
    147  {{{StgLetNoEscape}}}:: 
    148    Calls to {{{cgLetNoEscapeBindings}}} in {{{CgExpr}}}, but with a little bit of wrapping 
    149    by {{{nukeDeadBindings}}} and {{{saveVolatileVarsAndRegs}}}. 
    150  {{{StgSCC}}}:: Calls to  {{{emitSetCCC}}} in {{{CgProf}}} 
    151  {{{StgTick}}}:: Calls to {{{cgTickBox}}} in {{{CgHpc}}} 
    152  {{{StgLam}}}:: 
    153    Does not have a case because it is only for {{{CoreToStg}}}'s work. 
    155 Some of these cases call to functions defined in {{{cgExpr}}}. 
    156 This is because they need a little bit of wrapping and processing 
    157 before calling out to their main worker function. 
    159  {{{cgRhs}}}:: 
    160  * For {{{StgRhsCon}}} calls out to {{{buildDynCon}}} in {{{CgCon}}}. 
    161  * For {{{StgRhsClosure}}} calls out to {{{mkRhsClosure}}}. 
    162    In turn, {{{mkRhsClosure}}} calls out to {{{cgStdRhsClosure}}} for selectors and thunks, 
    163    and calls out to {{{cgRhsClosure}}} in the default case. 
    164    Both these are defined in {{{CgClosure}}}. 
    166  {{{cgLetNoEscapeBindings}}}:: 
    167  * Wraps a call to {{{cgLetNoEscapeRhs}}} with {{{addBindsC}}} 
    168    depending on whether it is called on a recursive or a non-recursive binding. 
    169    In turn {{{cgLetNoEscapeRhs}}} wraps {{{cgLetNoEscapeClosure}}} 
    170    defined in {{{CgLetNoEscapeClosure}}}. 
    172 {{{StgOpApp}}} has a number of sub-cases. 
    173  * {{{StgFCallOp}}} 
    174  * {{{StgPrimOp}}} of a !TagToEnumOp 
    175  * {{{StgPrimOp}}} that is primOpOutOfLine 
    176  * {{{StgPrimOp}}} that returns Void 
    177  * {{{StgPrimOp}}} that returns a single primitive 
    178  * {{{StgPrimOp}}} that returns an unboxed tuple 
    179  * {{{StgPrimOp}}} that returns an enumeration type 
    181 (It appears that non-foreign-call, inline [wiki:Commentary/PrimOps PrimOps] are not allowed to return complex data types (e.g. a |Maybe|), but this fact needs to be verified.) 
    183 Each of these cases centers around one of these three core calls: 
    184  * {{{emitForeignCall}}} in {{{CgForeignCall}}} 
    185  * {{{tailCallPrimOp}}} in {{{CgTailCall}}} 
    186  * {{{cgPrimOp}}} in {{{CgPrimOp}}} 
    188 There is also a little bit of argument and return marshelling with the following functions 
    189  Argument marshelling:: 
    190    {{{shimForeignCallArg}}}, {{{getArgAmods}}} 
    191  Return marshelling:: 
    192    {{{dataReturnConvPrim}}}, {{{primRepToCgRep}}}, {{{newUnboxedTupleRegs}}} 
    193  Performing the return:: 
    194    {{{emitReturnInstr}}}, {{{performReturn}}}, 
    195    {{{returnUnboxedTuple}}}, {{{ccallReturnUnboxedTuple}}} 
    197 In summary the modules that get called in order to handle a specific expression case are: 
    198 ==== Also called for top level bindings by {{{CodeGen}}} ==== 
    199  {{{CgCon}}}:: for {{{StgConApp}}} and the {{{StgRhsCon}}} part of {{{StgLet}}} 
    200  {{{CgClosure}}}:: for the {{{StgRhsClosure}}} part of {{{StgLet}}} 
    202 ==== Core code generation ==== 
    203  {{{CgTailCall}}}:: for {{{StgApp}}}, {{{StgLit}}}, and {{{StgOpApp}}} 
    204  {{{CgPrimOp}}}:: for {{{StgOpApp}}} 
    205  {{{CgLetNoEscapeClosure}}}:: for {{{StgLetNoEscape}}} 
    206  {{{CgCase}}}:: for {{{StgCase}}} 
    208 ==== Profiling and Code coverage related ==== 
    209  {{{CgProf}}}:: for {{{StgSCC}}} 
    210  {{{CgHpc}}}:: for {{{StgTick}}} 
    212 ==== Utility modules that happen to have the functions for code generation ==== 
    213  {{{CgForeignCall}}}:: for {{{StgOpApp}}} 
    214  {{{CgUtil}}}:: for {{{cgLit}}} 
    216 Note that the first two are 
    217 the same modules that are called for top level bindings by {{{CodeGen}}}, 
    218 and the last two are really utility modules, 
    219 but they happen to have the functions 
    220 needed for those code generation cases. 
    222 === Memory and Register Management === 
    223  {{{CgBindery}}}:: 
    224    Module for {{{CgBindings}}} which maps variable names 
    225    to all the volitile or stable locations where they are stored 
    226    (e.g. register, stack slot, computed from other expressions, etc.) 
    227    Provides the {{{addBindC}}}, {{{modifyBindC}}} and {{{getCgIdInfo}}} functions 
    228    for adding, modifying and looking up bindings. 
    230  {{{CgStackery}}}:: 
    231    Mostly utility functions for allocating and freeing stack slots. 
    232    But also has things on setting up update frames. 
    234  {{{CgHeapery}}}:: 
    235    Functions for allocating objects that appear on the heap such as closures and constructors. 
    236    Also includes code for stack and heap checks and {{{emitSetDynHdr}}}. 
    238 === Function Calls and Parameter Passing === 
    239 (Note: these will largely go away once CPS conversion is fully implemented.) 
    241  {{{CgPrimOp}}}, {{{CgTailCall}}}, {{{CgForeignCall}}}:: 
    242    Handle different types of calls. 
    243  {{{CgCallConv}}}:: 
    244    Use by the others in this category to determine liveness and 
    245    to select in what registers and stack locations arguments and return 
    246    values get stored. 
    248 === Misc utilities === 
    249  {{{Bitmap}}}:: 
    250    Utility functions for making bitmaps (e.g. {{{mkBitmap}}} with type {{{[Bool] -> Bitmap}}}) 
    251  {{{ClosureInfo}}}:: 
    252    Stores info about closures and bindings. 
    253    Includes information about memory layout, how to call a binding ({{{LambdaFormInfo}}}) 
    254    and information used to build the info table ({{{ClosureInfo}}}). 
    255  {{{SMRep}}}:: 
    256    Storage manager representation of closures. 
    257    Part of !ClosureInfo but kept separate to "keep nhc happy." 
    258  {{{CgUtils}}}:: TODO 
    259  {{{CgInfoTbls}}}:: TODO 
    261 === Special runtime support === 
    262  {{{CgTicky}}}:: Ticky-ticky profiling 
    263  {{{CgProf}}}:: Cost-centre profiling 
    264  {{{CgHpc}}}:: Support for the Haskell Program Coverage (hpc) toolkit, inside GHC. 
    265  {{{CgParallel}}}:: 
    266    Code generation for !GranSim (GRAN) and parallel (PAR). 
    267    All the functions are dead stubs except {{{granYield}}} and {{{granFetchAndReschedule}}}. 
     66write about Hoopl (link paper, mention which modules are implemented with Hoopl)