Changes between Initial Version and Version 1 of Commentary/Compiler/OldCodeGen

Feb 1, 2014 12:51:14 PM (5 years ago)



  • Commentary/Compiler/OldCodeGen

    v1 v1  
     1= Old Code Generator (prior to GHC 7.8) =
     3Material below describes old code generator that was used up to GHC 7.6 and was retired in 2012. This page is not maintained and is here only for historical purposes. See [wiki:Commentary/Compiler/CodeGen Code generator] page for an up to date description of the current code generator.
     5== Storage manager representations ==
     7See [wiki:Commentary/Rts/Storage The Storage Manager] for the [wiki:Commentary/Rts/Storage/Stack Layout of the stack].
     9The code generator needs to know the layout of heap objects, because it generates code that accesses and constructs those heap objects.  The runtime also needs to know about the layout of heap objects, because it contains the garbage collector.  How can we share the definition of storage layout such that the code generator and the runtime both have access to it, and so that we don't have to keep two independent definitions in sync?
     11Currently we solve the problem this way:
     13 * C types representing heap objects are defined in the C header files, see for example [[GhcFile(includes/rts/storage/Closures.h)]].
     15 * A C program, [[GhcFile(includes/mkDerivedConstants.c)]],  `#includes` the runtime headers.
     16   This program is built and run when you type `make` or `make boot` in `includes/`.  It is
     17   run twice: once to generate `includes/DerivedConstants.h`, and again to generate
     18   `includes/GHCConstants.h`.
     20 * The file `DerivedConstants.h` contains lots of `#defines` like this:
     22#define OFFSET_StgTSO_why_blocked 18
     24   which says that the offset to the why_blocked field of an `StgTSO` is 18 bytes.  This file
     25   is `#included` into [[GhcFile(includes/Cmm.h)]], so these offests are available to the
     26   [wiki:Commentary/Rts/Cmm hand-written .cmm files].
     28 * The file `GHCConstants.h` contains similar definitions:
     30oFFSET_StgTSO_why_blocked = 18::Int
     32  This time the definitions are in Haskell syntax, and this file is `#included` directly into
     33  [[GhcFile(compiler/main/Constants.lhs)]].  This is the way that these offsets are made
     34  available to GHC's code generator.
     36== Generated Cmm Naming Convention ==
     38See [[GhcFile(compiler/cmm/CLabel.hs)]]
     40Labels generated by the code generator are of the form {{{<name>_<type>}}}
     41where {{{<name>}}} is {{{<Module>_<name>}}} for external names and {{{<unique>}}} for
     42internal names. {{{<type>}}} is one of the following:
     44  info::                   Info table
     45  srt::                    Static reference table
     46  srtd::                   Static reference table descriptor
     47  entry::                  Entry code (function, closure)
     48  slow::                   Slow entry code (if any)
     49  ret::                    Direct return address   
     50  vtbl::                   Vector table
     51  ''n''_alt::              Case alternative (tag ''n'')
     52  dflt::                   Default case alternative
     53  btm::                    Large bitmap vector
     54  closure::                Static closure
     55  con_entry::              Dynamic Constructor entry code
     56  con_info::               Dynamic Constructor info table
     57  static_entry::           Static Constructor entry code
     58  static_info::            Static Constructor info table
     59  sel_info::               Selector info table
     60  sel_entry::              Selector entry code
     61  cc::                     Cost centre
     62  ccs::                    Cost centre stack
     64Many of these distinctions are only for documentation reasons.  For
     65example, _ret is only distinguished from _entry to make it easy to
     66tell whether a code fragment is a return point or a closure/function
     69== Modules ==
     70=== {{{CodeGen}}} ===
     71Top level, only exports {{{codeGen}}}.
     73Called from {{{HscMain}}} for each module that needs to be converted from Stg to Cmm.
     75For each such module {{{codeGen}}} does three things:
     76 * {{{cgTopBinding}}} for the {{{StgBinding}}}
     77 * {{{cgTyCon}}} for the {{{TyCon}}} (These are constructors not constructor calls).
     78 * {{{mkModuleInit}}} for the module
     80{{{mkModuleInit}}} generates several boilerplate initialization functions
     82 * regiser the module,
     83 * creates an Hpc table,
     84 * setup its profiling info ({{{InitConstCentres}}}, code coverage info {{{initHpc}}}), and
     85 * calls the initialization functions of the modules it imports.
     87If neither SCC profiling or HPC are used,
     88then the initialization code short circuits to return.
     90If the module has already been initialized,
     91the initialization function just returns.
     93The {{{Ghc.TopHandler}}} and {{{Ghc.Prim}}} modules get special treatment.
     95{{{cgTopBinding}}} is a small wrapper around {{{cgTopRhs}}}
     96which in turn disptaches to:
     97 * {{{cgTopRhsCons}}} for {{{StgRhsCons}}}
     98   (these are bindings of constructor applications not constructors themselves) and
     99 * {{{cgTopRhsClosure}}} for {{{StgRhsClosure}}}.
     101{{{cgTopRhsCons}}} and {{{cgTopRhsClosure}}} are located in {{{CgCon}}} and {{{CgClosure}}}
     102which are the primary modules called by {{{CodeGen}}}.
     104=== {{{CgCon}}} ===
     107=== {{{CgClosure}}} ===
     110=== {{{CgMonad}}} ===
     111The monad that most of codeGen operates inside
     112 * Reader
     113 * State
     114 * (could be Writer?)
     115 * fork
     116 * flatten
     118=== {{{CgExpr}}} ===
     119Called by {{{CgClosure}}} and {{{CgCon}}}.
     121Since everything in STG is an expression, almost everything branches off from here.
     123This module exports only one function {{{cgExpr}}},
     124which for the most part just dispatches
     125to other functions to handle each specific constructor in {{{StgExpr}}}.
     127Here are the core functions that each constructor is disptached to
     128(though some may have little helper functions called in addition to the core function):
     129 {{{StgApp}}}:: Calls to {{{cgTailCall}}} in {{{CgTailCall}}}
     130 {{{StgConApp}}}:: Calls to {{{cgReturnDataCon}}} in {{{CgCon}}}
     131 {{{StgLit}}}::
     132   Calls to {{{cgLit}}} in {{{CgUtil}}}
     133    and {{{performPrimReturn}}} in {{{CgTailCall}}}
     134 {{{StgOpApp}}}::
     135   Is a bit more complicated see below.
     136 {{{StgCase}}}:: Calls to {{{cgCase}}} in {{{CgCase}}}
     137 {{{StgLet}}}:: Calls to {{{cgRhs}}} in {{{CgExpr}}}
     138 {{{StgLetNoEscape}}}::
     139   Calls to {{{cgLetNoEscapeBindings}}} in {{{CgExpr}}}, but with a little bit of wrapping
     140   by {{{nukeDeadBindings}}} and {{{saveVolatileVarsAndRegs}}}.
     141 {{{StgSCC}}}:: Calls to  {{{emitSetCCC}}} in {{{CgProf}}}
     142 {{{StgTick}}}:: Calls to {{{cgTickBox}}} in {{{CgHpc}}}
     143 {{{StgLam}}}::
     144   Does not have a case because it is only for {{{CoreToStg}}}'s work.
     146Some of these cases call to functions defined in {{{cgExpr}}}.
     147This is because they need a little bit of wrapping and processing
     148before calling out to their main worker function.
     150 {{{cgRhs}}}::
     151 * For {{{StgRhsCon}}} calls out to {{{buildDynCon}}} in {{{CgCon}}}.
     152 * For {{{StgRhsClosure}}} calls out to {{{mkRhsClosure}}}.
     153   In turn, {{{mkRhsClosure}}} calls out to {{{cgStdRhsClosure}}} for selectors and thunks,
     154   and calls out to {{{cgRhsClosure}}} in the default case.
     155   Both these are defined in {{{CgClosure}}}.
     157 {{{cgLetNoEscapeBindings}}}::
     158 * Wraps a call to {{{cgLetNoEscapeRhs}}} with {{{addBindsC}}}
     159   depending on whether it is called on a recursive or a non-recursive binding.
     160   In turn {{{cgLetNoEscapeRhs}}} wraps {{{cgLetNoEscapeClosure}}}
     161   defined in {{{CgLetNoEscapeClosure}}}.
     163{{{StgOpApp}}} has a number of sub-cases.
     164 * {{{StgFCallOp}}}
     165 * {{{StgPrimOp}}} of a !TagToEnumOp
     166 * {{{StgPrimOp}}} that is primOpOutOfLine
     167 * {{{StgPrimOp}}} that returns Void
     168 * {{{StgPrimOp}}} that returns a single primitive
     169 * {{{StgPrimOp}}} that returns an unboxed tuple
     170 * {{{StgPrimOp}}} that returns an enumeration type
     172(It appears that non-foreign-call, inline [wiki:Commentary/PrimOps PrimOps] are not allowed to return complex data types (e.g. a |Maybe|), but this fact needs to be verified.)
     174Each of these cases centers around one of these three core calls:
     175 * {{{emitForeignCall}}} in {{{CgForeignCall}}}
     176 * {{{tailCallPrimOp}}} in {{{CgTailCall}}}
     177 * {{{cgPrimOp}}} in {{{CgPrimOp}}}
     179There is also a little bit of argument and return marshelling with the following functions
     180 Argument marshelling::
     181   {{{shimForeignCallArg}}}, {{{getArgAmods}}}
     182 Return marshelling::
     183   {{{dataReturnConvPrim}}}, {{{primRepToCgRep}}}, {{{newUnboxedTupleRegs}}}
     184 Performing the return::
     185   {{{emitReturnInstr}}}, {{{performReturn}}},
     186   {{{returnUnboxedTuple}}}, {{{ccallReturnUnboxedTuple}}}
     188In summary the modules that get called in order to handle a specific expression case are:
     189==== Also called for top level bindings by {{{CodeGen}}} ====
     190 {{{CgCon}}}:: for {{{StgConApp}}} and the {{{StgRhsCon}}} part of {{{StgLet}}}
     191 {{{CgClosure}}}:: for the {{{StgRhsClosure}}} part of {{{StgLet}}}
     193==== Core code generation ====
     194 {{{CgTailCall}}}:: for {{{StgApp}}}, {{{StgLit}}}, and {{{StgOpApp}}}
     195 {{{CgPrimOp}}}:: for {{{StgOpApp}}}
     196 {{{CgLetNoEscapeClosure}}}:: for {{{StgLetNoEscape}}}
     197 {{{CgCase}}}:: for {{{StgCase}}}
     199==== Profiling and Code coverage related ====
     200 {{{CgProf}}}:: for {{{StgSCC}}}
     201 {{{CgHpc}}}:: for {{{StgTick}}}
     203==== Utility modules that happen to have the functions for code generation ====
     204 {{{CgForeignCall}}}:: for {{{StgOpApp}}}
     205 {{{CgUtil}}}:: for {{{cgLit}}}
     207Note that the first two are
     208the same modules that are called for top level bindings by {{{CodeGen}}},
     209and the last two are really utility modules,
     210but they happen to have the functions
     211needed for those code generation cases.
     213=== Memory and Register Management ===
     214 {{{CgBindery}}}::
     215   Module for {{{CgBindings}}} which maps variable names
     216   to all the volitile or stable locations where they are stored
     217   (e.g. register, stack slot, computed from other expressions, etc.)
     218   Provides the {{{addBindC}}}, {{{modifyBindC}}} and {{{getCgIdInfo}}} functions
     219   for adding, modifying and looking up bindings.
     221 {{{CgStackery}}}::
     222   Mostly utility functions for allocating and freeing stack slots.
     223   But also has things on setting up update frames.
     225 {{{CgHeapery}}}::
     226   Functions for allocating objects that appear on the heap such as closures and constructors.
     227   Also includes code for stack and heap checks and {{{emitSetDynHdr}}}.
     229=== Function Calls and Parameter Passing ===
     230(Note: these will largely go away once CPS conversion is fully implemented.)
     232 {{{CgPrimOp}}}, {{{CgTailCall}}}, {{{CgForeignCall}}}::
     233   Handle different types of calls.
     234 {{{CgCallConv}}}::
     235   Use by the others in this category to determine liveness and
     236   to select in what registers and stack locations arguments and return
     237   values get stored.
     239=== Misc utilities ===
     240 {{{Bitmap}}}::
     241   Utility functions for making bitmaps (e.g. {{{mkBitmap}}} with type {{{[Bool] -> Bitmap}}})
     242 {{{ClosureInfo}}}::
     243   Stores info about closures and bindings.
     244   Includes information about memory layout, how to call a binding ({{{LambdaFormInfo}}})
     245   and information used to build the info table ({{{ClosureInfo}}}).
     246 {{{SMRep}}}::
     247   Storage manager representation of closures.
     248   Part of !ClosureInfo but kept separate to "keep nhc happy."
     249 {{{CgUtils}}}:: TODO
     250 {{{CgInfoTbls}}}:: TODO
     252=== Special runtime support ===
     253 {{{CgTicky}}}:: Ticky-ticky profiling
     254 {{{CgProf}}}:: Cost-centre profiling
     255 {{{CgHpc}}}:: Support for the Haskell Program Coverage (hpc) toolkit, inside GHC.
     256 {{{CgParallel}}}::
     257   Code generation for !GranSim (GRAN) and parallel (PAR).
     258   All the functions are dead stubs except {{{granYield}}} and {{{granFetchAndReschedule}}}.