wiki:Commentary/Compiler/CodeGen

Version 11 (modified by guest, 7 years ago) (diff)

Info about CgStackery

GHC Commentary: The Code Generator

compiler/codeGen

See The Storage Manager for the Layout of the stack.

Storage manager representations

The code generator needs to know the layout of heap objects, because it generates code that accesses and constructs those heap objects. The runtime also needs to know about the layout of heap objects, because it contains the garbage collector. How can we share the definition of storage layout such that the code generator and the runtime both have access to it, and so that we don't have to keep two independent definitions in sync?

Currently we solve the problem this way:

  • C types representing heap objects are defined in the C header files, see for example includes/Closures.h.
  • A C program, includes/mkDerivedConstants.c, #includes the runtime headers. This program is built and run when you type make or make boot in includes/. It is run twice: once to generate includes\DerivedConstants.h, and again to generate includes/GHCConstants.h.
  • The file DerivedConstants.h contains lots of #defines like this:
    #define OFFSET_StgTSO_why_blocked 18
    
    which says that the offset to the why_blocked field of an StgTSO is 18 bytes. This file is #included into includes/Cmm.h, so these offests are available to the hand-written .cmm files.
  • The file GHCConstants.h contains similar definitions:
    oFFSET_StgTSO_why_blocked = 18::Int
    
    This time the definitions are in Haskell syntax, and this file is #included directly into compiler/main/Constants.lhs. This is the way that these offsets are made available to GHC's code generator.

Generated Cmm Naming Convention

See compiler/cmm/CLabel.hs

Labels generated by the code generator are of the form <name>_<type> where <name> is <Module>_<name> for external names and <unique> for internal names. <type> is one of the following:

info
Info table
srt
Static reference table
srtd
Static reference table descriptor
entry
Entry code (function, closure)
slow
Slow entry code (if any)
ret
Direct return address
vtbl
Vector table
<n>_alt
Case alternative (tag n)
dflt
Default case alternative
btm
Large bitmap vector
closure
Static closure
con_entry
Dynamic Constructor entry code
con_info
Dynamic Constructor info table
static_entry
Static Constructor entry code
static_info
Static Constructor info table
sel_info
Selector info table
sel_entry
Selector entry code
cc
Cost centre
ccs
Cost centre stack

Many of these distinctions are only for documentation reasons. For example, _ret is only distinguished from _entry to make it easy to tell whether a code fragment is a return point or a closure/function entry.

Modules

CodeGen
Top level. Called by the HscMain module.
CgMonad
The monad that most of codeGen operates inside
  • Reader
  • State
  • (could be Writer?)
  • fork
  • flatten
CgExpr
Seems to be the core function since everything in STG is an expression

Misc utilities

Bitmap
Utility functions for making bitmaps (e.g. mkBitmap with type [Bool] -> Bitmap)
ClosureInfo
Stores info about the memory layouts of closures
SMRep
Storage manager representation of closures. Part of ClosureInfo but kept separate to "keep nhc happy."
CgUtils
TODO

Special runtime support

CgTicky
Ticky-ticky profiling
CgProf
Cost-centre profiling
CgHpc
Support for the Haskell Program Coverage (hpc) toolkit, inside GHC.
CgParallel
Code generation for GranSim (GRAN) and parallel (PAR). All the functions are dead stubs except granYield and granFetchAndReschedule.

Not yet classified

Please help classify these if you know what they are.

CgBindery
Module for CgBindings which maps variable names to all the volitile or stable locations where they are stored (e.g. register, stack slot, computed from other expressions, etc.) Provides the addBindC, modifyBindC and getCgIdInfo functions for adding, modifying and looking up bindings.
CgStackery
Mostly utility functions for allocating and freeing stack slots. But also has things on setting up update frames.

CgHeapery

Maybe top-level
It seems that codeGen calls these two which in turn call CgExpr
  • CgClosure
  • CgCon

CgCase CgLetNoEscape

CgInfoTbls CgCallConv

CgPrimOp CgTailCall CgForeignCall