|Version 12 (modified by 10 years ago) (diff),|
GHC Commentary: The Code Generator
Storage manager representations
The code generator needs to know the layout of heap objects, because it generates code that accesses and constructs those heap objects. The runtime also needs to know about the layout of heap objects, because it contains the garbage collector. How can we share the definition of storage layout such that the code generator and the runtime both have access to it, and so that we don't have to keep two independent definitions in sync?
Currently we solve the problem this way:
- C types representing heap objects are defined in the C header files, see for example includes/Closures.h.
- A C program, includes/mkDerivedConstants.c,
#includesthe runtime headers. This program is built and run when you type
includes/. It is run twice: once to generate
includes\DerivedConstants.h, and again to generate
- The file
DerivedConstants.hcontains lots of
#define OFFSET_StgTSO_why_blocked 18which says that the offset to the why_blocked field of an
StgTSOis 18 bytes. This file is
#includedinto includes/Cmm.h, so these offests are available to the hand-written .cmm files.
- The file
GHCConstants.hcontains similar definitions:
oFFSET_StgTSO_why_blocked = 18::IntThis time the definitions are in Haskell syntax, and this file is
#includeddirectly into compiler/main/Constants.lhs. This is the way that these offsets are made available to GHC's code generator.
Generated Cmm Naming Convention
Labels generated by the code generator are of the form
<Module>_<name> for external names and
<type> is one of the following:
- Info table
- Static reference table
- Static reference table descriptor
- Entry code (function, closure)
- Slow entry code (if any)
- Direct return address
- Vector table
- Case alternative (tag n)
- Default case alternative
- Large bitmap vector
- Static closure
- Dynamic Constructor entry code
- Dynamic Constructor info table
- Static Constructor entry code
- Static Constructor info table
- Selector info table
- Selector entry code
- Cost centre
- Cost centre stack
Many of these distinctions are only for documentation reasons. For example, _ret is only distinguished from _entry to make it easy to tell whether a code fragment is a return point or a closure/function entry.
- Top level. Called by the
- The monad that most of codeGen operates inside
- (could be Writer?)
- Seems to be the core function since everything in STG is an expression
Memory and Register Management
CgBindingswhich maps variable names to all the volitile or stable locations where they are stored (e.g. register, stack slot, computed from other expressions, etc.) Provides the
getCgIdInfofunctions for adding, modifying and looking up bindings.
- Mostly utility functions for allocating and freeing stack slots. But also has things on setting up update frames.
Functions for allocating objects that appear on the heap such as closures and constructors.
Also includes code for stack and heap checks and
Utility functions for making bitmaps (e.g.
[Bool] -> Bitmap)
- Stores info about the memory layouts of closures
- Storage manager representation of closures. Part of ClosureInfo but kept separate to "keep nhc happy."
Special runtime support
- Ticky-ticky profiling
- Cost-centre profiling
- Support for the Haskell Program Coverage (hpc) toolkit, inside GHC.
Code generation for GranSim (GRAN) and parallel (PAR).
All the functions are dead stubs except
Not yet classified
Please help classify these if you know what they are.
- Maybe top-level
- It seems that codeGen calls these two which in turn call CgExpr
CgPrimOp CgTailCall CgForeignCall