|Version 30 (modified by 3 years ago) (diff),|
This page describes code generator ("codegen") in GHC. It is meant to reflect current state of the implementation. If you notice any inacurracies please update the page (if you know how) or complain on ghc-devs.
A brief history of code generator
You might ocasionally hear about "old" and "new" code generator. GHC 7.6 and earlier used the old code generator. New code generator was being developed since 2007 and it was enabled by default on 31 August 2012 after the release of GHC 7.6.1. The first stable GHC to use the new code generator is 7.8.1 released in early 2014. The commentary on the old code generator can be found here. Notes from the development process of the new code generator are located in a couple of pages on the wiki - go to Index and look for pages starting with "NewCodeGen".
There are some plans for the future development of code generator. One plan is to expand the capability of the pipeline so that it does native code generation too so that existing backends can be discarded - see IntegratedCodeGen for discussion of the design. It is hard to say if this will ever happen as currently there is no work being done on that subject and in the meanwhile there was an alternative proposal to replace native code generator with LLVM.
The goal of the code generator is to convert program from STG representation to Cmm representation. STG is a functional language with explicit stack. Cmm is a low-level imperative language - something between C and assembly - that is suitable for machine code generation. Note that terminology might be a bit confusing here: the term "code generator" can refer both to STG->Cmm pass and the whole STG->Cmm->assembly pass. The Cmm->assembly conversion is performed by one the backends, eg. NCG (Native Code Generator or LLVM.
The top-most entry point to the codegen is located in compiler/main/HscMain.hs in the
tryNewCodegen function. Code generation is done in two stages:
- Convert STG to Cmm with implicit stack, and native Cmm calls. This whole stage lives in compiler/codeGen directory with the entry point being
codeGenfunction in compiler/codeGen/StgCmm.hs module.
- Optimise the Cmm, and CPS-convert it to have an explicit stack, and no native calls. This lives in compiler/cmm directory with the
cmmPipelinefunction from compiler/cmm/CmmPipeline.hs module being the entry point.
The CPS-converted Cmm is fed to one of the backends. This is done by
codeOutput function (compiler/main/CodeOutput.lhs called from
hscGenHardCode after returning from
First stage: STG to Cmm conversion
- Code generator converts STG to
CmmGraph. Implemented in
StgCmm*modules (in directory
Cmm.CmmGraphis pretty much a Hoopl graph of
CmmNode.CmmNodenodes. Control transfer instructions are always the last node of a basic block.
- Parameter passing is made explicit; the calling convention depends on the target architecture. The key function is
- Parameters are passed in virtual registers R1, R2 etc. [These map 1-1 to real registers.]
- Overflow parameters are passed on the stack using explicit memory stores, to locations described abstractly using the ''Stack Area'' abstraction..
- Making the calling convention explicit includes an explicit store instruction of the return address, which is stored explicitly on the stack in the same way as overflow parameters. This is done (obscurely) in
Second stage: the Cmm pipeline
The core of the Cmm pipeline is implemented by the
cpsTop function in compiler/cmm/CmmPipeline.hs module. The pipeline consists of following passes:
- Control Flow Optimisations, implemented in
CmmContFlowOpt, simplifies the control flow graph by:
- Eliminating blocks that have only one predecessor by concatenating them with that predecessor
- Shortcuting targets of branches and calls (see Note [What is shortcutting])
If a block becomes unreachable because of shortcutting it is eliminated from the graph. However, it is theoretically possible that this pass will produce unreachable blocks. The reason is the label renaming pass performed after block concatenation has been completed.
This pass might be optionally called for the second time at the end of the pipeline.
- Common Block Elimination, implemented in
CmmCommonBlockElim, eliminates blocks that are identical (except for the label on their first node). Since this pass traverses blocks in depth-first order any unreachable blocks introduced by Control Flow Optimisations are eliminated. This pass is optional.
- Determine proc-points, implemented in
CmmProcPoint. The idea behind the "proc-point splitting" is that we first determine proc-points, ie. blocks in the graph that can be turned into entry points of procedures, and then split a larger function into many smaller ones, each having a proc-point as its entry point. This is required for the LLVM backend. The proc-point splitting itself is done later in the pipeline, but here we only determine the set of proc-points. We first call
callProcPoints, which assumes that entry point to a Cmm graph and every continuation of a call is a procpoint. If we are aplitting proc-points we update the list of proc-points by calling
minimalProcPointSet, which adds all blocks reachable from more than one block in the graph. The set of proc-points is required by the stack layout pass.
- Figure out the stack layout, implemented in
CmmStackLayout. The job of this pass is to:
- replace references to abstract stack Areas with fixed offsets from Sp.
- replace the CmmHighStackMark constant used in the stack check with the maximum stack usage of the proc.
- save any variables that are live across a call, and reload them as necessary.
Invariant violation: It may happen that stack layout will invalidate the computed set of proc-points by removing a block that is a proc-point. This means that at this point in the pipeline we have insonsistent data and subsequent steps must be prepared for it.
- Sinking assignments, implemented in
CmmSink, performs these optimizations:
- moves assignments closer to their uses, to reduce register pressure
- pushes assignments into a single branch of a conditional if possible
- inlines assignments to registers that are mentioned only once
- discards dead assignments
- CAF analysis, implemented in
CmmBuildInfoTables. Computed CAF information is returned from
cmmPipelineand used to create Static Reference Tables (SRT). See here for some more detail on CAFs and SRTs. This pass is implemented using Hoopl (see below).
- Proc-point analysis and splitting (only when splitting proc-points), implemented by
CmmProcPoint, takes a list of proc-points and for each block and determines from which proc-point the block is reachable. This is implemented using Hoopl. Then the call to
splitAtProcPointssplits the Cmm graph into multiple Cmm graphs (each represents a single function) and build info tables to each of them. When doing this we must be prepared for the fact that a proc-point does not actually exist in the graph since it was removed by stack layout pass (see #8205).
- Attach continuations' info tables (only when NOT splitting proc-points), implemented by
CmmProcPointattaches info tables for the continuations of calls in the graph. [PLEASE WRITE MORE IF YOU KNOW WHY THIS IS DONE]
- Update info tables to include stack liveness, implemented by
CmmLayoutStack. Populates info tables of each Cmm function with stack usage information. Uses stack maps created by the stack layout pass.
- Control Flow Optimisations, same as the beginning of the pipeline, but this pass runs only with
-O2. Since this pass might produce unreachable blocks it is followed by a call to
Dumping and debugging Cmm
You can dump the generated Cmm code using
-ddump-cmm flag. This is helpful for debugging Cmm problems. Cmm dump is divided into several sections:
==================== Cmm produced by new codegen ==================== ... ==================== Post control-flow optimisations ==================== ... ==================== Post common block elimination ==================== ... ==================== Layout Stack ==================== ... ==================== Sink assignments ==================== ... ==================== CAFEnv ==================== ... ==================== after setInfoTableStackMap ==================== ... ==================== Post control-flow optimisations ==================== ... ==================== Post CPS Cmm ==================== ... ==================== Output Cmm ==================== ...
"Cmm produced by new codegen" is emited in
HscMain module after converting STG to Cmm. This Cmm has not been processed in any way by the Cmm pipeline. If you see that something is incorrect in that dump it means that the problem is located in the STG->Cmm pass. The last section, "Output Cmm", is also dumped in
HscMain but this is done after the Cmm has been processed by the whole Cmm pipeline. All other sections are dumped by the CmmPipeline. You can dump only selected passes with more specific flags. For example, if you know (or suspect) that the sinking pass is performing some incorrect transformations you can make the dump shorter by adding
-ddump-cmm-sp -ddump-cmm-sink flags. This will produce only the "Layout Stack" dump (just before sinking pass) and "Sink assignments" dump (just after the sinking pass) allowing you to focus on the changes introduced by the sinking pass.