Changes between Version 11 and Version 12 of Commentary/Compiler/IntegratedCodeGen


Ignore:
Timestamp:
Jun 5, 2008 10:30:41 PM (6 years ago)
Author:
nr
Comment:

wordsmithing and clarifications inserted by NR

Legend:

Unmodified
Added
Removed
Modified
  • Commentary/Compiler/IntegratedCodeGen

    v11 v12  
    1 The long-term plan for reworking GHC's back end is to produce an "Integrated Code Generator," which will break down the barrier between the machine-independent code generator (CPS conversion, stack layout, etc) and the native-code generators (instruction selection, calling conventions, register allocation -- including spilling to the C stack, etc). The goal is to simplify the back ends by reducing code duplication and to improve the quality of the generated code by making machine-specific decisions (such as register usage) using knowledge of the actual target machine. 
     1= An Integrated Code Generator for GHC = 
    22 
    3 Philosophy 
     3We propose reworking GHC's back end into an '''Integrated Code 
     4Generator''', which will widen the interface between 
     5machine-independent and machine-independent parts of the back end. 
     6We wish to '''dissolve the barrier''' between the current 
     7machine-independent transformations (CPS 
     8conversion, stack layout, etc) and the native-code generators 
     9(instruction selection, calling conventions, register allocation -- 
     10including spilling to the C stack, etc).  
     11The goal is instead to have a code generator that '''integrates both 
     12machine-independent and machine-dependent components''', which will 
     13interact through wide but well-specified interfaces.  From this 
     14refactoring we expect the following benefits: 
    415 
    5 The main infrastructure of the back end may be complicated in some cases, but the  the interface for extending a back end should be as simple as possible. For example, the implementation of the dataflow framework is quite complicated. But we can use the framework to write a new optimization by simply writing down the dataflow transfer functions that are found in standard compiler textbooks. Better yet, we can write combined "superoptimizations" with no more effort than writing the dataflow transfer functions for each individual optimization. 
     16 * '''The back end will be simpler overall''', primarily because the 
     17   refactoring will reduce or eliminate duplication of code 
    618 
    7 Pipeline 
     19 * '''Complexity will be isolated''' in two modules with well-defined 
     20   interfaces: a dataflow engine and a register allocator 
    821 
    9  0. Convert from STG to a flat representation of C--: Stg -> Cmm 
    10  0. Build control-flow graph: Cmm -> ZGraphCmm<variables, stack slots, compile-time constants> 
    11   * Converts the flat representation to a control-flow graph, with Cmm statements representing instructions in the basic blocks. 
    12   * Implements calling conventions for call, jump, and return instructions: all parameter passing is turned into data-movement instructions (register-to-register move, load, or store), and stack-pointer adjustments are inserted. After this point, calls, returns, and jumps are just control-transfer instructions -- the parameter passing has been compiled away. 
    13   * How do we refer to locations on the stack when we haven't laid it out yet? The compiler names a stack slot using the idea of a "late compile-time constant," which is just a symbolic constant that will be replaced with an actual stack offset when the stack layout is chosen. 
    14  0. Code expansion (instruction selection): ZGraph Cmm<variables, stack slots, compile-time constants> -> ZGraph Instrs<variables, stack slots, compile-time constants> 
    15   * Expands each Cmm instruction into a series of instructions. The representation of an instruction can be chosen by the back end. In some compilers (vpo, gcc, QC--), machine instructions are represented using RTLs. But Machine SUIF uses a target-specific, abstract representation that must satisfy a well-defined interface (i.e. by using a typeclass). It would be nice to support both. 
    16  0. Optimizer: ZGraph Instrs<variables, stack slots, compile-time constants> -> ZGraph Instrs<variables, stack slots, compile-time constants> 
    17  0. Proc-point analysis: ZGraph Instrs<variables, stack slots, compile-time constants> -> [ZGraph Instrs<variables, stack slots, compile-time constants>] 
     22 * '''GHC will generate better machine code''', primarily because 
     23   important decisions about register usage will be made at a later 
     24   stage of translation and will exploit knowledge of the actual 
     25   target machine.  
     26 
     27== Design elements == 
     28 
     29The important elements of 
     30our design are as follows: 
     31 
     32  0. Build two big hammers, and hit as many nails as possible.  (The big hammers are the '''dataflow rewriting engine''' and a '''coalescing register allocator.''')  The hammer itself may be big and complicated, but '''using a big hammer should be easy''' and should give easily predictable results. 
     33  0. Load all back ends into every instance of the compiler, and '''treat every compilation as a cross-compilation.'''  Despite having been used in production compilers for at least twenty years, this technique is still seen as somewhat unorthodox, but it removes many {{{#ifdef}}}s and saves significant complexity at compiler-configuration time. 
     34 
     35== Design philosophy == 
     36 
     37State-of-the art dataflow optimization and register allocation both 
     38require complex implementations.  We live with this complexity because 
     39'''creating new clients is easy.'''   
     40 
     41 * We can define a new 
     42   optimization simply by defining a lattice of dataflow facts (akin 
     43   to a specialized logic) and then writing the dataflow-transfer 
     44   functions found in compiler textbooks.   Handing these functions to 
     45   the dataflow engine produces a new optimization that is not only 
     46   useful on its own, but that can easily be composed with other 
     47   optimizations to create an integrated "superoptimization" that is 
     48   strictly more powerful than any sequence of individual optimizations, 
     49   no matter how many times they are re-run 
     50   [http://portal.acm.org/citation.cfm?id=503298 (Lerner, Grove, and Chambers 2002)]. 
     51 
     52 * The back end can use fresh temporaries and register-register moves 
     53   with abandon, knowing that a state-of-the-art register allocator 
     54   will eliminate almost all move instructions. 
     55 
     56 * Our ultimate goal is to make adding a new back end easy as well. 
     57   In the long run, we wish to apply John Dias's dissertation work to GHC. 
     58   In the short run, however, we 
     59   think it more sensible to represent each target-machine instruction 
     60   set with an algebraic datatype.  We propose to use type classes to 
     61   define common functions such as identifying the registers read and 
     62   written by each instruction. 
     63 
     64      
     65== Proposed compilation pipeline == 
     66 
     67 0. Convert from {{{STG}}} to an abstract {{{CmmAgraph}}} ([[GhcFile(compiler/cmm/ZipCfg.hs)]], [[GhcFile(compiler/cmm/ZipCfgCmmRep.hs)]]).  This step is Simon PJ's "new code generator" from September 2007.  One departure from the old code generator is that '''we do not build a {{{Cmm}}} abstract-syntax tree;''' instead we go straight to a control-flow graph. 
     68 0. Reify the control-flow graph in a non-abstract form that can be analyzed, transformed, and optimized: convert {{{CmmAgraph -> CmmGraph}}}.  This conversion may instroduce new variables, stack slots, and compile-time constants.  
     69   * Implements calling conventions for call, jump, and return instructions: all parameter passing is turned into data-movement instructions (register-to-register move, load, or store), and stack-pointer adjustments are inserted. After this point, calls, returns, and jumps are just control-transfer instructions -- the parameter passing has been compiled away.   
     70   * How do we refer to locations on the stack when we haven't laid it out yet? The compiler names a stack slot using the idea of a "late compile-time constant," which is just a symbolic constant that will be replaced with an actual stack offset when the stack layout is chosen. 
     71 0. Instruction selection: each {{{Cmm}}} {{{Middle}}} and {{{Last}}} node in the control-flow graph is replaced with a new graph in which the nodes are machine instructions.  The {{{MachineMiddle}}} type represents computational machine instructions; the {{{MachineLast}}} type represents control-transfer instructions.  The choice of representation is up to the author of the back end, but for continuity with the existing native code generators, we expect to begin by using algebraic data types inspired by the existing definitions in [[GhcFile(compiler/nativeGen/MachInstrs.hs)]]. 
     72      * An '''extremely important distinction''' from the existing code is that we plan to eliminate {{{#ifdef}}} and instead provide multiple datatypes, e.g., in {{{I386Instrs.hs}}}, {{{PpcInstrs.hs}}}, {{{SparcInstrs.hs}}}, and so on. 
     73    We expect to '''abstract away from the details of these representations''' by borrowing some abstractions from [http://www.eecs.harvard.edu/hube/software/nci/overview.html Machine SUIF].  In the longer term we would like to support RTL-based representations such as are used in gcc, vpo and Quick C--.  [[BR]][[BR]] We expect a set of types and an instruction selector for ''each'' back end, so for example, we might have a transformation that maps {{{LGraph Cmm.Middle Cmm.Last}}} (with variables, stack slots, and compile-time constants) {{{-> LGraph I368Instrs.Middle I368Instrs.Last}}} (with variables, stack slots, and compile-time constants) 
     74 0. Optimizer: {{{LGraph Instrs}}} (with variables, stack slots, and compile-time constants) {{{-> LGraph Instrs}}} (with variables, stack slots, and compile-time constants) 
     75     * Obvious code improvements include lazy code motion and dead-code elimination as well as various optimizations based on forward substitution (copy propagation, constant propagation, peephole optimization).  We intend to follow the style of Machine SUIF and '''make code improvements machine-independent''', using machine-dependent functions to capture the semantics of instructions. 
     76     * The difficulty of implementing a good peephole optimizer varies greatly with the representation of instructions.  We propose to postpone serious work on peephole optimization until we have a back end capable of representing machine instructions as RTLs, which makes peephole optimization trivial. 
     77 0. Proc-point analysis: {{{LGraph Instrs}}} (with variables, stack slots, and compile-time constants) -> {{{LGraph Instrs}}} (with variables, stack slots, and compile-time constants) 
    1878  * Proc points are found, and the appropriate control-transfer instructions are inserted. 
    19   * Why so early? Depending on the back end (think of C as the worst case), the proc-point analysis might have to satisfy some horrible calling convention. We want to make these requirements explicit before we get to the register allocator. 
    20  0. Register allocation: ZGraph Instrs<variables, stack slots, compile-time constants> -> ZGraph Instrs<stack slots, compile-time constants> 
     79  * Why so early? Depending on the back end (think of C as the worst case), the proc-point analysis might have to satisfy some horrible calling convention. We want to make these requirements explicit before we get to the register allocator.  We also want to '''exploit the register allocator''' to make the best possible decisions about ''which live variables (if any) should be in registers at a proc point''. 
     80 0. Register allocation: {{{LGraph Instrs}}} (with variables, stack slots, and compile-time constants) {{{-> LGraph Instrs}}} (with stack slots, and compile-time constants) 
    2181  * Replace variable references with machine register and stack slots. 
    22  0. Stack Layout: ZGraph Instrs<> -> ZGraph Instrs<> 
     82 0. Stack Layout: {{{LGraph Instrs}}} (with stack slots, and compile-time constants) {{{-> LGraph Instrs}}} 
    2383  * Choose a stack layout. 
    2484  * Replace references to stack slots with addresses on the stack. 
    2585  * Replace compile-time constants with offsets into the stack. 
    26  0. Proc-point splitting: ZGraph Instrs<> -> [ZGraph Instrs<>] 
     86 0. Proc-point splitting: {{{LGraph Instrs -> [LGraph Instrs]}}}  
    2787  * Each proc point gets its own procedure. 
    28  0. Code emission: ZGraph Instrs<> -> String 
    29   * Assembly code ahoy! 
     88 0. Code layout: {{{LGraph Instrs -> [String]}}} 
     89  * A reverse postorder depth-first traversal simultaneously converts the graph from to sequential code and converts each instruction into an assembly-code string: '''Assembly code ahoy'''! 
    3090 
     91=== Machine-dependence === 
    3192 
    32  
    33 Implicit in this pipeline: 
    34  * Besides the expander, (parts of) the optimizer, and the code emitter, the rest of the passes should work on any chosen representation of instructions. Typeclasses are our friends. 
     93A key property of the design is that the scopes of machine-dependent code and machine-dependent static types are limited as much as possible: 
     94  0. The representation of machine instructions may be machine-dependent (algebraic data type), or we may use a machine-independent representation that satisfies a machine-dependent dynamic invariant (RTLs).   The back end should be designed in such a way that most passes don't know the difference; we intend to borrow heavily from Machine SUIF.  To define the interface used to conceal the difference, Machine SUIF uses C++ classes; we will use Haskell's type classes. 
     95  0. Instruction selection is necessarily machine-dependent, and moreover, it must know the representation of machine instructions 
     96  0. Most of the optimizer need not know the representation of machine instructions. 
     97  0. Other passes, including register allocation, stack layout, and so on, should be completely machine-independent. 
     98  0. RTLs are not a new representation; they are a trivial extension of existing {{{Cmm}}} representations.