Changes between Version 1 and Version 2 of Commentary/Compiler/Backends/LLVM/Design

Feb 25, 2010 3:44:06 AM (4 years ago)

Fleshed out


  • Commentary/Compiler/Backends/LLVM/Design

    v1 v2  
    22= LLVM Back-end Design = 
    4 The initial design tries to fit into GHC's current pipeline stages as seamlessly as possible. This allows for quicker development and focus on the core task of LLVM code generation. 
     4The current design tries to fit into GHC's pipeline stages as an alternative to the C and NCG back-ends as seamlessly as possible. This allows for quicker development and focus on the core task of LLVM code generation. 
    66The LLVM pipeline works as follows: 
    19   Cmm -> (codeOutput) --->(ncg) Assembler                -->(mangler, splitter) --> ('As' phase) -----> Object Code --> (link) --> executable 
    20                           \---> LLVM Assembler          --> LLVM Optimizer     --> ('llc' phase) -----/ 
     19Cmm -> (codeOutput) --->(ncg) Assembler      -->(mangler, splitter) --> ('As' phase) -----> Object Code --> (link) --> executable 
     20                        \---> LLVM Assembler --> LLVM Optimizer     --> ('llc' phase) -----/ 
    2323This approach was the easiest and thus quickest way to initially implement the LLVM back-end. Now that it is working, there is some room for additional optimisations. A potential optimisation would be to add a new linker phase for LLVM. Instead of each module just being compiled to native object code ASAP, it would be better to keep them in the LLVM bitcode format and link all the modules together using the LLVM linker. This enable all of LLVM's link time optimisations. All the user program LLVM bitcode will then be compiled to a native object file and linked with the runtime using the native system linker. 
     25= Implementation = 
     27== Framework == 
     29  * New -fllvm code generation pipeline, involved modifying: 
     30    * {{{main/CodeOutput.lhs}}} - Selects appropriate back-end for code generation (C, NCG, LLVM). 
     31    * {{{main/DynFlags.hs}}} - Stores GHC configuration (command line options, compile time options... ect). Added `HscLlvm` target type. 
     32    * {{{}}} - Stores modules/files to compile for ghc. Added new LLVM files and directory stored under `llvmGen`, and new CPP flag to enable the LLVM code generator (`-DLLVM`). 
     33    * {{{}}} - Added new `GhcWithLlvmCodeGen` option which can be set in `` to `YES` to enable the LLVM code generator. 
     34    * {{{main/DriverPhases.hs}}} - Added `LlvmAs` phase to invoke the compilation of LLVM bitcode/IR to an object file. After this phase linking can occur. 
     35    * {{{main/DriverPipeline.hs}}} - Added code for new `LlvmAs`, `LlvmOpt` and `LlvmLlc` phases. 
     36      * {{{LlvmAs}}} - Invokes `llvm-as` tool to compile a llvm assembly file ('.ll') to a bitcode file (`.bc`). 
     37      * {{{LlvmOpt}}} - Invokes the llvm `opt` tool to optimise the module. Just use the llvm standard optimisation groups of `O1`, `O2`, `O3`, depending on the optimisation level passed to 'ghc' by the user. 
     38      * {{{LlvmLlc}}} - Invokes the llvm `llc` tool to generate the machine code ('.s' file) from the optimised bitcode. 'As' stage runs next, part of existing 'ghc' pipeline. 
     39    * {{{SysTools.lhs}}} - Stores the path and default settings of the system tools needed, so for LLVM back-end this is `llvm-as`, `opt` and `llc`. 
     41The LLVM pipeline works as specified above. Code generation phase occurs, using the {{{dflags}}} option data the appropriate generator is selected (which is the Llvm back-end is `-fllvm` has been specified on the command line). After code generation, the next phase is determined, this is done from the `HscLlvm` target data constructor which is selected at ghc startup by {{{DynFlags.hs}}}. The next phase is `LlvmAs` which will compile the text IR to an LLVM bitcode file (equivalent to `llvm-as` tool). After this the `LlvmLlc` phase is run, which produces a native object file from the llvm bitcode file (equivalanet to the `llc` tool). At this stage, the output from all three back-ends should be 'equivalent'. After this phase, the `StopLn`, or linking phase occurs which should result in the end result. Compiling some Haskell code with the c-backend and some with the llvm-backend and linking them together is supported. 
     43== LLVM Code Generation == 
     45For LLVM code generation we need a method for representing and generating LLVM code. The [ LLVM FAQ] suggest the following possible approaches:  
     46  * Call into LLVM Libraries using FFI (can probably use [ Haskell LLVM Bindings])  
     47  * Emit LLVM Assembly (approach taken by [ EHC's] LLVM Back-end, can use the [ module] developed by them for this)  
     48  * Emit LLVM Bitcode (can't see any reason to do this) 
     50The approach taken was to use the LLVM module from [ EHC]. This module contains an abstract syntax representation of LLVM Assembly and the ability to pretty print it. It has been heavily modified to increase its language coverage as it was missing several LLVM constructs which were needed. Ideally we would like to add a second pretty printer which calls into the LLVM C++ API to generate LLVM Bitcode. This should hopefully decrease the compile times and make the back-end more resilient to future changes to LLVM Assembly. The LLVM Haskell binding (first option) wasn't used as it represents LLVM at a very high level, which isn't appropriate for the back-end. 
     52== Code Generation == 
     54Code generation consists of translating a list of `GenCmmTop` data types to LLVM code. `GenCmmTop` has the following form: 
     57data GenCmmTop d h g 
     58  = CmmProc          -- Function 
     59        h            -- Extra header such as the info table 
     60        CLabel       -- Procedure name 
     61        CmmFormals   -- Argument locals live on entry 
     62        g            -- Control flow graph 
     64 | CmmData           -- Static data 
     65        Section      -- Type 
     66        [d]          -- Data 
     68data GenBasicBlock i = BasicBlock BlockId [i] 
     69newtype ListGraph i = ListGraph [[GenBasicBlock i] 
     71type RawCmmTop = GenCmmmTop CmmStatic [CmmStatic] (ListGraph CmmStmt) 
     74That is, it consists of two types, static data and functions. Each can largely be handled separately. Just enough information is needed such that pointers can be constructed to them and in many cases this information can be gathered from assumptions and constraints on Cmm. 
     76The code generator lives in `llvmGen` with the driver being `llvmGen/LlvmCodeGen.lhs`. 
     78A large part of the code generation is keeping track of defined variables/functions and their type. An `LlvmEnv` construct is used for this. It is simply a dictionary storing function/variable names with their corresponding type information. This is used to create correct references/pointers between variables and functions. 
     80=== Unregistered Vs. Registered === 
     82Code generation can take place in two general modes, `unregistered` and `registered`. There are two major differences from a back-end code generation point of view. Firstly, in unregistered mode a optimisation features called {{{TABLES_NEXT_TO_CODE}}} is disabled. This means that the `h` field of `CmmProc` is empty. In registered mode it instead contains the `CmmStatic` data for the procedures info table which must be placed just before the procedure in the generated code so that both the info table and procedure can be accessed through one pointer. This optimisation can be disabled separately though in `registered` mode. 
     84The other major change is the use of pinned global registers. The `Cmm` language includes a concept called registers. These are used like machine registers or variables in C to store the result of expressions. Unlike `LLVM` they are mutable. `Cmm` includes two types of registers as you can see below:  
     87data CmmReg  
     88  = CmmLocal  LocalReg 
     89  | CmmGlobal GlobalReg 
     90  deriving( Eq, Ord ) 
     93A `LocalReg` is a temporary general purpose registered used in a procedure with scope of a single procedure. A `GlobalReg` on the other hand has global scope and a specific use. They are used just like machine registers, with a Stack Pointer and Heap Pointer registers creating a virtual machine (`STG`). `GlobalReg` is of the form: 
     96data GlobalReg 
     97  -- Argument and return registers 
     98  = VanillaReg                  -- pointers, unboxed ints and chars 
     99        {-# UNPACK #-} !Int     -- its number 
     100        VGcPtr 
     102  | FloatReg            -- single-precision floating-point registers 
     103        {-# UNPACK #-} !Int     -- its number 
     105  | DoubleReg           -- double-precision floating-point registers 
     106        {-# UNPACK #-} !Int     -- its number 
     108  | LongReg             -- long int registers (64-bit, really) 
     109        {-# UNPACK #-} !Int     -- its number 
     111  -- STG registers 
     112  | Sp                  -- Stack ptr; points to last occupied stack location. 
     113  | SpLim               -- Stack limit 
     114  | Hp                  -- Heap ptr; points to last occupied heap location. 
     115  | HpLim               -- Heap limit register 
     116  | CurrentTSO          -- pointer to current thread's TSO 
     117  | CurrentNursery      -- pointer to allocation area 
     118  | HpAlloc             -- allocation count for heap check failure 
     120                -- We keep the address of some commonly-called  
     121                -- functions in the register table, to keep code 
     122                -- size down: 
     123  | EagerBlackholeInfo  -- stg_EAGER_BLACKHOLE_info 
     124  | GCEnter1            -- stg_gc_enter_1 
     125  | GCFun               -- stg_gc_fun 
     127  -- Base offset for the register table, used for accessing registers 
     128  -- which do not have real registers assigned to them.  This register 
     129  -- will only appear after we have expanded GlobalReg into memory accesses 
     130  -- (where necessary) in the native code generator. 
     131  | BaseReg 
     133  -- Base Register for PIC (position-independent code) calculations 
     134  -- Only used inside the native code generator. It's exact meaning differs 
     135  -- from platform to platform (see module PositionIndependentCode). 
     136  | PicBaseReg 
     138  deriving( Show ) 
     141In unregistered mode these global registers are all just stored in memory in the heap. A specific pass operating on Cmm that takes place just before code generation thus transforms code such as: 
     144__stginit_ZCMain() { 
     145        { update_frame: <none> 
     146        } 
     147    cfF: 
     148        Sp = Sp + 4; 
     149        jump (I32[Sp - 4]) (); 
     153into the following unregistered form for code generation: 
     156__stginit_main::Main() { 
     157        { [] 
     158        } 
     159    crF: 
     160        I32[MainCapability+92] = I32[MainCapability+92] + 4; 
     161        jump (I32[I32[MainCapability+92] - 4]) (); 
     165Where `MainCapability` is a label to the start of a RTS defined structure storing all the global registers. 
     167In registered mode as many of these global registers are assigned permanently to fixed hardware registers. This is done as it greatly improves performance. As these registers are accessed very frequently needing to load and store to memory for accessing adds a great cost. So for example on `x86` the following map between `Cmm` global registers and `x86` hardware registers exists: 
     170Base -> %EBX 
     171Sp   -> %EBP 
     172Hp   -> %EDI 
     173R1   -> %ESI 
     176These are all the available `callee save` registers on x86. `callee save` are used as in ghc generated code now saving and restoring of these registers are needed due to there new special use and because GHC uses continuation passing style, so a `'ret'` statement is never actually generated. And since they are `callee save`, foreign code can also be called without any need to handle the `Cmm` registers. 
     178== `CmmData` == 
     180`CmmData` takes the following form: 
     183newtype CmmData = CmmData Section [CmmStatic] 
     185data CmmStatic 
     186  = CmmStaticLit      CmmLit  -- static value 
     187  | CmmUninitialised  Int     -- n bytes of uninitialised data 
     188  | CmmAlign          Int     -- align to next N byte boundary 
     189  | CmmDataLabel      CLabel  -- label current position in code 
     190  | CmmString         [Word8] -- string of 8 bit values 
     192data CmmLit 
     193  = CmmInt            Integer  Width    -- 2 compliments, truncated int 
     194  | CmmFloat          Rational Width    -- float 
     195  | CmmLabel          CLabel            -- &l1 
     196  | CmmLabelOff       CLabel Int        -- &l1 + offset 
     197  | CmmLabelDiffOff   CLabel CLabel Int -- &l1 - &l2 + offset 
     198  | CmmBlock          BlockId           -- address of code label 
     199  | CmmHighStackMark                    -- max stack space used during a procedure 
     202Code generation takes place mainly in {{{llvmGen/LlvmCodeGen/Data.hs}}}, driven by the main Llvm compiler driver, {{llvmGen/LlvmCodeGen.lhs}}}. 
     204The code generation for data occurs in two phases, firstly the types and all data is generated except for address values. Then the address values are resolved. This two step method is used as in the first pass, we don't know if a address refers to an external address or a procedure/data structure in the current LLVM module. We also need the type information in LLVM to create a pointer. 
     206=== 1st Pass : Generation === 
     208All `CmmStatic` is translated to LLVM structures. 
     210== `CmmStaticLit` == 
     212These are translated when possible as follows: 
     213  * `CmmInt` -> Reduced to Int and then an appropriate `LMInt` of correct size is created. As LLVM supports any bit size, this is very straight forward. 
     214  * `CmmFloat` -> Translated to a double, detecting NAN and INFINITY correctly. Then correct LLVM type (`float`, `double`, `float80`, `float128`) is selected. 
     215  * `CmmLabel` -> Left untranslated at first, later resolved once we have determined types. As pointers are cast to word size ints, we can still determine types. 
     216  * `CmmLabelOff` -> As above. 
     217  * `CmmLabelDiffOff` -> As above. 
     218  * `CmmBlock` -> `BlockId` is changed to a `CLabel` and then treated as a `CmmLabel` static type. 
     219  * `CmmHighStackMark` -> Panic occurs if this type is encountered. 
     221==== `CmmUninitialised` ==== 
     223For this, a zeroed array of `8bit` values is created of correct size. 
     225==== `CmmAlign` & `CmmDataLabel` ==== 
     227The LLVM back-end can't handle `CmmAlign` or `CmmDataLabel`. A panic occurs if either is encountered. A `CmmDataLabel` is expected at the very start of each list of `CmmStatic`. It is removed and used as the name for the structure and constant instance. 
     229==== `CmmString` ==== 
     231This is translated into a LLVM string. Ascii characters are used when they are printable, escaped hex values otherwise. A null termination is added. 
     233=== 2nd Pass : Resolution === 
     235After the first pass, all types have been determined and all data translated except for address values (CLabel's). All generated llvm data is added to a Map of string to `LlvmType`, string being the data structure name. All `CmmProc's` are added to the map as well, they don't need to be properly passed though, just their names retrieved as they have a constant type of void return and no parameters. 
     237Now appropriate pointers can be generated using the type information from the map and LLVM's `getelementptr` instruction. These are then all passed to int's to allow the types of structures to be determined in advance. If a pointer doesn't have a match in the Map, it is assumed to refer to an external (outside of this module) address. An external reference is declared for this address as: 
     240@label = external global [0 * i32] 
     243Where i32 is the pointer size. (i64 if on 64 bit). 
     245== `CmmProc` ==