Changes between Version 8 and Version 9 of Commentary/Compiler/Backends/LLVM


Ignore:
Timestamp:
Feb 24, 2010 12:27:38 AM (4 years ago)
Author:
dterei
Comment:

Check all information and fix up old email style.

Legend:

Unmodified
Added
Removed
Modified
  • Commentary/Compiler/Backends/LLVM

    v8 v9  
    44David Terei wrote a new code generator for GHC which targets the LLVM compiler infrastructure. Most of the work was done as part of an honours thesis at the University of New South Wales under the supervision of Manuel Chakravarty. Its now at a stage where it is under consideration to be merged into GHC mainline. 
    55 
    6 The patch for the llvm back-end can be found here (should apply cleanly to GHC head): 
    7  
    8 http://www.cse.unsw.edu.au/~davidt/downloads/ghc-llvmbackend-full.gz 
    9  
    10 This is what is under consideration to be merged into GHC head. 
    11  
    126The thesis paper which offers a detailed performance evaluation, as well as the motivation and design of the back-end can be found at: 
    137 
    148http://www.cse.unsw.edu.au/~pls/thesis/davidt-thesis.pdf 
    159 
    16 Below I'll quickly detail out the important points though. There are also instructions on how to get started with the back-end. 
     10As the back-end isn't currently in GHC head, you need to follow the steps below to get it up and running. 
    1711 
    18 Finally there are also some issues that I think may need to be sorted out before a merge could be done. They are at the end. 
     12 
     13= Patches = 
     14 
     15 * GHC Patch (applies to GHC): http://www.cse.unsw.edu.au/~davidt/downloads/ghc-llvmbackend-full.gz 
     16 * LLVM Patch (applies to LLVM): http://www.cse.unsw.edu.au/~davidt/downloads/llvm-ghc.patch 
     17 
     18These are the patches that you should be working with, they are 'stable'. The back-end also lives in a Git repository where the actual development work is done, this can be found at https://cgi.cse.unsw.edu.au/~davidt/cgit/cgit.cgi/Thesis%20GHC%20Dev/ 
     19 
    1920 
    2021= Installing = 
    21  
    22  
    23 http://www.cse.unsw.edu.au/~davidt/downloads/ghc-llvmbackend-full.gz 
    2422 
    2523Apply the darcs patch linked above to GHC head. This will make some changes across GHC, with the bulk of the new code ending up in 'compiler/llvmGen'. 
     
    3230}}} 
    3331 
    34 The llvm code generator doesn't support at this time the {{{TABLES_NEXT_TO_CODE}}} optimisation due to limitations with LLVM. 
     32The LLVM code generator doesn't support at this time the {{{TABLES_NEXT_TO_CODE}}} optimisation due to limitations with LLVM. 
    3533 
    36 You will also need LLVM installed on your computer to use the back-end. Version 2.6 or SVN trunk is supported. If you want to use the back-end in an unregistered ghc build, then you can use a vanilla build of LLVM. However if you want to use a registered ghc build (very likely) then you need to patch LLVM for this to work. The patch for llvm can be found here: 
    37  
    38 http://www.cse.unsw.edu.au/~davidt/downloads/llvm-ghc.patch 
     34You will also need LLVM installed on your computer to use the back-end. Version 2.6 or SVN trunk is supported. If you want to use the back-end in an unregistered ghc build, then you can use a vanilla build of LLVM. However if you want to use a registered GHC build (very likely) then you need to patch LLVM for this to work using the patch provided above. 
    3935 
    4036LLVM is very easy to build and install. It can be done as follows: 
     
    5147Just make sure this modified version of LLVM is on your path and takes precedence over any other builds. 
    5248 
     49 
    5350= Using = 
    5451 
    55  
    56 Once GHC is built, you can trigger GHC to use the LLVM back-end with the {{{-fllvm}}} flag. There is also a new {{{-ddump-llvm}}} which will dump out the llvm IR code generated (must be used in combination with the {{{-fllvm}}} flag. (or use the {{{-keep-tmp-files}}} flag). 
     52Once GHC is built, you can trigger GHC to use the LLVM back-end with the {{{-fllvm}}} flag. There is also a new {{{-ddump-llvm}}} which will dump out the LLVM IR code generated (must be used in combination with the {{{-fllvm}}} flag. (or use the {{{-keep-tmp-files}}} flag). 
    5753 
    5854{{{ghc --info}}} should also now report that it includes the llvm code generator. 
     
    6258= Supported Platforms & Correctness = 
    6359 
    64 Linux x86-32/x86-64 are currently well supported. The back-end can pass the test suite and build a working version of GHC (bootstrap test). 
     60 * Linux x86-32/x86-64 are currently well supported. The back-end can pass the test suite and build a working version of GHC (bootstrap test). 
    6561 
    66 Mac OS X 10.5 currently has a rather nasty bug with any dynamic lib calls (all libffi stuff) [due to the stack not being 16byte aligned when the calls are made as required by OSX ABI for the curious]. Test suite passes except for most the ffi tests. 
     62 * Mac OS X 10.5 currently has a rather nasty bug with any dynamic lib calls (all libffi stuff) [due to the stack not being 16byte aligned when the calls are made as required by OSX ABI for the curious]. Test suite passes except for most the ffi tests. 
    6763 
    68 Other platforms haven't been tested at all. As using the back-end with a registered build of GHC requires a modified version of LLVM, people wanting to try it out on those platforms will need to either make the needed changes to LLVM themselves, or use an unregistered build of GHC which will work with a vanilla install of LLVM. (A patch for LLVM for x86 is linked to below.) 
     64 * Other platforms haven't been tested at all. As using the back-end with a registered build of GHC requires a modified version of LLVM, people wanting to try it out on those platforms will need to either make the needed changes to LLVM themselves, or use an unregistered build of GHC which will work with a vanilla install of LLVM. (A patch for LLVM for x86 is linked to below.) 
    6965 
    70 = LLVM Backend Design = 
     66 
     67= Performance = 
     68 
     69(All done on linux/x86-32) 
     70 
     71A quick summary of the results are that for the 'nofib' benchmark suite, the LLVM code generator was 3.8% slower than the NCG (the C code generator was 6.9% slower than the NCG). The DPH project includes a benchmark suite which I (David Terei) also ran and for this type of code using the LLVM back-end shortened the runtime by an average of 25% compared to the NCG. Also, while not included in my thesis paper as I ran out of time, I did do some benchmarking with the 'nobench' benchmark suite. It gave performance ratios for the back-ends of around: 
     72 
     73||NCG || 1.11|| 
     74||C || 1.05|| 
     75||LLVM || 1.14|| 
     76 
     77A nice demonstration of the improvements the LLVM back-end can bring to some code though can be see at http://donsbot.wordpress.com/2010/02/21/smoking-fast-haskell-code-using-ghcs-new-llvm-codegen/ 
     78 
     79= LLVM Back-end Design = 
     80 
    7181 
    7282The initial design tries to fit into GHC's current pipeline stages as seamlessly as possible. This allows for quicker development and focus on the core task of LLVM code generation. 
     
    7686  * New path for LLVM generation, separate from C and NCG. (path forks at compiler/main/CodeOutput.lhs, same place where C and NCG fork). 
    7787  * LLVM code generation will output LLVM assembly code. 
    78   * The llvm assembly code is translated to an object file as follows 
    79      * First, there is an '!LlvmAs' phase which generates llvm bitcode from LLVM assembly code (using the {{{llvm-as}}} tool).  
    80      * The llvm optimizer is run which is a series of bitcode to bitcode optimization passes (using the {{{llc}}} tool). 
    81      * Finally an object file is created from the llvm bitcode (using the {{{llc}}} tool) 
     88  * The LLVM assembly code is translated to an object file as follows 
     89     * First, there is an '!LlvmAs' phase which generates LLVM bitcode from LLVM assembly code (using the {{{llvm-as}}} tool).  
     90     * The LLVM optimizer is run which is a series of bitcode to bitcode optimization passes (using the {{{llc}}} tool). 
     91     * Finally an object file is created from the LLVM bitcode (using the {{{llc}}} tool) 
    8292   
    83   * This brings the LLVM path back to the other backends. 
    84   * The final state is the Link stage, which uses the system linker as with the other backends. 
     93  * This brings the LLVM path back to the other back-ends. 
     94  * The final state is the Link stage, which uses the system linker as with the other back-ends. 
    8595 
    8696Here is a diagram of the pipeline: 
     
    91101}}} 
    92102 
    93 This approach was the easiest and thus quickest way to initially implement the LLVM backend. Now that it is working, there is some room for additional optimizations. A potential optimization would be to add a new linker phase for LLVM. Instead of each module just being compiled to native object code ASAP, it would be better to keep them in the LLVM bitcode format and link all the modules together using the LLVM linker. This enable all of LLVM's link time optimisations. All the user program LLVM bitcode will then be compiled to a native object file and linked with the runtime using the native system linker. 
     103This approach was the easiest and thus quickest way to initially implement the LLVM back-end. Now that it is working, there is some room for additional optimisations. A potential optimisation would be to add a new linker phase for LLVM. Instead of each module just being compiled to native object code ASAP, it would be better to keep them in the LLVM bitcode format and link all the modules together using the LLVM linker. This enable all of LLVM's link time optimisations. All the user program LLVM bitcode will then be compiled to a native object file and linked with the runtime using the native system linker. 
     104 
    94105 
    95106= Implementation Issues = 
    96107 
     108== LLVM Changes == 
     109 
     110The biggest problem is that LLVM doesn't provide all the features we need. The two issues below, 'Register Pinning' and 'TNTC' are the primary examples of this. While there is a patch for LLVM to partially correct fix this, this is a problem in itself as we now must include in GHC our own version of LLVM. Eventually we need to either get the changes we need included in LLVM or improve LLVM so that the features we require could be included dynamically. 
     111 
    97112== Register Pinning == 
    98113 
    99 The new backend supports a custom calling convention to place the STG virtual registers into specific hardware registers. The current approach taken by the C back-end and NCG of having a fixed assignment of STG virtual registers to hardware registers for performance gains not implemented in the LLVM backend. Instead, it uses a custom calling convention to support something semantically equivalent to register pinning. The custom calling convention passes the first N variables in specific hardware registers, thus guaranteeing on all function entries that the STG virtual registers can be found in the expected hardware registers. This approach is hoped to provide better performance than the register pinning used by NCG/C back-ends as it keeps the STG virtual registers mostly in hardware registers but allows the register allocator more flexibility and access to all machine registers. 
     114The new back-end supports a custom calling convention to place the STG virtual registers into specific hardware registers. The current approach taken by the C back-end and NCG of having a fixed assignment of STG virtual registers to hardware registers for performance gains is not implemented in the LLVM back-end. Instead, it uses a custom calling convention to support something semantically equivalent to register pinning. The custom calling convention passes the first N variables in specific hardware registers, thus guaranteeing on all function entries that the STG virtual registers can be found in the expected hardware registers. This approach is believed to provide better performance than the register pinning used by NCG/C back-ends as it keeps the STG virtual registers mostly in hardware registers but allows the register allocator more flexibility and access to all machine registers. 
    100115 
    101116== TABLES_NEXT_TO_CODE == 
    102117 
    103 GHC for heap objects places the info table (meta data) and the code adjacent to each other. That is, in memory, the object firstly has a head structure, which consists of a pointer to an info table and a payload structure. The pointer points to the bottom of the info table and the closures code is placed to be straight after the info table, so to jump to the code we can just jump one past the info table pointer. The other way to do this would be to have the info table contain a pointer to the closure code. However this would then require two jumps to get to the code instead of just one jump in the optimized layout. Achieving this layout can create some difficulty, the current back-ends handle it as follows: 
     118GHC for heap objects places the info table (meta data) and the code adjacent to each other. That is, in memory, the object firstly has a head structure, which consists of a pointer to an info table and a payload structure. The pointer points to the bottom of the info table and the closures code is placed to be straight after the info table, so to jump to the code we can just jump one past the info table pointer. The other way to do this would be to have the info table contain a pointer to the closure code. However this would then require two jumps to get to the code instead of just one jump in the optimised layout. Achieving this layout can create some difficulty, the current back-ends handle it as follows: 
    104119 
    105120  * The NCG can create this layout itself 
    106121  * The C code generator can't. So the [wiki:Commentary/EvilMangler Evil Mangler] rearranges the GCC assembly code to achieve the layout.  
    107122 
    108 There is a build option in GHC to use the unoptimised layout and instead use a pointer to the code in the info table. This layout can be enabled/disabled by using the compiler def TABLES_NEXT_TO_CODE. As LLVM has no means to achieve the optimised layout and we don't wish to write an LLVM sister for the Evil Mangler, the LLVM backend currently uses the unoptimised layout. This apparently incurs a performance penalty of 5% (source, Making a ''Fast Curry: Push/Enter vs. Eval/Apply for Higher-order Languages'', Simon Marlow and Simon Peyton Jones, 2004). 
     123There is a build option in GHC to use the unoptimised layout and instead use a pointer to the code in the info table. This layout can be enabled/disabled by using the compiler {{{#def TABLES_NEXT_TO_CODE}}}. As LLVM has no means to achieve the optimised layout and we don't wish to write an LLVM sister for the Evil Mangler, the LLVM back-end currently uses the unoptimised layout. This apparently incurs a performance penalty of 5% (source, Making a ''Fast Curry: Push/Enter vs. Eval/Apply for Higher-order Languages'', Simon Marlow and Simon Peyton Jones, 2004). 
    109124 
    110125== Shared Code with NCG == 
     
    121136The LLVM IR is modeled in GHC using an algebraic data type to represent the first order abstract syntax of the LLVM assembly code. The LLVM representation lives in the 'Llvm' subdirectory and also contains code for pretty printing. This is the same approach taken by [http://www.cs.uu.nl/wiki/Ehc/WebHome EHC]'s LLVM Back-end, and we adapted the [https://subversion.cs.uu.nl/repos/project.UHC.pub/trunk/EHC/src/ehc/LLVM.cag module] developed by them for this purpose. 
    122137 
    123  
    124 = Performance = 
    125  
    126 (All done on linux/x86-32) 
    127  
    128 A quick summary of the results are that for the 'nofib' benchmark suite, the llvm code generator was 3.8% slower than the NCG (the C code generator was 6.9% slower than the NCG). The DPH project includes a benchmark suite which I also ran and for this type of code using the llvm back-end shortened the runtime by an average of 25% compared to the NCG. Also, while not included in my thesis paper as I ran out of time, I did do some benchmarking with the 'nobench' benchmark suite. It gave performance ratios for the back-ends of around: 
    129  
    130 ||NCG || 1.11|| 
    131 ||C || 1.05|| 
    132 ||LLVM || 1.14|| 
    133  
     138It is an open question as to if this binding should be split out into its own cabal package. Please contact the GHC mailing list if you think you might be a user of such a package. 
    134139 
    135140= Validate = 
    136141 
    137  
    138 I've validated my GHC patch to make sure it won't break anything. This is just compiling and running GHC normally but with the llvm back-end code included. It doesn't actually test the llvm code generator, just makes sure it hasn't broken the NCG or C code generator. 
     142The GHC patch has been validated to make sure it won't break anything. This is just compiling and running GHC normally but with the LLVM back-end code included. It doesn't actually test the LLVM code generator, just makes sure it hasn't broken the NCG or C code generator. 
    139143 
    140144'''Linux/x86-32:''' 
     
    192196}}} 
    193197 
    194 All of the test failures fail for me with a unmodified GHC head build as well as when the llvm patch is included, so the llvm patch isn't introducing any new failures. 
    195  
    196 = Issues = 
    197  
    198 Issues that might need to be resolved before merging the patch: 
    199  
    200 1. Developed in isolation by 1 person with no Haskell knowledge at first. So usual issues with that may apply, misused data structures, bad style... ect. Criticisms of the code are very welcome. There are some specific notes on what I think may be wrong with the code atm in 'compiler/llvmGen/NOTES'. 
    201  
    202 2. The back-end has a LLVM binding of sorts, this binding is similar in design to say the Cmm representation used in GHC. It represents the LLVM Assembly language using a collection of data types and can pretty print it out correctly. This binding lives in the 'compiler/llvmGen/Llvm' folder. Should this binding be split out into a separate library? 
    203  
    204 3. As mentioned above, LLVM needs to be patched to work with a registered build of GHC. If the llvm back-end was merged, how would this be handled? I would suggest simply carrying the patch with some instructions on how to use it in the GHC repo. People using GHC head could be expected to grab the LLVM source code and apply the patch themselves at this stage. 
    205  
    206 4. Resolve code duplication issues with code shared with NCG (see 'Shared Code with NCG' in LLVM Backend Design section) 
     198All of the test failures fail for me with a unmodified GHC head build as well as when the LLVM patch is included, so the llvm patch isn't introducing any new failures.