|Version 4 (modified by dterei, 6 years ago) (diff)|
LLVM Back-end Design
This document aims to describe the current proposed design for a new back-end for GHC that will generate bitcode. This will then be compiled to actual machine code using the LLVM tools. This description is relevant for GHC developers and attempts to address how the back-end will intergrate with the rest of GHC.
This work is being undertaken by myself (David Terei), an undergraduate honours student at The University of New South Wales, under the supervision of Manuel Chakravarty.
The aim is to contribute a new back-end to GHC which outperforms the existing native code generator (NCG) and C code generator back-ends. The use of LLVM will also hopefully allow for some new optimisations to be implemented that are currently very difficult with the existing back-ends.
The new back-end will at first be more of a proof-of-concept due to the significant time restraints with my thesis. I hope though to continue working on it as time permits after the thesis is due and as much as I can resist spending the entire summer at the beach surfing.
The initial design will try to fit into GHC's current pipeline stages as seamlessly as possible. This should allow for quicker development and focus on the core task of LLVM code generation which is required no matter the overall design.
The LLVM pipeline will works as follows:
- New path for LLVM generation, sepperate from C and NCG. (path forks at compiler/main/CodeOutput.lhs, same place where C and NCG fork).
- LLVM code generation will output LLVM bitcode.
- Following this, instead of an 'As' phase as will C and NCG (which generates object file from assembler), will have an equivalent 'LlvmAs' phase which generates an object file from LLVM bitcode. The LLVM path won't provide an option of running the splitter or mangler.
- This should bring the LLVM path back to the other back-ends, the next stage is the Link stage.
Here is a diagram of the pipeline:
; Cmm -> (codeOutput) --->(ncg) Assembler -->(mangler, splitter) --> ('As' phase) -----> Object Code --> (link) --> executable \---> LLVM BitCode --> ('LlvmAs' phase) -/
This should be the easiest and thus quickest to initially implement. Ideally though once this approach is working well I would like to add a new linker phase for LLVM. Instead of each module just being compiled to native object code ASAP, it would be better to keep them in the LLVM bitcode format and link all the modules together using the LLVM linker. This enable all of LLVM's link time optimisations. All the user program LLVM bitcode will then be compiled to a native object file and linked with the runtime using the native system linker.
The new back-end will at first only operate in GHC's unregistered mode due to the lack of support in LLVM for 'pinning' virtual registers to actual machine registers. The current approach taken by the C back-end and NCG of having a fixed assignment of STG virtual registers to hardware registers for performance gains is impossible to implement in LLVM. Once the back-end is working for unregistered code, I will attempt to improve on the performance by using a custom calling convention to support something semantically equivalent to register pinning. The custom calling convention will pass the first N variables in specific hardware registers, thus guaranteeing on all function entries that the STG virtual registers can be found in the expected hardware registers. This approach is hoped to provide better performance than the register pinning used by NCG/C back-ends as it keeps the STG virtual registers mostly in hardware registers but allows the register allocator more flexibility and access to all machine registers.
GHC for heap objects places the info table (meta data) and the code adjacent to each other. That is, in memory, the object firstly has a head structure, which consists of a pointer to an info table and a payload structure. The pointer points to the bottom of the info table and the closures code is placed to be straight after the info table, so to jump to the code we can just jump one past the info table pointer. The other way to do this would be to have the info table contain a pointer to the closure code. However this would then require two jumps to get to the code instead of just one jump in the optimised layout. Achieving this layout can create some difficulty, the current back-ends handle it as follows:
- The NCG can create this layout itself
- The C code generator can't. So the Evil Mangler rearranges the GCC assembly code to achieve the layout.
There is a build option in GHC to use the unoptimised layout and instead use a pointer to the code in the info table. This layout can be enabled/disabled by using the compiler def TABLES_NEXT_TO_CODE. As LLVM has no means to achieve the optimised layout and we don't wish to write an LLVM sister for the Evil Mangler, I will be using the unoptimised layout. This apparenlty incurs a performance penalty of 5% (source, Making a Fast Curry: Push/Enter vs. Eval/Apply for Higher-order Languages, Simon Marlow and Simon Peyton Jones, 2004).
Shared Code with NCG
It is probable that some of the code needed by the LLVM back-end is already implemented for the NCG back-end. Some examples of this code would be the following two functions in compiler/main/AsmCodeGen.lhs:
- Changes assignments to global registers to instead assign to the RegTable, used for non-pinned virtual registers.
- Optimises the cmm code, in particular it changes loads from global registers to instead load from the RegTable.
Until the amount of shared code can be determined and the LLVM back-end approaches a level of maturity that is suitable for inclusion in GHC HEAD, this is probably a mute issue.
LLVM IR Representation
Still undecided on best way to represent and emit the LLVM IR. There are 3 main options as outlined in the LLVM FAQ:
- Call into LLVM Libraries using FFI (can probably use Haskell LLVM Bindings)
- Emit LLVM Assembly (approach taken by EHC's LLVM Back-end, can use the module developed by them for this)
- Emit LLVM Bitcode (can't see any reason to do this)
This is something I am currently investigating. I think at the moment that using both EHC's LLVM module and the LLVM Bindings would be best. My main concern with the LLVM bindings is that I'm not sure how well it could handle representing the LLVM IR during translation from CmmStmts. This is, I need a representation of the LLVM IR to work with during translation from CmmStmts that can be easily built up and changed (an abstract syntax form of LLVM IR). However instead of pretty printing it to LLVM Assembly, it would be pretty printed to LLVM Bitcode using the LLVM Bindings.
As I may end up only using a small part of the LLVM Bindings then it will need to be decided eventually if its worth including them with GHC or extracting just the parts needed for inclusion.