Changes between Version 4 and Version 5 of Commentary/Compiler/Backends/LLVM/Design

Jun 11, 2010 12:05:28 PM (5 years ago)

add info on cmmproc handling


  • Commentary/Compiler/Backends/LLVM/Design

    v4 v5  
    7272        [d]          -- Data
     74data BlockId = BlockId Unique
    7475data GenBasicBlock i = BasicBlock BlockId [i]
     76type CmmBasicBlock = GenBasicBlock CmmStmt
    7578newtype ListGraph i = ListGraph [[GenBasicBlock i]
    7780type RawCmmTop = GenCmmmTop CmmStatic [CmmStatic] (ListGraph CmmStmt)
     81-- new type RawCmm = Cmm [RawCmmTop] : A list version of RawCmmTop, actual code is different, but its effectively this.
    8084That is, it consists of two types, static data and functions. Each can largely be handled separately. Just enough information is needed such that pointers can be constructed to them and in many cases this information can be gathered from assumptions and constraints on Cmm.
     86After all the polymorphic types are bound we get this:
     88RawCmm = [
     89    CmmProc [CmmSatic] CLabel [LocalReg] [BlockId [CmmStmt]]
     90  | CmmData Section [CmmStatic]
     93data Section = Text | Data | ReadOnlyData | RelocatableReadOnlyData | UninitialisedData | ReadOnlyData16 | OtherSection String
    8296The code generator lives in `llvmGen` with the driver being `llvmGen/LlvmCodeGen.lhs`.
    95109  | CmmGlobal GlobalReg
    96110  deriving( Eq, Ord )
     112data LocalReg = LocalReg Unique CmmType
    204220  | CmmBlock          BlockId           -- address of code label
    205221  | CmmHighStackMark                    -- max stack space used during a procedure
     223data Width = W8 | W16 | W32 | W64 | W80 | W128
    251269== !CmmProc ==
    253 TODO
     271A Cmm procedure is made up of a list of basic blocks, with each basic block being comprised of a list of CmmStmt’s.
     273Code generation takes place mainly in {{{llvmGen/LlvmCodeGen/CodeGen.hs}}}, driven by the main Llvm compiler driver, {{llvmGen/LlvmCodeGen.lhs}}}.
     275While Cmm procedures include a specification for arguments and a return type there is in fact only one type used, that is a procedure which takes no arguments and returns void. The reason for this is that the STG registers are instead used for the purpose of argument passing and the returning of results.Another detail of the Cmm code produced by GHC is that
     276it doesn’t contain any return statements. Instead a style of code called continuation passing is used in which the control is explicitly passed in the form of a continuation, and all Cmm procedures produced by GHC are instead terminated by tail calls.
     278Below is the Haskell definition for Cmm statements and expressions.
     281data CmmStmt
     282  = CmmNop                         
     283  | CmmComment    FastString
     284  | CmmAssign     CmmReg CmmExpr
     285  | CmmStore      CmmExpr CmmExpr
     286  | CmmCall       CmmCallTarget HintedCmmFormals HintedCmmActuals CmmSaftey CmmReturnInfo
     287  | CmmBranch     BlockId
     288  | CmmCondBranch CmmExpr BlockId
     289  | CmmSwitch     CmmExpr [Maybe BlockId]
     290  | CmmJump       CmmExpr HintedCmmActuals
     292data CmmExpr,
     293  = CmmLit       CmmLit
     294  | CmmLoad      CmmExpr CmmType
     295  | CmmReg       CmmReg
     296  | CmmMachOp    MachOp [CmmExpr]
     297  | CmmStackSlot Area Int
     298  | CmmRegOff    CmmReg Int
     300type CmmFormals = [CmmFormal]
     301type CmmFormal  = LocalReg
     304=== !CmmExpr ===
     305!CmmExpr’s are handled in a relatively straight-forward manner. The most interesting aspect of their compilation to LLVM is the return type of functions in the LLVM back-end which
     306compile !CmmExpr’s. This gives an idea of the compilation process, as while each expression must be handled differently, they all return the same type when compiled to LLVM code by
     307the back-end.
     310-- Return type of LLVM fucntions that compile CmmExpr's
     311type ExprData = (LlvmEnv , LlvmVar , LlvmStatements , [LlvmCmmTop] )
     314  * '''!LlvmEnv''': During code generation for an expression, an external Cmm Label may be encountered for the first time. An external reference for it will be created and return as part of the [!LlvmCmmTop] list. It is also added to the current environment.
     315  * '''!LlvmVar''': All expressions share the property that there execution results in a single value which can be stored in a variable. This LLVM local variable holds the result of the !CmmExpr. This allows for statements to very easily use and access the result of an expression.
     316  * '''!LlvmStatements''': A !CmmExpr may require several LLVM statements to implement, they are returned in this list and must be executed before the !LlvmVar is accessed.
     317  * '''[!LlvmCmmTop]''': An externally declared Cmm Label can be encountered at any point as Cmm requires no external declaration. LLVM though requires that these labels do have an external declaration and in this list such declarations are returned. They add new global variables to the LLVM module.
     319=== !CmmStmt ===
     320Statements are also handled in a fairly straight-forward manner process involved can be detailed most simply by studying the return type of functions in the LLVM back-end which deal with compiling !CmmStmt’s. Statements just as expressions also all return the same basic type when compiled to LLVM code by the back-end. This type is shown below.
     323type StmtData = (LlvmEnv , [ LlvmStatement ] , [LlvmCmmTop ] )
     326  * '''!LlvmEnv''': As compiling a Cmm statement usually involves also compiling a Cmm expression, this LLVM Environment performs the same purpose of returning an updated environment if new external Cmm Label’s have been encountered. This first case updates the environments global map, as a new global variable has been created. In the case of a !CmmStore statement though, a Cmm local register may be encountered for the first time. It will be allocated on the stack and added to the local map of the environment.
     327  * '''!LlvmStatements''': A !CmmStatment is compiled to a list of LLVM Statements.
     328  * '''[!LlvmCmmTop]:''' Serves the same purpose as it does for Cmm expression code generation.
     330=== Handling LLVM's SSA Form ===
     331Handling LLVM’s SSA Form One of the main difference between Cmm and LLVM Assembly is the requirement that LLVM Assembly be in single static assignment form. Thankfully, this is actually quite easy to handle. LLVM allows for data to be explicitly allocated on the stack, using its alloca instruction. This instruction provides an alternative to producing SSA formed code. If a mutable variable is needed, then it is allocated on the stack with alloca. The value returned from this instruction is a pointer to the stack memory and this memory location can be read from and written to just like any other memory location in LLVM by using the load and store instructions respectively. While this initially allocates all these variables on the stack and doesn’t use any registers, LLVM includes an optimisation pass called mem2reg which is designed to correct this, changing explicit stack allocation into SSA form instead which can use machine registers when compiled to native code. This approach to handling LLVM’s SSA form is in fact the method that the LLVM developers themselves recommend.
     333=== Handling Registered Code ===
     334Handling registerised Cmm Code involves handling the pinning of the STG virtual registers and the TABLES_NEXT_TO_CODE optimisation.
     336To handle the TABLES_NEXT_TO_CODE optimisation, the LLVM back-end simply disables it. This can be done independent of enabling or disabling all of registered mode. This is done through by putting the following in your
     338GhcEnableTablesNextToCode = NO
     341To handle the pinning of the STG registers the LLVM back-end uses a custom calling convention that passes the first n arguments
     342of a function call in the specific registers that the STG registers should be pinned to. Then, whenever there is function call, then LLVM back-end generates a call with the correct STG
     343virtual registers as the first n arguments to that call. Why does this work? It works as it guarantees that on the entrance to any function, the STG registers are currently stored in the correct hardware registers. It also guarantees this on a function exit since all Cmm functions that GHC generates are exited by tail calls. In the function itself, the STG registers can be treated just like normal variables, read and written to at will.
     345The new calling convention was included by the LLVM developers in LLVM 2.7. It uses calling convention number 10. At the moment it supports x86-32/64.
     347== After Code Generation ==
     349After code generation there are three more stages, they are simply calls to the LLVM tools though:
     351  * '''LLVM Asssembler''': This is a very simple stage in which the human readable text version of LLVM assembly code is translated to the binary bitcode format. This is done by simply invoking the LLVM llvm-as tool on the stage input file.
     352  * '''LLVM Optimisation''': In this section a range of LLVM’s optimisations are applied to the bitcode file, resulting in a new optimised bitcode file. This is done by simply invoking the LLVM opt tool on the stage input file. The optimisations are selected using the standard optimisation groups of ’-O1’, ’-O2’, ’-O3’ provided by opt, depending on the level of optimisation requested by the user when they invoked GHC.
     353  * '''LLVM Compiler''': This is the final stage in which the input LLVM bitcode file is compiled to native assembly for the target machine. This is done by simply invoking the LLVM llc tool on the stage input file.