Changes between Version 4 and Version 5 of SIMDPlan


Ignore:
Timestamp:
Oct 10, 2011 4:05:21 PM (3 years ago)
Author:
pmonday
Comment:

--

Legend:

Unmodified
Added
Removed
Modified
  • SIMDPlan

    v4 v5  
    2727These clearly won't be all of the questions I have, there is a substantial amount of work that goes through the entire GHC compiler stack before reaching the LLVM instructions. 
    2828 
    29  * Should the existing pure Vector libraries (/libraries/vector/Data/*) be modified to use the vectorized code as a first priority, wait until DPH (/libraries/dph/) is modified, or leave the Vector library as is? 
    3029 * How does one create one of the new Vector Types in a Haskell program (direct PrimOp, for testing ... let x = ????), See the Example Section Below 
    3130 * One discussion point was that the "Vector Lengths" should be "Set to 1" for non LLVM code generation, where does this happen?  On my first survey of the code, it seems that the code generators are partitioned from the main body of code, implying that each of the code generators will have to be modified to account for the new Cmm MachOps and properly translate them to non-vectorized instructions. 
     
    9291 
    9392Simple usage of the new instructions to add to vectors of doubles: 
    94 '''Question:'''  How does one create one of the new PrimOp types 
     93 
     94'''Question:'''  How does one create one of the new PrimOp types to test prior to testing the vector add operations?  This is going to have to be looked at a little ... the code should basically create a vector and then insertDoubleVec# repeatedly to populate the vector.  Without the subsequent steps done, this will have to be "hand" done without additional operations defined.  Here is the response from Manuel to expand on this:  I am not quite sure what the best approach is. The intention in LLVM is clearly to populate vectors using the 'insertIntVec#' etc functions. However, in LLVM you can just use an uninitialised register and insert elements into a vector successively. We could provide a vector "0" value in Haskell and insert into that. Any other ideas? 
    9595{{{ 
    9696        let x = ???? 
     
    109109(Note that over time and several generations of the integration, one would hope that the latter path would be “optimized” into SIMD instructions) 
    110110 
    111 == Modify Vector Libraries and Pragmas == 
    112 The compiler/vectorise code contains the implementation details for the [http://hackage.haskell.org/trac/ghc/wiki/DataParallel/VectPragma VECTORISE pragma]. 
     111== Modify Vector Libraries and Vector Compiler Optimization (Pragmas and such) == 
     112Once we've shown there is speed-up for the lower portions of the compiler and have quantified it, the upper half of the stack should be optimized to take advantage of the vectorization code that was added to the PrimOps and Cmm.  There are two primary locations this is handled, in the compiler (compile/vectorize) code that vectorizes modules post-desugar process.  This location handles the VECTORISE pragmas as well as implicit vectorization of code.  The other location that requires modification is the Vector library itself. 
    113113 
    114  * /compiler/vectorise/Vectorise.hs 
    115  * /compiler/vectorise/Vectorise/Env.hs 
    116  * /compiler/vectorise/Vectorise/Type/Env.hs 
     114 1. Modify the Vector library /libraries/vector/Data to make use of PrimOps where possible and adjust VECTORISE pragmas if necessary 
     115  * Modify the existing Vector code 
     116  * We will likely also need vector versions of array read/write/indexing to process Haskell arrays with vector operations (this may need to go into compiler/vectorise) 
     117  * Use the /libraries/vector/benchmarks to test updated code, look for 
     118    * slowdowns - vector operations that cannot benefit from SIMD should not show slowdown 
     119    * speedup - all performance tests that make use of maps for the common operators (+, -, *, etc..) should benefit from the SIMD speedup 
     120 1. Modify the compiler/vectorise code to adjust pragmas and vectorization post-desugar process.  These modifications may not need to be made on the first pass through the code, more evaluation is necessary. 
     121  * /compiler/vectorise/Vectorise.hs 
     122  * /compiler/vectorise/Vectorise/Env.hs 
     123  * /compiler/vectorise/Vectorise/Type/Env.hs 
    117124 
    118 These may need to be modified to add options to the VECTORISE pragma, this is not determined at the moment. 
    119  
    120 == Modify the Vector Libraries == 
    121 The /libraries/vector/Data/Vector/* should be modified to take advantage of the new PrimOps.  Here we replace loops with strided loops.   
    122  
    123 '''Question:''' do we skip this and only do it for lifted operations in DPH? 
    124  
    125 '''TODO:''' Insert example. 
    126  
     125Once the benchmarks show measurable, reproducible behavior, move onto the DPH libraries.  Note that a closer inspection of the benchmarks in the /libraries/vector/benchmarks directory is necessary to ensure they reflect code that will be optimized with the use of SIMD instructions.  If they are not appropriate, add code that demonstrates SIMD speed-up appropriately. 
    127126 
    128127== Modify DPH Libraries == 
     128The DPH libraries have heavy dependencies on the previous vectorization modification step (modifying the Vector libraries and the compiler vector options and post-desugar vectorization steps).  The DPH steps should not be undertaken without significant performance improvements illustrated in the previous steps. 
    129129 
    130  1. Primary changes are in /libraries/dph/dph-common/Data/Array/Parallel/Lifted/* 
     130 1. The primary changes for DPH are in /libraries/dph/dph-common/Data/Array/Parallel/Lifted/* 
    131131 1. VECTOR SCALAR is also heavily used in /libraries/dph/dph-common/Data/Array/Parallel/Prelude, these should be inspected for update as well (Double.hs, Float.hs, Int.hs, Word8.hs) 
    132132 a. Modify pragmas as necessary based on changes made above 
    133133 
    134 '''Note to Self:''' Clarify WHAT needs to be done here as we progress, there may not need to be any changes to the vectorize pragmas 
     134'''Note to Self:''' Determine if the VECTORISE pragmas need adjustment or enhancement (based on previous steps) 
    135135 
    136 == Modify Remaining Code Generators == 
    137 '''Question:''' we need to set the vector lengths (intVecLen, floatVecLen, and so on) to 1 for other backends.  Where do we do this? 
     136== Ensure Remaining Code Generators Function Properly == 
     137There are really two options on the remaining code generators: 
     138 * Modify each code generator to understand the new Cmm instructions and restore them to non-vectorized instructions 
     139 * Add a compiler step that that does a pre-pass and replaces all "length = 1" vectors and operations on them by the corresponding scalar type and operations 
     140 
     141The latter makes sense in that it is effective on all code generators, including the LLVM code generator.  Vectors of length = 1 should not be put through SIMD instructions to begin with (as they will incur substantial overhead for no return). 
     142 
     143To make this work, a ghc compiler flag must be added that forces all vector lengths to 1 (this will be required in conjunction with any non-LLVM code generator).  A user can also use this option to turn off SIMD optimization for LLVM. 
     144 
     145 * Add the ghc compiler option: --vector-length=1 
     146 * Modify compiler/vectorise to recognize the new option or add this compiler pass as appropriate 
     147 
     148== Reference Discussion Threads == 
     149{{{ 
     150From: Manual Chakravarty 
     151Q: Should the existing pure Vector libraries (/libraries/vector/Data/*) be modified to use the vectorized code as a first priority, wait until DPH (/libraries/dph/) is modified, or leave the Vector library as is? 
     152 
     153A: The DPH libraries ('dph-*') are based on the 'vector' library — i.e., for DPH to use SIMD instruction, we must modify 'vector' first. 
     154 
     155Q: How does one create one of the new Vector Types in a Haskell program (direct PrimOp?, for testing ... let x = ????) 
     156 
     157A: I am not quite sure what the best approach is. The intention in LLVM is clearly to populate vectors using the 'insertIntVec#' etc functions. However, in LLVM you can just use an uninitialised register and insert elements into a vector successively. We could provide a vector "0" value in Haskell and insert into that. Any other ideas? 
     158 
     159A: I just realised that we need vector version of the array read/write/indexing operations as well to process Haskell arrays with vector operations. 
     160 
     161Q: One discussion point was that the "Vector Lengths" should be "Set to 1" for non LLVM code generation, where does this happen? On my first survey of the code, it seems that the code generators are partitioned from the main body of code, implying that each of the code generators will have to be modified to account for the new Cmm MachOps? and properly translate them to non-vectorized instructions. 
     162 
     163A: Instead of doing the translation for every native code generator separately, we could have a pre-pass that replaces all length = 1 vectors and operations on them by the corresponding scalar type and operation.  Then, the actual native code generators wouldn't need to be changed. 
     164 
     165A: The setting of the vector length to 1 needs to happen in dependence on the command line options passed to GHC — i.e., if a non-LLVM backend is selected. 
     166 
     167Q: Can we re-use any of the existing MachOps? when adding to Cmm? 
     168 
     169A: I am not sure. 
     170}}} 
     171