Changes between Version 18 and Version 19 of SIMDPlan


Ignore:
Timestamp:
Oct 13, 2011 9:04:45 PM (4 years ago)
Author:
pmonday
Comment:

--

Legend:

Unmodified
Added
Removed
Modified
  • SIMDPlan

    v18 v19  
    44 
    55Based on that design, the high-level tasks that must be accomplished include the following: 
     6 1. Modify autoconf to determine SSE availability and vector size 
    67 1. Add new PrimOps to allow Haskell to make use of Vectors 
    78 1. Add new MachOps to Cmm to communicate use of Vectors 
     
    1314 
    1415Introduction of SIMD support to GHC will occur in stages to demonstrate the entire “vertical” stack is functional: 
    15  1. Introduce “Double” PrimOps (as necessary to run an example showing SIMD usage in the LLVM) 
    16  1. Add appropriate Cmm support for the Double primtype / primop subset 
     16 1. Introduce “Float” PrimOps (as necessary to run an example showing SIMD usage in the LLVM) 
     17 1. Add appropriate Cmm support for the Float primtype / primop subset 
    1718 1. Modify the LLVM Code Generator to support the Double vectorization 
    1819 1. Demonstrate the PrimOps and do limited performance testing to ensure SIMD is functional 
     
    2728These clearly won't be all of the questions I have, there is a substantial amount of work that goes through the entire GHC compiler stack before reaching the LLVM instructions. 
    2829 
     30== Modify autoconf == 
     31Determining if a particular hardware architecture has SIMD instructions, the version of the instructions available (SSE, SSE2, SSE3, SSE4 or an iteration of one of those), and consequently the size of vectors that are supported occurs during the configuration step on the architecture that the build will occur on.  This is the same time that the sizes of Ints are calculated, alignment constants, and other pieces that are critical to GHC. 
     32 
     33Backing up from the results to the location that the changes are introduced, the current alignment and primitive sizes are available in ./includes/ghcautoconfig.h, here is a sample: 
     34{{{ 
     35... 
     36/* The size of `char', as computed by sizeof. */ 
     37#define SIZEOF_CHAR 1 
     38 
     39/* The size of `double', as computed by sizeof. */ 
     40#define SIZEOF_DOUBLE 8 
     41... 
     42}}} 
     43 
     44These are constructed from mk/config.h* that are generated by configure.ac and autoheader.  The configure.ac (or a related file) should be able to be sufficiently modified to determine if SSE is available and, consequently, the vector size that can be operated on and (later) how many pieces of data can be operated on in parallel (determined by the operation).  SSE had an MMX register size of 64-bits and all later SSE versions (2 and above) have a register size of 128 bits.  This implies that any type that is 32-bits can have 4 pieces of data calculated against in a single instruction. 
     45 
     46There is an example of configure.ac modifications to detect SSE availability available on the web, the primary body of the check is as follows (xmmintrin.h contains the SSE instruction set): 
     47{{{ 
     48AC_MSG_CHECKING(for SSE in current arch/CFLAGS) 
     49AC_LINK_IFELSE([ 
     50AC_LANG_PROGRAM([[ 
     51#include <xmmintrin.h> 
     52__m128 testfunc(float *a, float *b) { 
     53  return _mm_add_ps(_mm_loadu_ps(a), _mm_loadu_ps(b)); 
     54} 
     55]])], 
     56[ 
     57has_sse=yes 
     58], 
     59[ 
     60has_sse=no 
     61] 
     62) 
     63AC_MSG_RESULT($has_sse)   
     64}}} 
     65 
     66There are more detailed explanations of how to use cpuid to determine the supported SSE instruction set available on the web as well.  cpuid may be more appropriate but are also much more complex.  Details for using cpuid are available at [http://software.intel.com/en-us/articles/using-cpuid-to-detect-the-presence-of-sse-41-and-sse-42-instruction-sets/]. 
     67 
     68It should be noted, that since the overall goal is to let the LLVM handle the actual assembly code that does vectorization, it's only possible to support vectorization up to the version that the LLVM supports. 
     69 
     70== Add new MachOps to Cmm code == 
     71It may make more sense to add the MachOps to Cmm prior to implementing the PrimOps (or at least before adding the code to the CgPrimOp.hs file).  There is a useful [http://hackage.haskell.org/trac/ghc/wiki/Commentary/Compiler/CmmType#AdditionsinCmm Cmm Wiki Page] available to aid in the definition of the new Cmm operations. 
     72 
     73Modify compiler/cmm/CmmType.hs to add new required vector types and such, here is a basic outline of what needs to be done: 
     74{{{ 
     75data CmmType    -- The important one! 
     76  = CmmType CmmCat Width 
     77  
     78type Multiplicty = Int 
     79  
     80data CmmCat     -- "Category" (not exported) 
     81   = GcPtrCat   -- GC pointer 
     82   | BitsCat   -- Non-pointer 
     83   | FloatCat   -- Float 
     84   | VBitsCat  Multiplicity   -- Non-pointer 
     85   | VFloatCat Multiplicity  -- Float 
     86   deriving( Eq ) 
     87        -- See Note [Signed vs unsigned] at the end 
     88}}} 
     89 
     90Modify compiler/cmm/CmmMachOp.hs, this will add the necessary MachOps for use from the PrimOps modifications to support SIMD.  Here is an example of adding a SIMD version of the MO_F_Add MachOp: 
     91{{{ 
     92  -- Integer SIMD arithmetic 
     93  | MO_V_Add  Width Int 
     94  | MO_V_Sub  Width Int 
     95  | MO_V_Neg  Width Int         -- unary - 
     96  | MO_V_Mul  Width Int 
     97  | MO_V_Quot Width Int 
     98 
     99  -- Floating point arithmetic 
     100  | MO_VF_Add Width Int   -- MO_VF_Add W64 4   Add 4-vector of 64-bit floats 
     101  ... 
     102}}} 
     103 
     104Some existing Cmm instructions may be able to be reused, but there will have to be additional instructions added to account for vectorization primitives.  This will help keep the SIMD / non-SIMD code generation separate for the time being until we have it working. 
    29105 
    30106== Add new PrimOps == 
     
    94170}}} 
    95171 
    96 == Add new MachOps to Cmm code == 
    97 It may make more sense to add the MachOps to Cmm prior to implementing the PrimOps (or at least before adding the code to the CgPrimOp.hs file).  There is a useful [http://hackage.haskell.org/trac/ghc/wiki/Commentary/Compiler/CmmType#AdditionsinCmm Cmm Wiki Page] available to aid in the definition of the new Cmm operations. 
    98  
    99 Modify compiler/cmm/CmmType.hs to add new required vector types and such, here is a basic outline of what needs to be done: 
    100 {{{ 
    101 data CmmType    -- The important one! 
    102   = CmmType CmmCat Width 
    103   
    104 type Multiplicty = Int 
    105   
    106 data CmmCat     -- "Category" (not exported) 
    107    = GcPtrCat   -- GC pointer 
    108    | BitsCat   -- Non-pointer 
    109    | FloatCat   -- Float 
    110    | VBitsCat  Multiplicity   -- Non-pointer 
    111    | VFloatCat Multiplicity  -- Float 
    112    deriving( Eq ) 
    113         -- See Note [Signed vs unsigned] at the end 
    114 }}} 
    115  
    116 Modify compiler/cmm/CmmMachOp.hs, this will add the necessary MachOps for use from the PrimOps modifications to support SIMD.  Here is an example of adding a SIMD version of the MO_F_Add MachOp: 
    117 {{{ 
    118   -- Integer SIMD arithmetic 
    119   | MO_V_Add  Width Int 
    120   | MO_V_Sub  Width Int 
    121   | MO_V_Neg  Width Int         -- unary - 
    122   | MO_V_Mul  Width Int 
    123   | MO_V_Quot Width Int 
    124  
    125   -- Floating point arithmetic 
    126   | MO_VF_Add Width Int   -- MO_VF_Add W64 4   Add 4-vector of 64-bit floats 
    127   ... 
    128 }}} 
    129  
    130 Some existing Cmm instructions may be able to be reused, but there will have to be additional instructions added to account for vectorization primitives.  This will help keep the SIMD / non-SIMD code generation separate for the time being until we have it working. 
    131  
    132172== Modify LLVM Code Generator == 
    133173