Changes between Version 18 and Version 19 of SIMDPlan


Ignore:
Timestamp:
Oct 13, 2011 9:04:45 PM (4 years ago)
Author:
pmonday
Comment:

--

Legend:

Unmodified
Added
Removed
Modified
  • SIMDPlan

    v18 v19  
    44
    55Based on that design, the high-level tasks that must be accomplished include the following:
     6 1. Modify autoconf to determine SSE availability and vector size
    67 1. Add new PrimOps to allow Haskell to make use of Vectors
    78 1. Add new MachOps to Cmm to communicate use of Vectors
     
    1314
    1415Introduction of SIMD support to GHC will occur in stages to demonstrate the entire “vertical” stack is functional:
    15  1. Introduce “Double” PrimOps (as necessary to run an example showing SIMD usage in the LLVM)
    16  1. Add appropriate Cmm support for the Double primtype / primop subset
     16 1. Introduce “Float” PrimOps (as necessary to run an example showing SIMD usage in the LLVM)
     17 1. Add appropriate Cmm support for the Float primtype / primop subset
    1718 1. Modify the LLVM Code Generator to support the Double vectorization
    1819 1. Demonstrate the PrimOps and do limited performance testing to ensure SIMD is functional
     
    2728These clearly won't be all of the questions I have, there is a substantial amount of work that goes through the entire GHC compiler stack before reaching the LLVM instructions.
    2829
     30== Modify autoconf ==
     31Determining if a particular hardware architecture has SIMD instructions, the version of the instructions available (SSE, SSE2, SSE3, SSE4 or an iteration of one of those), and consequently the size of vectors that are supported occurs during the configuration step on the architecture that the build will occur on.  This is the same time that the sizes of Ints are calculated, alignment constants, and other pieces that are critical to GHC.
     32
     33Backing up from the results to the location that the changes are introduced, the current alignment and primitive sizes are available in ./includes/ghcautoconfig.h, here is a sample:
     34{{{
     35...
     36/* The size of `char', as computed by sizeof. */
     37#define SIZEOF_CHAR 1
     38
     39/* The size of `double', as computed by sizeof. */
     40#define SIZEOF_DOUBLE 8
     41...
     42}}}
     43
     44These are constructed from mk/config.h* that are generated by configure.ac and autoheader.  The configure.ac (or a related file) should be able to be sufficiently modified to determine if SSE is available and, consequently, the vector size that can be operated on and (later) how many pieces of data can be operated on in parallel (determined by the operation).  SSE had an MMX register size of 64-bits and all later SSE versions (2 and above) have a register size of 128 bits.  This implies that any type that is 32-bits can have 4 pieces of data calculated against in a single instruction.
     45
     46There is an example of configure.ac modifications to detect SSE availability available on the web, the primary body of the check is as follows (xmmintrin.h contains the SSE instruction set):
     47{{{
     48AC_MSG_CHECKING(for SSE in current arch/CFLAGS)
     49AC_LINK_IFELSE([
     50AC_LANG_PROGRAM([[
     51#include <xmmintrin.h>
     52__m128 testfunc(float *a, float *b) {
     53  return _mm_add_ps(_mm_loadu_ps(a), _mm_loadu_ps(b));
     54}
     55]])],
     56[
     57has_sse=yes
     58],
     59[
     60has_sse=no
     61]
     62)
     63AC_MSG_RESULT($has_sse) 
     64}}}
     65
     66There are more detailed explanations of how to use cpuid to determine the supported SSE instruction set available on the web as well.  cpuid may be more appropriate but are also much more complex.  Details for using cpuid are available at [http://software.intel.com/en-us/articles/using-cpuid-to-detect-the-presence-of-sse-41-and-sse-42-instruction-sets/].
     67
     68It should be noted, that since the overall goal is to let the LLVM handle the actual assembly code that does vectorization, it's only possible to support vectorization up to the version that the LLVM supports.
     69
     70== Add new MachOps to Cmm code ==
     71It may make more sense to add the MachOps to Cmm prior to implementing the PrimOps (or at least before adding the code to the CgPrimOp.hs file).  There is a useful [http://hackage.haskell.org/trac/ghc/wiki/Commentary/Compiler/CmmType#AdditionsinCmm Cmm Wiki Page] available to aid in the definition of the new Cmm operations.
     72
     73Modify compiler/cmm/CmmType.hs to add new required vector types and such, here is a basic outline of what needs to be done:
     74{{{
     75data CmmType    -- The important one!
     76  = CmmType CmmCat Width
     77 
     78type Multiplicty = Int
     79 
     80data CmmCat     -- "Category" (not exported)
     81   = GcPtrCat   -- GC pointer
     82   | BitsCat   -- Non-pointer
     83   | FloatCat   -- Float
     84   | VBitsCat  Multiplicity   -- Non-pointer
     85   | VFloatCat Multiplicity  -- Float
     86   deriving( Eq )
     87        -- See Note [Signed vs unsigned] at the end
     88}}}
     89
     90Modify compiler/cmm/CmmMachOp.hs, this will add the necessary MachOps for use from the PrimOps modifications to support SIMD.  Here is an example of adding a SIMD version of the MO_F_Add MachOp:
     91{{{
     92  -- Integer SIMD arithmetic
     93  | MO_V_Add  Width Int
     94  | MO_V_Sub  Width Int
     95  | MO_V_Neg  Width Int         -- unary -
     96  | MO_V_Mul  Width Int
     97  | MO_V_Quot Width Int
     98
     99  -- Floating point arithmetic
     100  | MO_VF_Add Width Int   -- MO_VF_Add W64 4   Add 4-vector of 64-bit floats
     101  ...
     102}}}
     103
     104Some existing Cmm instructions may be able to be reused, but there will have to be additional instructions added to account for vectorization primitives.  This will help keep the SIMD / non-SIMD code generation separate for the time being until we have it working.
    29105
    30106== Add new PrimOps ==
     
    94170}}}
    95171
    96 == Add new MachOps to Cmm code ==
    97 It may make more sense to add the MachOps to Cmm prior to implementing the PrimOps (or at least before adding the code to the CgPrimOp.hs file).  There is a useful [http://hackage.haskell.org/trac/ghc/wiki/Commentary/Compiler/CmmType#AdditionsinCmm Cmm Wiki Page] available to aid in the definition of the new Cmm operations.
    98 
    99 Modify compiler/cmm/CmmType.hs to add new required vector types and such, here is a basic outline of what needs to be done:
    100 {{{
    101 data CmmType    -- The important one!
    102   = CmmType CmmCat Width
    103  
    104 type Multiplicty = Int
    105  
    106 data CmmCat     -- "Category" (not exported)
    107    = GcPtrCat   -- GC pointer
    108    | BitsCat   -- Non-pointer
    109    | FloatCat   -- Float
    110    | VBitsCat  Multiplicity   -- Non-pointer
    111    | VFloatCat Multiplicity  -- Float
    112    deriving( Eq )
    113         -- See Note [Signed vs unsigned] at the end
    114 }}}
    115 
    116 Modify compiler/cmm/CmmMachOp.hs, this will add the necessary MachOps for use from the PrimOps modifications to support SIMD.  Here is an example of adding a SIMD version of the MO_F_Add MachOp:
    117 {{{
    118   -- Integer SIMD arithmetic
    119   | MO_V_Add  Width Int
    120   | MO_V_Sub  Width Int
    121   | MO_V_Neg  Width Int         -- unary -
    122   | MO_V_Mul  Width Int
    123   | MO_V_Quot Width Int
    124 
    125   -- Floating point arithmetic
    126   | MO_VF_Add Width Int   -- MO_VF_Add W64 4   Add 4-vector of 64-bit floats
    127   ...
    128 }}}
    129 
    130 Some existing Cmm instructions may be able to be reused, but there will have to be additional instructions added to account for vectorization primitives.  This will help keep the SIMD / non-SIMD code generation separate for the time being until we have it working.
    131 
    132172== Modify LLVM Code Generator ==
    133173