Changes between Version 11 and Version 12 of SIMD


Ignore:
Timestamp:
Nov 10, 2011 10:57:43 PM (4 years ago)
Author:
duncan
Comment:

--

Legend:

Unmodified
Added
Removed
Modified
  • SIMD

    v11 v12  
    1313We are interested in the SIMD vector instructions on current and future generations of CPUs. This includes SSE and AVX on x86/x86-64 and NEON on ARM chips (targets like GPUs or FPGAs are out of scope for this project). These SIMD vector instruction sets are broadly similar in the sense of having relatively short vector registers and operations for various sizes of integer and/or floating point operation. In the details however they have different capabilities and different vector register sizes. 
    1414 
    15 We therefore want a design for SIMD support in GHC that will let us efficiently exploit current vector instructions but a design that is not tied too tightly to one CPU architecture or generation. In particular, it should be possible to write portable Haskell programs that use SIMD vectors. This implies that we need fallbacks for cases where certain types or operations are not supported directly in hardware. 
     15We therefore want a design for SIMD support in GHC that will let us efficiently exploit current vector instructions but a design that is not tied too tightly to one CPU architecture or generation. In particular, it should be possible to write portable Haskell programs that use SIMD vectors. 
    1616 
    1717On the other hand, we want to be able to write programs for maximum efficiency that exploit the native vector sizes, preferably while remaining portable. For example, algorithms on large variable length vectors are in principle agnostic about the size of the primitive vector operations. 
     
    4848 * Some strategy for making use of vector primops, e.g. DPH or Vector lib 
    4949 
     50=== Vector types === 
     51 
     52We intend to provide vectors of the following basic types: 
     53 
     54 || Int8  || Int16  || Int32  || Int64  || 
     55 || Word8 || Word16 || Word32 || Word64 || 
     56 ||       ||        || Float  || Double || 
     57 
    5058=== Fixed and variable sized vectors === 
    5159 
    52 The hardware supports only small fixed sized vectors. High level libraries would like to be able to use arbitrary sized vectors. Similar to the design in GCC and LLVM we provide primitive Haskell types and operations for fixed-size vectors. The task of implementing variable sized vectors in terms of fixed-size vector types and primops is left to the next layer up (DPH, vector lib). 
    53  
    54 That is, in the core primop layer and down, vector support is only for fixed-size vectors. The fixed sizes will be only powers of 2 and only up to some maximum size. The choice of maximum size should reflect the largest vector size supported by the current range of CPUs (e.g. 256bit with AVX). 
    55  
    56 === Fallbacks === 
    57  
    58 The portabilty strategy relies on fallbacks so that we can implement large vectors on machines with only small vector registers, or no vector support at all (either none at all, or none for that type, e.g. only support for integer vectors not floating point, or only 32bit floats not doubles). 
     60The hardware supports only small fixed sized vectors. High level libraries would like to be able to use arbitrary sized vectors. Similar to the design in GCC and LLVM we will provide primitive Haskell types and operations for fixed-size vectors. The task of implementing variable sized vectors in terms of fixed-size vector types and primops is left to the next layer up (DPH, vector lib). 
     61 
     62That is, in the core primop layer and down, vector support is only for fixed-size vectors. The fixed sizes will be only powers of 2 and only up to some maximum size. The choice of maximum size should reflect the largest vector size supported by the current range of CPUs (256bit with AVX): 
     63 
     64 || types ||        ||        || vector sizes    || 
     65 || Int8  || Word8  ||        || 2, 4, 8, 16, 32 || 
     66 || Int16 || Word16 ||        || 2, 4, 8, 16     || 
     67 || Int32 || Word32 || Float  || 2, 4, 8         || 
     68 || Int64 || Word64 || Double || 2, 4            || 
     69 
     70We could choose to support larger fixed sizes, or the same maximum size for all types, but there is no strict need to do so. 
     71 
     72=== Portability and fallbacks === 
     73 
     74To enable portable Haskell code we will to provide the same set of vector types and operations on all architectures. Again this follows the approach taken by GCC and LLVM. 
     75 
     76We will rely on fallbacks for the cases where certain types or operations are not supported directly in hardware. In particular we can implement large vectors on machines with only small vector registers. Where there is no vector hardware support at all for a type (e.g. arch with no vectors or 64bit doubles on ARM's NEON) we can implement it using scalar code. 
    5977 
    6078The obvious approach is a transformation to synthesize larger vector types and operations using smaller vector operations or scalar operations. This synthesisation could plausible be done at the core, Cmm or code generator layers, however the most natural choice would be as a Cmm -> Cmm transformation. This approach would reduce or eliminate the burden on code generators by allowing them to support only their architecture's native vector sizes and types, or none at all. 
     
    6280Using fallbacks does pose some challenges for a stable/portable ABI, in particular how vector registers should be used in the GHC calling convention. This is discussed in a later section. 
    6381 
    64 === Code generators === 
    65  
    66 We would not extend the portable C backend to emit vector instructions. It would rely on the higher layers transforming vector operations into scalar operations. The portable C backend is not ABI compatible with the other code generators so there is no concern about vector registers in the calling convention. 
    67  
    68 The LLVM C library supports vector types and instructions directly. The GHC LLVM backend could be extended to translate vector ops at the CMM level into LLVM vector ops. 
    69  
    70 The NCG (native code generator) may need at least minimal support for vector types if vector registers are to be used in the calling convention. This would be necessary if ABI compatibility is to be preserved with the LLVM backend. It is optional whether vector instructions are used to improve performance. 
     82== Code generators == 
     83 
     84We will not extend the portable C backend to emit vector instructions. It will rely on the higher layers transforming vector operations into scalar operations. The portable C backend is not ABI compatible with the other code generators so there is no concern about vector registers in the calling convention. 
     85 
     86The LLVM C library supports vector types and instructions directly. The GHC LLVM backend could be extended to translate vector ops at the Cmm level into LLVM vector ops. 
     87 
     88The NCG (native code generator) may need at least minimal support for vector types if vector registers are to be used in the calling convention (see below). If we choose a common calling convention where vectors are passed in registers rather than on the stack then minimal support in the NCG would be necessary if ABI compatibility is to be preserved with the LLVM backend. It is optional whether vector instructions are used to improve performance. 
    7189 
    7290== Cmm layer == 
     
    85103}}} 
    86104 
    87 The current code distinguishes floats, pointer and non-pointer data. These are distinguished primarily either because they need to be tracked separately (GC pointers) or because they live in special registers on many architectures (floats). 
     105The current code distinguishes floats, pointer and non-pointer data. These are distinguished primarily because either they need to be tracked separately (GC pointers) or because they live in special registers on many architectures (floats). 
    88106 
    89107For vectors we add two new categories 
     
    94112type Multiplicty = Int 
    95113}}} 
    96 We keep vector types separate from scalars, rather than representing scalars as having multiplicty 1. This is to limit distruption to existing code paths and also because it is expected that vectors will often need to be treated differently from scalars. Again we distinguish float from integral types as these may use different classes of registers. 
     114We keep vector types separate from scalars, rather than representing scalars as having multiplicty 1. This is to limit distruption to existing code paths and also because it is expected that vectors will often need to be treated differently from scalars. Again we distinguish float from integral types as these may use different classes of registers. There is no need to support vectors of GC pointers. 
    97115 
    98116Vector operations on these machine vector types will be added to the Cmm `MachOp` type, e.g. 
     
    110128Our design is to provide a family of fixed size vector types and primitive operations, but not to provide any facility to parametrise this family on the vector length. 
    111129 
     130for width {w} in 8, 16, 32, 64 and "", (empty for native Int#/Word# width)[[BR]] 
     131for multiplicity {m} in 2, 4, 8, 16, 32 
     132 
     133`type Int`''{w}''`Vec`''{m}''`#`[[BR]] 
     134`type Word`''{w}''`Vec`''{m}''`#`[[BR]] 
     135`type FloatVec`''{m}''`#`[[BR]] 
     136`type DoubleVec`''{m}''`#`[[BR]] 
     137 
    112138Syntax note: here {m} is meta-syntax, not concrete syntax 
    113139 
    114 for width {w} in 8, 16, 32, 64 and "" -- empty for native Int#/Word# width 
    115 for multiplicity {m} in 2, 4, 8, 16, 32 
    116 {{{ 
    117 type Int{w}Vec{m}# 
    118 type Word{w}Vec{m}# 
    119 type FloatVec{m}# 
    120 type DoubleVec{m}# 
    121 }}} 
    122 It has not yet been decided if we will use a name convention such as: 
    123 {{{ 
    124 IntVec2#    IntVec4#  IntVec8# ... 
    125 Int8Vec2#   ... 
    126 Int16Vec2# 
    127 ... 
    128 }}} 
    129 Or if we will add a new concrete syntax to suggest a paramater, but have it really still part of the name, such as: 
    130  
    131 Syntax note: here <2> is concrete syntax 
    132 {{{ 
    133 IntVec<2>#    IntVec<4>#  IntVec<8># ... 
    134 Int8Vec<2>#   ... 
    135 Int16Vec<2># 
    136 .. 
    137 }}} 
    138 Similarly there would be families of primops: 
     140Hence we have individual type names with the following naming convention: 
     141 
     142 ||              || length 2     || length 4     || length 8     || etc || 
     143 || native `Int` || `IntVec2#`   || `IntVec4#`   || `IntVec8#`   || ... || 
     144 || `Int8`       || `Int8Vec2#`  || `Int8Vec4#`  || `Int8Vec8#`  || ... || 
     145 || `Int16`      || `Int16Vec2#` || `Int16Vec4#` || `Int16Vec8#` || ... || 
     146 || etc          || ...          || ...          || ...          || ... || 
     147 
     148Similarly there will be families of primops: 
    139149{{{ 
    140150extractInt{w}Vec{m}#  :: Int{w}Vec{m}# -> Int# -> Int{w}# 
    141151addInt{w}Vec{m}#      :: Int{w}Vec{m}# -> Int{w}Vec{m}# -> Int{w}Vec{m}# 
    142152}}} 
    143 From the point of view of the Haskell namespace for values and types, each member of each of these families is distinct. It is just a naming convention that suggests the relationship (with or without the addition of some concrete syntax to support the convention). 
     153From the point of view of the Haskell namespace for values and types, each member of each of these families is distinct. It is just a naming convention that suggests the relationship. 
     154 
     155=== Optional extension: extra syntax === 
     156 
     157We could add a new concrete syntax using `<...>` to suggest a paramater, but have it really still part of the name: 
     158 
     159 ||              || length 2       || length 4       || length 8       || etc || 
     160 || native `Int` || `IntVec<2>#`   || `IntVec<4>#`   || `IntVec<8>#`   || ... || 
     161 || `Int8`       || `Int8Vec<2>#`  || `Int8Vec<4>#`  || `Int8Vec<8>#`  || ... || 
     162 || `Int16`      || `Int16Vec<2>#` || `Int16Vec<4>#` || `Int16Vec<8>#` || ... || 
     163 || etc          || ...            || ...            || ...            || ... || 
    144164 
    145165=== Primop generation and representation === 
    146166 
    147 Internally in GHC we can take advantage of the obvious parametrisation within the families of primitive types and operations. In particular we extend GHC's primop.txt.pp machinery to enable us to describe the family as a whole and to generate the members. 
    148  
    149 For example: 
    150 {{{ 
    151 paramater <w> Width 8,16,32,64 
    152 paramater <m> Multiplicity 2,4,8,16,32 
    153  
    154 primop VIntAddOp <w> <m> "addInt<w>Vec<m>#" Dyadic 
    155   Int{w}Vec{m}# -> Int{w}Vec{m}# -> Int{w}Vec{m}# 
    156   {doc comments} 
    157 }}} 
    158 This would generate a family of primops, and an internal representation using the obvious parameters: 
     167Internally in GHC we can take advantage of the obvious parametrisation within the families of primitive types and operations. In particular we extend GHC's `primop.txt.pp` machinery to enable us to describe the family as a whole and to generate the members. 
     168 
     169For example, here is some plausible concrete syntax for `primop.txt.pp`: 
     170{{{ 
     171parameter <w, m> Width Multiplicity 
     172  with <w, m> in <8, 2>,<8, 4>,<8, 8>,<8, 16>,<8, 32>, 
     173                 <16,2>,<16,4>,<16,8>,<16,16>, 
     174                 <32,2>,<32,4>,<32,8>, 
     175                 <64,2>,<64,4> 
     176}}} 
     177Note that we allow non-rectangular combinations of values for the parameters. We declare the range of values along with the parameter so that we do not have to repeat it for every primtype and primop. 
     178{{{ 
     179primtype <w,m> Int<w>Vec<m># 
     180 
     181primop VIntAddOp <w,m> "addInt<w>Vec<m>#" Dyadic 
     182  Int<w>Vec<m># -> Int<w>Vec<m># -> Int<w>Vec<m># 
     183  {Vector addition} 
     184}}} 
     185 
     186This would generate a family of primops, and an internal representation using the type names declared for the parameters: 
    159187{{{ 
    160188data PrimOp = ... 
     
    162190   | VIntQuotOp Width Multiplicity 
    163191}}} 
    164  
    165 === Optional: primitive int sizes === 
    166  
    167 The same mechanism could be used to handle parametrisation between Int8#, Int16# etc. Currently these do not exist as primitive types. The types Int8, Int16 etc are implemented as a boxed native-sized Int# plus narrowing. 
     192It is not yet clear what syntax to achieve the names of the native sized types `Int` and `Word`. Perhaps we should use "", e.g. 
     193{{{ 
     194parameter <w, m> Width Multiplicity 
     195  with <w, m> in <8, 2>,<8, 4>,<8, 8>,<8, 16>,<8, 32>, 
     196                 <16,2>,<16,4>,<16,8>,<16,16>, 
     197                 <32,2>,<32,4>,<32,8>, 
     198                 <64,2>,<64,4> 
     199                 <"",2>,<"",4>  
     200}}} 
     201 
     202=== Optional extension: primitive int sizes === 
     203 
     204The above mechanism could be used to handle parametrisation between Int8#, Int16# etc. Currently these do not exist as primitive types. The types Int8, Int16 etc are implemented as a boxed native-sized Int# plus narrowing. 
    168205 
    169206Note that while this change is possible and would make things more uniform it is not essential for vector support. 
     
    171208That is we might have: 
    172209{{{ 
     210parameter <w> Width 
     211  with <w> in <8>, <16>, <32>, <64>, <""> 
     212 
    173213primtype Int<w># 
    174214 
    175215primop   IntAddOp <w>    "addInt<w>#"    Dyadic 
    176    Int# -> Int# -> Int# 
     216   Int<w># -> Int<w># -> Int<w># 
    177217   with commutable = True 
    178218}}} 
     
    182222   | IntAddOp Width 
    183223}}} 
    184 We might want to specify the values <w> and <m> range over in each operation rather than globally, or override it locally. For example we might want to support Int8 vectors up to size 32 but Double vectors only up to size 8. Or we might want to distinguish the native size or treat it uniformly, e.g.: 
    185 {{{ 
    186 primtype Int# 
    187  
    188 primop   IntAddOp    "+#"    Dyadic 
    189    Int# -> Int# -> Int# 
    190    with commutable = True 
    191  
    192 primtype Int<w># 
    193   with w = 8,16,32,64 
    194  
    195 primop   IntAddOp <w>    "addInt<w>#"    Dyadic 
    196    Int# -> Int# -> Int# 
    197    with commutable = True 
    198         w = 8,16,32,64 
    199 }}} 
    200 Or we might want some other solution so we can use `+#` as well as `addInt<8>#` since `+<8>#` as an infix operator is more than a bit obscure! 
     224We might want some other solution so we can use `+#` as well as `addInt#` since `+8#` as an infix operator doesn't really work. 
    201225 
    202226