Changes between Version 11 and Version 12 of SIMD


Ignore:
Timestamp:
Nov 10, 2011 10:57:43 PM (4 years ago)
Author:
duncan
Comment:

--

Legend:

Unmodified
Added
Removed
Modified
  • SIMD

    v11 v12  
    1313We are interested in the SIMD vector instructions on current and future generations of CPUs. This includes SSE and AVX on x86/x86-64 and NEON on ARM chips (targets like GPUs or FPGAs are out of scope for this project). These SIMD vector instruction sets are broadly similar in the sense of having relatively short vector registers and operations for various sizes of integer and/or floating point operation. In the details however they have different capabilities and different vector register sizes.
    1414
    15 We therefore want a design for SIMD support in GHC that will let us efficiently exploit current vector instructions but a design that is not tied too tightly to one CPU architecture or generation. In particular, it should be possible to write portable Haskell programs that use SIMD vectors. This implies that we need fallbacks for cases where certain types or operations are not supported directly in hardware.
     15We therefore want a design for SIMD support in GHC that will let us efficiently exploit current vector instructions but a design that is not tied too tightly to one CPU architecture or generation. In particular, it should be possible to write portable Haskell programs that use SIMD vectors.
    1616
    1717On the other hand, we want to be able to write programs for maximum efficiency that exploit the native vector sizes, preferably while remaining portable. For example, algorithms on large variable length vectors are in principle agnostic about the size of the primitive vector operations.
     
    4848 * Some strategy for making use of vector primops, e.g. DPH or Vector lib
    4949
     50=== Vector types ===
     51
     52We intend to provide vectors of the following basic types:
     53
     54 || Int8  || Int16  || Int32  || Int64  ||
     55 || Word8 || Word16 || Word32 || Word64 ||
     56 ||       ||        || Float  || Double ||
     57
    5058=== Fixed and variable sized vectors ===
    5159
    52 The hardware supports only small fixed sized vectors. High level libraries would like to be able to use arbitrary sized vectors. Similar to the design in GCC and LLVM we provide primitive Haskell types and operations for fixed-size vectors. The task of implementing variable sized vectors in terms of fixed-size vector types and primops is left to the next layer up (DPH, vector lib).
    53 
    54 That is, in the core primop layer and down, vector support is only for fixed-size vectors. The fixed sizes will be only powers of 2 and only up to some maximum size. The choice of maximum size should reflect the largest vector size supported by the current range of CPUs (e.g. 256bit with AVX).
    55 
    56 === Fallbacks ===
    57 
    58 The portabilty strategy relies on fallbacks so that we can implement large vectors on machines with only small vector registers, or no vector support at all (either none at all, or none for that type, e.g. only support for integer vectors not floating point, or only 32bit floats not doubles).
     60The hardware supports only small fixed sized vectors. High level libraries would like to be able to use arbitrary sized vectors. Similar to the design in GCC and LLVM we will provide primitive Haskell types and operations for fixed-size vectors. The task of implementing variable sized vectors in terms of fixed-size vector types and primops is left to the next layer up (DPH, vector lib).
     61
     62That is, in the core primop layer and down, vector support is only for fixed-size vectors. The fixed sizes will be only powers of 2 and only up to some maximum size. The choice of maximum size should reflect the largest vector size supported by the current range of CPUs (256bit with AVX):
     63
     64 || types ||        ||        || vector sizes    ||
     65 || Int8  || Word8  ||        || 2, 4, 8, 16, 32 ||
     66 || Int16 || Word16 ||        || 2, 4, 8, 16     ||
     67 || Int32 || Word32 || Float  || 2, 4, 8         ||
     68 || Int64 || Word64 || Double || 2, 4            ||
     69
     70We could choose to support larger fixed sizes, or the same maximum size for all types, but there is no strict need to do so.
     71
     72=== Portability and fallbacks ===
     73
     74To enable portable Haskell code we will to provide the same set of vector types and operations on all architectures. Again this follows the approach taken by GCC and LLVM.
     75
     76We will rely on fallbacks for the cases where certain types or operations are not supported directly in hardware. In particular we can implement large vectors on machines with only small vector registers. Where there is no vector hardware support at all for a type (e.g. arch with no vectors or 64bit doubles on ARM's NEON) we can implement it using scalar code.
    5977
    6078The obvious approach is a transformation to synthesize larger vector types and operations using smaller vector operations or scalar operations. This synthesisation could plausible be done at the core, Cmm or code generator layers, however the most natural choice would be as a Cmm -> Cmm transformation. This approach would reduce or eliminate the burden on code generators by allowing them to support only their architecture's native vector sizes and types, or none at all.
     
    6280Using fallbacks does pose some challenges for a stable/portable ABI, in particular how vector registers should be used in the GHC calling convention. This is discussed in a later section.
    6381
    64 === Code generators ===
    65 
    66 We would not extend the portable C backend to emit vector instructions. It would rely on the higher layers transforming vector operations into scalar operations. The portable C backend is not ABI compatible with the other code generators so there is no concern about vector registers in the calling convention.
    67 
    68 The LLVM C library supports vector types and instructions directly. The GHC LLVM backend could be extended to translate vector ops at the CMM level into LLVM vector ops.
    69 
    70 The NCG (native code generator) may need at least minimal support for vector types if vector registers are to be used in the calling convention. This would be necessary if ABI compatibility is to be preserved with the LLVM backend. It is optional whether vector instructions are used to improve performance.
     82== Code generators ==
     83
     84We will not extend the portable C backend to emit vector instructions. It will rely on the higher layers transforming vector operations into scalar operations. The portable C backend is not ABI compatible with the other code generators so there is no concern about vector registers in the calling convention.
     85
     86The LLVM C library supports vector types and instructions directly. The GHC LLVM backend could be extended to translate vector ops at the Cmm level into LLVM vector ops.
     87
     88The NCG (native code generator) may need at least minimal support for vector types if vector registers are to be used in the calling convention (see below). If we choose a common calling convention where vectors are passed in registers rather than on the stack then minimal support in the NCG would be necessary if ABI compatibility is to be preserved with the LLVM backend. It is optional whether vector instructions are used to improve performance.
    7189
    7290== Cmm layer ==
     
    85103}}}
    86104
    87 The current code distinguishes floats, pointer and non-pointer data. These are distinguished primarily either because they need to be tracked separately (GC pointers) or because they live in special registers on many architectures (floats).
     105The current code distinguishes floats, pointer and non-pointer data. These are distinguished primarily because either they need to be tracked separately (GC pointers) or because they live in special registers on many architectures (floats).
    88106
    89107For vectors we add two new categories
     
    94112type Multiplicty = Int
    95113}}}
    96 We keep vector types separate from scalars, rather than representing scalars as having multiplicty 1. This is to limit distruption to existing code paths and also because it is expected that vectors will often need to be treated differently from scalars. Again we distinguish float from integral types as these may use different classes of registers.
     114We keep vector types separate from scalars, rather than representing scalars as having multiplicty 1. This is to limit distruption to existing code paths and also because it is expected that vectors will often need to be treated differently from scalars. Again we distinguish float from integral types as these may use different classes of registers. There is no need to support vectors of GC pointers.
    97115
    98116Vector operations on these machine vector types will be added to the Cmm `MachOp` type, e.g.
     
    110128Our design is to provide a family of fixed size vector types and primitive operations, but not to provide any facility to parametrise this family on the vector length.
    111129
     130for width {w} in 8, 16, 32, 64 and "", (empty for native Int#/Word# width)[[BR]]
     131for multiplicity {m} in 2, 4, 8, 16, 32
     132
     133`type Int`''{w}''`Vec`''{m}''`#`[[BR]]
     134`type Word`''{w}''`Vec`''{m}''`#`[[BR]]
     135`type FloatVec`''{m}''`#`[[BR]]
     136`type DoubleVec`''{m}''`#`[[BR]]
     137
    112138Syntax note: here {m} is meta-syntax, not concrete syntax
    113139
    114 for width {w} in 8, 16, 32, 64 and "" -- empty for native Int#/Word# width
    115 for multiplicity {m} in 2, 4, 8, 16, 32
    116 {{{
    117 type Int{w}Vec{m}#
    118 type Word{w}Vec{m}#
    119 type FloatVec{m}#
    120 type DoubleVec{m}#
    121 }}}
    122 It has not yet been decided if we will use a name convention such as:
    123 {{{
    124 IntVec2#    IntVec4#  IntVec8# ...
    125 Int8Vec2#   ...
    126 Int16Vec2#
    127 ...
    128 }}}
    129 Or if we will add a new concrete syntax to suggest a paramater, but have it really still part of the name, such as:
    130 
    131 Syntax note: here <2> is concrete syntax
    132 {{{
    133 IntVec<2>#    IntVec<4>#  IntVec<8># ...
    134 Int8Vec<2>#   ...
    135 Int16Vec<2>#
    136 ..
    137 }}}
    138 Similarly there would be families of primops:
     140Hence we have individual type names with the following naming convention:
     141
     142 ||              || length 2     || length 4     || length 8     || etc ||
     143 || native `Int` || `IntVec2#`   || `IntVec4#`   || `IntVec8#`   || ... ||
     144 || `Int8`       || `Int8Vec2#`  || `Int8Vec4#`  || `Int8Vec8#`  || ... ||
     145 || `Int16`      || `Int16Vec2#` || `Int16Vec4#` || `Int16Vec8#` || ... ||
     146 || etc          || ...          || ...          || ...          || ... ||
     147
     148Similarly there will be families of primops:
    139149{{{
    140150extractInt{w}Vec{m}#  :: Int{w}Vec{m}# -> Int# -> Int{w}#
    141151addInt{w}Vec{m}#      :: Int{w}Vec{m}# -> Int{w}Vec{m}# -> Int{w}Vec{m}#
    142152}}}
    143 From the point of view of the Haskell namespace for values and types, each member of each of these families is distinct. It is just a naming convention that suggests the relationship (with or without the addition of some concrete syntax to support the convention).
     153From the point of view of the Haskell namespace for values and types, each member of each of these families is distinct. It is just a naming convention that suggests the relationship.
     154
     155=== Optional extension: extra syntax ===
     156
     157We could add a new concrete syntax using `<...>` to suggest a paramater, but have it really still part of the name:
     158
     159 ||              || length 2       || length 4       || length 8       || etc ||
     160 || native `Int` || `IntVec<2>#`   || `IntVec<4>#`   || `IntVec<8>#`   || ... ||
     161 || `Int8`       || `Int8Vec<2>#`  || `Int8Vec<4>#`  || `Int8Vec<8>#`  || ... ||
     162 || `Int16`      || `Int16Vec<2>#` || `Int16Vec<4>#` || `Int16Vec<8>#` || ... ||
     163 || etc          || ...            || ...            || ...            || ... ||
    144164
    145165=== Primop generation and representation ===
    146166
    147 Internally in GHC we can take advantage of the obvious parametrisation within the families of primitive types and operations. In particular we extend GHC's primop.txt.pp machinery to enable us to describe the family as a whole and to generate the members.
    148 
    149 For example:
    150 {{{
    151 paramater <w> Width 8,16,32,64
    152 paramater <m> Multiplicity 2,4,8,16,32
    153 
    154 primop VIntAddOp <w> <m> "addInt<w>Vec<m>#" Dyadic
    155   Int{w}Vec{m}# -> Int{w}Vec{m}# -> Int{w}Vec{m}#
    156   {doc comments}
    157 }}}
    158 This would generate a family of primops, and an internal representation using the obvious parameters:
     167Internally in GHC we can take advantage of the obvious parametrisation within the families of primitive types and operations. In particular we extend GHC's `primop.txt.pp` machinery to enable us to describe the family as a whole and to generate the members.
     168
     169For example, here is some plausible concrete syntax for `primop.txt.pp`:
     170{{{
     171parameter <w, m> Width Multiplicity
     172  with <w, m> in <8, 2>,<8, 4>,<8, 8>,<8, 16>,<8, 32>,
     173                 <16,2>,<16,4>,<16,8>,<16,16>,
     174                 <32,2>,<32,4>,<32,8>,
     175                 <64,2>,<64,4>
     176}}}
     177Note that we allow non-rectangular combinations of values for the parameters. We declare the range of values along with the parameter so that we do not have to repeat it for every primtype and primop.
     178{{{
     179primtype <w,m> Int<w>Vec<m>#
     180
     181primop VIntAddOp <w,m> "addInt<w>Vec<m>#" Dyadic
     182  Int<w>Vec<m># -> Int<w>Vec<m># -> Int<w>Vec<m>#
     183  {Vector addition}
     184}}}
     185
     186This would generate a family of primops, and an internal representation using the type names declared for the parameters:
    159187{{{
    160188data PrimOp = ...
     
    162190   | VIntQuotOp Width Multiplicity
    163191}}}
    164 
    165 === Optional: primitive int sizes ===
    166 
    167 The same mechanism could be used to handle parametrisation between Int8#, Int16# etc. Currently these do not exist as primitive types. The types Int8, Int16 etc are implemented as a boxed native-sized Int# plus narrowing.
     192It is not yet clear what syntax to achieve the names of the native sized types `Int` and `Word`. Perhaps we should use "", e.g.
     193{{{
     194parameter <w, m> Width Multiplicity
     195  with <w, m> in <8, 2>,<8, 4>,<8, 8>,<8, 16>,<8, 32>,
     196                 <16,2>,<16,4>,<16,8>,<16,16>,
     197                 <32,2>,<32,4>,<32,8>,
     198                 <64,2>,<64,4>
     199                 <"",2>,<"",4>
     200}}}
     201
     202=== Optional extension: primitive int sizes ===
     203
     204The above mechanism could be used to handle parametrisation between Int8#, Int16# etc. Currently these do not exist as primitive types. The types Int8, Int16 etc are implemented as a boxed native-sized Int# plus narrowing.
    168205
    169206Note that while this change is possible and would make things more uniform it is not essential for vector support.
     
    171208That is we might have:
    172209{{{
     210parameter <w> Width
     211  with <w> in <8>, <16>, <32>, <64>, <"">
     212
    173213primtype Int<w>#
    174214
    175215primop   IntAddOp <w>    "addInt<w>#"    Dyadic
    176    Int# -> Int# -> Int#
     216   Int<w># -> Int<w># -> Int<w>#
    177217   with commutable = True
    178218}}}
     
    182222   | IntAddOp Width
    183223}}}
    184 We might want to specify the values <w> and <m> range over in each operation rather than globally, or override it locally. For example we might want to support Int8 vectors up to size 32 but Double vectors only up to size 8. Or we might want to distinguish the native size or treat it uniformly, e.g.:
    185 {{{
    186 primtype Int#
    187 
    188 primop   IntAddOp    "+#"    Dyadic
    189    Int# -> Int# -> Int#
    190    with commutable = True
    191 
    192 primtype Int<w>#
    193   with w = 8,16,32,64
    194 
    195 primop   IntAddOp <w>    "addInt<w>#"    Dyadic
    196    Int# -> Int# -> Int#
    197    with commutable = True
    198         w = 8,16,32,64
    199 }}}
    200 Or we might want some other solution so we can use `+#` as well as `addInt<8>#` since `+<8>#` as an infix operator is more than a bit obscure!
     224We might want some other solution so we can use `+#` as well as `addInt#` since `+8#` as an infix operator doesn't really work.
    201225
    202226