| 266 | == Vector operations == |
| 267 | |
| 268 | The following operations on vectors will be supported. They will need to be implemented at the Haskell/core primop layer, Cmm MachOp layer and optional support in the code generators. |
| 269 | |
| 270 | Extracting and inserting vector elements: |
| 271 | {{{ |
| 272 | extractInt<w>Vec<m># :: Int<w>Vec<m># -> Int# -> Int# |
| 273 | extractWord<w>Vec<m># :: Word<w>Vec<m># -> Int# -> Word# |
| 274 | extractFloatVec# :: FloatVec<m># -> Int# -> Float# |
| 275 | extractDoubleVec# :: DoubleVec<m># -> Int# -> Double# |
| 276 | }}} |
| 277 | {{{ |
| 278 | insertInt<w>Vec<m># :: Int<w>Vec<m># -> Int# -> Int# -> Int<w>Vec<m># |
| 279 | insertWord<w>Vec<m># :: Word<w>Vec<m># -> Int# -> Word# -> Word<w>Vec<m># |
| 280 | insertFloatVec# :: FloatVec<m># -> Int# -> Float# -> FloatVec<m># |
| 281 | insertDoubleVec# :: DoubleVec<m># -> Int# -> Double# -> DoubleVec<m># |
| 282 | }}} |
| 283 | Vector shuffle: |
| 284 | {{{ |
| 285 | shuffleInt<w>Vec<m>ToVec<m'> :: Int<w>Vec<m># -> Int32Vec<m'># -> Int<w>Vec<m'># |
| 286 | }}} |
| 287 | For the fixed size vectors (not native size) we may also want to add pack/unpack functions like: |
| 288 | {{{ |
| 289 | unpackInt<w>Vec4# :: Int<w>Vec4# -> (# Int#, Int#, Int#, Int# #) |
| 290 | packInt<w>Vec4# :: (# Int#, Int#, Int#, Int# #) -> Int<w>Vec4# |
| 291 | }}} |
| 292 | |
| 293 | In the following, `<t>` ranges over `Int<w>`, `Word<w>`, `Float`, `Double`. |
| 294 | |
| 295 | Arithmetic operations: |
| 296 | {{{ |
| 297 | plus<t>Vec<m>#, minus<t>Vec<m>#, |
| 298 | times<t>Vec<m>#, quot<t>Vec<m>#, rem<t>Vec<m># :: <t>Vec<m># -> <t>Vec<m># -> <t>Vec<m># |
| 299 | |
| 300 | negate<t>Vec<m># :: <t>Vec<m># -> <t>Vec<m># |
| 301 | }}} |
| 302 | Logic operations: |
| 303 | {{{ |
| 304 | andInt<w>Vec<m>#, orInt<w>Vec<m>#, xorInt<w>Vec<m># :: Int<w>Vec<m># -> Int<w>Vec<m># -> Int<w>Vec<m># |
| 305 | andWord<w>Vec<m>#, orWord<w>Vec<m>#, xorWord<w>Vec<m># :: Word<w>Vec<m># -> Word<w>Vec<m># -> Word<w>Vec<m># |
| 306 | |
| 307 | notInt<w>Vec<m># :: Int<w>Vec<m># -> Int<w>Vec<m># |
| 308 | notWord<w>Vec<m># :: Word<w>Vec<m># -> Word<w>Vec<m># |
| 309 | |
| 310 | shiftLInt<w>Vec<m>#, shiftRAInt<w>Vec<m># :: Int<w>Vec<m># -> Word# -> Int<w>Vec<m># |
| 311 | ShiftLWord<w>Vec<m>#, ShiftRLWord<w>Vec<m># :: Word<w>Vec<m># -> Word# -> Word<w>Vec<m># |
| 312 | }}} |
| 313 | Comparison: |
| 314 | {{{ |
| 315 | cmp<eq,ne,gt,gt,lt,le>Int<w>Vec<m># :: Int<w>Vec<m># -> Int<w>Vec<m># -> Word<w>Vec<m># |
| 316 | cmp<eq,ne,gt,gt,lt,le>Word<w>Vec<m># :: Word<w>Vec<m># -> Word<w>Vec<m># -> Word<w>Vec<m># |
| 317 | }}} |
| 318 | Note that LLVM does not yet support the comparison operations. |
| 319 | |
| 320 | TODO: |
| 321 | * conversion sign/width operations, e.g. Word <-> Int, Word8 <-> Word16 etc. |
| 322 | * conversion fp operations, e.g. Float <-> Int |
| 323 | Should also consider: |
| 324 | * vector constants, at least at Cmm level |
| 325 | * replicating a scalar to a vector |
| 326 | * AVX also suppports a bunch of interesting things: |
| 327 | * permute, shuffle, "blend", masked moves. |
| 328 | * min, max within a vector |
| 329 | * average |
| 330 | * horizontal add/sub |
| 331 | * shift whole vector left/right by n bytes |
| 332 | * gather (but not scatter) of 32, 64bit int and fp from memory (base + vector of offsets) |
| 333 | |
| 334 | === Int/Word size wrinkle === |
| 335 | |
| 336 | Note that there is a wrinkle with the 32 and 64 bit int and word types. For example, the types for the extract functions should be: |
| 337 | {{{ |
| 338 | extractInt32Vec<m># :: Int32Vec# -> Int# -> INT32 |
| 339 | extractInt64Vec<m># :: Int64Vec# -> Int# -> INT64 |
| 340 | extractWord32Vec<m># :: Word32Vec# -> Int# -> WORD32 |
| 341 | extractWord64Vec<m># :: Word64Vec# -> Int# -> WORD64 |
| 342 | }}} |
| 343 | where `INT32`, `INT64`, `INT64`, `WORD64` are CPP macros that expand in a arch-dependent way to the types Int#/Int64# and Word#/Word64#. |
| 344 | |
| 345 | To describe this in the primop definition we might want something like: |
| 346 | {{{ |
| 347 | primop IntAddOp <w,m,t> "extractWord<w>Vec<m>#" Dyadic |
| 348 | Word<w>Vec<m># -> Int# -> <t> |
| 349 | with <w, m, t> in <8, 2,Word#>,<8, 4,Word#>,<8, 8,Word#>,<8, 16,Word#>,<8, 32,Word#>, |
| 350 | <16,2,Word#>,<16,4,Word#>,<16,8,Word#>,<16,16,Word#>, |
| 351 | <32,2,WORD32>,<32,4,WORD32>,<32,8,WORD32>, |
| 352 | <64,2,WORD64>,<64,4,WORD64> |
| 353 | <"",2,WORD>,<"",4,WORD> |
| 354 | }}} |
| 355 | |
| 356 | To iron out this wrinkle we would need the whole family of primitve types: Int8#, Int16#, Int32# etc whereas currently only the native register sized Int# type is provided, plus a primitive Int64# type is provided on 32bit systems. |
| 357 | |