Changes between Version 28 and Version 29 of SIMD


Ignore:
Timestamp:
Nov 14, 2011 2:02:00 PM (4 years ago)
Author:
duncan
Comment:

add array load/store and widen/narrow and fp conversion operations

Legend:

Unmodified
Added
Removed
Modified
  • SIMD

    v28 v29  
    274274== Vector operations ==
    275275
    276 The following operations on vectors will be supported. They will need to be implemented at the Haskell/core primop layer, Cmm MachOp layer and optional support in the code generators.
     276The following operations on vectors will be supported. They will need to be implemented at the Haskell/core primop layer, Cmm `MachOp` layer and optional support in the code generators.
     277
     278In the following, `<t>` ranges over `Int<w>`, `Word<w>`, `Float`, `Double`.
     279
     280Loading and storing vectors in arrays, ByteArray# and raw Addr#
     281{{{
     282readInt<w>Vec<m>Array#  :: MutableByteArray# d -> Int# -> Int#    -> State# d -> State# d
     283readWord<w>Vec<m>Array# :: MutableByteArray# d -> Int# -> Word#   -> State# d -> State# d
     284readFloatVec<m>Array#   :: MutableByteArray# d -> Int# -> Float#  -> State# d -> State# d
     285readDoubleVec<m>Array#  :: MutableByteArray# d -> Int# -> Double# -> State# d -> State# d
     286
     287writeInt<w>Vec<m>Array#  :: MutableByteArray# d -> Int# -> Int#    -> State# d -> State# d
     288writeWord<w>Vec<m>Array# :: MutableByteArray# d -> Int# -> Word#   -> State# d -> State# d
     289writeFloatVec<m>Array#   :: MutableByteArray# d -> Int# -> Float#  -> State# d -> State# d
     290writeDoubleVec<m>Array#  :: MutableByteArray# d -> Int# -> Double# -> State# d -> State# d
     291
     292readInt<w>Vec<m>OffAddr#  :: Addr# -> Int# -> Int#    -> State# d -> State# d
     293readWord<w>Vec<m>OffAddr# :: Addr# -> Int# -> Word#   -> State# d -> State# d
     294readFloatVec<m>OffAddr#   :: Addr# -> Int# -> Float#  -> State# d -> State# d
     295readDoubleVec<m>OffAddr#  :: Addr# -> Int# -> Double# -> State# d -> State# d
     296
     297writeInt<w>Vec<m>OffAddr#  :: Addr# -> Int# -> Int#    -> State# d -> State# d
     298writeWord<w>Vec<m>OffAddr# :: Addr# -> Int# -> Word#   -> State# d -> State# d
     299writeFloatVec<m>OffAddr#   :: Addr# -> Int# -> Float#  -> State# d -> State# d
     300writeDoubleVec<m>OffAddr#  :: Addr# -> Int# -> Double# -> State# d -> State# d
     301}}}
    277302
    278303Extracting and inserting vector elements:
     
    280305extractInt<w>Vec<m>#   :: Int<w>Vec<m>#  -> Int# -> Int#
    281306extractWord<w>Vec<m>#  :: Word<w>Vec<m># -> Int# -> Word#
    282 extractFloatVec#       :: FloatVec<m>#   -> Int# -> Float#
    283 extractDoubleVec#      :: DoubleVec<m>#  -> Int# -> Double#
     307extractFloatVec<m>#    :: FloatVec<m>#   -> Int# -> Float#
     308extractDoubleVec<m>#   :: DoubleVec<m>#  -> Int# -> Double#
    284309}}}
    285310{{{
     
    289314insertDoubleVec#      :: DoubleVec<m>#  -> Int# -> Double# -> DoubleVec<m>#
    290315}}}
     316
     317Duplicating a scalar to a vector:
     318{{{
     319replicateToInt<w>Vec<m>#  :: Int<w>Vec<m>#  -> Int#    -> Int<w>Vec<m>#
     320replicateToWord<w>Vec<m># :: Word<w>Vec<m># -> Word#   -> Word<w>Vec<m>#
     321replicateToFloatVec#      :: FloatVec<m>#   -> Float#  -> FloatVec<m>#
     322replicateToDoubleVec#     :: DoubleVec<m>#  -> Double# -> DoubleVec<m>#
     323}}}
     324
    291325Vector shuffle:
    292326{{{
    293 shuffleInt<w>Vec<m>ToVec<m'>  :: Int<w>Vec<m>#  -> Int32Vec<m'>#    -> Int<w>Vec<m'>#
     327shuffle<t>Vec<m>ToVec<m'> :: <t>Vec<m># -> Int32Vec<m'># -> <t>Vec<m'>#
    294328}}}
    295329For the fixed size vectors (not native size) we may also want to add pack/unpack functions like:
     
    299333}}}
    300334
    301 In the following, `<t>` ranges over `Int<w>`, `Word<w>`, `Float`, `Double`.
    302 
    303335Arithmetic operations:
    304336{{{
     
    326358Note that LLVM does not yet support the comparison operations.
    327359
    328 TODO:
    329  * conversion sign/width operations, e.g. Word <-> Int, Word8 <-> Word16 etc.
    330  * conversion fp operations, e.g. Float <-> Int
    331 Should also consider:
     360Integer width narrow/widen operations:
     361{{{
     362narrowInt<w>To<w'>Vec<m>#  :: Int<w>Vec<m># -> Int<w'>Vec<m>#     -- for w' < w
     363narrowWord<w>To<w'>Vec<m># :: Word<w>Vec<m># -> Word<w'>Vec<m>#   -- for w' < w
     364
     365widenInt<w>To<w'>Vec<m>#   :: Int<w>Vec<m># -> Int<w'>Vec<m>#     -- for w' > w
     366widenWord<w>To<w'>Vec<m>#  :: Word<w>Vec<m># -> Word<w'>Vec<m>#   -- for w' > w
     367}}}
     368Note: LLVM calls these truncate and extend (signed extend or unsigned extend)
     369
     370Floating point conversion:
     371{{{
     372narrowDoubleToFloatVec<m>#  :: DoubleVec<m># -> FloatVec<m>#
     373widenFloatToDoubleVec<m>#   :: FloatVec<m>#  -> DoubleVec<m>#
     374
     375roundFloatToInt32Vec<m>     :: FloatVec<m>#  -> Int32Vec<m>#
     376roundFloatToInt64Vec<m>     :: FloatVec<m>#  -> Int64Vec<m>#
     377roundDoubleToInt32Vec<m>    :: DoubleVec<m># -> Int32Vec<m>#
     378roundDoubleToInt64Vec<m>    :: DoubleVec<m># -> Int64Vec<m>#
     379
     380truncateFloatToInt32Vec<m>  :: FloatVec<m>#  -> Int32Vec<m>#
     381truncateFloatToInt64Vec<m>  :: FloatVec<m>#  -> Int64Vec<m>#
     382truncateDoubleToInt32Vec<m> :: DoubleVec<m># -> Int32Vec<m>#
     383truncateDoubleToInt64Vec<m> :: DoubleVec<m># -> Int64Vec<m>#
     384
     385promoteInt32ToFloatVec<m>   :: Int32Vec<m># -> FloatVec<m>#
     386promoteInt64ToFloatVec<m>   :: Int64Vec<m># -> FloatVec<m>#
     387promoteInt32ToDoubleVec<m>  :: Int32Vec<m># -> DoubleVec<m>#
     388promoteInt64ToDoubleVec<m>  :: Int64Vec<m># -> DoubleVec<m>#
     389}}}
     390
     391TODO: Should consider:
    332392 * vector constants, at least at Cmm level
    333393 * replicating a scalar to a vector
    334394 * FMA: fused multiply add, this is supported by NEON and AVX however software fallback may not be possible with the same precision. Tricky.
    335  * AVX also suppports a bunch of interesting things:
     395 * SSE/AVX also suppports a bunch of interesting things:
     396   * add/sub/mul/div of vector by a scalar
     397   * reciprocal, square root, reciprocal of square root
    336398   * permute, shuffle, "blend", masked moves.
     399   * abs
    337400   * min, max within a vector
    338401   * average
    339402   * horizontal add/sub
    340403   * shift whole vector left/right by n bytes
     404   * and not logical op
    341405   * gather (but not scatter) of 32, 64bit int and fp from memory (base + vector of offsets)
    342406