Changes between Version 28 and Version 29 of SIMD


Ignore:
Timestamp:
Nov 14, 2011 2:02:00 PM (2 years ago)
Author:
duncan
Comment:

add array load/store and widen/narrow and fp conversion operations

Legend:

Unmodified
Added
Removed
Modified
  • SIMD

    v28 v29  
    274274== Vector operations == 
    275275 
    276 The following operations on vectors will be supported. They will need to be implemented at the Haskell/core primop layer, Cmm MachOp layer and optional support in the code generators. 
     276The following operations on vectors will be supported. They will need to be implemented at the Haskell/core primop layer, Cmm `MachOp` layer and optional support in the code generators. 
     277 
     278In the following, `<t>` ranges over `Int<w>`, `Word<w>`, `Float`, `Double`. 
     279 
     280Loading and storing vectors in arrays, ByteArray# and raw Addr# 
     281{{{ 
     282readInt<w>Vec<m>Array#  :: MutableByteArray# d -> Int# -> Int#    -> State# d -> State# d 
     283readWord<w>Vec<m>Array# :: MutableByteArray# d -> Int# -> Word#   -> State# d -> State# d 
     284readFloatVec<m>Array#   :: MutableByteArray# d -> Int# -> Float#  -> State# d -> State# d 
     285readDoubleVec<m>Array#  :: MutableByteArray# d -> Int# -> Double# -> State# d -> State# d 
     286 
     287writeInt<w>Vec<m>Array#  :: MutableByteArray# d -> Int# -> Int#    -> State# d -> State# d 
     288writeWord<w>Vec<m>Array# :: MutableByteArray# d -> Int# -> Word#   -> State# d -> State# d 
     289writeFloatVec<m>Array#   :: MutableByteArray# d -> Int# -> Float#  -> State# d -> State# d 
     290writeDoubleVec<m>Array#  :: MutableByteArray# d -> Int# -> Double# -> State# d -> State# d 
     291 
     292readInt<w>Vec<m>OffAddr#  :: Addr# -> Int# -> Int#    -> State# d -> State# d 
     293readWord<w>Vec<m>OffAddr# :: Addr# -> Int# -> Word#   -> State# d -> State# d 
     294readFloatVec<m>OffAddr#   :: Addr# -> Int# -> Float#  -> State# d -> State# d 
     295readDoubleVec<m>OffAddr#  :: Addr# -> Int# -> Double# -> State# d -> State# d 
     296 
     297writeInt<w>Vec<m>OffAddr#  :: Addr# -> Int# -> Int#    -> State# d -> State# d 
     298writeWord<w>Vec<m>OffAddr# :: Addr# -> Int# -> Word#   -> State# d -> State# d 
     299writeFloatVec<m>OffAddr#   :: Addr# -> Int# -> Float#  -> State# d -> State# d 
     300writeDoubleVec<m>OffAddr#  :: Addr# -> Int# -> Double# -> State# d -> State# d 
     301}}} 
    277302 
    278303Extracting and inserting vector elements: 
     
    280305extractInt<w>Vec<m>#   :: Int<w>Vec<m>#  -> Int# -> Int# 
    281306extractWord<w>Vec<m>#  :: Word<w>Vec<m># -> Int# -> Word# 
    282 extractFloatVec#       :: FloatVec<m>#   -> Int# -> Float# 
    283 extractDoubleVec#      :: DoubleVec<m>#  -> Int# -> Double# 
     307extractFloatVec<m>#    :: FloatVec<m>#   -> Int# -> Float# 
     308extractDoubleVec<m>#   :: DoubleVec<m>#  -> Int# -> Double# 
    284309}}} 
    285310{{{ 
     
    289314insertDoubleVec#      :: DoubleVec<m>#  -> Int# -> Double# -> DoubleVec<m># 
    290315}}} 
     316 
     317Duplicating a scalar to a vector: 
     318{{{ 
     319replicateToInt<w>Vec<m>#  :: Int<w>Vec<m>#  -> Int#    -> Int<w>Vec<m># 
     320replicateToWord<w>Vec<m># :: Word<w>Vec<m># -> Word#   -> Word<w>Vec<m># 
     321replicateToFloatVec#      :: FloatVec<m>#   -> Float#  -> FloatVec<m># 
     322replicateToDoubleVec#     :: DoubleVec<m>#  -> Double# -> DoubleVec<m># 
     323}}} 
     324 
    291325Vector shuffle: 
    292326{{{ 
    293 shuffleInt<w>Vec<m>ToVec<m'>  :: Int<w>Vec<m>#  -> Int32Vec<m'>#    -> Int<w>Vec<m'># 
     327shuffle<t>Vec<m>ToVec<m'> :: <t>Vec<m># -> Int32Vec<m'># -> <t>Vec<m'># 
    294328}}} 
    295329For the fixed size vectors (not native size) we may also want to add pack/unpack functions like: 
     
    299333}}} 
    300334 
    301 In the following, `<t>` ranges over `Int<w>`, `Word<w>`, `Float`, `Double`. 
    302  
    303335Arithmetic operations: 
    304336{{{ 
     
    326358Note that LLVM does not yet support the comparison operations. 
    327359 
    328 TODO: 
    329  * conversion sign/width operations, e.g. Word <-> Int, Word8 <-> Word16 etc. 
    330  * conversion fp operations, e.g. Float <-> Int 
    331 Should also consider: 
     360Integer width narrow/widen operations: 
     361{{{ 
     362narrowInt<w>To<w'>Vec<m>#  :: Int<w>Vec<m># -> Int<w'>Vec<m>#     -- for w' < w 
     363narrowWord<w>To<w'>Vec<m># :: Word<w>Vec<m># -> Word<w'>Vec<m>#   -- for w' < w 
     364 
     365widenInt<w>To<w'>Vec<m>#   :: Int<w>Vec<m># -> Int<w'>Vec<m>#     -- for w' > w 
     366widenWord<w>To<w'>Vec<m>#  :: Word<w>Vec<m># -> Word<w'>Vec<m>#   -- for w' > w 
     367}}} 
     368Note: LLVM calls these truncate and extend (signed extend or unsigned extend) 
     369 
     370Floating point conversion: 
     371{{{ 
     372narrowDoubleToFloatVec<m>#  :: DoubleVec<m># -> FloatVec<m># 
     373widenFloatToDoubleVec<m>#   :: FloatVec<m>#  -> DoubleVec<m># 
     374 
     375roundFloatToInt32Vec<m>     :: FloatVec<m>#  -> Int32Vec<m># 
     376roundFloatToInt64Vec<m>     :: FloatVec<m>#  -> Int64Vec<m># 
     377roundDoubleToInt32Vec<m>    :: DoubleVec<m># -> Int32Vec<m># 
     378roundDoubleToInt64Vec<m>    :: DoubleVec<m># -> Int64Vec<m># 
     379 
     380truncateFloatToInt32Vec<m>  :: FloatVec<m>#  -> Int32Vec<m># 
     381truncateFloatToInt64Vec<m>  :: FloatVec<m>#  -> Int64Vec<m># 
     382truncateDoubleToInt32Vec<m> :: DoubleVec<m># -> Int32Vec<m># 
     383truncateDoubleToInt64Vec<m> :: DoubleVec<m># -> Int64Vec<m># 
     384 
     385promoteInt32ToFloatVec<m>   :: Int32Vec<m># -> FloatVec<m># 
     386promoteInt64ToFloatVec<m>   :: Int64Vec<m># -> FloatVec<m># 
     387promoteInt32ToDoubleVec<m>  :: Int32Vec<m># -> DoubleVec<m># 
     388promoteInt64ToDoubleVec<m>  :: Int64Vec<m># -> DoubleVec<m># 
     389}}} 
     390 
     391TODO: Should consider: 
    332392 * vector constants, at least at Cmm level 
    333393 * replicating a scalar to a vector 
    334394 * FMA: fused multiply add, this is supported by NEON and AVX however software fallback may not be possible with the same precision. Tricky. 
    335  * AVX also suppports a bunch of interesting things: 
     395 * SSE/AVX also suppports a bunch of interesting things: 
     396   * add/sub/mul/div of vector by a scalar 
     397   * reciprocal, square root, reciprocal of square root 
    336398   * permute, shuffle, "blend", masked moves. 
     399   * abs 
    337400   * min, max within a vector 
    338401   * average 
    339402   * horizontal add/sub 
    340403   * shift whole vector left/right by n bytes 
     404   * and not logical op 
    341405   * gather (but not scatter) of 32, 64bit int and fp from memory (base + vector of offsets) 
    342406