wiki:SIMD

Version 1 (modified by vivian, 3 years ago) (diff)

created page

SIMD

This page is to collect notes about adding SIMD instructions to GHC.

A number of processors support SIMD (Single Instruction Multiple Data) operations. The X86 instructions are provided by SSE (streaming SIMD extension), SSE2, SSE3, SSE4.1, and SSE4.2. PowerPC has the Altivec extension.

SSE

SSE adds 8 (or 16) 128 bit registers (xmm registers) and operations over those packed data types. The data types are packed floats (4), doubles (2), and 32 (4) and 64 (2) bit integers. In addition to the usual mathematical and logical operations, there are also operations such as horizontal add, horizontal subtract, and dot product operations.

Implementation

PrimOps?

I have added the following primitive packed data types:

  • Int32Packed4#
  • Int64Packed2#
  • FloatPacked4#
  • DoublePacked2#

There are primOps for the standard operations on these datatypes (add, mul, and, or, etc). These are relatively straightforward to add. All that is required is that the code generating backends need to output the assembler mnemonics (e.g. ADDPD for "add packed double").

Conversion between packed types is also covered by the MachineOps in CmmExpr?.hs

There remain two types of operations which are not covered by the current machine ops:

  • instructions such as horizontal add which have no unpacked equivalent
  • packing/unpacking instructions

I have decided to hardcode the packing operations, e.g. unPack1of4Float#, in order to avoid the need to do bound checking. In Cmm this is translated to a MachOp: MO_Unpack Int. Another reason in favour of this method is that the specific instruction MO_Unpack n does not use n as a register value, rather, this translates to a specific immediate argument. PrimOps? can only take #-kinded arguments.

The PackFloat4# operation gets translated into multiple assembler lines (for each element being packed) and so I provide the MO_Pack Int machine operation which takes two arguments, the scalar value and the packed value and inserts the scalar value. The specific machine operations depend on the value of the Int argument to the machine operation.

Cmm

There was a suggestion to add CmmVector as a data constructor to CmmType, unfortunately, this abstract type is not used for literals. Thus I have added new Width data constructors

Width = ...
      | W8_16 | W16_8 | W32_4 | W64_2
      ...

STG/RTS

These new operations require a new register size (128 bit), both in Cmm and in the RTS. So I have added a PackedReg register type for both Int and Float data types. This requires adding a 128 bit type to STG as well.

Issues

  • Not all processors provide these machine operations on packed datatypes and not all x86 processors provide all the SSE extensions. I propose to add capability tests (sse, sse2 etc.) to System.Info for the programmer to be able to determine whether to use these packed primOps. Thus if Haskell code uses primOps which the CPU does not support, there will be a compile-time error.
  • It would be possible to add a Cmm pass that converts operations on packed datatypes to operations on a set of scalar types, however, it is cleaner and more efficient to not use packed datatypes on machines that do not provide those SIMD instructions.