= DArrays - Haskell Support for Array Computations [OUT OF DATE] =
'''This page has been superseded by the Repa paper in ICFP'10: [http://www.cse.unsw.edu.au/~chak/papers/KCLPL10.html]'''
The library provides a layer on top of DPH unlifted arrays to support multi-dimensional arrays, and shape polymorphic
operations for fast sequential and parallel execution. The interface for delayed arrays is similar, but in contrast
to operations on the former, any operation on a delayed array is only actually evaluated when elements are accessed outside the DArray framework.
The current implementation of the library exposes some implementation details the user shouldn't
have to worry about. Once the design of the library is finalised, most of these will be hidden by distinguishing
between internal types and representation types.
== Strict Arrays, Delayed Array and Shape Data Type ==
Both strict and delayed arrays are parametrised with their shape - that is, their dimensionality and size
of each dimension. The shape type class has the following interface:
=== DArrays ===
DArrays are collections of values of `primitive' type, which are
member of the class Data.Parallel.Array.Unlifted.Elt, which includes
all basic types like Float, Double, Bool, Char, Integer, and pairs
constructed with Data.Parallel.Array.Unlifted.(:*:), including ().
=== The Shape class ===
Values of class Shape serve two purposes: they can describe the dimensionality and
size of an array (in which case we refer to them as 'shape'), and they can also refer
to the position of a particular element in the array (in which case we refer to them
as an 'index'). It provides the following methods:
{{{
class (Show sh, U.Elt sh, Eq sh, Rebox sh) => Shape sh where
dim :: sh -> Int
-- ^number of dimensions (>= 0)
size :: sh -> Int
-- ^for a *shape* yield the total number of elements in that array
toIndex :: sh -> sh -> Int
-- ^corresponding index into a linear representation of
-- the array (first argument is the shape)
fromIndex:: sh -> Int -> sh
-- ^given index into linear row major representation, calculates
-- index into array
range :: sh -> U.Array sh
-- ^all the valid indices in a shape. The following equality should hold:
-- map (toIndex sh) (range sh) = [:0..(size sh)-1:]
inRange :: sh -> sh -> Bool
-- ^Checks if a given index is in the range of an array shape. I.e.,
-- inRange sh ind == elem ind (range sh)
zeroDim :: sh
-- ^shape of an array of size zero of the particular dimensionality
intersectDim :: sh -> sh -> sh
-- ^shape of an array of size zero of the particular dimensionality
next:: sh -> sh -> Maybe sh
-- ^shape of an array of size zero of the particular dimensionality
}}}
Note that a `Shape` has to be in the type class `Elt` imported from `Data.Parallel.Array.Unboxed` so
that it can be an element of `Data.Parallel.Array.Unboxed.Array`.
The following instances are defined
{{{
instance Shape ()
instance Shape sh => Shape (sh :*: Int)
}}}
so we have inductively defined n-tuples of `Int` values to represent shapes. This somewhat unusual representation
is necessary to be able to inductively define operations on `Shape`. It should, however, be hidden from the library
user in favour of the common tuple representation.
The multiparameter type class `Subshape sh sh'` contains all pairs of shapes `sh` and `sh'`, for which the dimensionality of `sh'` is
less than that of `sh`.
{{{
class (Shape sh, Shape sh') => Subshape sh sh' where
addDim :: sh -> sh' -> sh
modDim :: sh -> sh' -> sh
inject :: sh -> sh' -> sh
}}}
The method `addDim` adds the sizes of two shapes (or positions of two indices). If `sh'` is a strict subshape of
`sh`, the fields of `sh` are copied when no corresponding fields of `sh'` exist, accordingly for `modDim`
=== Representation, Order of Elements, and Lifted Values ===
As mentioned when introducing the functions `toIndex` and `range`, the following relationship should hold:
{{{
map (toIndex sh) (range sh) = [:0..(size sh)-1:]
}}}
this means that, for example
{{{
range (() :*: 2 :*: 3) =
[() :*: 0 :*: 0, () :*: 0 :*: 1, ....., () :*: 0 :*: 2, () :*: 1 :*: 0,....
}}}
== Operations on Arrays and Delayed Arrays ==
=== Array Creation and Conversion ===
Strict arrays are simply defined as record containing a flat data array and shape information:
{{{
data Array dim e where
Array:: { arrayShape :: dim -- ^extend of dimensions
, arrayData :: U.Array e -- flat parallel array
} -> Array dim e
deriving Show
toArray :: (U.Elt e, Shape dim) => dim -> U.Array e -> Array dim e
fromArray:: (U.Elt e, Shape dim) => Array dim e -> U.Array e
}}}
Delayed arrays, in contrast, in addition to the shape, only contain a function which, given an index,
yields the corresponding element.
{{{
data DArray dim e where
DArray :: {dArrayShape::dim -> dArrayFn:: (dim -> e)} -> DArray dim e
}}}
Delayed arrays can be converted to and from strict arrays:
{{{
toDArray:: (U.Elt e, Array.Shape dim) => Array.Array dim e -> DArray dim e
fromDArray:: (U.Elt e, Array.Shape dim) => DArray dim e -> Array dim e
}}}
the result of `toDArray` is a DArray which contains an indexing function into
an array. In general, the function `dArrayFn` can be much more complex. The function
`forceDArray` (should this be called `normalizeDArray`?) forces the evaluation `dArrayFn` on
every index of the range, and replaces `dArrayFn` by a simple indexing function into an array
of the result values.
{{{
forceDArray:: (U.Elt e, A.Shape dim) => DArray dim e -> DArray dim e
}}}
Singular (zero-dimensional) arrays are isomorphic to scalar values and can be converted to
one using the following function:
{{{
toScalar:: U.Elt e => DArray () e -> e
}}}
Note that in contrast to all the previous operations, `toScalar` requires the array to be of a particular
dimensionality.
== Collection Oriented Operations ==
=== Elementwise Application of Functions ===
The `map` operation takes a function over element types and applies it to every
data element of the DArray, which can have arbitrary dimensionality.
{{{
map:: (U.Elt a, U.Elt b, A.Shape dim) => (a -> b) -> DArray dim a -> DArray dim b
}}}
Similarily, `zip` and `zipWith` apply to every data element in the array as well. Both arguments
have to have the same dimensionality (which is enforced by the type system). If they have a different
shape, the result will have the intersection of both shapes. For example, zipping an array of shape
`(() :*: 4 :*: 6)` and `(() :*: 2 :*: 8)` results in an array of shape `(() :*: 2 :*: 6)`.
{{{
zipWith:: (U.Elt a, U.Elt b, U.Elt c, A.Shape dim) =>
(a -> b -> c) -> DArray dim a -> DArray dim b-> DArray dim c
zip:: (U.Elt a, U.Elt b, A.Shape dim) =>
DArray dim a -> DArray dim b-> DArray dim (a :*: b)
}}}
The function `fold` collapses the values of the innermost rows of an array of at least dimensionality 1.
{{{
fold :: (U.Elt e, A.Shape dim) =>
(e -> e-> e) -> e -> DArray (dim :*: Int) e -> DArray dim e
}}}
Again, it's not possible to use `fold` directly to collapse an array along any other axis, but, as
we will see shortly, this can be easily done using other functions in combination with `fold`.
Related to `fold`, we have `scan`:
{{{
scan :: (U.Elt e, A.Shape dim) =>
(e -> e-> e) -> e -> DArray dim e -> Array dim e
}}}
Note that `scan` returns a value of type `Array`, not `DArray`: this means that, if we apply scan to an array and access one of its elements, the whole array will be created.
=== Support for Parallel Execution ===
Since the implementation of DArrays builds on the DPH library, all the array operations can be executed in parallel. That is, we compiling a DArray program, the compiler generates a sequential as well as a parallel executable. All collective operations, like `map`, `fold`, and so on are executed in parallel.
=== Shape Polymorphic Computations on Arrays ===
The library provides a range of operation where the dimensionality of
the result depends on the dimensionality of the argument in a
non-trivial manner, which we want to be reflected in the type system.
Examples of such functions are generalised selection, which allows for
extraction of subarrays of arbitrary dimension, and generalised replicate,
which allows replication of an array in any dimension (or dimensions). For example,
given a three dimensional matrix, we can use select to extract scalar element values,
rows, columns, as well as two dimensional matrices along any of the three axes.
For selection, we can informally state the relationship between dimensionality of
the argument, the selector, and the result as follows:
{{{
select:: Array dim e ->