Changes between Version 11 and Version 12 of DataParallel/Regular


Ignore:
Timestamp:
Aug 18, 2009 4:59:21 AM (6 years ago)
Author:
gckeller
Comment:

--

Legend:

Unmodified
Added
Removed
Modified
  • DataParallel/Regular

    v11 v12  
     1= Shape Polymorphic Arrays and Delayed Arrays =
    12
    2 = Regular, Multi-dimensional parallel Arrays =
    3 The nested parallel arrays of DPH could be used to model regular arrays, as we could simply either create segment information using replicate, or define regular arrays in terms of flat parallel arrays and separately stored dimensionality information, and define operations on these arrays in a library in terms of the nested operations. However, there are two main reasons why this is unsatisfactory: convenience and efficiency.
     3The library provides a layer on top of DPH unlifted arrays to support multi-dimensional arrays, and operations
     4like maps, folds, permutations, shifts and so on. The interface for delayed arrays is similar, but in contrast
     5to operations on the former, any operation on a delayed array is not evaluated. To force evaluation, the programmer
     6has to explicitely convert a delayed array to a strict array.
    47
    5 === Convenience ===
     8The current implementation of the library exposes some implementation details the user of the library shouldn't
     9have to worry about. Once the design of the library is finalised, most of these will be hidden by distinguishing
     10between internal types and representation types
    611
    7 Languages like SAC, which provide high-level support for operations on multi-dimensional arrays, offer shape invariant operations. If we want to model this on a library level, we either have to give up type safety to a
    8 large extend (for example, by encoding the shape as a list of integer values whose length is proportionate to its dimensionality) or use sophisticated language features like GADTs, which may impede the usability of the library for inexperienced users.
     12== Strict Arrays, Delayed Array and Shape Data Type ==
     13Both strict and delayed arrays are parametrised with their shape - that is, their dimensionality and size
     14of each dimension. The shape type class has the following interface:
     15
     16{{{
     17class (Show sh, U.Elt sh) => Shape sh where
     18  dim   :: sh -> Int           -- ^number of dimensions (>= 0)
     19  size  :: sh -> Int           -- ^for a *shape* yield the total number of
     20                               -- elements in that array
     21  index :: sh -> sh -> Int     -- ^corresponding index into a linear, row-major
     22                               -- representation of the array (first argument
     23                               -- is the shape)
     24  indexInv:: sh -> Int -> sh   -- ^given index into linear row major representation,
     25                               -- calculates index into array
     26  range      :: sh -> U.Array sh  -- ^all the valid indices in a shape. The following
     27                                  -- equality should hold:
     28                                  -- map (index sh) (range sh) = [:0..(size sh)-1:]
     29  inRange    :: sh -> sh -> Bool  -- ^determines if a given index is in range
     30  zeroDim    :: sh               
     31  addDim     :: sh -> sh -> sh    -- ^adds two shapes of the same dimensionality
     32  modDim     :: sh -> sh -> sh    -- ^modulo operation lifted on shapes
     33  addModDim  :: sh -> sh -> sh
     34
     35  last    :: (sh :*: Int) -> Int  -- ^yields the innermost dimension of a shape
     36  inits   :: (sh :*: Int) -> sh   -- ^removes the innermost dimension from a shape
     37}}}
     38
     39Note that a `Shape` has to be in the type class `Elt` imported from `Data.Parallel.Array.Unboxed` so
     40that it can be an element of  `Data.Parallel.Array.Unboxed.Array`.
     41
     42The following instances are defined
     43{{{
     44instance Shape ()
     45instance Shape sh => Shape (sh :*: Int)
     46}}}
     47so we have inductively defined n-tuples of `Int` values to represent shapes. This somewhat unusual representation
     48is necessary to be able to inductively define operations on `Shape`. It should, however, be hidden from the library
     49user in favour of the common tuple representation.
     50
     51For convenience, we provide type synonyms for dimensionality up to five:
     52{{{
     53type DIM0 = ()
     54type DIM1 = (DIM0 :*: Int)
     55....
     56}}}
    957
    1058
    11 === Efficiency ===
     59== Operations on Arrays and Delayed Arrays ==
    1260
    13 When encoding multidimensional arrays using segment descriptors or by storing the dimensions separately. In the first case, this would mean significant memory overhead proportionate to the number of subarrays on each level. But even in the second case, segment descriptors have to be generated to call functions like segmented fold and scan. It is hard to predict the exact overhead for this step, as fusion might prevent the segment descriptor array to be actually built in many cases. More significant in terms of overhead is that, when using segment descriptors, parallel splits become significantly more complicated, as they require communication in the irregular case to determine the distribution, whereas the distributions of a regularly segmented array can be determined locally on each processor.
     61=== Array Creation and Conversion ===
    1462
    15 == Language Support ==
     63Strict arrays are simply defined as record containing a flat data array and shape information:
     64{{{
     65data Array dim e where
     66  Array { arrayShape    :: dim                -- ^extend of dimensions
     67        , arrayData     :: U.Array e          -- flat parallel array
     68        }               :: Array dim e
     69  deriving Show
    1670
    17 The remainder of this document is a first design draft for SaC style language support of multidimensional arrays in the context of DPH. The implementation is not completed yet, and there are several open questions.
     71toArray  :: (U.Elt e, Shape dim) => dim -> U.Array e -> Array dim e
     72fromArray:: (U.Elt e, Shape dim) => Array dim e -> U.Array e
     73}}}
    1874
    19 == The regular array type ==
    20 
    21 === SaC ===
    22 
    23 In SaC, multidimensional arrays are represented by two vectors, the shape and the data vector, where vectors are one dimensional arrays. Scalar values are viewed as 0-dimensional arrays. The function `reshape` takes as first argument a shape vector, as second an array, and creates an array with identical data vector and the given shape vector. For example:
     75Delayed arrays, in contrast, in addition to the shape, only contain a function which, given an index,
     76yields the corresponding element.
    2477{{{
    25   reshape ([3,2],[1,2,3,4,5,6])
     78data DArray dim e where
     79  DArray :: {dArrayShape::dim -> dArrayFn:: (dim -> e)} -> DArray dim e
    2680}}}
    27 produces a 3 times 2 matrix.
    28 
    29 === DPH ===
    30 Regular parallel arrays are similar to arrays in SaC, with one major
    31 difference: SaC employs a mix of static and dynamic type checking, combined with a form of shape inference, whereas we use GHC's type checker to ensure certain domain restrictions are not violated. 
    32 
    33 '''Note:''' currently, we are only able to statically check that restrictions regarding the dimensionality of and array are met, but not with respect to the size. SaC is, to a certain extend, able to do so. I still need to check if there are some cases where the DPH approach would statically find some dimensionality bugs where SaC wouldn't - need to check that.
     81Delayed arrays can be converted to and from strict arrays:
     82(TODO: there needs to be an darray constructor function accepting the shape and the function as arguments)
     83{{{
     84toDArray:: (U.Elt e, Array.Shape dim)   => Array.Array dim e -> DArray dim e
     85fromDArray:: (U.Elt e, Array.Shape dim) => DArray dim e      -> Array dim e
     86}}}
    3487
    3588
    36 array operations in DPH are fully typed, and consequently, what
    37 is called 'shape invariant programming' in SaC works differently in DPH. In particular, in DPH the dimensionality of an array (not its size, however) are encoded in its type.
     89=== Shape Invariant Computations on Arrays ===
    3890
    39 An multidimensional array is parametrised with its dimensionality and its
    40 element type:
     91The library provides a range of operation where the dimensionality of
     92the result depends on the dimensionality of the argument in a
     93non-trivial matter, which we want to be reflected in the type system.
     94Examples of such functions are generalised selection, which allows for
     95extraction of subarrays of arbitrary dimension, and generalised replicate,
     96which allows replication of an array in any dimension (or dimensions).
     97
     98For selection, we can informally state the relationship between dimensionality of
     99the argument, the selector, and the result as follows:
     100{{{
     101select:: Array dim e -> <select dim' of dim array> -> Array dim' e
     102}}}
     103
     104To express this relationship, the library provides the index GADT,
     105which expresses a relationship between the inital and the projected
     106dimensionality. It is defined as follows:
    41107
    42108{{{
    43 (Shape dim, U.Elt a)  => Array dim a
    44 }}}
    45 The element type of multidimensional arrays is restricted to the type class `Elt` exported from ` Data.Array.Parallel.Unlifted`, and contains all primitive types like `Bool`, `Int`, `Float`, and pairs thereof constructed with the type constructor `:*:`, also exported from the same module. The elements of the type class `Shape` describe the shape of a multidimensional array, but also indices into
    46 an array ('''Note:''' so, is `Shape` really the right name? `Ix` however, also doesn't seem to be right, since it is too different from the`Ix` defined in the Prelude)
    47 {{{
    48 class U.Elt sh => Shape sh where
    49   dim   :: sh -> Int                 -- number of dimensions (>= 0)
    50   size  :: sh -> Int                  -- for a shape, yield the total number of
    51                                                  -- elements in that array
    52   index :: sh -> sh -> Int     -- corresponding index into a linear, row-major
    53                                                  -- representation of the array (first argument
    54                                                  -- is the shape)
     109data Index a initialDim projectedDim where
     110  IndexNil   :: Index a () ()
     111  IndexAll   :: (Shape init, Shape proj) =>     
     112                   Index a init proj -> Index a (init :*: Int) (proj :*: Int)
     113  IndexFixed :: (Shape init, Shape proj) => a ->
     114                   Index a init proj -> Index a (init :*: Int)  proj
    55115
    56   shLast:: (sh :*: Int) -> Int  --  given an at least one dimensional shape, returns the innermost
    57                                                  --  dimension   
    58   shInit:: (sh :*: Int) -> sh    --  returns the outermost dimensions
    59   range:: sh -> U.Array sh    -- all indexes for a given shape
     116type SelectIndex = Index Int
     117type MapIndex    = Index ()
    60118}}}
    61119
    62 with the following instances:
    63120
    64 {{{
    65 instance Shape ()
    66 instance (Shape sh1, U.Elt sh1) => Shape (sh1 :*: Int)
    67 }}}
    68 So, for example a two dimensional array of three vectors of the length five has the shape `(() :*: 5) :*: 3`. This is a suitable internal representation, but it should be hidden from the user, who should be provided with a more familiar notation, but for now, we will stick with the internal representation.
     121Note that the library provides no way to statically check the pre- and
     122postconditions on the actual size of arguments and results. This has
     123to be done at run time using assertions.
    69124
    70 We use the following type synonyms to improve the readability of the code:
    71 {{{
    72 type DIM0 = ()
    73 type DIM1 = DIM0 :*: Int
    74 type DIM2 = DIM1 :*: Int
    75 type DIM3 = DIM2 :*: Int
    76 }}}
    77 == Operations ==
    78 
    79 === Array Shapes ===
    80 The `shape` function returns the shape of an n-dimensional array as n-tuple:
    81 {{{
    82 shape  :: Array dim e ->  dim
    83 }}}
    84 For example
    85 {{{
    86 matrixMult:: (Elt e, Num e) => Array DIM2 -> Array DIM2 e -> Array DIM2 e
    87 matrixMult m1 m2| snd (shape m1) == fst (shape m2) = multiply ....
    88                 | otherwise                                             = error "Incompatible array sizes in matrixMult..."
    89 
    90 }}}
    91 
    92 {{{
    93 -- size of both shapes have to be the same, otherwise runtime error
    94 reshape     ::(Shape dim, Shape dim') =>
    95                     dim                             -- new shape
    96               -> Array dim' a              -- array to be reshaped
    97               -> Array dim a
    98 }}}
    99 
    100 === Creating Arrays ===
    101 
    102 A new array can be created from a flat parallel array
    103 {{{
    104 fromNArray:: U.Elt r => U.Array r -> Array DIM1 r
    105 }}}
    106  
    107 {{{
    108 fromScalar::  U.Elt r => r -> Array DIM0 r
    109 }}}
    110 ''
    111 and  bpermuteR, which creates a new array of new shape, using values of the argument array.
    112 {{{
    113 bpermuteR:: Array dim e -> Shape dim' -> (Shape dim' -> Shape dim) -> Array dim'
    114 }}}
    115 For example, transposition of a two dimensional array can be defined in terms of bpermuteR as follows:
    116 {{{
    117 transpose:: Array DIM2 a -> Array DIM2 a
    118 transpose arr = bpermuteR arr (m,n) (\(i,j) -> (j,i))
    119   where (n,m) = shape arr
    120 }}}
    121 Or cutting a 3 x 3 tile starting at indices (0,0) out of a two dimensional matrix:
    122 {{{
    123 tile:: Array DIM2 a -> Array DIM2 a
    124 tile arr = bpermuteR arr (3,3) id
    125 }}}
    126 
    127 '''SLPJ: Does the `Shape` stuff need to be exposed at this level. Could we not work just in terms of the `(Int,Int)` indices the programmer expects, and hide the shapery?'''
    128 
    129 === Manipulating array values ===
    130 
    131 All the shape invariant operations available on parallel arrays are also defined on regular arrays:
    132 {{{
    133 map     :: (Elt a, Elt b) => (a -> b) -> Array dim a -> Array dim b
    134 zipWith :: (Elt a, Elt b, Elt c) => (a -> b -> c) -> Array dim a -> Array dim b -> Array dim c
    135 }}}
    136 Parallel array operations which can change the shape are restricted to one dimensional arrays, to make sure that the
    137 result is still a regular array.
    138 {{{
    139 filter :: Elt a => (a -> Bool) -> Array DIM1 a -> Array DIM1 a
    140 scan        :: Elt a => ((a, a) -> a)            -- combination function
    141               -> a                               -- default value
    142               -> Array (dim, Int) a              -- linear array
    143               -> (Array dim a, Array (dim, Int) a)
    144 }}}
    145 '''SLPJ: didn't understand scan'''.  Manipulating the shape of arrays:
    146 '''SLPJ: why doesn't `reshape` need the size of the result array, as `bpermuteR` did.'''
    147 
    148 === Changing the dimensionality of an array ===
    149 
    150 
    151 
    152 
    153 ==== The index type ====
    154 
    155 Two operations we provide change the dimensionality of an argument
    156 array, namely the generalised indexing function, which extracts
    157 subarrays from an multidimensional array, and generalised replicate,
    158 which expands the array along specified axes. To encode the
    159 relationship between the argument's dimensionality and the result's dimensionality,
    160 we use the Index type:
    161 {{{
    162 (!) :: Elt e => Arr dim e -> Index dim dim' -> Arr dim' e
    163 
    164 replicate   :: Index dim' dim                     -- ^specifies new dimensions
    165               -> Array dim a                      -- ^data to be replicated
    166               -> Array dim' a
    167 
    168 }}}
    169 where Index is defined as
    170 {{{
    171 data Index initialDim projectedDim where
    172   IndexNil   :: Index () ()
    173   IndexAll   :: Index init proj -> Index (init, Int) (proj, Int)
    174   IndexFixed :: Int -> Index init proj -> Index (init, Int)  proj
    175 }}}
    176 
    177 The index type is similar to generalized  selection in SaC. For example, the selection vector
    178 {{{
    179   (1,2,3)
    180 }}}
    181 which indexes the fourth element in the third subarray of the second matrix in a three dimensional array would correspond to the index value
    182 {{{
    183  IndexFixed 1 (IndexFixed 2 (IndexFixed 3 IndexNil)) :: Index ((((),Int), Int),,Int) ()
    184 }}}
    185 Where the type tells us that the index value describes a projection from a three dimensional array to a scalar value. More interestingly,  the SaC selection
    186 {{{
    187  (1,.,3)
    188 }}}
    189 specifies a subvector (i.e., all the values in position (1,i,3) for all i's in the arrays range. This corresponds to
    190 {{{
    191 IndexFixed 1 (IndexAll (IndexFixed 3 IndexNil)):: Index ((((),Int), Int),,Int) ((),Int)
    192 }}}
    193 ==== Examples ====
    194 
    195 To demonstrate the use of the Index type, consider the following replicates expressed in terms of generalised replicate:
    196 {{{
    197 -- 'chunk replicate' - create a two dimensional array by replicating the one dimensional
    198 -- argument array n times
    199 replicateC:: Int -> Array DIM1 a -> Array DIM2 a
    200 replicateC n arr = RArray.replicate       (IndexFixed n (IndexAll IndexNil))  arr
    201 
    202 -- create a two dimensional array by replicating each element n times
    203 -- 'replicateLifted'
    204 replicateL:: Int -> Array DIM1 a -> Array DIM2 a
    205 replicateL n arr = RArray.replicate (IndexAll (IndexFixed n IndexNil))  arr
    206 
    207 -- 'chunkreplicate' on a two dimensional array
    208 --
    209 replicateC2:: Int -> Array DIM2 a -> Array DIM3 a
    210 replicateC2 n arr = RArray.replicate       (IndexFixed n (IndexAll (IndexAll IndexNil)))  arr
    211  
    212 
    213 replicateLL:: Int -> Array DIM2 a -> Array DIM3 a
    214 replicateLL n arr = RArray.replicate (IndexAll (IndexAll (IndexFixed n IndexNil)))  arr
    215  }}}
    216 
    217 In the actual array language (user level) the DPH should provide a general selection-like notation for index expressions.
    218 
    219 === Comparison with SaC ===
    220 
    221 We started out with the goal to provide support for SaC like array programming. In this section compares SaC's functionality with the approach described in this document.
    222