Changes between Version 11 and Version 12 of DataParallel/Regular


Ignore:
Timestamp:
Aug 18, 2009 4:59:21 AM (5 years ago)
Author:
gckeller
Comment:

--

Legend:

Unmodified
Added
Removed
Modified
  • DataParallel/Regular

    v11 v12  
     1= Shape Polymorphic Arrays and Delayed Arrays = 
    12 
    2 = Regular, Multi-dimensional parallel Arrays = 
    3 The nested parallel arrays of DPH could be used to model regular arrays, as we could simply either create segment information using replicate, or define regular arrays in terms of flat parallel arrays and separately stored dimensionality information, and define operations on these arrays in a library in terms of the nested operations. However, there are two main reasons why this is unsatisfactory: convenience and efficiency.  
     3The library provides a layer on top of DPH unlifted arrays to support multi-dimensional arrays, and operations  
     4like maps, folds, permutations, shifts and so on. The interface for delayed arrays is similar, but in contrast  
     5to operations on the former, any operation on a delayed array is not evaluated. To force evaluation, the programmer 
     6has to explicitely convert a delayed array to a strict array.  
    47 
    5 === Convenience === 
     8The current implementation of the library exposes some implementation details the user of the library shouldn't  
     9have to worry about. Once the design of the library is finalised, most of these will be hidden by distinguishing  
     10between internal types and representation types 
    611 
    7 Languages like SAC, which provide high-level support for operations on multi-dimensional arrays, offer shape invariant operations. If we want to model this on a library level, we either have to give up type safety to a 
    8 large extend (for example, by encoding the shape as a list of integer values whose length is proportionate to its dimensionality) or use sophisticated language features like GADTs, which may impede the usability of the library for inexperienced users. 
     12== Strict Arrays, Delayed Array and Shape Data Type == 
     13Both strict and delayed arrays are parametrised with their shape - that is, their dimensionality and size  
     14of each dimension. The shape type class has the following interface: 
     15 
     16{{{ 
     17class (Show sh, U.Elt sh) => Shape sh where 
     18  dim   :: sh -> Int           -- ^number of dimensions (>= 0) 
     19  size  :: sh -> Int           -- ^for a *shape* yield the total number of  
     20                               -- elements in that array 
     21  index :: sh -> sh -> Int     -- ^corresponding index into a linear, row-major  
     22                               -- representation of the array (first argument 
     23                               -- is the shape) 
     24  indexInv:: sh -> Int -> sh   -- ^given index into linear row major representation, 
     25                               -- calculates index into array 
     26  range      :: sh -> U.Array sh  -- ^all the valid indices in a shape. The following 
     27                                  -- equality should hold:  
     28                                  -- map (index sh) (range sh) = [:0..(size sh)-1:] 
     29  inRange    :: sh -> sh -> Bool  -- ^determines if a given index is in range 
     30  zeroDim    :: sh                 
     31  addDim     :: sh -> sh -> sh    -- ^adds two shapes of the same dimensionality 
     32  modDim     :: sh -> sh -> sh    -- ^modulo operation lifted on shapes 
     33  addModDim  :: sh -> sh -> sh 
     34 
     35  last    :: (sh :*: Int) -> Int  -- ^yields the innermost dimension of a shape 
     36  inits   :: (sh :*: Int) -> sh   -- ^removes the innermost dimension from a shape 
     37}}} 
     38 
     39Note that a `Shape` has to be in the type class `Elt` imported from `Data.Parallel.Array.Unboxed` so  
     40that it can be an element of  `Data.Parallel.Array.Unboxed.Array`. 
     41 
     42The following instances are defined 
     43{{{ 
     44instance Shape ()  
     45instance Shape sh => Shape (sh :*: Int)  
     46}}} 
     47so we have inductively defined n-tuples of `Int` values to represent shapes. This somewhat unusual representation 
     48is necessary to be able to inductively define operations on `Shape`. It should, however, be hidden from the library 
     49user in favour of the common tuple representation. 
     50 
     51For convenience, we provide type synonyms for dimensionality up to five: 
     52{{{ 
     53type DIM0 = () 
     54type DIM1 = (DIM0 :*: Int) 
     55.... 
     56}}} 
    957 
    1058 
    11 === Efficiency === 
     59== Operations on Arrays and Delayed Arrays == 
    1260 
    13 When encoding multidimensional arrays using segment descriptors or by storing the dimensions separately. In the first case, this would mean significant memory overhead proportionate to the number of subarrays on each level. But even in the second case, segment descriptors have to be generated to call functions like segmented fold and scan. It is hard to predict the exact overhead for this step, as fusion might prevent the segment descriptor array to be actually built in many cases. More significant in terms of overhead is that, when using segment descriptors, parallel splits become significantly more complicated, as they require communication in the irregular case to determine the distribution, whereas the distributions of a regularly segmented array can be determined locally on each processor. 
     61=== Array Creation and Conversion === 
    1462 
    15 == Language Support == 
     63Strict arrays are simply defined as record containing a flat data array and shape information: 
     64{{{ 
     65data Array dim e where 
     66  Array { arrayShape    :: dim                -- ^extend of dimensions 
     67        , arrayData     :: U.Array e          -- flat parallel array 
     68        }               :: Array dim e 
     69  deriving Show 
    1670 
    17 The remainder of this document is a first design draft for SaC style language support of multidimensional arrays in the context of DPH. The implementation is not completed yet, and there are several open questions. 
     71toArray  :: (U.Elt e, Shape dim) => dim -> U.Array e -> Array dim e 
     72fromArray:: (U.Elt e, Shape dim) => Array dim e -> U.Array e  
     73}}} 
    1874 
    19 == The regular array type == 
    20  
    21 === SaC === 
    22  
    23 In SaC, multidimensional arrays are represented by two vectors, the shape and the data vector, where vectors are one dimensional arrays. Scalar values are viewed as 0-dimensional arrays. The function `reshape` takes as first argument a shape vector, as second an array, and creates an array with identical data vector and the given shape vector. For example: 
     75Delayed arrays, in contrast, in addition to the shape, only contain a function which, given an index, 
     76yields the corresponding element. 
    2477{{{ 
    25   reshape ([3,2],[1,2,3,4,5,6]) 
     78data DArray dim e where  
     79  DArray :: {dArrayShape::dim -> dArrayFn:: (dim -> e)} -> DArray dim e 
    2680}}} 
    27 produces a 3 times 2 matrix. 
    28  
    29 === DPH ===  
    30 Regular parallel arrays are similar to arrays in SaC, with one major 
    31 difference: SaC employs a mix of static and dynamic type checking, combined with a form of shape inference, whereas we use GHC's type checker to ensure certain domain restrictions are not violated.   
    32  
    33 '''Note:''' currently, we are only able to statically check that restrictions regarding the dimensionality of and array are met, but not with respect to the size. SaC is, to a certain extend, able to do so. I still need to check if there are some cases where the DPH approach would statically find some dimensionality bugs where SaC wouldn't - need to check that.  
     81Delayed arrays can be converted to and from strict arrays: 
     82(TODO: there needs to be an darray constructor function accepting the shape and the function as arguments) 
     83{{{ 
     84toDArray:: (U.Elt e, Array.Shape dim)   => Array.Array dim e -> DArray dim e 
     85fromDArray:: (U.Elt e, Array.Shape dim) => DArray dim e      -> Array dim e 
     86}}} 
    3487 
    3588 
    36 array operations in DPH are fully typed, and consequently, what 
    37 is called 'shape invariant programming' in SaC works differently in DPH. In particular, in DPH the dimensionality of an array (not its size, however) are encoded in its type. 
     89=== Shape Invariant Computations on Arrays === 
    3890 
    39 An multidimensional array is parametrised with its dimensionality and its 
    40 element type: 
     91The library provides a range of operation where the dimensionality of 
     92the result depends on the dimensionality of the argument in a 
     93non-trivial matter, which we want to be reflected in the type system.  
     94Examples of such functions are generalised selection, which allows for  
     95extraction of subarrays of arbitrary dimension, and generalised replicate, 
     96which allows replication of an array in any dimension (or dimensions).  
     97 
     98For selection, we can informally state the relationship between dimensionality of 
     99the argument, the selector, and the result as follows: 
     100{{{ 
     101select:: Array dim e -> <select dim' of dim array> -> Array dim' e 
     102}}} 
     103 
     104To express this relationship, the library provides the index GADT, 
     105which expresses a relationship between the inital and the projected 
     106dimensionality. It is defined as follows: 
    41107 
    42108{{{ 
    43 (Shape dim, U.Elt a)  => Array dim a  
    44 }}} 
    45 The element type of multidimensional arrays is restricted to the type class `Elt` exported from ` Data.Array.Parallel.Unlifted`, and contains all primitive types like `Bool`, `Int`, `Float`, and pairs thereof constructed with the type constructor `:*:`, also exported from the same module. The elements of the type class `Shape` describe the shape of a multidimensional array, but also indices into 
    46 an array ('''Note:''' so, is `Shape` really the right name? `Ix` however, also doesn't seem to be right, since it is too different from the`Ix` defined in the Prelude)  
    47 {{{ 
    48 class U.Elt sh => Shape sh where 
    49   dim   :: sh -> Int                 -- number of dimensions (>= 0) 
    50   size  :: sh -> Int                  -- for a shape, yield the total number of  
    51                                                  -- elements in that array 
    52   index :: sh -> sh -> Int     -- corresponding index into a linear, row-major  
    53                                                  -- representation of the array (first argument 
    54                                                  -- is the shape) 
     109data Index a initialDim projectedDim where 
     110  IndexNil   :: Index a () () 
     111  IndexAll   :: (Shape init, Shape proj) =>       
     112                   Index a init proj -> Index a (init :*: Int) (proj :*: Int) 
     113  IndexFixed :: (Shape init, Shape proj) => a ->  
     114                   Index a init proj -> Index a (init :*: Int)  proj 
    55115 
    56   shLast:: (sh :*: Int) -> Int  --  given an at least one dimensional shape, returns the innermost  
    57                                                  --  dimension    
    58   shInit:: (sh :*: Int) -> sh    --  returns the outermost dimensions 
    59   range:: sh -> U.Array sh    -- all indexes for a given shape 
     116type SelectIndex = Index Int 
     117type MapIndex    = Index () 
    60118}}} 
    61119 
    62 with the following instances: 
    63120 
    64 {{{ 
    65 instance Shape () 
    66 instance (Shape sh1, U.Elt sh1) => Shape (sh1 :*: Int)  
    67 }}} 
    68 So, for example a two dimensional array of three vectors of the length five has the shape `(() :*: 5) :*: 3`. This is a suitable internal representation, but it should be hidden from the user, who should be provided with a more familiar notation, but for now, we will stick with the internal representation. 
     121Note that the library provides no way to statically check the pre- and 
     122postconditions on the actual size of arguments and results. This has 
     123to be done at run time using assertions. 
    69124 
    70 We use the following type synonyms to improve the readability of the code: 
    71 {{{ 
    72 type DIM0 = () 
    73 type DIM1 = DIM0 :*: Int 
    74 type DIM2 = DIM1 :*: Int 
    75 type DIM3 = DIM2 :*: Int 
    76 }}} 
    77 == Operations == 
    78  
    79 === Array Shapes === 
    80 The `shape` function returns the shape of an n-dimensional array as n-tuple: 
    81 {{{ 
    82 shape  :: Array dim e ->  dim 
    83 }}} 
    84 For example 
    85 {{{ 
    86 matrixMult:: (Elt e, Num e) => Array DIM2 -> Array DIM2 e -> Array DIM2 e  
    87 matrixMult m1 m2| snd (shape m1) == fst (shape m2) = multiply .... 
    88                 | otherwise                                             = error "Incompatible array sizes in matrixMult..." 
    89  
    90 }}} 
    91  
    92 {{{ 
    93 -- size of both shapes have to be the same, otherwise runtime error 
    94 reshape     ::(Shape dim, Shape dim') => 
    95                     dim                             -- new shape 
    96               -> Array dim' a              -- array to be reshaped 
    97               -> Array dim a 
    98 }}} 
    99  
    100 === Creating Arrays === 
    101  
    102 A new array can be created from a flat parallel array 
    103 {{{ 
    104 fromNArray:: U.Elt r => U.Array r -> Array DIM1 r 
    105 }}} 
    106   
    107 {{{ 
    108 fromScalar::  U.Elt r => r -> Array DIM0 r 
    109 }}} 
    110 '' 
    111 and  bpermuteR, which creates a new array of new shape, using values of the argument array. 
    112 {{{ 
    113 bpermuteR:: Array dim e -> Shape dim' -> (Shape dim' -> Shape dim) -> Array dim' 
    114 }}} 
    115 For example, transposition of a two dimensional array can be defined in terms of bpermuteR as follows: 
    116 {{{ 
    117 transpose:: Array DIM2 a -> Array DIM2 a 
    118 transpose arr = bpermuteR arr (m,n) (\(i,j) -> (j,i)) 
    119   where (n,m) = shape arr 
    120 }}} 
    121 Or cutting a 3 x 3 tile starting at indices (0,0) out of a two dimensional matrix: 
    122 {{{ 
    123 tile:: Array DIM2 a -> Array DIM2 a 
    124 tile arr = bpermuteR arr (3,3) id 
    125 }}} 
    126  
    127 '''SLPJ: Does the `Shape` stuff need to be exposed at this level. Could we not work just in terms of the `(Int,Int)` indices the programmer expects, and hide the shapery?''' 
    128  
    129 === Manipulating array values === 
    130  
    131 All the shape invariant operations available on parallel arrays are also defined on regular arrays: 
    132 {{{ 
    133 map     :: (Elt a, Elt b) => (a -> b) -> Array dim a -> Array dim b 
    134 zipWith :: (Elt a, Elt b, Elt c) => (a -> b -> c) -> Array dim a -> Array dim b -> Array dim c 
    135 }}} 
    136 Parallel array operations which can change the shape are restricted to one dimensional arrays, to make sure that the  
    137 result is still a regular array.  
    138 {{{ 
    139 filter :: Elt a => (a -> Bool) -> Array DIM1 a -> Array DIM1 a 
    140 scan        :: Elt a => ((a, a) -> a)            -- combination function 
    141               -> a                               -- default value 
    142               -> Array (dim, Int) a              -- linear array 
    143               -> (Array dim a, Array (dim, Int) a) 
    144 }}} 
    145 '''SLPJ: didn't understand scan'''.  Manipulating the shape of arrays: 
    146 '''SLPJ: why doesn't `reshape` need the size of the result array, as `bpermuteR` did.''' 
    147  
    148 === Changing the dimensionality of an array === 
    149  
    150  
    151  
    152  
    153 ==== The index type ==== 
    154  
    155 Two operations we provide change the dimensionality of an argument 
    156 array, namely the generalised indexing function, which extracts 
    157 subarrays from an multidimensional array, and generalised replicate, 
    158 which expands the array along specified axes. To encode the 
    159 relationship between the argument's dimensionality and the result's dimensionality,  
    160 we use the Index type: 
    161 {{{ 
    162 (!) :: Elt e => Arr dim e -> Index dim dim' -> Arr dim' e 
    163  
    164 replicate   :: Index dim' dim                     -- ^specifies new dimensions 
    165               -> Array dim a                      -- ^data to be replicated 
    166               -> Array dim' a 
    167  
    168 }}} 
    169 where Index is defined as 
    170 {{{ 
    171 data Index initialDim projectedDim where 
    172   IndexNil   :: Index () () 
    173   IndexAll   :: Index init proj -> Index (init, Int) (proj, Int) 
    174   IndexFixed :: Int -> Index init proj -> Index (init, Int)  proj 
    175 }}} 
    176  
    177 The index type is similar to generalized  selection in SaC. For example, the selection vector  
    178 {{{ 
    179   (1,2,3) 
    180 }}} 
    181 which indexes the fourth element in the third subarray of the second matrix in a three dimensional array would correspond to the index value 
    182 {{{ 
    183  IndexFixed 1 (IndexFixed 2 (IndexFixed 3 IndexNil)) :: Index ((((),Int), Int),,Int) () 
    184 }}} 
    185 Where the type tells us that the index value describes a projection from a three dimensional array to a scalar value. More interestingly,  the SaC selection 
    186 {{{ 
    187  (1,.,3) 
    188 }}} 
    189 specifies a subvector (i.e., all the values in position (1,i,3) for all i's in the arrays range. This corresponds to 
    190 {{{ 
    191 IndexFixed 1 (IndexAll (IndexFixed 3 IndexNil)):: Index ((((),Int), Int),,Int) ((),Int) 
    192 }}} 
    193 ==== Examples ==== 
    194  
    195 To demonstrate the use of the Index type, consider the following replicates expressed in terms of generalised replicate: 
    196 {{{ 
    197 -- 'chunk replicate' - create a two dimensional array by replicating the one dimensional  
    198 -- argument array n times 
    199 replicateC:: Int -> Array DIM1 a -> Array DIM2 a 
    200 replicateC n arr = RArray.replicate       (IndexFixed n (IndexAll IndexNil))  arr 
    201  
    202 -- create a two dimensional array by replicating each element n times 
    203 -- 'replicateLifted' 
    204 replicateL:: Int -> Array DIM1 a -> Array DIM2 a 
    205 replicateL n arr = RArray.replicate (IndexAll (IndexFixed n IndexNil))  arr 
    206  
    207 -- 'chunkreplicate' on a two dimensional array 
    208 -- 
    209 replicateC2:: Int -> Array DIM2 a -> Array DIM3 a 
    210 replicateC2 n arr = RArray.replicate       (IndexFixed n (IndexAll (IndexAll IndexNil)))  arr 
    211   
    212  
    213 replicateLL:: Int -> Array DIM2 a -> Array DIM3 a 
    214 replicateLL n arr = RArray.replicate (IndexAll (IndexAll (IndexFixed n IndexNil)))  arr 
    215  }}} 
    216  
    217 In the actual array language (user level) the DPH should provide a general selection-like notation for index expressions. 
    218  
    219 === Comparison with SaC === 
    220  
    221 We started out with the goal to provide support for SaC like array programming. In this section compares SaC's functionality with the approach described in this document. 
    222