313 | | |
| 314 | == Example 1: Matrix multiplication == |
| 315 | As a simple example, consider matrix-matrix multiplication. We can either implement it by directly manipulating the array |
| 316 | function, or use the operations provided by the DArray library. Let as start with the former, which is more fairly similar to |
| 317 | what we would write using loops over array indices: |
| 318 | {{{ |
| 319 | mmMult1:: |
| 320 | DArray (() :*: Int :*: Int) Double -> DArray (() :*: Int :*: Int) Double -> DArray (() :*: Int :*: Int) Double |
| 321 | mmMult1 arr1@(DArray (() :*: m1 :*: n1) _) arr2@(DArray (() :*: m2 :*: n2) _) = |
| 322 | mapFold (+) 0 arrDP |
| 323 | where |
| 324 | arrDP = DArray (():*: m1 :*: n2 :*:n1) |
| 325 | (\(() :*: i :*: j :*: k) -> (index arr1 (() :*: i :*: k)) * (index arr2 (() :*: k :*: j))) |
| 326 | }}} |
| 327 | In the first step, we create the intermediate three dimensional array which contains the products of all |
| 328 | sums and rows, and in the second step, we collapse each of the rows to it's sum, to obtain the two dimensional |
| 329 | result matrix. It is important to note that the elements of `arrDP` are never all in memory (otherwise, the memory |
| 330 | consumption would be cubic), but each value is consumed immediately by `mapFold`. |
| 331 | |
| 332 | This implementation suffers from the same problem a corresponding C implementation would - since we access one |
| 333 | array row-major, the other column major, the locality is poor. Therefore, first transposing `arr2` and adjusting the |
| 334 | access will actually improve the performance by approximately 40%: |
| 335 | {{{ |
| 336 | mmMult1:: |
| 337 | DArray (() :*: Int :*: Int) Double -> DArray (() :*: Int :*: Int) Double -> DArray (() :*: Int :*: Int) Double |
| 338 | mmMult1 arr1@(DArray (() :*: m1 :*: n1) _) arr2@(DArray (() :*: m2 :*: n2) _) = |
| 339 | mapFold (+) 0 arrDP |
| 340 | where |
| 341 | arr2T = forceDArray $ transpose arr2 |
| 342 | arrDP = DArray (():*: m1 :*: n2 :*:n1) |
| 343 | (\(() :*: i :*: j :*: k) -> (index arr1 (() :*: i :*: k)) * (index arr2T (() :*: j:*: k))) |
| 344 | }}} |
| 345 | However, we do need to force the actual creation of the transposed array, otherwise, the change would have no effect at all. We therefore |
| 346 | use `forceDArray`, which converts it into an array whose array function is a simple indexing operation (see description of `forceDArray` above). This means that the second version requires more memory, but this is offset by improving the locality for each of the multiplications. |
| 347 | |
| 348 | {{{ |
| 349 | -- mmMult:: (Array.RepFun dim, Array.InitShape dim, Array.Shape dim) => |
| 350 | -- DArray (dim :*: Int :*: Int) Double -> DArray (dim :*: Int :*: Int) Double -> DArray (dim :*: Int :*: Int) Double |
| 351 | mmMult:: |
| 352 | DArray (() :*: Int :*: Int) Double -> DArray (() :*: Int :*: Int) Double -> DArray (() :*: Int :*: Int) Double |
| 353 | mmMult arr1@(DArray (sh :*: m1 :*: n1) fn1) arr2@(DArray (sh' :*: m2 :*: n2) fn2) = |
| 354 | assert ((m1 == n2) && (sh == sh')) $ |
| 355 | mapFold (+) 0 (arr1Ext * arr2Ext) |
| 356 | -- 'fold' doesn't fuse at the moment, so mapFold is significantly faster |
| 357 | -- fold (+) 0 $ zipWith (*) arr1Ext arr2Ext |
| 358 | where |
| 359 | arr2T = forceDArray $ transpose arr2 -- forces evaluation of 'transpose' |
| 360 | arr1Ext = replicate arr1 (Array.IndexAll (Array.IndexFixed m2 (Array.IndexAll Array.IndexNil))) |
| 361 | arr2Ext = replicate arr2T |
| 362 | (Array.IndexAll (Array.IndexAll (Array.IndexFixed n1 Array.IndexNil))) |
| 363 | |
| 364 | }}} |