Changes between Version 25 and Version 26 of DataParallel/Regular
 Timestamp:
 Jan 20, 2010 2:32:58 AM (9 years ago)
Legend:
 Unmodified
 Added
 Removed
 Modified

DataParallel/Regular
v25 v26 356 356 This implementation suffers from the same problem a corresponding C implementation would  since we access one 357 357 array rowmajor, the other column major, the locality is poor. Therefore, first transposing `arr2` and adjusting the 358 access will actually improve the performance by approximately 40%:358 access will actually improve the performance significantly: 359 359 {{{ 360 360 mmMult1:: … … 371 371 372 372 373 {{{ 374 mmMult:: (Array.RepFun dim, Array.InitShape dim, Array.Shape dim) => 373 An alternative way to define matrixmatrix multiplication is in terms of the collective library functions provided. First, we 374 expand both arrays and, in case of `arr2` transpose it such that the elements which have to be multiplied match up. Then, 375 we calculate the products using `zipWith`, and then use `fold` to compute the sums: 376 {{{ 377 mmMult2:: (Array.RepFun dim, Array.InitShape dim, Array.Shape dim) => 375 378 DArray (dim :*: Int :*: Int) Double > DArray (dim :*: Int :*: Int) Double > DArray (dim :*: Int :*: Int) Double 376 mmMult arr1@(DArray (sh :*: m1 :*: n1) fn1) arr2@(DArray (sh' :*: m2 :*: n2) fn2) =379 mmMult2 arr1@(DArray (sh :*: m1 :*: n1) fn1) arr2@(DArray (sh' :*: m2 :*: n2) fn2) = 377 380 fold (+) 0 (arr1Ext * arr2Ext) 378 381 where 379 arr2T = forceDArray $ transpose arr2  forces evaluation of 'transpose'382 arr2T = forceDArray $ transpose arr2 380 383 arr1Ext = replicate arr1 (Array.IndexAll (Array.IndexFixed m2 (Array.IndexAll Array.IndexNil))) 381 384 arr2Ext = replicate arr2T … … 383 386 384 387 }}} 388 In this implementation, `transpose` is necessary to place the elements at the right position for `zipWith`, and we call `forceDArray` for 389 the same reason as in the previous implementation, to improve locality. 390 385 391 386 392