Changes between Version 25 and Version 26 of DataParallel/Regular


Ignore:
Timestamp:
Jan 20, 2010 2:32:58 AM (6 years ago)
Author:
gckeller
Comment:

--

Legend:

Unmodified
Added
Removed
Modified
  • DataParallel/Regular

    v25 v26  
    356356This implementation suffers from the same problem a corresponding C implementation would - since we access one
    357357array row-major, the other column major, the locality is poor. Therefore, first transposing `arr2` and adjusting the
    358 access will actually improve the performance by approximately 40%:
     358access will actually improve the performance significantly:
    359359{{{
    360360mmMult1::
     
    371371
    372372
    373 {{{
    374 mmMult:: (Array.RepFun dim, Array.InitShape dim, Array.Shape dim) =>
     373An alternative way to define matrix-matrix multiplication is in terms of the collective library functions provided. First, we
     374expand both arrays and, in case of `arr2` transpose it such that the elements which have to be multiplied match up. Then,
     375we calculate the products using `zipWith`, and then use `fold` to compute the sums:
     376{{{
     377mmMult2:: (Array.RepFun dim, Array.InitShape dim, Array.Shape dim) =>
    375378    DArray (dim :*: Int :*: Int)  Double -> DArray (dim :*: Int :*: Int)  Double -> DArray (dim :*: Int :*: Int)  Double 
    376 mmMult arr1@(DArray (sh :*: m1 :*: n1) fn1) arr2@(DArray (sh' :*: m2 :*: n2) fn2) =
     379mmMult2 arr1@(DArray (sh :*: m1 :*: n1) fn1) arr2@(DArray (sh' :*: m2 :*: n2) fn2) =
    377380   fold (+) 0 (arr1Ext * arr2Ext)
    378381 where
    379     arr2T   = forceDArray $ transpose arr2  -- forces evaluation of 'transpose'
     382    arr2T   = forceDArray $ transpose arr2 
    380383    arr1Ext = replicate arr1 (Array.IndexAll (Array.IndexFixed m2 (Array.IndexAll Array.IndexNil)))
    381384    arr2Ext = replicate arr2T
     
    383386
    384387}}}
     388In this implementation, `transpose` is necessary to place the elements at the right position for `zipWith`, and we call `forceDArray` for
     389the same reason as in the previous implementation, to improve locality.
     390
    385391
    386392