Changes between Version 25 and Version 26 of DataParallel/Regular


Ignore:
Timestamp:
Jan 20, 2010 2:32:58 AM (4 years ago)
Author:
gckeller
Comment:

--

Legend:

Unmodified
Added
Removed
Modified
  • DataParallel/Regular

    v25 v26  
    356356This implementation suffers from the same problem a corresponding C implementation would - since we access one 
    357357array row-major, the other column major, the locality is poor. Therefore, first transposing `arr2` and adjusting the 
    358 access will actually improve the performance by approximately 40%:  
     358access will actually improve the performance significantly:  
    359359{{{ 
    360360mmMult1::  
     
    371371 
    372372 
    373 {{{ 
    374 mmMult:: (Array.RepFun dim, Array.InitShape dim, Array.Shape dim) =>  
     373An alternative way to define matrix-matrix multiplication is in terms of the collective library functions provided. First, we 
     374expand both arrays and, in case of `arr2` transpose it such that the elements which have to be multiplied match up. Then, 
     375we calculate the products using `zipWith`, and then use `fold` to compute the sums: 
     376{{{ 
     377mmMult2:: (Array.RepFun dim, Array.InitShape dim, Array.Shape dim) =>  
    375378    DArray (dim :*: Int :*: Int)  Double -> DArray (dim :*: Int :*: Int)  Double -> DArray (dim :*: Int :*: Int)  Double   
    376 mmMult arr1@(DArray (sh :*: m1 :*: n1) fn1) arr2@(DArray (sh' :*: m2 :*: n2) fn2) =  
     379mmMult2 arr1@(DArray (sh :*: m1 :*: n1) fn1) arr2@(DArray (sh' :*: m2 :*: n2) fn2) =  
    377380   fold (+) 0 (arr1Ext * arr2Ext) 
    378381 where 
    379     arr2T   = forceDArray $ transpose arr2  -- forces evaluation of 'transpose' 
     382    arr2T   = forceDArray $ transpose arr2   
    380383    arr1Ext = replicate arr1 (Array.IndexAll (Array.IndexFixed m2 (Array.IndexAll Array.IndexNil))) 
    381384    arr2Ext = replicate arr2T 
     
    383386 
    384387}}} 
     388In this implementation, `transpose` is necessary to place the elements at the right position for `zipWith`, and we call `forceDArray` for 
     389the same reason as in the previous implementation, to improve locality.  
     390 
    385391 
    386392