Changes between Version 43 and Version 44 of DataParallel/Regular


Ignore:
Timestamp:
Jan 20, 2010 1:25:08 PM (4 years ago)
Author:
gckeller
Comment:

--

Legend:

Unmodified
Added
Removed
Modified
  • DataParallel/Regular

    v43 v44  
    419419=== Performance of Matrix-Matrix Multiplication === 
    420420 
    421 The following table contains the running times of `mmMult1` and `mmMult2', applied to two matrices of with `size * size` elements. As mentioned before, `mmMult2` is faster than `mmMult1`, as `replicate` can be implemented more efficiently than the general permutation which is the result of the element-wise index computation in `mmMult1`. This is the case for most problems: if it is possible to use collection oriented operations, than it will lead to more efficient code. We can also see that using `forceDArray` for improved locality has  a big impact on performance (we have O (size*size*size) memory accesses, and creating the transposed matrix has only a memory overhead of O(size*size)). `mmMult1` without the 
     421The following table contains the running times in milliseconds  of `mmMult1` and `mmMult2', applied to two matrices of with `size * size` elements. As mentioned before, `mmMult2` is faster than `mmMult1`, as `replicate` can be implemented more efficiently than the general permutation which is the result of the element-wise index computation in `mmMult1`. This is the case for most problems: if it is possible to use collection oriented operations, than it will lead to more efficient code. We can also see that using `forceDArray` for improved locality has  a big impact on performance (we have O (size*size*size) memory accesses, and creating the transposed matrix has only a memory overhead of O(size*size)). `mmMult1` without the 
    422422transposed matrix is about as fast as `mmMult2` without `forceDArray` (times omitted). We can also see that the speedup on two processors is close to the optimal speedup of 2. 
    423423