Changes between Version 37 and Version 38 of DataParallel/Regular


Ignore:
Timestamp:
Jan 20, 2010 12:36:31 PM (5 years ago)
Author:
gckeller
Comment:

--

Legend:

Unmodified
Added
Removed
Modified
  • DataParallel/Regular

    v37 v38  
     1=DArrays - Haskell Support for Array Computations = 
    12 
    23The library provides a layer on top of DPH unlifted arrays to support multi-dimensional arrays, and shape polymorphic  
     
    421422 
    422423=== Performance of Matrix-Matrix Multiplication === 
    423    
    424 We measured the performance of the two matrix multiplication implementations and compared their 
    425 performance to C. Both matrices contain (size * size) elements. As we can see, the first version is significantly slower. 
     424 
     425The following table contains the running times of `mmMult1` and `mmMult2', applied to two matrices of with `size * size` elements. As mentioned before, `mmMult2` is faster than `mmMult1`, as `replicate` can be implemented more efficiently than the general permutation which is the result of the element-wise index computation in `mmMult1`. This is the case for most problems: if it is possible to use collection oriented operations, than it will lead to more efficient code. We can also see that using `forceDArray` for improved locality has  a big impact on performance (we have O (size*size*size) memory accesses, and creating the transposed matrix has only a memory overhead of O(size*size)). `mmMult1` without the 
     426transposed matrix is about as fast as `mmMult2` without `forceDArray` (times omitted). We can also see that the speedup on two processors is close to the optimal speedup of 2. 
     427 
     428To get an idea about the absolute performance of DArrays, we compared it to two C implementations. The first (handwritten) is a straight forward C implementation with three nested loops, iterations re-arranged to get better performance, which has a similar effect on the performance than the `forceDArray`/`transpose` step. The second implementation uses the matrix-matrix multiplication operation provided by MacOS accelerate library. We can see that, for reasonably large arrays, DArrays is about a factor of 3 slower than the C implementation if run sequentially.  
     429 
    426430{{{ 
    427431  ----------------------------------------------------------------------