Changes between Version 1 and Version 2 of DataParallel/Benchmarks


Ignore:
Timestamp:
Mar 19, 2007 10:10:21 AM (7 years ago)
Author:
chak
Comment:

--

Legend:

Unmodified
Added
Removed
Modified
  • DataParallel/Benchmarks

    v1 v2  
    33=== Sparse matrix vector multiplication === 
    44 
    5 This benchmark is explained it much detail in [http://www.cse.unsw.edu.au/~chak/papers/CLPKM07.html Data Parallel Haskell: a status report].   
     5This benchmark is explained it much detail in [http://www.cse.unsw.edu.au/~chak/papers/CLPKM07.html Data Parallel Haskell: a status report].  Runtimes comparing to sequential C code on Intel Xeon (x86) and Sun SunFire9600 (Sparc) are in [attachment:time-colour.png].  The parallel Haskell code is more efficient from 2 PEs for the !SunFire and from 4 PEs for the Xeon processors.  We blame the low sequential performance for the Xeon on the lack of effort that has been put into generating good straight-line code (both in the NCG and when compiling via C), this includes inadequate register allocation and lack of low-level optimisations. 
     6 
     7The speedup for the Xeon box and the !SunFire are in [attachment:speedup-colour.png] and the speedup for our 8x dualcore Opteron NUMA box is in [attachment:serenity-all-speedup-colour.png].  The speedup of on the NUMA machine is limited by the memory bandwidth for smvm.  When we only use one core per CPU, the benchmark scales much better.  Moreover, the memory traffic/compute ratio is slightly more favourable when processing arrays of `Float`s than when processing arrays of `Double`s.