|Version 2 (modified by 10 years ago) (diff),|
Sparse matrix vector multiplication
This benchmark is explained it much detail in Data Parallel Haskell: a status report. Runtimes comparing to sequential C code on Intel Xeon (x86) and Sun SunFire9600 (Sparc) are in time-colour.png. The parallel Haskell code is more efficient from 2 PEs for the SunFire and from 4 PEs for the Xeon processors. We blame the low sequential performance for the Xeon on the lack of effort that has been put into generating good straight-line code (both in the NCG and when compiling via C), this includes inadequate register allocation and lack of low-level optimisations.
The speedup for the Xeon box and the SunFire are in speedup-colour.png and the speedup for our 8x dualcore Opteron NUMA box is in serenity-all-speedup-colour.png. The speedup of on the NUMA machine is limited by the memory bandwidth for smvm. When we only use one core per CPU, the benchmark scales much better. Moreover, the memory traffic/compute ratio is slightly more favourable when processing arrays of
Floats than when processing arrays of
speedup-colour.png (6.0 KB) - added by 10 years ago.
Speedup of smvm on 2x dualcore Xeon and SunFire9600
time-colour.png (6.9 KB) - added by 10 years ago.
Runtime of smvm on 2x dualcore Xeon and SunFire9600, including sequential C reference implementation
serenity-all-speedup-colour.png (7.7 KB) - added by 10 years ago.
Speedup of smvm on 8x dualcore Opteron for Float and Double, including the use of one core/CPU and two cores/CPU
Download all attachments as: .zip