|Version 5 (modified by chak, 5 years ago) (diff)|
Status of DPH Benchmarks
This page gives an overview of how well the benchmarks in the examples/ directory of package dph are currently working.
Overview over the benchmark programs
- Computes the dot product of two vectors of Doubles. There are two variants of this program: (1) "primitives" is directly coded against the array primitives from package dph and (2) "vectorised" is a high-level DPH program transformed by GHC's vectoriser.
Execution on LimitingFactor (2x Quad-Core Xeon)
Hardware spec: 2x 3.0GHz Quad-Core Intel Xeon 5400; 12MB (2x6MB) on-die L2 cache per processor; independent 1.6GHz frontside bus per processor; 800MHz DDR2; 256-bit-wide memory architecture; Mac OS X Server 10.5.6
|Program||Problem size||sequential||1 core||2 cores||4 cores||8 cores|
|DotP, primitives||10M elements||823/823/824|
|DotP, vectorised||10M elements||823/824/824|
All results are in milliseconds, and the triples report best/average/worst execution case time (wall clock) of three runs. The column marked "sequential" reports times when linked against dph-seq and the columns marked "N cores" report times when linked against dph-par and run in parallel on the specified number of processor cores.
|Program||Sequential (manually vectorised)||Vectorised||Parallel|
|DotP||Order of mag. faster than list impl||Same performance as seq.||speedup of 2 for 2 CPUs, 4 threads|
|QuickSort||Slower than list (fusion)||Slower than seq. (why?)||speedup of 1.4 on 2 CPUs|
|SparseVector||Similar to DotP|
|Primes (Nesl)||15 x faster than list version||NYI||20 x slower than seq (fusion?)|
|BarnesHut||Small bug in alg||Working||See seq.|
- I only ran a first set of benchmarks when checking for what's there. I'll run the benchmarks properly as next step
- Fusion doesn't work well on parallel programs yet, so for all but simple examples, the parallel program performs worse than the sequential
- The compiler doesn't exploit all fusion opportunities for QSort and BarnesHut?. Once this is fixed, they should run considerably faster.
- Interestingly, the automatically vectorised version of qsort is quite a bit faster than the hand-flattened. Need to find out why.