Version 48 (modified by 6 years ago) (diff)  ,

Status of DPH Benchmarks
This page gives an overview of how well the benchmarks in the dphexamples/ directory of package dph are currently working.
Overview over the benchmark programs
 SumSq

Computes the sum of the squares from 1 to N using
Int
. There are two variants of this program: (1) "primitives" is directly coded against the array primitives from package dph and (2) "vectorised" is a highlevel DPH program transformed by GHC's vectoriser. As a reference implementation, we have a sequential C program denoted by "ref C".  DotP

Computes the dot product of two vectors of
Double
s. There are two variants of this program: (1) "primitives" is directly coded against the array primitives from package dph and (2) "vectorised" is a highlevel DPH program transformed by GHC's vectoriser. In addition to these two DPH variants of the dot product, we also have two nonDPH reference implementations: (a) "ref Haskell" is a Haskell program using imperative, unboxed arrays and and (b) "ref C" is a C implementation using pthreads.  SMVM
 Multiplies a dense vector with a sparse matrix represented in the compressed sparse row format (CSR). There are three variants of this program: (1) "primitives" is directly coded against the array primitives from package dph and (2) "vectorised" is a highlevel DPH program transformed by GHC's vectoriser. As a reference implementation, we have a sequential C program denoted by "ref C".
 Quickhull
 Given a set of points (in a plane), compute the sequence of points that encloses all points in the set. This benchmark is interesting as it is the simplest code that exploits the ability to implement divideandconquer algorithms with nested data parallelism. We have only a "vectorised" version of this benchmark and a sequential Haskell reference implementation, "ref Haskell", using vanilla lists.
 Primes

The Sieve of Eratosthenes using parallel writes into a sieve structure represented as an array of
Bool
s. We currently don't have a proper parallel implementation of this benchmark, as we are missing a parallel version of default backpermute. The problem is that we need to make the representation of parallel arrays ofBool
dependent on whether the hardware supports atomic writes of bytes. Investigate whether any of the architectures relevant for DPH actually do have trouble with atomic writes of bytes (akaWord8
).  Quicksort
 FIXME
 ConComp
 Implementation of the AwerbuchShiloach and Hybrid algorithms for finding connected components in undirected graphs. There is only a version directly coded against the array primitives. Needs to be adapted to new benchmark framework.
 BarnesHut
 This benchmark implements the BarnesHut algorithm to solve the nbody problem in two dimensions. Currently won't compile with vectorisation due to excessive inlining of dictionaries.
Execution on LimitingFactor (2x QuadCore Xeon)
Hardware spec: 2x 3.0GHz QuadCore Intel Xeon 5400; 12MB (2x6MB) ondie L2 cache per processor; independent 1.6GHz frontside bus per processor; 800MHz DDR2 FBDIMM; 256bitwide memory architecture; Mac OS X Server 10.5.6