Changes between Version 48 and Version 49 of DataParallel/BenchmarkStatus
 Timestamp:
 Dec 2, 2010 2:37:53 AM (3 years ago)
Legend:
 Unmodified
 Added
 Removed
 Modified

DataParallel/BenchmarkStatus
v48 v49 5 5 This page gives an overview of how well the benchmarks in the [http://darcs.haskell.org/packages/dph/dphexamples dphexamples/] directory of package dph are currently working. 6 6 7 === Overview over the benchmark programs === 7 The benchmarks are run each night by [http://darcs.haskell.org/packages/dph/dphbuildbot DPH BuildBot]. The results are posted to cvsghc and uploaded to [http://log.ouroborus.net/limitingfactor/dph/]. Check there for the latest numbers. 8 8 9 [http://darcs.haskell.org/packages/dph/examples/sumsq/ SumSq]:: 10 Computes the sum of the squares from 1 to N using `Int`. There are two variants of this program: (1) "primitives" is directly coded against the array primitives from package dph and (2) "vectorised" is a highlevel DPH program transformed by GHC's vectoriser. As a reference implementation, we have a sequential C program denoted by "ref C". 9 == Key == 10 <project>.<benchmarkname>.<version>.<parallelism>.<threads> 11 12 Project 13 * Either ''dph'' or ''repa''. Repa programs use the same parallel array library as DPH, but do not go through the vectorising transform. 14 15 Version 16 * ''vectorised'' means it's been through the DPH vectorising transform. 17 * ''vector'' is a hand written version using immutable Data.Vectors 18 * ''vectormutable'' is a hand written version using mutable Data.Vectors. 19 * ''vectorimmutable'' means the same as ''vector'' and is used when there is also an mutable version. 20 21 Parallelism 22 * Whether a benchmark is natively parallel or sequential. 23 * Parallel versions are also run single threaded (with N1) and sequential versions are also run with (N4) so we get the parallel GC. 24 * Parallel versions with N1 will tend to be slower than natively sequential versions due to overheads for supporting parallelism. 25 26 [[br]] 27 == Statically Nested == 28 Statically nested parallelism is where the parallelism has a fixed, finite depth. For example ``mapP f (filterP g xs)``. Statically nested programs are easier to vectorize than dynamically nested programs. At present, single threaded statically nested programs should run as fast as equivalent Data.Vector programs. Parallel versions should display a good speedup. 29 30 [http://darcs.haskell.org/packages/dph/dphexamples/imaginary/SumSquares/ SumSquares]:: 31 Computes the sum of the squares from 1 to N using `Int`. N = 10M. 32 33  '''name'''  '''runtime'''  '''speedup'''  '''notes''' 34  dph.sumsq.vector.seq.N4  404ms  1   35  dph.sumsq.vectorised.seq.N4  434ms  0.93   36  dph.sumsq.vectorised.par.N1  443ms  0.91   37  dph.sumsq.vectorised.par.N2  222ms  1.82   38  dph.sumsq.vectorised.par.N4  111ms  3.63   39 40 '''Summary''': fine, though we should run a sequential C version as well. 41 42 11 43 [http://darcs.haskell.org/packages/dph/examples/dotp/ DotP]:: 12 44 Computes the dot product of two vectors of `Double`s. There are two variants of this program: (1) "primitives" is directly coded against the array primitives from package dph and (2) "vectorised" is a highlevel DPH program transformed by GHC's vectoriser. In addition to these two DPH variants of the dot product, we also have two nonDPH reference implementations: (a) "ref Haskell" is a Haskell program using imperative, unboxed arrays and and (b) "ref C" is a C implementation using pthreads. 45 13 46 [http://darcs.haskell.org/packages/dph/examples/smvm/ SMVM]:: 14 47 Multiplies a dense vector with a sparse matrix represented in the ''compressed sparse row format (CSR).'' There are three variants of this program: (1) "primitives" is directly coded against the array primitives from package dph and (2) "vectorised" is a highlevel DPH program transformed by GHC's vectoriser. As a reference implementation, we have a sequential C program denoted by "ref C". 48 49 [[br]] 50 === Dynamically Nested === 51 Dynamically nested programs have a recursive structure where each level of the recursion invokes more parallel computations. This is common for benchmarks that use divideandconquer style algorithms. 52 53 [http://darcs.haskell.org/packages/dph/examples/primes/ Primes]:: 54 The Sieve of Eratosthenes using parallel writes into a sieve structure represented as an array of `Bool`s. We currently don't have a proper parallel implementation of this benchmark, as we are missing a parallel version of default backpermute. The problem is that we need to make the representation of parallel arrays of `Bool` dependent on whether the hardware supports atomic writes of bytes. '''Investigate whether any of the architectures relevant for DPH actually do have trouble with atomic writes of bytes (aka `Word8`).''' 55 15 56 [http://darcs.haskell.org/packages/dph/examples/quickhull/ Quickhull]:: 16 57 Given a set of points (in a plane), compute the sequence of points that encloses all points in the set. This benchmark is interesting as it is the simplest code that exploits the ability to implement divideandconquer algorithms with nested data parallelism. We have only a "vectorised" version of this benchmark and a sequential Haskell reference implementation, "ref Haskell", using vanilla lists. 17 [http://darcs.haskell.org/packages/dph/examples/primes/ Primes]:: 18 The Sieve of Eratosthenes using parallel writes into a sieve structure represented as an array of `Bool`s. We currently don't have a proper parallel implementation of this benchmark, as we are missing a parallel version of default backpermute. The problem is that we need to make the representation of parallel arrays of `Bool` dependent on whether the hardware supports atomic writes of bytes. '''Investigate whether any of the architectures relevant for DPH actually do have trouble with atomic writes of bytes (aka `Word8`).''' 58 19 59 [http://darcs.haskell.org/packages/dph/examples/qsort/ Quicksort]:: 20 60 FIXME 21 [http://darcs.haskell.org/packages/dph/examples/concomp/ ConComp]:: 22 Implementation of the AwerbuchShiloach and Hybrid algorithms for finding connected components in undirected graphs. There is only a version directly coded against the array primitives. '''Needs to be adapted to new benchmark framework.''' 23 [http://darcs.haskell.org/packages/dph/examples/barnesHut/ BarnesHut]:: 61 62 [[br]] 63 === Dynamically Nested with Algebraic Data Types === 64 These programs also use user defined algebraic data types. Vectorization of these programs is still a work in progress. 65 66 [http://darcs.haskell.org/packages/dph/dphexamples/barnesHut/ BarnesHut]:: 24 67 This benchmark implements the BarnesHut algorithm to solve the ''n''body problem in two dimensions. '''Currently won't compile with vectorisation due to excessive inlining of dictionaries.''' 25 68