Changes between Version 48 and Version 49 of DataParallel/BenchmarkStatus


Ignore:
Timestamp:
Dec 2, 2010 2:37:53 AM (3 years ago)
Author:
benl
Comment:

Add key and start on new numbers

Legend:

Unmodified
Added
Removed
Modified
  • DataParallel/BenchmarkStatus

    v48 v49  
    55This page gives an overview of how well the benchmarks in the [http://darcs.haskell.org/packages/dph/dph-examples dph-examples/] directory of package dph are currently working. 
    66 
    7 === Overview over the benchmark programs === 
     7The benchmarks are run each night by [http://darcs.haskell.org/packages/dph/dph-buildbot DPH BuildBot]. The results are posted to cvs-ghc and uploaded to [http://log.ouroborus.net/limitingfactor/dph/]. Check there for the latest numbers. 
    88 
    9  [http://darcs.haskell.org/packages/dph/examples/sumsq/ SumSq]:: 
    10   Computes the sum of the squares from 1 to N using `Int`.  There are two variants of this program: (1) "primitives" is directly coded against the array primitives from package dph and (2) "vectorised" is a high-level DPH program transformed by GHC's vectoriser.  As a reference implementation, we have a sequential C program denoted by "ref C". 
     9== Key == 
     10<project>.<benchmark-name>.<version>.<parallelism>.<threads> 
     11 
     12Project 
     13 * Either ''dph'' or ''repa''. Repa programs use the same parallel array library as DPH, but do not go through the vectorising transform. 
     14 
     15Version 
     16 * ''vectorised'' means it's been through the DPH vectorising transform.  
     17 * ''vector'' is a hand written version using immutable Data.Vectors 
     18 * ''vector-mutable'' is a hand written version using mutable Data.Vectors. 
     19 * ''vector-immutable'' means the same as ''vector'' and is used when there is also an mutable version. 
     20 
     21Parallelism  
     22 * Whether a benchmark is natively parallel or sequential.  
     23 * Parallel versions are also run single threaded (with -N1) and sequential versions are also run with (-N4) so we get the parallel GC. 
     24 * Parallel versions with -N1 will tend to be slower than natively sequential versions due to overheads for supporting parallelism. 
     25 
     26[[br]] 
     27== Statically Nested == 
     28Statically nested parallelism is where the parallelism has a fixed, finite depth. For example ``mapP f (filterP g xs)``. Statically nested programs are easier to vectorize than dynamically nested programs. At present, single threaded statically nested programs should run as fast as equivalent Data.Vector programs. Parallel versions should display a good speedup. 
     29 
     30 [http://darcs.haskell.org/packages/dph/dph-examples/imaginary/SumSquares/ SumSquares]:: 
     31  Computes the sum of the squares from 1 to N using `Int`.  N = 10M. 
     32 
     33  || '''name''' || '''runtime''' || '''speedup''' || '''notes''' 
     34  || dph.sumsq.vector.seq.N4 ||  404ms || 1 || || 
     35  || dph.sumsq.vectorised.seq.N4 || 434ms || 0.93 || || 
     36  || dph.sumsq.vectorised.par.N1 || 443ms || 0.91 || || 
     37  || dph.sumsq.vectorised.par.N2 || 222ms || 1.82 || || 
     38  || dph.sumsq.vectorised.par.N4 || 111ms || 3.63 || || 
     39 
     40  '''Summary''': fine, though we should run a sequential C version as well. 
     41 
     42 
    1143 [http://darcs.haskell.org/packages/dph/examples/dotp/ DotP]:: 
    1244  Computes the dot product of two vectors of `Double`s.  There are two variants of this program: (1) "primitives" is directly coded against the array primitives from package dph and (2) "vectorised" is a high-level DPH program transformed by GHC's vectoriser.  In addition to these two DPH variants of the dot product, we also have two non-DPH reference implementations: (a) "ref Haskell" is a Haskell program using imperative, unboxed arrays and and (b) "ref C" is a C implementation using pthreads. 
     45 
    1346 [http://darcs.haskell.org/packages/dph/examples/smvm/ SMVM]:: 
    1447  Multiplies a dense vector with a sparse matrix represented in the ''compressed sparse row format (CSR).''  There are three variants of this program: (1) "primitives" is directly coded against the array primitives from package dph and (2) "vectorised" is a high-level DPH program transformed by GHC's vectoriser.  As a reference implementation, we have a sequential C program denoted by "ref C". 
     48 
     49[[br]] 
     50=== Dynamically Nested === 
     51Dynamically nested programs have a recursive structure where each level of the recursion invokes more parallel computations. This is common for benchmarks that use divide-and-conquer style algorithms. 
     52 
     53 [http://darcs.haskell.org/packages/dph/examples/primes/ Primes]:: 
     54  The Sieve of Eratosthenes using parallel writes into a sieve structure represented as an array of `Bool`s.  We currently don't have a proper parallel implementation of this benchmark, as we are missing a parallel version of default backpermute.  The problem is that we need to make the representation of parallel arrays of `Bool` dependent on whether the hardware supports atomic writes of bytes.  '''Investigate whether any of the architectures relevant for DPH actually do have trouble with atomic writes of bytes (aka `Word8`).''' 
     55 
    1556 [http://darcs.haskell.org/packages/dph/examples/quickhull/ Quickhull]:: 
    1657  Given a set of points (in a plane), compute the sequence of points that encloses all points in the set. This benchmark is interesting as it is the simplest code that exploits the ability to implement divide-and-conquer algorithms with nested data parallelism.  We have only a "vectorised" version of this benchmark and a sequential Haskell reference implementation, "ref Haskell", using vanilla lists. 
    17  [http://darcs.haskell.org/packages/dph/examples/primes/ Primes]:: 
    18   The Sieve of Eratosthenes using parallel writes into a sieve structure represented as an array of `Bool`s.  We currently don't have a proper parallel implementation of this benchmark, as we are missing a parallel version of default backpermute.  The problem is that we need to make the representation of parallel arrays of `Bool` dependent on whether the hardware supports atomic writes of bytes.  '''Investigate whether any of the architectures relevant for DPH actually do have trouble with atomic writes of bytes (aka `Word8`).''' 
     58 
    1959 [http://darcs.haskell.org/packages/dph/examples/qsort/ Quicksort]:: 
    2060  FIXME 
    21  [http://darcs.haskell.org/packages/dph/examples/concomp/ ConComp]:: 
    22   Implementation of the Awerbuch-Shiloach and Hybrid algorithms for finding connected components in undirected graphs.  There is only a version directly coded against the array primitives.  '''Needs to be adapted to new benchmark framework.''' 
    23  [http://darcs.haskell.org/packages/dph/examples/barnesHut/ BarnesHut]:: 
     61 
     62[[br]] 
     63=== Dynamically Nested with Algebraic Data Types === 
     64These programs also use user defined algebraic data types. Vectorization of these programs is still a work in progress. 
     65 
     66 [http://darcs.haskell.org/packages/dph/dph-examples/barnesHut/ BarnesHut]:: 
    2467  This benchmark implements the Barnes-Hut algorithm to solve the ''n''-body problem in two dimensions.  '''Currently won't compile with vectorisation due to excessive inlining of dictionaries.''' 
    2568