Changes between Version 48 and Version 49 of DataParallel/BenchmarkStatus


Ignore:
Timestamp:
Dec 2, 2010 2:37:53 AM (5 years ago)
Author:
benl
Comment:

Add key and start on new numbers

Legend:

Unmodified
Added
Removed
Modified
  • DataParallel/BenchmarkStatus

    v48 v49  
    55This page gives an overview of how well the benchmarks in the [http://darcs.haskell.org/packages/dph/dph-examples dph-examples/] directory of package dph are currently working.
    66
    7 === Overview over the benchmark programs ===
     7The benchmarks are run each night by [http://darcs.haskell.org/packages/dph/dph-buildbot DPH BuildBot]. The results are posted to cvs-ghc and uploaded to [http://log.ouroborus.net/limitingfactor/dph/]. Check there for the latest numbers.
    88
    9  [http://darcs.haskell.org/packages/dph/examples/sumsq/ SumSq]::
    10   Computes the sum of the squares from 1 to N using `Int`.  There are two variants of this program: (1) "primitives" is directly coded against the array primitives from package dph and (2) "vectorised" is a high-level DPH program transformed by GHC's vectoriser.  As a reference implementation, we have a sequential C program denoted by "ref C".
     9== Key ==
     10<project>.<benchmark-name>.<version>.<parallelism>.<threads>
     11
     12Project
     13 * Either ''dph'' or ''repa''. Repa programs use the same parallel array library as DPH, but do not go through the vectorising transform.
     14
     15Version
     16 * ''vectorised'' means it's been through the DPH vectorising transform.
     17 * ''vector'' is a hand written version using immutable Data.Vectors
     18 * ''vector-mutable'' is a hand written version using mutable Data.Vectors.
     19 * ''vector-immutable'' means the same as ''vector'' and is used when there is also an mutable version.
     20
     21Parallelism
     22 * Whether a benchmark is natively parallel or sequential.
     23 * Parallel versions are also run single threaded (with -N1) and sequential versions are also run with (-N4) so we get the parallel GC.
     24 * Parallel versions with -N1 will tend to be slower than natively sequential versions due to overheads for supporting parallelism.
     25
     26[[br]]
     27== Statically Nested ==
     28Statically nested parallelism is where the parallelism has a fixed, finite depth. For example ``mapP f (filterP g xs)``. Statically nested programs are easier to vectorize than dynamically nested programs. At present, single threaded statically nested programs should run as fast as equivalent Data.Vector programs. Parallel versions should display a good speedup.
     29
     30 [http://darcs.haskell.org/packages/dph/dph-examples/imaginary/SumSquares/ SumSquares]::
     31  Computes the sum of the squares from 1 to N using `Int`.  N = 10M.
     32
     33  || '''name''' || '''runtime''' || '''speedup''' || '''notes'''
     34  || dph.sumsq.vector.seq.N4 ||  404ms || 1 || ||
     35  || dph.sumsq.vectorised.seq.N4 || 434ms || 0.93 || ||
     36  || dph.sumsq.vectorised.par.N1 || 443ms || 0.91 || ||
     37  || dph.sumsq.vectorised.par.N2 || 222ms || 1.82 || ||
     38  || dph.sumsq.vectorised.par.N4 || 111ms || 3.63 || ||
     39
     40  '''Summary''': fine, though we should run a sequential C version as well.
     41
     42
    1143 [http://darcs.haskell.org/packages/dph/examples/dotp/ DotP]::
    1244  Computes the dot product of two vectors of `Double`s.  There are two variants of this program: (1) "primitives" is directly coded against the array primitives from package dph and (2) "vectorised" is a high-level DPH program transformed by GHC's vectoriser.  In addition to these two DPH variants of the dot product, we also have two non-DPH reference implementations: (a) "ref Haskell" is a Haskell program using imperative, unboxed arrays and and (b) "ref C" is a C implementation using pthreads.
     45
    1346 [http://darcs.haskell.org/packages/dph/examples/smvm/ SMVM]::
    1447  Multiplies a dense vector with a sparse matrix represented in the ''compressed sparse row format (CSR).''  There are three variants of this program: (1) "primitives" is directly coded against the array primitives from package dph and (2) "vectorised" is a high-level DPH program transformed by GHC's vectoriser.  As a reference implementation, we have a sequential C program denoted by "ref C".
     48
     49[[br]]
     50=== Dynamically Nested ===
     51Dynamically nested programs have a recursive structure where each level of the recursion invokes more parallel computations. This is common for benchmarks that use divide-and-conquer style algorithms.
     52
     53 [http://darcs.haskell.org/packages/dph/examples/primes/ Primes]::
     54  The Sieve of Eratosthenes using parallel writes into a sieve structure represented as an array of `Bool`s.  We currently don't have a proper parallel implementation of this benchmark, as we are missing a parallel version of default backpermute.  The problem is that we need to make the representation of parallel arrays of `Bool` dependent on whether the hardware supports atomic writes of bytes.  '''Investigate whether any of the architectures relevant for DPH actually do have trouble with atomic writes of bytes (aka `Word8`).'''
     55
    1556 [http://darcs.haskell.org/packages/dph/examples/quickhull/ Quickhull]::
    1657  Given a set of points (in a plane), compute the sequence of points that encloses all points in the set. This benchmark is interesting as it is the simplest code that exploits the ability to implement divide-and-conquer algorithms with nested data parallelism.  We have only a "vectorised" version of this benchmark and a sequential Haskell reference implementation, "ref Haskell", using vanilla lists.
    17  [http://darcs.haskell.org/packages/dph/examples/primes/ Primes]::
    18   The Sieve of Eratosthenes using parallel writes into a sieve structure represented as an array of `Bool`s.  We currently don't have a proper parallel implementation of this benchmark, as we are missing a parallel version of default backpermute.  The problem is that we need to make the representation of parallel arrays of `Bool` dependent on whether the hardware supports atomic writes of bytes.  '''Investigate whether any of the architectures relevant for DPH actually do have trouble with atomic writes of bytes (aka `Word8`).'''
     58
    1959 [http://darcs.haskell.org/packages/dph/examples/qsort/ Quicksort]::
    2060  FIXME
    21  [http://darcs.haskell.org/packages/dph/examples/concomp/ ConComp]::
    22   Implementation of the Awerbuch-Shiloach and Hybrid algorithms for finding connected components in undirected graphs.  There is only a version directly coded against the array primitives.  '''Needs to be adapted to new benchmark framework.'''
    23  [http://darcs.haskell.org/packages/dph/examples/barnesHut/ BarnesHut]::
     61
     62[[br]]
     63=== Dynamically Nested with Algebraic Data Types ===
     64These programs also use user defined algebraic data types. Vectorization of these programs is still a work in progress.
     65
     66 [http://darcs.haskell.org/packages/dph/dph-examples/barnesHut/ BarnesHut]::
    2467  This benchmark implements the Barnes-Hut algorithm to solve the ''n''-body problem in two dimensions.  '''Currently won't compile with vectorisation due to excessive inlining of dictionaries.'''
    2568