Changes between Version 56 and Version 57 of DataParallel/BenchmarkStatus


Ignore:
Timestamp:
Dec 2, 2010 5:09:44 AM (3 years ago)
Author:
benl
Comment:

--

Legend:

Unmodified
Added
Removed
Modified
  • DataParallel/BenchmarkStatus

    v56 v57  
    2121[[br]] 
    2222---- 
     23---- 
    2324= Flat Parallelism = 
    24  
    25 Todo: add Repa benchmarks. 
    26  
    27   
     25Flat parallel programs are ones in which parallel computations do not invoke further parallel computations. For Repa programs this means that the value of each element in a given array can be computed independently of the others. These should run as fast as equivalent programs using immutable Data.Vector. We'd also hope to get close to the performance of C programs using equivalent algorithms, though this is a harder comparison due to differences in the back-end code generator. 
     26 
     27[[br]] 
     28  [http://code.haskell.org/repa/repa-head/repa-examples/MMult/ MMult]:: 
     29   Matrix-Matrix multiplication. 
     30 
     31  || '''name''' || '''runtime''' || '''speedup''' || '''notes''' 
     32  || repa.mmult.c.seq ||  3.792s || 1 || A || 
     33  || repa.mmult.par.N4 || 2.147s || 1.77 || || 
     34  A: Straightforward C program using triple nested loops. A cache-friendly block-based version would be faster. 
     35 
     36  '''Status:''' Ok, but about 20% slower than in 6.13.[[br]] 
     37  '''ToDo:''' Run with LLVM and without bounds checking. 
     38 
     39[[br]] 
     40  [http://code.haskell.org/repa/repa-head/repa-examples/Laplace/ Laplace] '''(SLOWLORIS)''':: 
     41   Solves the Laplace equation in the 2D plane. 
     42 
     43  || '''name''' || '''runtime''' || '''speedup''' || '''notes''' 
     44  || repa.laplace.c.seq ||  1.299s || 1 || A || 
     45  || repa.laplace.par.N4 || 2.521s || 0.51 || || 
     46  A: Straightforward C program using triple nested loops. A cache-friendly block-based version would be faster. 
     47 
     48  '''Status:''' Ok, but about 25% slower than in 6.13.[[br]] 
     49  '''ToDo:''' Run with LLVM and without bounds checking. Run with more threads to see if we can get to the C version's run time. 
     50 
     51 
     52[[br]] 
     53  [http://code.haskell.org/repa/repa-head/repa-examples/Blur/ Blur]:: 
     54  Applies a Gaussian blur filter to a 2D image. 
     55 
     56  '''ToDo:''' Runs ok, but need to add other versions for comparison. 
     57 
     58 
     59[[br]] 
     60  [http://code.haskell.org/repa/repa-head/repa-examples/EdgeDetect/ EdgeDetect]:: 
     61  Performs Canny edge detection to a 2D image. 
     62 
     63  '''ToDo:''' Runs ok, but need to add other versions for comparison. 
     64 
     65[[br]] 
     66  [http://code.haskell.org/repa/repa-head/repa-examples/FFT/ FFT]:: 
     67  Performs high-pass filtering using 2D and 3D FFTs. These are naive benchmarks used for regression testing only. They divide right down to two-point vectors and construct the result using copying append. Using an inplace algorithm (like with FFTW) would be significantly faster. 
     68 
     69  '''ToDo:''' Runs ok, but need to add other versions for comparison. 
     70 
     71 
    2872[[br]] 
    2973= Statically Nested Parallelism = 
    30 Statically nested parallelism is where the parallelism has a fixed, finite depth. For example ``mapP f (filterP g xs)``. Statically nested programs are easier to vectorize than dynamically nested programs. At present, single threaded statically nested programs should run as fast as equivalent Data.Vector programs. Parallel versions should display a good speedup. 
     74Statically nested parallelism is where the parallelism has a fixed, finite depth. For example ``mapP f (filterP g xs)``. Statically nested programs are easier to vectorise than dynamically nested programs. At present, single threaded statically nested programs should run as fast as equivalent Data.Vector programs. Parallel versions should display a good speedup. 
    3175 
    3276 [http://darcs.haskell.org/packages/dph/dph-examples/imaginary/SumSquares/ SumSquares]:: 
     
    89133[[br]] 
    90134 [http://darcs.haskell.org/libraries/dph/dph-examples/spectral/QuickSort/ QuickSort] '''(BROKEN)''':: 
    91   Sort a vector of doubles by recursively splitting the vector and sorting the two halves. This is a "fake" benchmark because we divide right down to two-point vectors and construct the result using copying append. A production algorithm would switch to an in-place sort once the size of the vector reaches a few thousand elements. 
     135  Sort a vector of doubles by recursively splitting the vector and sorting the two halves. This is a naive benchmark used for regression testing only. We divide right down to two-point vectors and construct the result using copying append. A production algorithm would switch to an in-place sort once the size of the vector reaches a few thousand elements. 
    92136 
    93137  || '''name''' || '''runtime''' || '''speedup''' || '''notes''' || 
     
    152196 * '''BROKEN''': Benchmark doesn't compile, or crashes when run. 
    153197 * '''SLOWDOWN''': Benchmark gets slower as number of threads increases.  
    154  * '''SLOWLORIS''': Benchmark scales as the number of threads increases, but the absolute performance is not acceptable compared with equivalent versions using immutable Data.Vectors. 
     198 * '''SLOWLORIS''': Benchmark scales as the number of threads increases, but the absolute performance is not acceptable compared with equivalent versions using immutable Data.Vectors. We do not have a setup in which the parallel version runs faster than the sequential reference version. Increasing the number of threads makes it faster, but not fast enough. 
    155199 
    156200[[br]]