Changes between Version 43 and Version 44 of DataParallel/BenchmarkStatus


Ignore:
Timestamp:
Mar 10, 2009 6:43:11 AM (5 years ago)
Author:
chak
Comment:

--

Legend:

Unmodified
Added
Removed
Modified
  • DataParallel/BenchmarkStatus

    v43 v44  
    5454However, found a number of general problems when working on this example: 
    5555 * We need an extra -funfolding-use-threshold.  We don't really want users having to worry about that. 
     56 * `enumFromTo` doesn't fuse due to excessive dictionaries in the unfolding of `zipWithUP`. 
    5657 * `mapP (\x -> x * x) xs` essentially turns into `zipWithU (*) xs xs`, which doesn't fuse with `enumFromTo` anymore.  We have a rewrite rule in the library to fix that, but that's not general enough.  We really would rather not vectorise the lambda abstraction at all. 
    57  * `enumFromTo` doesn't fuse due to excessive dictionaries in the unfolding of `zipWithUP`. 
    5858 * Finally, to achieve the current result, we needed an analysis that avoids vectorising subcomputations that don't to be vectorised, and worse, that fusion has to turn back into their original form.  In this case, the lambda abstraction `\x -> x * x`.  This is currently implemented in a rather limited and ad-hoc way.  We should implement this on the basis of a more general analysis. 
    5959 
     
    107107=== Summary === 
    108108 
    109 The speedup relative to a sequential C program for !SumSq, DotP, and SMVM on both architectures is illustrated by [http://justtesting.org/post/85103645/these-graphs-summarise-the-performance-of-data two summary graphs.]  In all cases, the data parallel Haskell program outperforms the sequential C program by a large margin on 8 cores.  The gray graph is a parallel C program computing the dot product using pthreads.  It clearly shows that the two Quad-Core Xeon with 8x1 threads are memory-limited for this benchmark, and the C code is barely any faster on 8 cores than the Haskell code. 
     109The speedup relative to a sequential C program for !SumSq, DotP, and SMVM on both architectures is illustrated by [http://justtesting.org/post/85103645/these-graphs-summarise-the-performance-of-data two summary graphs.]  In all cases, the data parallel Haskell program outperforms the sequential C program by a large margin on 8 cores.  The gray curve is a parallel C program computing the dot product using pthreads.  It clearly shows that the two Quad-Core Xeon with 8x1 threads are memory-limited for this benchmark, and the C code is barely any faster on 8 cores than the Haskell code.