Changes between Version 36 and Version 37 of DataParallel/BenchmarkStatus


Ignore:
Timestamp:
Mar 9, 2009 12:09:52 PM (6 years ago)
Author:
chak
Comment:

--

Legend:

Unmodified
Added
Removed
Modified
  • DataParallel/BenchmarkStatus

    v36 v37  
    8383|| DotP, ref C || 100M elements || – || 554 || 277 || 142 || 72 || 37 || 22 || 20 ||
    8484|| SMVM, primitives || 10kx10k @ density 0.1 || 1102/1102 || 1112/1112 || 561/561 || 285/285 || 150/150 || 82/82 || 63/70 || 54/100 ||
    85 || SMVM, vectorised || 10kx10k @ density 0.1 || 2312/2312 || 15960/15960 || 8192/8192 || 4188/4188 || 2362/2362 || 1538/1538 || 1047/1047 || 950/950 ||
     85|| SMVM, vectorised || 10kx10k @ density 0.1 || 1784/1784 || 1810/1810 || 910/910 || 466/466 || 237/237 || 131/131 || 96/96 || 87/87 ||
    8686|| SMVM, ref C || 10kx10k @ density 0.1 || 580 || – || – || – || – || – || – || – ||
    8787|| SMVM, primitives || 100kx100k @ density 0.001 || 1112/1112 || 1299/1299 || 684/684 || 653/653 || 368/368 || 294/294 || 197/197 || 160/160 ||
    88 || SMVM, vectorised || 100kx100k @ density 0.001 || 2345/2345 || 16110/16110 || 8553/8553 || 4400/4400 || 2572/2572 || 1645/1645 || 1224/1224 || 1005/1005 ||
     88|| SMVM, vectorised || 100kx100k @ density 0.001 || 1824/1824 || 2008/2008 || 1048/1048 || 1010/1010 || 545/545 || 426/426 || 269/269 || 258/258 ||
    8989|| SMVM, ref C || 100kx100k @ density 0.001 || 600 || – || – || – || – || – || – || – ||
    9090
     
    101101==== Comments regarding smvm ====
    102102
    103 As on !LimitingFactor, but it scales much more nicely and improves until using four threads per core.  This suggets that memory bandwidth is again a critical factor in this benchmark (this fits well with earlier observations on other architectures).  Despite fusion problem with `dph-par`, the parallel Haskell program, using all 8 cores, still ends up three times faster than the sequential C program.
     103As on !LimitingFactor, but it scales much more nicely and improves until using four threads per core.  This suggets that memory bandwidth is again a critical factor in this benchmark (this fits well with earlier observations on other architectures).
    104104
    105105On this machine, "SMVM primitives" also has a quirk from 2 to 4 threads.  This re-enforces the suspicion that this is a scheduling problem.