Changes between Version 33 and Version 34 of DataParallel/BenchmarkStatus


Ignore:
Timestamp:
Mar 8, 2009 2:04:42 PM (5 years ago)
Author:
chak
Comment:

--

Legend:

Unmodified
Added
Removed
Modified
  • DataParallel/BenchmarkStatus

    v33 v34  
    6666There seems to be a fusion problem in DotP with `dph-par` (even if the version of `zipWithSUP` that uses `splitSD/joinSD` is used); hence the much lower runtime for "N=1" than for "sequential".  The vectorised version runs out of memory; maybe because we didn't solve the `bpermute` problem, yet. 
    6767 
     68Obviously, the vectorised version remains to be improved.  This is due to an unexploited fusion opportunity.  Moreover, "SMVM, primitives" exhibits a strange behaviour from 2 to 4 threads with the matrix of density 0.001.  This might be a scheduling problem. 
     69 
    6870=== Execution on greyarea (1x UltraSPARC T2) === 
    6971 
     
    101103As on !LimitingFactor, but it scales much more nicely and improves until using four threads per core.  This suggets that memory bandwidth is again a critical factor in this benchmark (this fits well with earlier observations on other architectures).  Despite fusion problem with `dph-par`, the parallel Haskell program, using all 8 cores, still ends up three times faster than the sequential C program. 
    102104 
    103  
    104    
     105On this machine, "SMVM primitives" also has a quirk from 2 to 4 threads.  This re-enforces the suspicion that this is a scheduling problem.