Changes between Version 23 and Version 24 of DataParallel/BenchmarkStatus


Ignore:
Timestamp:
Mar 5, 2009 11:17:54 AM (5 years ago)
Author:
chak
Comment:

--

Legend:

Unmodified
Added
Removed
Modified
  • DataParallel/BenchmarkStatus

    v23 v24  
    3131All results are in milliseconds, and the triples report best/average/worst execution time (wall clock) of three runs.  The column marked "sequential" reports times when linked against `dph-seq` and the columns marked "P=n" report times when linked against `dph-par` and run in parallel using the specified number of parallel OS threads. 
    3232 
     33==== Comments regarding !SumSq ==== 
     34 
     35The "primitives" version works nicely, but the vectorised one exposes some problems: 
     36 * We need an extra -funfolding-use-threshold.  We don't really want users having to worry about that. 
     37 * `mapP (\x -> x * x) xs` essentially turns into `zipWithU (*) xs xs`, which doesn't fuse with `enumFromTo` anymore.  We have a rewrite rule in the library to fix that, but that's not general enough.  We really would rather not vectorise the lambda abstraction at all. 
     38 
    3339==== Comments regarding DotP ==== 
    3440 
     
    4652 
    4753|| '''Program''' || '''Problem size''' || '''sequential''' || '''P=1''' || '''P=2''' || '''P=4''' || '''P=8''' || '''P=16''' || '''P=32''' || '''P=64''' || 
    48 || !SumSq, primitives || 10M || 212/212 || 255/255 || 128/128 || 64/64 || 36/36 || 28/28 || 17/17 || 10/10 || 
    49 || !SumSq, vectorised || 10M || 1161/1161 || 1884/1884 ||950/950 ||499/499 || 288/288 || 254/254 || 193/193 || 337/377 || 
    50 || !SumSq, ref C ||10M || 130 || – || – || – || – || – || – || – || 
     54|| !SumSq, primitives || 10M || 212/212 || 254/254 || 127/127 || 64/64 || 36/36 || 25/25 || 17/17 || 10/10 || 
     55|| !SumSq, vectorised || 10M || 212/212 || 1884/1884 ||950/950 ||499/499 || 288/288 || 254/254 || 193/193 || 337/377 || 
     56|| !SumSq, ref C ||10M || 120 || – || – || – || – || – || – || – || 
    5157|| DotP, primitives || 100M elements || 937/937 || 934/934 || 474/474 || 238/238 || 120/120 || 65/65 || 38/38 || 28/28 || 
    5258|| DotP, vectorised || 100M elements || 937/937 || 942/942 || 471/471 || 240/240 || 118/118 || 65/65 || 43/43 || 29/29 ||  
     
    5965All results are in milliseconds, and the triples report best/worst execution time (wall clock) of three runs.  The column marked "sequential" reports times when linked against `dph-seq` and the columns marked "P=n" report times when linked against `dph-par` and run in parallel using the specified number of parallel OS threads. 
    6066 
    61 ==== Comments regarding SumSq ==== 
     67==== Comments regarding !SumSq ==== 
    6268 
    6369The primitives scale nicely, but something is deeply wrong (lack of fusion, perhaps) with the vectorised version.