Changes between Version 23 and Version 24 of DataParallel/BenchmarkStatus


Ignore:
Timestamp:
Mar 5, 2009 11:17:54 AM (6 years ago)
Author:
chak
Comment:

--

Legend:

Unmodified
Added
Removed
Modified
  • DataParallel/BenchmarkStatus

    v23 v24  
    3131All results are in milliseconds, and the triples report best/average/worst execution time (wall clock) of three runs.  The column marked "sequential" reports times when linked against `dph-seq` and the columns marked "P=n" report times when linked against `dph-par` and run in parallel using the specified number of parallel OS threads.
    3232
     33==== Comments regarding !SumSq ====
     34
     35The "primitives" version works nicely, but the vectorised one exposes some problems:
     36 * We need an extra -funfolding-use-threshold.  We don't really want users having to worry about that.
     37 * `mapP (\x -> x * x) xs` essentially turns into `zipWithU (*) xs xs`, which doesn't fuse with `enumFromTo` anymore.  We have a rewrite rule in the library to fix that, but that's not general enough.  We really would rather not vectorise the lambda abstraction at all.
     38
    3339==== Comments regarding DotP ====
    3440
     
    4652
    4753|| '''Program''' || '''Problem size''' || '''sequential''' || '''P=1''' || '''P=2''' || '''P=4''' || '''P=8''' || '''P=16''' || '''P=32''' || '''P=64''' ||
    48 || !SumSq, primitives || 10M || 212/212 || 255/255 || 128/128 || 64/64 || 36/36 || 28/28 || 17/17 || 10/10 ||
    49 || !SumSq, vectorised || 10M || 1161/1161 || 1884/1884 ||950/950 ||499/499 || 288/288 || 254/254 || 193/193 || 337/377 ||
    50 || !SumSq, ref C ||10M || 130 || – || – || – || – || – || – || – ||
     54|| !SumSq, primitives || 10M || 212/212 || 254/254 || 127/127 || 64/64 || 36/36 || 25/25 || 17/17 || 10/10 ||
     55|| !SumSq, vectorised || 10M || 212/212 || 1884/1884 ||950/950 ||499/499 || 288/288 || 254/254 || 193/193 || 337/377 ||
     56|| !SumSq, ref C ||10M || 120 || – || – || – || – || – || – || – ||
    5157|| DotP, primitives || 100M elements || 937/937 || 934/934 || 474/474 || 238/238 || 120/120 || 65/65 || 38/38 || 28/28 ||
    5258|| DotP, vectorised || 100M elements || 937/937 || 942/942 || 471/471 || 240/240 || 118/118 || 65/65 || 43/43 || 29/29 ||
     
    5965All results are in milliseconds, and the triples report best/worst execution time (wall clock) of three runs.  The column marked "sequential" reports times when linked against `dph-seq` and the columns marked "P=n" report times when linked against `dph-par` and run in parallel using the specified number of parallel OS threads.
    6066
    61 ==== Comments regarding SumSq ====
     67==== Comments regarding !SumSq ====
    6268
    6369The primitives scale nicely, but something is deeply wrong (lack of fusion, perhaps) with the vectorised version.