Changes between Version 14 and Version 15 of DataParallel/BenchmarkStatus


Ignore:
Timestamp:
Mar 5, 2009 1:03:33 AM (7 years ago)
Author:
chak
Comment:

--

Legend:

Unmodified
Added
Removed
Modified
  • DataParallel/BenchmarkStatus

    v14 v15  
    1414=== Execution on !LimitingFactor (2x Quad-Core Xeon) ===
    1515
    16 Hardware spec: 2x 3.0GHz Quad-Core Intel Xeon 5400; 12MB (2x6MB) on-die L2 cache per processor; independent 1.6GHz frontside bus per processor; 800MHz DDR2; 256-bit-wide memory architecture; Mac OS X Server 10.5.6
     16Hardware spec: 2x 3.0GHz Quad-Core Intel Xeon 5400; 12MB (2x6MB) on-die L2 cache per processor; independent 1.6GHz frontside bus per processor; 800MHz DDR2 FB-DIMM; 256-bit-wide memory architecture; Mac OS X Server 10.5.6
    1717
    1818Software spec: GHC 6.11 (from end of Feb 09); gcc 4.0.1
    1919
    20 || '''Program''' || '''Problem size''' || '''sequential''' || '''1 core''' || '''2 cores''' || '''4 cores''' || '''8 cores''' ||
     20|| '''Program''' || '''Problem size''' || '''sequential''' || '''P=1''' || '''P=2''' || '''P=4''' || '''P=8''' ||
    2121|| DotP, primitives || 100M elements || 823/823/824 || 812/813/815 || 408/408/409 || 220/223/227 || 210/214/221 ||
    2222|| DotP, vectorised || 100M elements || 823/824/824 || 814/816/818 || 412/417/421 || 222/225/227 || 227/232/238 ||
     
    2626|| SMVM, vectorised || ?? elems, density ?? ||  ||  ||  ||  ||  ||
    2727
    28 All results are in milliseconds, and the triples report best/average/worst execution case time (wall clock) of three runs.  The column marked "sequential" reports times when linked against `dph-seq` and the columns marked "N cores" report times when linked against `dph-par` and run in parallel on the specified number of processor cores.
     28All results are in milliseconds, and the triples report best/average/worst execution time (wall clock) of three runs.  The column marked "sequential" reports times when linked against `dph-seq` and the columns marked "P=n" report times when linked against `dph-par` and run in parallel using the specified number of parallel OS threads.
    2929
    3030==== Observations regarding DotP ====
    3131
    3232Performance is memory bound, and hence, the benchmark stops scaling once the memory bus saturated.  As a consequence, the wall-clock execution time of the Haskell programs and the C reference implementation are the same when all available parallelism is exploited.  The parallel DPH library delivers the same single core performance as the sequential one in this benchmark.
     33
     34=== Execution on greyarea (1x UltraSPARC T2) ===
     35
     36Hardware spec: 1x 1.4GHz UltraSPARC T2; 8 cores/processors with 8 hardware threads/core; 4MB on-die L2 cache per processor; FB-DIMM; Solaris 5.10
     37
     38Software spec: GHC 6.11 (from end of Feb 09); gccfss 4.0.4 (gcc front-end with Sun compiler backend)
     39
     40|| '''Program''' || '''Problem size''' || '''sequential''' || '''P=1''' || '''P=2''' || '''P=4''' || '''P=8''' || '''P=16''' || '''P=32''' || '''P=64''' ||
     41|| DotP, primitives || 100M elements || 937/937 || 934/934 || 474/474 || 238/238 || 120/120 || 65/65 || 38/38 || 28/28 ||
     42|| DotP, vectorised || 100M elements || || || || || || || || ||
     43|| DotP, ref Haskell || 100M elements || – || || || || || || || ||
     44|| DotP, ref C || 100M elements || – || || || || || || || || ||
     45|| SMVM, primitives || ?? elems, density ?? ||  || || || || || || || ||
     46|| SMVM, vectorised || ?? elems, density ?? ||  || || || || || || || ||
     47
     48All results are in milliseconds, and the triples report best/worst execution time (wall clock) of three runs.  The column marked "sequential" reports times when linked against `dph-seq` and the columns marked "P=n" report times when linked against `dph-par` and run in parallel using the specified number of parallel OS threads.
     49
     50==== Observations regarding DotP ====
     51
     52The benchmark scales nicely up to the maximum number of hardware threads.  Memory latency is largely covered by excess parallelism.
    3353
    3454----