Changes between Version 14 and Version 15 of DataParallel/WorkPlan


Ignore:
Timestamp:
Mar 10, 2009 5:47:55 AM (5 years ago)
Author:
chak
Comment:

--

Legend:

Unmodified
Added
Removed
Modified
  • DataParallel/WorkPlan

    v14 v15  
    22== Work plan for implementing Data Parallel Haskell == 
    33 
     4=== Issues that need discussion and planning === 
     5 
     6 * '''Unlifted functions:''' specify the exact behaviour of this optimisation and how the unliftedness of a named function is propagated across module boundaries. 
     7 
     8 * '''Code blow up:''' what do we do about dictionary instances, their inlining, floating, and sharing? 
     9 
     10 * '''More expressive rewrite rules:''' Roman has some concrete ideas, which we need to discuss. 
     11 
    412=== Milestones === 
    513 
    6  0. '''DUE 9 March.''' Poster for ''Microsoft External Research Symposium''. 
     14 0. The system should be usable for small applications for the GHC 6.12 release. 
    715 
     16  
    817=== Task assignments === 
    918 
     
    2433 
    2534 ''Manuel'':: 
    26    '''Benchmark status''' & '''Desugaring comprehensions''' 
    27    – status: in the middle of cleaning up and measuring the benchmarks 
     35   '''Desugaring comprehensions''' & '''Benchmark status''' 
     36   – status: not started on comprehensions and waiting for code blow up to be improved before continuing with benchmarks 
    2837 
    2938=== Open tasks === 
     
    3544Category: ''Efficiency'' (improve scalability and/or baseline performance of generated code): 
    3645 
    37  * '''Replicate:''' Implement an extended array representation that uses an optimised representation for arrays that are the result of common forms of replication (i.e., due to free variables in lifted expressions).  The optimised representation stores the data to be replicated and the replication count(s) instead of actually replicating the data.  This also requires all functions consuming arrays to be adapted. 
     46 * '''Replicate:''' Implement an extended array representation that uses an optimised representation for arrays that are the result of common forms of replication (i.e., due to free variables in lifted expressions).  The optimised representation stores the data to be replicated and the replication count(s) instead of actually replicating the data.  This also requires all functions consuming arrays to be adapted.  We may habe a new take on this; see '''More expressive rewrite rules'''. 
     47 
     48 * '''More expressive rewrite rules:''' It seems that with more expressive rewrite rules may enable us to handle many of the problems with replicate an friends using rules.  (At least of the proper inlining happens.) 
    3849 
    3950 * '''Recycling:''' Use Roman's recycling optimisation (PADL'09) to avoid copying in `joinD`. 
    40  
    41  * '''Scaling:''' Investigate the scaling problems that we are seeing with vectorised code at the moment.  ('''Replicate''' and '''Recycling''' play a role here, but it is unclear whether that's all.  Some benchmarks are simply memory-bandwidth limited.) 
    4251 
    4352 * '''Test new inliner:''' Retest package dph with new inliner and the simplifier changes and try to simplify the library on the basis of these new phases. 
     
    4554 * '''Desugaring comprehensions:''' The current desugaring of array comprehensions produces very inefficient code.  This needs to be improved.  In particular, the `map/crossMap` base case in `dotP` for `x<-xs` and use `zipWith` instead of `map/zip`. 
    4655 
     56 * '''Unlifted functions:''' For some scalar functions (especially numeric functions and functions operating on enumerations), it is easier to not lift them at all; rather than to lift them and then attempt to recover the original form with fusion and other optimisations.  An example is the `SumSq` benchmark, where we have `sumP (mapP (\x -> x * x) [:1..n:]`.  Here, we would rather not lift `\x -> x * x` at all.  Roman implemented a very simple form of this idea (which works for `SumSq`).  However, we would like this in a more general form, where named functions that remain unlifted are marked suitably, as clients of a function can only be unlifted if all functions it calls are already unlifted. 
     57 
    4758Category:  ''Compile time'' (improve compile times): 
    4859 
    49  * '''Code blow up:''' GHC generates a lot of intermediate code when vectorisation is enabled, leading to excessive compilation times.  Find out whether the new inliner helped here and what else can be done to improve this situation. 
     60 * '''Code blow up:''' GHC generates a lot of intermediate code when vectorisation is enabled, leading to excessive compilation times.  It all appears to come down to the treatment of dictionary instances.  We need a plan for how to make progress here. 
    5061 
    5162Category: ''Ease of use'' (make the system easier or more convenient to use for end users): 
     
    6374 * '''Hierarchical matrix representation:''' Sparse matrices can be space-efficiently represented by recursively decomposing them into four quadrants.  Decomposition stops if a quadrant is smaller than a threshold or contains only zeros.  Multiplication of such matrices is straight forward using Strassen's divide-and-conquer scheme, which is popular for parallel implementations.  Other operations, such as transposing a matrix, can also be efficiently implemented.  The plan is to experiment with the implementation of some BLAS routines using this representation. 
    6475 
    65  * '''Benchmark status:''' Update and complete [wiki:DataParallel/BenchmarkStatus]; at the same time clean up the benchmark portion of the repo. 
     76 * '''Benchmark status:''' Once the code blow up problem is under control, update and complete those benchmarks on [wiki:DataParallel/BenchmarkStatus] that didn't work earlier. 
    6677 
    6778 * '''N-body:''' Get a fully vectorised n-body code to run and scale well on !LimitingFactor. 
     
    7586=== Done === 
    7687 
     88 * '''Scaling:''' Investigate the scaling problems that we are seeing with vectorised code at the moment.  ('''Replicate''' and '''Recycling''' play a role here, but it is unclear whether that's all.  Some benchmarks are simply memory-bandwidth limited.)  [So far, we only found scaling problems due to memory bandwidth of the tested architecture.  Scaling on the Sun T2 is excellent.] 
     89 
     90 * '''DUE 9 March.''' Poster for ''Microsoft External Research Symposium''.  [Submitted to MER.]† 
     91 
     92 * '''Benchmark status:''' Update and complete [wiki:DataParallel/BenchmarkStatus]; at the same time clean up the benchmark portion of the repo.  [Completed a first sweep through this with updated benchmark results for !SumSq, DotP, and SMVM, cleaned up code, and a much better idea of what the most important work items from now on are.] 
     93 
    7794 * '''New build system:''' Evaluate whether the preview of the new build system looks like it is what we want.  [The new build system seems fine and we should have no problem building package dph in stage2 either.] 
    7895