Changes between Initial Version and Version 1 of Parallel/SparkProfiles


Ignore:
Timestamp:
Nov 9, 2006 12:48:40 PM (7 years ago)
Author:
simonpj
Comment:

--

Legend:

Unmodified
Added
Removed
Modified
  • Parallel/SparkProfiles

    v1 v1  
     1= Working notes on spark profiling = 
     2 
     3This page is just working notes, not really intended for public consumption (though you are welcome to peek). It records work on spark profiling, led by Tim Harris. 
     4 
     5 * Add ticky calls, expressed as macros in HC files, which either 
     6   generates instrumentation calls, or calls to spark things. 
     7   The Haskell program isn't recompiled. 
     8 
     9 * The instrumentation calls record, in an in-memory log, the following 
     10   events, each time-stamped (allocation bytes or processor time) 
     11   * thunk allocation( address of thunk, alloc site-id ) 
     12     [hack to ensure site-id is unique across compilation units] 
     13   * thunk entry( address ) 
     14   * thunk update( address ) 
     15   * start running thread( thread id, time ) 
     16   * stop running thread( thread id ) 
     17 
     18 * At GC time, process the in-memory log, generate unique ids for 
     19   each thunk (stable across the whole run), output text version 
     20   into log file 
     21 
     22 * Off-line processing consumes log file and decides which allocation  
     23   site should be a spark.  Heuristics: 
     24   * decent delay between allocation and entry 
     25   * decent granularity 
     26   * most instances are entered (e.g. 7/8) 
     27 
     28 * Take log, use thread-switching info to generate a log 
     29   for each individual thread, with time stamps virtualised to 
     30   that thread's life only. Now we have a per-thread log. 
     31 
     32 * Also use log to generate parallelism profiles under wildly 
     33   optimistic assuptions of cost 
     34 
     35 * Consider 
     36{{{ 
     37let x = z+1 
     38    y = z+1 
     39in p + x + y 
     40}}} 
     41   If (+) evaluates left-right, the log will show that  
     42   * x is expensive but y is cheap (because z is evaluated by x) 
     43   * x isn't need until some time after its allocation 
     44     (because of the cost of evaluating p) 
     45  Hence we'll decide to spark x but not y... which is precisel 
     46  what we want!! 
     47 
     48  But if (+) evaluates the right-left, we won't spark either x or y! 
     49  (You can use seq to control this, of course.) 
     50 
     51 * Note [Tension between thunks and parallelism] 
     52{{{ 
     53      let t2 = fib 40 
     54      in case fib 40 of r1 -> case t2 of r2 -> r1+r2 
     55  vs 
     56      case fib 40 of r1 -> case fib 40 of r2 -> r1+r2 
     57}}} 
     58   GHC will convert the former to the latter, which will 
     59   lose sparking opportunities 
     60 
     61 * Idea 
     62{{{ 
     63        case f x of (,) -> let y = e in body 
     64  ==>    
     65        let y = e in case f x of (,) -> body 
     66}}} 
     67   This affects (up or down) the number of variables 
     68   saved across the case.  But it's GOOD for parallelism 
     69   because y is allocated sooner. 
     70 
     71 * Idea: display info about thunks that are 
     72   * always entered 
     73   * never entered 
     74  Feedback to programmer; give more clues to strictness 
     75  analyser. 
     76 
     77== Challenges == 
     78 
     79 * Making the profile information robust, that it can be used 
     80   on the *source* program, perhaps after small modifications 
     81 
     82 * Address Note [Tension between thunks and parallelism]