Changes between Initial Version and Version 1 of Parallel/SparkProfiles


Ignore:
Timestamp:
Nov 9, 2006 12:48:40 PM (9 years ago)
Author:
simonpj
Comment:

--

Legend:

Unmodified
Added
Removed
Modified
  • Parallel/SparkProfiles

    v1 v1  
     1= Working notes on spark profiling =
     2
     3This page is just working notes, not really intended for public consumption (though you are welcome to peek). It records work on spark profiling, led by Tim Harris.
     4
     5 * Add ticky calls, expressed as macros in HC files, which either
     6   generates instrumentation calls, or calls to spark things.
     7   The Haskell program isn't recompiled.
     8
     9 * The instrumentation calls record, in an in-memory log, the following
     10   events, each time-stamped (allocation bytes or processor time)
     11   * thunk allocation( address of thunk, alloc site-id )
     12     [hack to ensure site-id is unique across compilation units]
     13   * thunk entry( address )
     14   * thunk update( address )
     15   * start running thread( thread id, time )
     16   * stop running thread( thread id )
     17
     18 * At GC time, process the in-memory log, generate unique ids for
     19   each thunk (stable across the whole run), output text version
     20   into log file
     21
     22 * Off-line processing consumes log file and decides which allocation
     23   site should be a spark.  Heuristics:
     24   * decent delay between allocation and entry
     25   * decent granularity
     26   * most instances are entered (e.g. 7/8)
     27
     28 * Take log, use thread-switching info to generate a log
     29   for each individual thread, with time stamps virtualised to
     30   that thread's life only. Now we have a per-thread log.
     31
     32 * Also use log to generate parallelism profiles under wildly
     33   optimistic assuptions of cost
     34
     35 * Consider
     36{{{
     37let x = z+1
     38    y = z+1
     39in p + x + y
     40}}}
     41   If (+) evaluates left-right, the log will show that
     42   * x is expensive but y is cheap (because z is evaluated by x)
     43   * x isn't need until some time after its allocation
     44     (because of the cost of evaluating p)
     45  Hence we'll decide to spark x but not y... which is precisel
     46  what we want!!
     47
     48  But if (+) evaluates the right-left, we won't spark either x or y!
     49  (You can use seq to control this, of course.)
     50
     51 * Note [Tension between thunks and parallelism]
     52{{{
     53      let t2 = fib 40
     54      in case fib 40 of r1 -> case t2 of r2 -> r1+r2
     55  vs
     56      case fib 40 of r1 -> case fib 40 of r2 -> r1+r2
     57}}}
     58   GHC will convert the former to the latter, which will
     59   lose sparking opportunities
     60
     61 * Idea
     62{{{
     63        case f x of (,) -> let y = e in body
     64  ==>   
     65        let y = e in case f x of (,) -> body
     66}}}
     67   This affects (up or down) the number of variables
     68   saved across the case.  But it's GOOD for parallelism
     69   because y is allocated sooner.
     70
     71 * Idea: display info about thunks that are
     72   * always entered
     73   * never entered
     74  Feedback to programmer; give more clues to strictness
     75  analyser.
     76
     77== Challenges ==
     78
     79 * Making the profile information robust, that it can be used
     80   on the *source* program, perhaps after small modifications
     81
     82 * Address Note [Tension between thunks and parallelism]