Changes between Version 22 and Version 23 of Frisby2013Q1


Ignore:
Timestamp:
Feb 24, 2013 9:02:25 PM (3 years ago)
Author:
nfrisby
Comment:

--

Legend:

Unmodified
Added
Removed
Modified
  • Frisby2013Q1

    v22 v23  
    4040 * KNOWN_CALL_TOO_FEW_ARGS_ctr - "PAP": creates a Partial APplication closure
    4141 * KNOWN_CALL_EXTRA_ARGS_ctr   - like a fast call, but first push a continuation onto the stack that effectively uses stg_ap_<PAT>_fast for the extra args
    42 
    4342 * ALLOC_HEAP_tot is in words, and so are the allocation numbers for each id
    4443
     
    4645
    4746TODO and _what you can learn by looking at each one_
     47
     48=== The !NoFib Experimental Method ===
     49
     50 * start with -O1, both for the libraries and for the individual test programs
     51   * proceed to -O2 once you've identified the primary interesting scenarios
     52   * (also saves some compile time)
     53 * always compile with -ticky
     54 * to manage compilation of the libraries, use build.mk's !GhcLibOpts
     55 * to manage compilation of the nofib tests, use
     56    * nofib's EXTRA_HC_OPTs="..." command line parameter
     57    * or build.mk's !NofibHcOpts
     58    * I recommend EXTRA_HC_OPTS because it's more flexible and more explicit
     59 * there are four basic combinations
     60   * (1) libs w/ baseline    , (nofib) tests w/ baseline
     61   * (2) libs w/ baseline    , (nofib) tests w/ your change
     62   * (3) libs w/ your change , (nofib) tests w/ baseline
     63   * (4) libs w/ your change , (nofib) tests w/ your change
     64 * ultimately the comparison that matters is (1) versus (4)
     65 * though, (1) versus (2) and (1) versus (3) isolate the changes
     66   * for allocation changes, ticky usually makes isolation a moot point
     67   * for runtime, though, there's so many factors and so much noise that isolating the changes can benefit your sanity
     68 * get a reproducible measurement before you start inspecting code
     69   * alloc is *almost* always consistent
     70   * runtime is rarely :/ so I iteratively crank up the iterations
     71     * use nofib's !NoFibRuns command line parameter
     72     * alternatively, use nofib's mode=slow, mode=norm, mode=fast command line flags
     73 * inspect the ticky per-closure allocation, per-closure entry, and the general counters to hopefully isolate the change
     74    * run the test (just once) with `+RTS -rFILENAME -RTS`
     75      * use nofib's EXTRA_RUNTEST_OPTS="+RTS -rFILENAME -RTS"
     76    * for allocation, the only subtlety is that allocation "inside" an LNE is assigned to the (most recent non-LNE) caller
     77       * (also, it seems that we're not tracking allocation by the array cloning primops, but I don't know how prevalent that is)
     78    * for runtime, entry counts and the general counters tracking the variety of closure entries will hopefully help
     79      * cf [#TickyCounters]
     80  * inspect the compilation outputs for differences
     81    * cf [#CoreDiving] [#Core->STG->CMM]
     82  * allocation changes probably won't require more work (unless its delication GC stuff, I suppose)
     83  * run runtime, isolate the changes
     84    * for a given change, write a simpler test that hammers just that code in order to estimate its affect
     85    * if the change seems to be in a library, slice out the relevant code into its own module so you can mutate it to experiment
     86
     87Running the full nofib suite with one set of flags and then again with another is fine for allocation, but pretty bad for runtime. The two measurements for a particular test are separated by a large amount of time, so the load is bad. My workaround has been a hacky shell script that transposes the two for-loops: outer for tests, inner for compilation method.
     88
     89Moreover, I usually include the baseline variant twice. For example, I'll compare "baseline" "idea #1" "baseline again" "idea #2"; this has two benefits.
     90  * if the two baseline runtimes are significantly difference, then there's too much noise
     91  * (I'm unsure about this) it leaves the machine in a comparable state before executing the two ideas
    4892
    4993== Late Lambda Float ==