Changes between Version 22 and Version 23 of Frisby2013Q1


Ignore:
Timestamp:
Feb 24, 2013 9:02:25 PM (14 months ago)
Author:
nfrisby
Comment:

--

Legend:

Unmodified
Added
Removed
Modified
  • Frisby2013Q1

    v22 v23  
    4040 * KNOWN_CALL_TOO_FEW_ARGS_ctr - "PAP": creates a Partial APplication closure 
    4141 * KNOWN_CALL_EXTRA_ARGS_ctr   - like a fast call, but first push a continuation onto the stack that effectively uses stg_ap_<PAT>_fast for the extra args 
    42  
    4342 * ALLOC_HEAP_tot is in words, and so are the allocation numbers for each id 
    4443 
     
    4645 
    4746TODO and _what you can learn by looking at each one_ 
     47 
     48=== The !NoFib Experimental Method === 
     49 
     50 * start with -O1, both for the libraries and for the individual test programs 
     51   * proceed to -O2 once you've identified the primary interesting scenarios 
     52   * (also saves some compile time) 
     53 * always compile with -ticky 
     54 * to manage compilation of the libraries, use build.mk's !GhcLibOpts 
     55 * to manage compilation of the nofib tests, use 
     56    * nofib's EXTRA_HC_OPTs="..." command line parameter 
     57    * or build.mk's !NofibHcOpts 
     58    * I recommend EXTRA_HC_OPTS because it's more flexible and more explicit 
     59 * there are four basic combinations 
     60   * (1) libs w/ baseline    , (nofib) tests w/ baseline 
     61   * (2) libs w/ baseline    , (nofib) tests w/ your change 
     62   * (3) libs w/ your change , (nofib) tests w/ baseline 
     63   * (4) libs w/ your change , (nofib) tests w/ your change 
     64 * ultimately the comparison that matters is (1) versus (4) 
     65 * though, (1) versus (2) and (1) versus (3) isolate the changes 
     66   * for allocation changes, ticky usually makes isolation a moot point 
     67   * for runtime, though, there's so many factors and so much noise that isolating the changes can benefit your sanity 
     68 * get a reproducible measurement before you start inspecting code 
     69   * alloc is *almost* always consistent 
     70   * runtime is rarely :/ so I iteratively crank up the iterations 
     71     * use nofib's !NoFibRuns command line parameter 
     72     * alternatively, use nofib's mode=slow, mode=norm, mode=fast command line flags 
     73 * inspect the ticky per-closure allocation, per-closure entry, and the general counters to hopefully isolate the change 
     74    * run the test (just once) with `+RTS -rFILENAME -RTS` 
     75      * use nofib's EXTRA_RUNTEST_OPTS="+RTS -rFILENAME -RTS" 
     76    * for allocation, the only subtlety is that allocation "inside" an LNE is assigned to the (most recent non-LNE) caller 
     77       * (also, it seems that we're not tracking allocation by the array cloning primops, but I don't know how prevalent that is) 
     78    * for runtime, entry counts and the general counters tracking the variety of closure entries will hopefully help 
     79      * cf [#TickyCounters] 
     80  * inspect the compilation outputs for differences 
     81    * cf [#CoreDiving] [#Core->STG->CMM] 
     82  * allocation changes probably won't require more work (unless its delication GC stuff, I suppose) 
     83  * run runtime, isolate the changes 
     84    * for a given change, write a simpler test that hammers just that code in order to estimate its affect 
     85    * if the change seems to be in a library, slice out the relevant code into its own module so you can mutate it to experiment 
     86 
     87Running the full nofib suite with one set of flags and then again with another is fine for allocation, but pretty bad for runtime. The two measurements for a particular test are separated by a large amount of time, so the load is bad. My workaround has been a hacky shell script that transposes the two for-loops: outer for tests, inner for compilation method. 
     88 
     89Moreover, I usually include the baseline variant twice. For example, I'll compare "baseline" "idea #1" "baseline again" "idea #2"; this has two benefits. 
     90  * if the two baseline runtimes are significantly difference, then there's too much noise 
     91  * (I'm unsure about this) it leaves the machine in a comparable state before executing the two ideas 
    4892 
    4993== Late Lambda Float ==