Changes between Version 12 and Version 13 of Frisby2013Q1


Ignore:
Timestamp:
Feb 12, 2013 6:31:06 PM (14 months ago)
Author:
nfrisby
Comment:

--

Legend:

Unmodified
Added
Removed
Modified
  • Frisby2013Q1

    v12 v13  
    99== general core knowledge == 
    1010 
    11   * Max's [wiki:Commentary/Compiler/GeneratedCode page about code generation]: really good! 
    12   * document ticky profiling 
    13   * Core -> STG -> CMM and _what you can learn by looking at each one_ 
    14   * let-no-escape (LNE) 
     11[wiki:Commentary/Compiler/GeneratedCode Max's page about code generation] is really helpful! 
     12 
     13=== Ticky Counters === 
     14 
     15TODO 
     16 
     17 * UNKNOWN_CALL_ctr            - put arguments on stack and call the RTS's stg_ap_<PAT>_fast routine for that argument pattern 
     18 * KNOWN_CALL_ctr              - "fast call": put arguments in registers and call the function's the fast entry point 
     19 * KNOWN_CALL_TOO_FEW_ARGS_ctr - "PAP": creates a Partial APplication closure 
     20 * KNOWN_CALL_EXTRA_ARGS_ctr   - like a fast call, but first push a continuation onto the stack that effectively uses stg_ap_<PAT>_fast for the extra args 
     21 
     22=== Core -> STG -> CMM === 
     23 
     24TODO and _what you can learn by looking at each one_ 
    1525 
    1626== Late Lambda Float == 
     
    7181 
    7282  * join points 
    73   * let-no-escape 
     83  * let-no-escape (LNE) 
    7484  * Note [join point abstraction] 
    7585 
     
    114124==== Preserving Fast Entries ==== 
    115125 
    116 TODO 
    117  
    118 Using -flate-float-in-thunk-limit=10, -fprotect-last-arg, and -O1, I tested the libraries+NoFib for four variants. 
     126The first idea here was simply: do not float a binding if its RHS applies a free variable. 
     127 
     128But since the idea was to avoid losing fast entries, this only applies to saturated and oversaturated calls. As a sanity check, however, I added two flags. 
     129 
     130  * `-f(no-)late-float-abstract-undersat-var` don't allow undersaturated applications 
     131  * `-f(no-)late-float-abstract-sat-var`      don't allow saturated or oversaturated applications 
     132 
     133Ever since, I've been doing parameter sweeps over these as we make other refinements to the system. 
    119134 
    120135  * nn - do not float a binding that applies one of its free variables. 
    121136  * yn - do not float a binding that applies one of its free variables saturated or oversaturated. 
    122137  * ny - do not float a binding that applies one of its free variables undersaturated. 
    123   * yy - do not restrict application of bindings free variables 
    124  
    125 Roughly, we expect that more floating means (barely) less allocation but worse runtime (by how much?) because some known calls become unknown calls. 
     138  * yy - do not restrict application of the binding's free variables 
     139 
     140I have yet to see a clear results; there's no variant that's bested the others on most programs' runtime. 
     141 
     142This is even after I developed some bash script (I'm so sorry) to transpose the NoFib loops; instead of running the entire NoFib suite for one set of switches and then running it again for the next set of switches, and so on, I build all the variants, and then run each variant's version of each program sequentially. I intend for this to reduce noise by improving the time locality of the measurements of the same test. Even so, the noise was bad. 
     143 
     144So I turned my attention to allocation instead, for now. Roughly, we expect that more floating means (barely) less allocation but worse runtime (by how much?) because some known calls become unknown calls. But, eg, going from nn -> yn --- ie floating functions that undersaturate free variables instead of not floating them --- caused worse allocation! This investigation led to [#MitigatingLNEAbstraction]. 
     145 
     146Based on that example, it occurred to me that we should only restrict the binding's saturation of its *known* free variables. For example, I was not floating a binding because its RHS applied a free variable, even though that free variable was lambda bound. That decision has no benefit, and indeed was causing knock-on effects that increase allocation (eg [#MitigatingLNEAbstraction]). 
     147 
     148I have yet to determine that the preservation of fast entries is worth the trouble --- I certainly hope so... the parameter sweeps have taken a lot of time! 
     149 
     150To enable further measurements, I have identified the semantics of some ticky counters, cf [#TickyCounters]. 
    126151 
    127152==== Mitigating LNE Abstraction ==== 
     
    131156NB I think this will be mitigated "for free", since I'm predicting that we will never abstract variables that occur exactly saturated and an LNE binder can only be exactly saturated. If we do end up abstracting over saturated functions, we may want to consider mitigating this separately. 
    132157 
    133 In fish (1.6%), hpg (~4.5%), and sphere (10.4%), allocation gets worse for ny and yy compared to nn and yn. The nn and ny do not change the allocation compared to the baseline library (ie no LLF). 
     158Using -flate-float-in-thunk-limit=10, -fprotect-last-arg, and -O1, I tested the libraries+NoFib for the four variants from [#PreservingFastEntries]. In fish (1.6%), hpg (~4.5%), and sphere (10.4%), allocation gets worse for ny and yy compared to nn and yn. The nn and ny do not change the allocation compared to the baseline library (ie no LLF). 
    134159 
    135160The nn -> ny comparison is counter to our rough idea: floating more bindings (those that saturate/oversaturate some free variables) worsens allocation. Thus, I investigate. 
     
    186211We discovered that the worker-wrapper was removing the void argument from join points (eg knights and mandel2). This ultimately resulted in LLF *increasing* allocation. A thunk was let-no-escape before LLF but not after, since it occurred free in the right-hand side of a floated binding and hence now occurred (escapingly) as an argument. 
    187212 
    188 SPJ was expecting no such non-lambda join points to exist. We identified where it was happening (WwLib.mkWorkerArgs) and switched it off. Here are the programs that with affected allocation. 
     213SPJ was expecting no such non-lambda join points to exist. We identified where it was happening (`WwLib.mkWorkerArgs`) and switched it off. Here are the programs that with affected allocation. 
    189214 
    190215{{{