| 86 | This is the simplest to mitigate: we do not float `f` if it ever occurs unapplied. |
| 87 | |
| 88 | ==== Mitigating Thunk Growth ==== |
| 89 | |
| 90 | In nucleic2, we floated a binding with 11 free variables. But this binder occurred in about 60 thunks, so many closures grew by ~11 pointers, giving a +2.2% allocation change (as opposed to -0.9%). |
| 91 | |
| 92 | We've considered three heuristics for avoiding this. In ascending complexity: |
| 93 | |
| 94 | 1. (easy) Limit the number of free variables the binding is allowed. |
| 95 | 1. in-thunk: If `f` occurs inside of a thunk in `BODY`, then limit its free variables. |
| 96 | 1. thunk-growth: Approximate the maximum number of free variables that floating `f `would add to a thunk in `BODY`, and limit that. |
| 97 | |
| 98 | We did not implement the first one, since in-thunk is not very complicated. thunk-growth is significantly more complicated. |
| 99 | |
| 100 | * The question of whether `f` occurs in a thunk is not simple. |
| 101 | * We count non-trivial arguments as thunks; but not all non-trivial arguments end up as thunks. |
| 102 | * We do not count lambda-forms as thunks, since the lambda will hopefully be floated. |
| 103 | * Estimating the effect of floating `f` on such a thunk's (call it `t`) closure size is more complicated. |
| 104 | * Another floated function (say `g`) may also add some of `f`'s free variables to `t`; we shouldn't penal both `f` and `g` for that. |
| 105 | * If `f` itself has a free variable, say `h`, which is a binder that gets floated, then floating `f` will also add `h`'s free variables to `t`. |
| 106 | |
| 107 | Therefore, these are rough approximations. Being more accurate would require changing the setLevels pass instead of just the simpler first pass (the one that only analyzes the original term). |
| 108 | |
| 109 | We tried limits of 32, 16, 8, and 4 to differentiate between the last two. At a limit of 8, the allocation increase in ida and puzzle were 1.3 and 2.9 percentage points better with thunk-growth than with in-thunk. But there were no differences around 10 --- which is the lowest we can go while improving nucleic2, anyway --- so we're adopting in-thunk for now. |
| 110 | |
| 111 | There might have been some potential benefits to run-time from thunk-growth versus in-thunk (with limit=8, 35 percentage points better on constraint, eg), but we're not confident in those measurements. |
| 112 | |
| 113 | ==== Preserving Fast Entries ==== |
| 114 | |