Full laziness destroys opportunities for join points
Even if we already know a binding is a join point we STILL float it to the top and turn it into a function.
The simple example below results in a join point after the first simplifier run. Then we run the float out pass immediately undoing this by making it a top level binding.
It then stays at the top till we are done resulting in the core I've put in the comments.
data T = A | B | C | D | E | F | G
{-# NOINLINE n #-}
n :: T -> T
n A = B
n B = C
n _ = A
f :: Int -> T -> T -> T
f sel x y =
-- function large enough to avoid being simply inlined
let j z = n . n . n . n . n . n $ z
in case sel of
-- j is always tailcalled
0 -> j x
_ -> j y
-- j is floated to top level instead of ending up as joinpoint.
-- T.f_j
-- = \ (eta_B1 [OS=OneShot] :: T) -> n (n (n (n (n (n eta_B1)))))
-- -- RHS size: {terms: 14, types: 6, coercions: 0, joins: 0/0}
-- f :: Int -> T -> T -> T
-- f = \ (sel_aYP :: Int) (x_aYQ :: T) (y_aYR :: T) ->
-- case sel_aYP of { GHC.Types.I# ds_d2fL ->
-- case ds_d2fL of {
-- __DEFAULT -> T.f_j y_aYR;
-- 0# -> T.f_j x_aYQ
-- }
-- }
Trac metadata
Trac field | Value |
---|---|
Version | 8.4.3 |
Type | Bug |
TypeOfFailure | OtherFailure |
Priority | normal |
Resolution | Unresolved |
Component | Compiler (CodeGen) |
Test case | |
Differential revisions | |
BlockedBy | |
Related | #14287 |
Blocking | |
CC | |
Operating system | |
Architecture |
Current plan (as of Dec. 19th 2023)
A brief summary
"destroys" is an apt description but the mechanism is this:
- some join points get lifted to the top level
- because they get lifted they no longer are join points and instead become top level functions with all the consequences of top level functions, as this comment points out.
- But these previous join point functions have some nice properties:
- They are never exported because they were floated out
- All call sites to them are known and saturated, again because they began life as a join point
The Conceptual Plan
So the plan is to not focus on join points but instead optimize all top level functions that that are local (i.e., not exported) and whose call sites are all known. Optimizing these functions will, in effect, also optimize the previously-join-point-now-top-level functions.
The Optimizations
SPJ provides a nice overview in this comment:
-
Elide the info-table and closure generation for top level functions that are local and whose call sites are all known and saturated. This should reduce the code size increase that occurs from the
join point -> top level function
conversion. -
Elide the stack overflow check for top level functions that have the properties of (1) and are tail recursive, call other top level functions with the same properties, or are not recursive.
-
Eliminate Heap Checks by absorbing the checks into the caller of the function. This is an orthogonal optimization to this ticket and is currently not done for join points nor top level functions. Please see Simon's comment linked above.
The Implementation Plan
Implement the optimizations in order:
For (1):
- Add a phase to Stg called
CgPrep
forcode gen prep
. - Absorb the
InferTags
pass intoCgPrep
. InferTags already defines a CgPrep pass and aCgInfo
record that is passed to the code generator for each backend, so this item is just a refactoring. - Add a pass to
CgPrep
that detects and records allId
s that are functions and whose call sites are all known and saturated. Pass this set ofId
s toCgInfo
. - For a backend
b
, use the new field inCgInfo
to elide the code generation for info-tables and closures for top-level local functions whoseId
s are also elements the new field.
The strategy here is to keep inspection of call sites at Stg
instead of StgToCmm
so that different backends (for example the JS backend) can implement (1) in their own code generator.
For (2):
- Todo
For (3):
- Should be tracked in another ticket (Todo)