Full laziness destroys opportunities for join points

Even if we already know a binding is a join point we STILL float it to the top and turn it into a function.

The simple example below results in a join point after the first simplifier run. Then we run the float out pass immediately undoing this by making it a top level binding.

It then stays at the top till we are done resulting in the core I've put in the comments.

data T = A | B | C | D | E | F | G

{-# NOINLINE n #-}
n :: T -> T
n A = B
n B = C
n _ = A

f :: Int -> T -> T -> T
f sel x y =
    -- function large enough to avoid being simply inlined
    let j z = n . n . n . n . n . n $ z
    in case sel of
        -- j is always tailcalled
        0   -> j x
        _   -> j y

-- j is floated to top level instead of ending up as joinpoint.
-- T.f_j
--   = \ (eta_B1 [OS=OneShot] :: T) -> n (n (n (n (n (n eta_B1)))))

-- -- RHS size: {terms: 14, types: 6, coercions: 0, joins: 0/0}
-- f :: Int -> T -> T -> T
-- f = \ (sel_aYP :: Int) (x_aYQ :: T) (y_aYR :: T) ->
--       case sel_aYP of { GHC.Types.I# ds_d2fL ->
--       case ds_d2fL of {
--         __DEFAULT -> T.f_j y_aYR;
--         0# -> T.f_j x_aYQ
--       }
--       }

Trac metadata

Trac field	Value
Version	8.4.3
Type	Bug
TypeOfFailure	OtherFailure
Priority	normal
Resolution	Unresolved
Component	Compiler (CodeGen)
Test case
Differential revisions
BlockedBy
Related	#14287
Blocking
CC
Operating system
Architecture

Current plan (as of Dec. 19th 2023)

A brief summary

"destroys" is an apt description but the mechanism is this:

some join points get lifted to the top level
because they get lifted they no longer are join points and instead become top level functions with all the consequences of top level functions, as this comment points out.
But these previous join point functions have some nice properties:
- They are never exported because they were floated out
- All call sites to them are known and saturated, again because they began life as a join point

The Conceptual Plan

So the plan is to not focus on join points but instead optimize all top level functions that that are local (i.e., not exported) and whose call sites are all known. Optimizing these functions will, in effect, also optimize the previously-join-point-now-top-level functions.

The Optimizations

SPJ provides a nice overview in this comment:

Elide the info-table and closure generation for top level functions that are local and whose call sites are all known and saturated. This should reduce the code size increase that occurs from the join point -> top level function conversion.
Elide the stack overflow check for top level functions that have the properties of (1) and are tail recursive, call other top level functions with the same properties, or are not recursive.
Eliminate Heap Checks by absorbing the checks into the caller of the function. This is an orthogonal optimization to this ticket and is currently not done for join points nor top level functions. Please see Simon's comment linked above.

The Implementation Plan

Implement the optimizations in order:

For (1):

Add a phase to Stg called CgPrep for code gen prep.
Absorb the InferTags pass into CgPrep. InferTags already defines a CgPrep pass and a CgInfo record that is passed to the code generator for each backend, so this item is just a refactoring.
Add a pass to CgPrep that detects and records all Ids that are functions and whose call sites are all known and saturated. Pass this set of Ids to CgInfo.
For a backend b, use the new field in CgInfo to elide the code generation for info-tables and closures for top-level local functions whose Ids are also elements the new field.

The strategy here is to keep inspection of call sites at Stg instead of StgToCmm so that different backends (for example the JS backend) can implement (1) in their own code generator.

For (2):

Todo

For (3):

Should be tracked in another ticket (Todo)

Edited Dec 20, 2023 by doyougnu

To upload designs, you'll need to enable LFS and have an admin enable hashed storage. More information