Version 1 (modified by nomeata, 15 months ago) (diff) |
---|

## wave4main

Baseline: [0e2fd3/ghc], Tested: nested-cpr (without nesting inside sum-types, without join-point detection).

Found a 11% increase in allocation, around `9000000` bytes.

The most obvious change in ticky-ticky-number are:

`FUNCTION ENTRIES`and`ENTERS`increasing by ~100000`RETURNS`doubling from 140745 to 280795-
`ALLOC_FUN_ctr`and`ALLOC_FUN_gds`almost doubling, by ~18000 resp. 9000000

So we are allocating more function closures. First guess: Join point property destroyed somewhere.

The ticky output shows a `$wgo{v s60k} (main:Main)` appearing that was not there before, with `140016` enters and `23522688` allocations. This appears in `$wtabulate`, and indeed corresponds to a `go1` that is a join-point before. So what is happening? We are changing

go1 [Occ=LoopBreaker] :: GHC.Prim.Int# -> GHC.Prim.State# s -> (# GHC.Prim.State# s, GHC.Arr.Array GHC.Types.Int x #)

to

$wgo [Occ=LoopBreaker] :: GHC.Prim.Int# -> GHC.Prim.State# s -> (# GHC.Prim.State# s, GHC.Prim.Int#, GHC.Prim.Int#, GHC.Prim.Int#, GHC.Prim.Array# x #)

`go1` is recursive, but tail-recursive, so the worker and wrapper indeed cancel for the recursive call. But where it is being used, we simply apply the `Array` constructor to the second component. So nothing is gained, but a join-point is lost.

My attempt below to detect join points does not help: The CPR information for `go1` is the same as for `let go1 = rhs in body`, as `body` is just `go1 ww ipv`.

The problem is that this is being passed as an argument to a function. In this case `runSTRep` (which is the function its being passed to) might help, but what is the general solution?

In this case, inlining `runSTRep` helps a bit, as it decreases allocation to `+6.6%`. (Whether that is correct is quite dubious). But in order to propagate the CPR’ing to tabulate, it seems one needs to pass nested CPR information from a case scrunitee to its components.

So I implement this, and now `tabulate` itself gets a CPR property, which comes from the second component of the `ST` action, as extracted by `runSTRep`. But we still do not retain the join-point-property, as the code that throws away the `ST` token is still there:

case $wgo ww ipv of _ [Occ=Dead] { (# ww3, ww4, ww5, ww6, ww7 #) -> (# ww4, ww5, ww6, ww7 #)

At that point I am running out of ideas.

## Summary with simple expressions

Original code:

f a x = case a of True -> case foo of b -> snd $ let go 0 = (1,(2,3)) go n = go (n-1) in go b False -> undefined

Initial CPR transformation yields, where `$go` is no longer a join-point for the argument to `snd`.

f a x = case a of True -> case foo of b -> snd $ let $wgo 0 = (# 1, 2, 3 #) $wgo n = go (n-1) in case $wgo b of (# a, b, c #) -> (a, (b,c)) False -> undefined

Inlining `snd` does not help a lot:

f a x = case a of True -> case foo of b -> let go 0 = (1,(2,3)) go n = go (n-1) in case (go b) of (x,y) -> y False -> undefined

still turns into

f a x = case a of True -> case foo of b -> snd $ let $wgo 0 = (# 1, 2, 3 #) $wgo n = go (n-1) in case $wgo b of (# a, b, c #) -> (b,c) False -> undefined

Furthermore allowing CPR information to flow from scrunitee to bound variables gives `f` a CPR property (which is good), but does not solve the problem.

$wf a x = case a of True -> case foo of b -> snd $ let $go 0 = (# 1, 2, 3 #) $go n = go (n-1) in case $go b of (# a, b, c #) -> (# b, c #) False -> undefined