wiki:NestedCPR/wave4main

Version 2 (modified by nomeata, 3 months ago) (diff)

--

wave4main

The analysis below is obsolete. In this case, go1 does in fact not have the nested CPR property, as the components are not surely terminating: DmdType <S,U><S,U>m#(,m(tm(),tm(),t,).

But if they were, the problem would manifest...

Obsolete:

Baseline: [0e2fd3/ghc], Tested: nested-cpr (without nesting inside sum-types, without join-point detection).

Found a 11% increase in allocation, around 9000000 bytes.

The most obvious change in ticky-ticky-number are:

  • FUNCTION ENTRIES and ENTERS increasing by ~100000
  • RETURNS doubling from 140745 to 280795
  • ALLOC_FUN_ctr and ALLOC_FUN_gds almost doubling, by ~18000 resp. 9000000

So we are allocating more function closures. First guess: Join point property destroyed somewhere.

The ticky output shows a $wgo{v s60k} (main:Main) appearing that was not there before, with 140016 enters and 23522688 allocations. This appears in $wtabulate, and indeed corresponds to a go1 that is a join-point before. So what is happening? We are changing

go1 [Occ=LoopBreaker]                                      
  :: GHC.Prim.Int#                                         
     -> GHC.Prim.State# s                                  
     -> (# GHC.Prim.State# s, GHC.Arr.Array GHC.Types.Int x #)

to

$wgo [Occ=LoopBreaker]          
  :: GHC.Prim.Int#
     -> GHC.Prim.State# s
     -> (# GHC.Prim.State# s,   
           GHC.Prim.Int#,       
           GHC.Prim.Int#,       
           GHC.Prim.Int#,       
           GHC.Prim.Array# x #) 

go1 is recursive, but tail-recursive, so the worker and wrapper indeed cancel for the recursive call. But where it is being used, we simply apply the Array constructor to the second component. So nothing is gained, but a join-point is lost.

My attempt below to detect join points does not help: The CPR information for go1 is the same as for let go1 = rhs in body, as body is just go1 ww ipv.

The problem is that this is being passed as an argument to a function. In this case runSTRep (which is the function its being passed to) might help, but what is the general solution?

In this case, inlining runSTRep helps a bit, as it decreases allocation to +6.6%. (Whether that is correct is quite dubious). But in order to propagate the CPR’ing to tabulate, it seems one needs to pass nested CPR information from a case scrunitee to its components.

So I implement this, and now tabulate itself gets a CPR property, which comes from the second component of the ST action, as extracted by runSTRep. But we still do not retain the join-point-property, as the code that throws away the ST token is still there:

case $wgo ww ipv of _ [Occ=Dead] { (# ww3, ww4, ww5, ww6, ww7 #) ->
(# ww4, ww5, ww6, ww7 #)

At that point I am running out of ideas.

Summary with simple expressions

Original code:

f a x = case a of
  True -> case foo of b -> snd $
    let go 0 = (1,(2,3))
        go n = go (n-1)
    in go b
  False -> undefined

Initial CPR transformation yields, where $go is no longer a join-point for the argument to snd.

f a x = case a of
  True -> case foo of b -> snd $
    let $wgo 0 = (# 1, 2, 3 #)
        $wgo n = go (n-1)
    in case $wgo b of (# a, b, c #) -> (a, (b,c))
  False -> undefined

Inlining snd does not help a lot:

f a x = case a of
  True -> case foo of b ->
    let go 0 = (1,(2,3))
        go n = go (n-1)
    in case (go b) of (x,y) -> y
  False -> undefined

still turns into

f a x = case a of
  True -> case foo of b -> snd $
    let $wgo 0 = (# 1, 2, 3 #)
        $wgo n = go (n-1)
    in case $wgo b of (# a, b, c #) -> (b,c)
  False -> undefined

Furthermore allowing CPR information to flow from scrunitee to bound variables gives f a CPR property (which is good), but does not solve the problem.

$wf a x = case a of
  True -> case foo of b -> snd $
    let $go 0 = (# 1, 2, 3 #)
        $go n = go (n-1)
    in case $go b of (# a, b, c #) -> (# b, c #)
  False -> undefined