This page tracks @akio's attempt at reviving @nomeata's work on NestedCPR.

Latest code can be found at https://github.com/takano-akio/ghc/compare/master...nested-cpr .

## Design

### Changes to `CPRResult`

- In the product case (
`RedProd`

), keep one`DmdResult`

for each component. This is the main change in the branch.

- Add
`NeverReturns`

contructor. It means that no value is returned from this expression. It's different from`ThrowsExn`

in that it might perform some side effect. Indeed it might be an IO-performing infinite loop. The addition of this constructor is necessary to infer a nested CPR property for a tail- recursive function that does some I/O.

In summary, `CPRResult`

is changed from:

NoCPR / \ RetProd RetSum ConTag

to:

NoCPR / \ RetProd [DmdResult] RetSum ConTag \ / NeverReturns

### Changes to `DmdResult`

- The new
`Converges`

contructor is added to track definite convergence.

Dunno CPRResult / \ ThrowsExn Converges CPRResult (new) / Diverges

This information is used to tell when it is safe to perform a nested CPR worker-wrapper transformation. Unpacking a nested component in the return value is safe only when that component definitely converges.

### Changes to the demand analyzer

- Infer nested CPR property when possible.

- Slightly strenghten the strictness analysis so that in the following code,
both
`foo`

and`bar`

get the strictness`S(SS)`

on their first argument. Previously only`foo`

got`S(SS)`

, where`bar`

got`S`

.

{-# LANGUAGE BangPatterns #-} data Complex a = !a :+ !a foo :: Complex Double -> Int -> Complex Double foo !x 0 = x foo (a :+ b) _ = b :+ a bar :: Complex Double -> Int -> Complex Double bar x 1 = x bar (a :+ b) _ = b :+ a

### Changes to the worker-wrapper transformer

- Apply nested CPR transformation. For example, if a function that returns
`(# State# RealWorld, (Int, Int, Int) #)`

has the CPR information`m(t, tm(tm(t), m(t), t))`

, then the worker function would return the type`(# State# RealWorld, Int#, Int, Int #)`

, unboxing the triple and one of the`Int`

s.

## Examples

### simple.hs

This is a simple recursive function with an easy-to-spot nested CPR property.

Status: ok

module Foo where f :: Int -> (Int, Int) f 0 = (1, 2) f n | even n = f (div n 2) | otherwise = case f (n - 1) of (a, b) -> (a - 1, b)

### strictness.hs

This one is trickier in that the analysis has to use the nested strictness of `foo`

on `p`

to give `p`

a nested CPR property. Inspired by `nofib/imaginary/x2n1`

.

Status: ok

{-# LANGUAGE BangPatterns #-} module Foo where foo :: Int -> (Int, Int) -> (Int, Int) foo n p | even (n + uncurry (+) p), n /= 0 = foo (n - 1) p | n == 0 = (1, 2) | otherwise = p

### strict_field.hs

This one needs a correct handling of strict constructor fields. Inspired by `nofib/imaginary/x2n1`

.

Status: ok

{-# LANGUAGE BangPatterns #-} module Foo where data C a = C !a !a pow :: C Double -> Int -> C Double pow !_ 0 = C 0 1 pow !c 1 = c pow c n | even n = let d = pow c (div n 2) in mul d d | otherwise = mul c (pow c (n - 1)) mul :: C Double -> C Double -> C Double mul (C a b) (C d e) = C (a*d-b*e) (a*e+b*d)

### strict_field1.hs

This is similar to `strict_field.hs`

, but needs a more aggressive worker-wrapper.

Status: ok

Changing CPR analysis alone wouldn't help here. We need to give the function a better strictness as well.

module Foo (pow) where data C a = C !a !a pow :: C Double -> Int -> C Double pow x y | even y = pow (x `mul` x) (y `quot` 2) | y == 1 = x | otherwise = pow (x `mul` x) ((y - 1) `quot` 2) `mul` x mul :: C Double -> C Double -> C Double mul (C a b) (C d e) = C (a*d-b*e) (a*e+b*d)

### peek.hs

This example involves reading a tuple of Ints from memory.

Status: needs -fcpr-depth=4 or higher.

{-# LANGUAGE BangPatterns #-} module Foo (peek4) where import Foreign.Storable import Foreign.Ptr peek4 :: Ptr Int -> Ptr Int -> (Ptr Int -> IO (Ptr Int, Ptr Int)) -> IO (Ptr Int, Ptr Int, (Int, Int, Int, Int)) peek4 ptr end req | end `minusPtr` ptr >= 32 = do val <- (,,,) <$> peekByteOff ptr 0 <*> peekByteOff ptr 8 <*> peekByteOff ptr 16 <*> peekByteOff ptr 24 let !ptr' = ptr `plusPtr` 32 return (ptr', end, val) | otherwise = do (ptr', end') <- req ptr peek4 ptr' end' req

## beam-word-poke.hs

This is a real-world example taken from https://github.com/tsurucapital/beamable/blob/master/src/Data/Beamable/Internal.hs#L159.
It serializes a `Word`

using a variable-length encoding.

Status: ok

module Foo(beamWordPoke) where import Data.Bits import Data.Monoid import Data.Word import Foreign.Ptr import Foreign.Storable newtype Poke = Poke (Ptr Word8 -> IO (Ptr Word8)) instance Monoid Poke where mempty = Poke return mappend (Poke a) (Poke b) = Poke $ \ptr -> a ptr >>= b beamWordPoke :: Word -> Poke beamWordPoke n | next == 0 = pokeWord8 firstSeptet | otherwise = pokeWord8 (firstSeptet .|. 0x80) <> beamWordPoke next where firstSeptet :: Word8 firstSeptet = fromIntegral $ n .&. 0x7F next = n `shiftR` 7 pokeWord8 :: Word8 -> Poke pokeWord8 w = Poke $ \p -> do poke p w; return $! p `plusPtr` 1