Opened 3 years ago

Last modified 3 months ago

#5075 new feature request

CPR optimisation for sum types if only one constructor is used

Reported by: batterseapower Owned by: simonpj
Priority: normal Milestone: 7.6.2
Component: Compiler Version: 7.0.3
Keywords: Cc: tkn.akio@…, danielv@…, nfrisby@…, mail@…
Operating System: Unknown/Multiple Architecture: Unknown/Multiple
Type of failure: None/Unknown Difficulty: Unknown
Test Case: Blocked By:
Blocking: Related Tickets:

Description

Inspired by #3138, it might be useful for StrAnal? to detect functions such as the following where only one of the data constructors for a sum type are CPRed:

loop x = case x < 10 of True -> Left x; False -> loop (x*2)

We can usefully transform to:

$wloop x = case (case x < 10 of True -> Left x; False -> loop (x*2)) of Left y -> (# y #)
loop x = case loop x of (# y #) -> Left y

Attached patch implements this behaviour. Most of the complication in the new code occurs because adding a DataCon? field to the Demand data type means that we have to define a separate IfaceDemand? type for storage in interface files.

The patch validates but I haven't done any tests on nofib. I have confirmed that the new optimisation hits on some examples, though.

Attachments (2)

5075.dpatch (35.4 KB) - added by batterseapower 3 years ago.
5075-testsuite.dpatch (846 bytes) - added by batterseapower 3 years ago.

Download all attachments as: .zip

Change History (22)

Changed 3 years ago by batterseapower

Changed 3 years ago by batterseapower

comment:1 Changed 3 years ago by batterseapower

  • Status changed from new to patch

comment:2 Changed 3 years ago by simonpj

  • Owner set to simonpj

I'm working on this

comment:3 Changed 3 years ago by igloo

  • Milestone set to 7.4.1

comment:4 Changed 2 years ago by igloo

  • Milestone changed from 7.4.1 to 7.6.1

comment:5 Changed 2 years ago by simonpj

  • Difficulty set to Unknown

All this is on the cpr-sum-types branch. Awaiting attention from Simon!

comment:6 Changed 19 months ago by igloo

  • Milestone changed from 7.6.1 to 7.6.2

comment:7 Changed 16 months ago by akio

  • Cc tkn.akio@… added

comment:8 Changed 15 months ago by simonpj@…

commit d3b8991be3875302ca6d1a4ef6e72891e9567dd5

Author: Simon Peyton Jones <simonpj@microsoft.com>
Date:   Thu Jan 24 14:50:50 2013 +0000

    Introduce CPR for sum types (Trac #5075)
    
    The main payload of this patch is to extend CPR so that it
    detects when a function always returns a result constructed
    with the *same* constructor, even if the constructor comes from
    a sum type.  This doesn't matter very often, but it does improve
    some things (results below).
    
    Binary sizes increase a little bit, I think because there are more
    wrappers.  This with -split-objs.  Without split-ojbs binary sizes
    increased by 6% even for HelloWorld.hs.  It's hard to see exactly why,
    but I think it was because System.Posix.Types.o got included in the
    linked binary, whereas it didn't before.
    
            Program           Size    Allocs   Runtime   Elapsed  TotalMem
              fluid          +1.8%     -0.3%      0.01      0.01     +0.0%
                tak          +2.2%     -0.2%      0.02      0.02     +0.0%
               ansi          +1.7%     -0.3%      0.00      0.00     +0.0%
          cacheprof          +1.6%     -0.3%     +0.6%     +0.5%     +1.4%
            parstof          +1.4%     -4.4%      0.00      0.00     +0.0%
            reptile          +2.0%     +0.3%      0.02      0.02     +0.0%
    ----------------------------------------------------------------------
                Min          +1.1%     -4.4%     -4.7%     -4.7%    -15.0%
                Max          +2.3%     +0.3%     +8.3%     +9.4%    +50.0%
     Geometric Mean          +1.9%     -0.1%     +0.6%     +0.7%     +0.3%
    
    Other things in this commit
    ~~~~~~~~~~~~~~~~~~~~~~~~~~~
    * Got rid of the Lattice class in Demand
    
    * Refactored the way that products and newtypes are
      decomposed (no change in functionality)

 compiler/basicTypes/BasicTypes.lhs  |   17 ++
 compiler/basicTypes/DataCon.lhs     |   60 -----
 compiler/basicTypes/Demand.lhs      |  429 ++++++++++++++++++-----------------
 compiler/basicTypes/MkId.lhs        |   11 +-
 compiler/cmm/CLabel.hs              |    1 -
 compiler/coreSyn/CoreLint.lhs       |   12 +-
 compiler/coreSyn/CoreSyn.lhs        |    7 +-
 compiler/deSugar/DsCCall.lhs        |   45 ++++-
 compiler/deSugar/DsForeign.lhs      |    2 +-
 compiler/prelude/PrelRules.lhs      |    2 +-
 compiler/simplCore/Simplify.lhs     |    5 +-
 compiler/specialise/SpecConstr.lhs  |    2 +-
 compiler/stranal/DmdAnal.lhs        |   49 +++-
 compiler/stranal/WwLib.lhs          |  210 ++++++++----------
 compiler/types/Coercion.lhs         |   62 ++++--
 compiler/types/FamInstEnv.lhs       |   26 +--
 compiler/types/TyCon.lhs            |   25 ++-
 compiler/types/Type.lhs             |   26 +--
 compiler/vectorise/Vectorise/Exp.hs |   10 +-
 19 files changed, 515 insertions(+), 486 deletions(-)

comment:9 Changed 15 months ago by simonpj

Still to come: the sum-CPR stuff is switched off for nested functions because it turns some let-no-escape functions into non-let-no-escape ones, which increases allocation. I'm hopeful that Nick F's work on late-lambda-lifting may solve this, in which case we can switch it on more vigorously. Hence leaving open for now.

comment:10 Changed 14 months ago by danielv

  • Cc danielv@… added

comment:11 Changed 14 months ago by parcs

I noticed that sum CPR does not trigger for this function, when it seems like it should:

loop :: Int -> Maybe Int -> Maybe Int
loop n x = case n of
    0 -> x
    _ -> loop (n-1) (fmap (+1) x)

The specializations that the SpecConstr pass creates for this function seem to be perfect candidates for sum CPR:

Rec {
loop_$s$wloop
loop_$s$wloop =
  \ sc_sop sc1_sor ->
    case sc_sop of ds_Xmj {
      __DEFAULT ->
        loop_$s$wloop
          (-# ds_Xmj 1) (case sc1_sor of _ { I# x_amE -> I# (+# x_amE 1) });
      0 -> Just sc1_sor
    }
end Rec }

Rec {
loop_$s$wloop1
loop_$s$wloop1 =
  \ sc_soq ->
    case sc_soq of ds_Xmj {
      __DEFAULT -> loop_$s$wloop1 (-# ds_Xmj 1);
      0 -> Nothing
    }
end Rec }

Or am I mistaken?

comment:12 Changed 14 months ago by simonpj

  • Cc nfrisby@… added

Indeed. But the opportunity only arises after SpecConstr, and we don't currently run the demand/CPR analyser after SpecConstr, we run it before.

Several other tickets (#6087, #5302, #6070) identify other opportunities that could be exploited by running the demand analyser again, late in the optimisation pipeline.

Nick Frisby is planning to experiment with this. Thanks for a new example.

Simon

comment:13 Changed 13 months ago by nfrisby

Confirmed: the Just gets ww'd-away with this core2core pipeline: SpecConstr, simpl, stranal, wwsplit, simpl.

Rec {
$w$s$wloop
$w$s$wloop =
  \ w_sqX w1_sqY ->
    case w_sqX of ds_Xng {
      __DEFAULT ->
        $w$s$wloop
          (-# ds_Xng 1) (case w1_sqY of _ { I# x_anM -> I# (+# x_anM 1) });
      0 -> (# w1_sqY #)
    }
end Rec }

loop_$s$wloop [InlPrag=INLINE[0]]
  :: ...
[...
 Str=DmdType <S,U><L,U(U)>m2,
 Unf=Unf{Src=Worker=$w$s$wloop,...}]
loop_$s$wloop =
  \ w_sqX w1_sqY ->
    case $w$s$wloop w_sqX w1_sqY of _ { (# ww1_sr4 #) -> Just ww1_sr4 }

Rec {
loop_$s$wloop1
loop_$s$wloop1 =
  \ sc_sqe ->
    case sc_sqe of ds_Xng {
      __DEFAULT -> loop_$s$wloop1 (-# ds_Xng 1);
      0 -> Nothing
    }
end Rec }

That's an impressive prefix, "$w$s$w".

In case any reader is wondering, the second argument to the loop gets unboxed if a strict variant of Maybe is used.

Thanks again for the compact example.

comment:14 Changed 12 months ago by igloo

  • Owner simonpj deleted
  • Status changed from patch to new

It looks like the patch has been applied to the cpr-sum-types branch.

comment:15 Changed 12 months ago by igloo

  • Owner set to simonpj

comment:16 Changed 5 months ago by nomeata

  • Cc mail@… added

comment:17 Changed 3 months ago by nomeata

Just to be clear: This ticket is left open because of comment:9, right?

Also, in case someone looks here: The branch cpr-sum-types is considered dead; Its last commit ([9b0d70e/ghc]) is from October 2011, while CPR for sum types has hit master in January 2013 ([d3b8991/ghc]). (Maybe we should rename dead branches to dead/...?)

comment:18 Changed 3 months ago by simonpj

Yes, I think it's just open because of comment 9

comment:19 Changed 3 months ago by nomeata

In that case, let me add some numbers related to sum type CPR in local functions.

With current master [4d70840d/ghc], enabling CPR for sum types has this effect:

--------------------------------------------------------------------------------
        Program           Size    Allocs   Runtime   Elapsed  TotalMem
--------------------------------------------------------------------------------
           ansi          +8.6%     +0.5%      0.00      0.00     +0.0%
         awards          +8.6%     +0.2%      0.00      0.00     +0.0%
      cacheprof          +8.0%     +0.2%     -3.1%     -3.1%     +0.0%
       calendar          +8.6%     +0.2%      0.00      0.00     +0.0%
        circsim          +8.5%     +1.1%     -8.7%     -8.7%     -2.0%
   cryptarithm1          +8.7%     -3.8%     +0.5%     +0.5%     +0.0%
            cse          +8.6%     +0.2%      0.00      0.00     +0.0%
       fibheaps          +8.6%     -0.3%      0.02      0.02     +0.0%
          fluid          +7.7%     +3.4%      0.01      0.01     +0.0%
         fulsom          +8.3%     +0.1%      0.19      0.19    -31.2%
         hidden          +8.3%     +1.8%     +3.4%     +6.2%     +0.0%
          kahan          +8.1%     +0.1%      0.17      0.17     +0.0%
      listcompr          +8.6%     +0.4%      0.06      0.05     +0.0%
       listcopy          +8.6%     +0.4%      0.07      0.06     +0.0%
         mandel          +7.5%     +0.3%      0.05      0.05     +0.0%
        mandel2          +8.7%    +33.8%      0.00      0.00     +0.0%
         parser          +8.1%     +1.7%      0.02      0.02     +0.0%
      primetest          +8.7%     -0.2%      0.07      0.07     +0.0%
        reptile          +8.5%     +0.2%      0.01      0.01     +0.0%
        rewrite          +8.5%     +0.1%      0.02      0.02     +0.0%
           rfib          +8.7%     +0.6%      0.02      0.02     +0.0%
  spectral-norm          +8.5%     +0.7%     -0.6%     -0.6%     +0.0%
         symalg          +8.3%     +0.2%      0.01      0.01     +0.0%
            tak          +8.6%     +1.3%      0.01      0.01     +0.0%
      transform          +8.3%     +0.1%     +0.0%     +0.0%     +0.0%
--------------------------------------------------------------------------------
            Min          +6.8%     -3.8%    -11.3%    -11.3%    -31.2%
            Max          +8.7%    +33.8%     +3.9%     +6.2%     +6.1%
 Geometric Mean          +8.4%     +0.4%     -1.8%     -1.8%     -0.4%

So one quite extreme degradation, and otherwise minor losses.

If I make sure that join points do not get the CPR information unless the expression they are a join point for does, we get this change (relative to the same baseline):

--------------------------------------------------------------------------------
        Program           Size    Allocs   Runtime   Elapsed  TotalMem
--------------------------------------------------------------------------------
      cacheprof          +8.0%     +0.3%     -4.7%     -4.7%     +1.8%
       calendar          +8.6%     +0.1%      0.00      0.00     +0.0%
        circsim          +8.5%     +0.2%     -8.4%     -8.4%     -2.0%
   cryptarithm1          +8.6%     -3.8%     +1.5%     +1.5%     +0.0%
            cse          +8.6%     +0.2%      0.00      0.00     +0.0%
       fibheaps          +8.5%     -0.3%      0.03      0.03     +0.0%
         fulsom          +8.2%     -3.7%      0.19      0.19    -31.2%
          kahan          +8.1%     -0.2%      0.17      0.17     +0.0%
      listcompr          +8.6%     +0.4%      0.07      0.06     +0.0%
       listcopy          +8.6%     +0.4%      0.07      0.06     +0.0%
       maillist          +8.6%     +0.0%      0.04      0.04    +16.0%
         mandel          +7.4%     +0.2%      0.05      0.05     +0.0%
         parser          +8.1%     +1.7%      0.02      0.02     +0.0%
      primetest          +8.6%     -0.2%      0.07      0.07     +0.0%
        rewrite          +8.5%     +0.1%      0.01      0.01     +0.0%
--------------------------------------------------------------------------------
            Min          +6.8%     -3.8%     -8.4%     -8.4%    -31.2%
            Max          +8.7%     +1.7%     +3.2%     +3.4%    +16.0%
 Geometric Mean          +8.3%     -0.0%     -2.0%     -1.9%     -0.2%

This looks quite different now: A few losses, but much smaller than before. A mean of zero, and more gains than before. Unfortunately, the code size increase is still there...

If it were not for the code size increase, I’d suggest to merge these changes: Fewer special cases in transformations is a good thing. But it is probably not worth the code size increase, is it?

(Just for the record: In theory it is possible that CPR’ing a non-sum-type can destroy the join-point-property of a let. But it does not seem to happen: If I enable the cpr-join-point-fix, but keep CPR for sum types disabled, nofib does not record any change whatsoever.)

comment:20 Changed 3 months ago by simonpj

An 8% increase in binary size is pretty dramatic. I wonder why that is happening. (Not every easy to find out.)

Note: See TracTickets for help on using tickets.