Opened 9 years ago

Last modified 7 months ago

#2940 new bug

Do CSE after CorePrep

Reported by: simonpj Owned by: simonpj
Priority: lowest Milestone:
Component: Compiler Version: 6.10.1
Keywords: CSE Cc: felix.lequen@…, slyfox, osa1
Operating System: Unknown/Multiple Architecture: Unknown/Multiple
Type of failure: Runtime performance bug Test Case:
Blocked By: Blocking:
Related Tickets: Differential Rev(s):
Wiki Page:

Description

Common sub-expression analysis is deliberately conservative, but it's really too conservative: we are missing obvious opportunities. Consider

{-# OPTIONS_GHC -XBangPatterns -XMagicHash #-}

module Foo where

import GHC.Base

-- CorePrep evaluates (reverse xs) twice
f xs = let !v1 = reverse (reverse xs)
       	   !v2 = filter id (reverse xs)
       in (v1, v2)

-- Even CSE inside CorePrep would not get this right;
-- the strict evaluation of (reverse xs) doesn't scope
-- over the non-strict version
g xs = reverse (reverse xs) ++ filter id (reverse xs)


-- Duplicate evaluation of (x +# 1#)
h :: Int# -> ( Int, Int )
h x = ( I# (x +# 1#), I# ((x +# 1#) *# 2#) )

If you compile this you'll see that there are obvious missed CSE opportunities in f, g and h; but they only show up after CorePrep.

I guess the right thing is to CSE after CorePrep, but then CSE would have to maintain the CorePrep invariants, which isn't trivial. Something to think about.

Simon

Change History (25)

comment:1 Changed 9 years ago by igloo

Milestone: 6.12 branch

comment:2 Changed 8 years ago by simonmar

Type of failure: Runtime performance bug

comment:3 Changed 7 years ago by igloo

Milestone: 6.12 branch6.12.3

comment:4 Changed 7 years ago by igloo

Milestone: 6.12.36.14.1
Priority: normallow

comment:5 Changed 7 years ago by igloo

Milestone: 7.0.17.0.2

comment:6 Changed 7 years ago by igloo

Milestone: 7.0.27.2.1

comment:7 Changed 6 years ago by igloo

Milestone: 7.2.17.4.1

comment:8 Changed 6 years ago by igloo

Milestone: 7.4.17.6.1
Priority: lowlowest

comment:9 Changed 5 years ago by felixl

Cc: felix.lequen@… added

I think it would indeed be a good idea to do CSE after CorePrep (although I'm not at all a GHC expert), but after some Core dumping, there's a problem with this solution: the inliner seems to do his work before CorePrep, so that with -O6 for example, some potential opportunities for CSE disappear… Well, that would still probably give better results than the current CSE, but for even more simplifications, it might be necessary to do inlining after CorePrep, I think.

comment:10 Changed 5 years ago by igloo

Milestone: 7.6.17.6.2

comment:11 Changed 3 years ago by thoughtpolice

Milestone: 7.6.27.10.1

Moving to 7.10.1.

comment:12 Changed 3 years ago by thoughtpolice

Milestone: 7.10.17.12.1

Moving to 7.12.1 milestone; if you feel this is an error and should be addressed sooner, please move it back to the 7.10.1 milestone.

comment:13 Changed 3 years ago by thoughtpolice

Moving to 7.12.1 milestone; if you feel this is an error and should be addressed sooner, please move it back to the 7.10.1 milestone.

comment:14 Changed 2 years ago by thoughtpolice

Milestone: 7.12.18.0.1

Milestone renamed

comment:15 Changed 2 years ago by slyfox

Cc: slyfox added

comment:16 Changed 2 years ago by osa1

Cc: osa1 added

comment:17 Changed 2 years ago by osa1

Simon, I was experimenting with CSE and unless I'm missing something I think current CSE is basically useless and we can just remove it.

As far as I understand from the implementation, current CSE implementation never introduces a let binding. So for example if I have something like:

g x = (x + 1, x + 1)

Current implementation never does CSE here. What's worse is that even if I have a let binding like this:

g x = let x_1 = x + 1
       in (x_1, x + 1)

Most of the time CSE can't do anything, because some other pass inlines/unfolds x_1 here(even with -O0). g above is compiled to this:

g :: Int -> (Int, Int)
g =
  \ (x_atS :: Int) ->
    (case x_atS of _ { I# x_a1IL -> I# (+# x_a1IL 1#) },
     case x_atS of _ { I# x_a1IL -> I# (+# x_a1IL 1#) })

See how let binding is removed. CSE can't do anything here. Funny thing is even if I mark this let binding as NOINLINE CSE can't work, because of Note [CSE for INLINE and NOINLINE]. This is what happens when I mark x_1 as NOINLINE:

g' :: Int -> (Int, Int)
g' =
  \ (x_av2 :: Int) ->
    let {
      x_1_s1Jd :: Int
      x_1_s1Jd = case x_av2 of _ { I# x_a1IL -> I# (+# x_a1IL 1#) } } in
    (case x_av2 of _ { I# x_a1IL -> I# (+# x_a1IL 1#) }, x_1_s1Jd)

There's a clear CSE opportunity missed because of Note [CSE for INLINE and NOINLINE].

So unless we actually run this after CorePrep it just can't do anything.

I tried on a couple of examples and it seems like current CSE implementation actually maintains invariants at least some of the time. For example, it works find on the code in the code above. Do we have something like CoreLint but fore CorePrep invariants?


In the examples I tried CSE never worked before the CorePrep step. After CorePrep it worked some of the time. For example, g above is compiled to this:

-- RHS size: {terms: 17, types: 8, coercions: 0}
g :: Int -> (Int, Int)
g =
  \ (x_s2nV :: Int) ->
    let {
      sat_s2o3 :: Int
      sat_s2o3 =
        case x_s2nV of wild_s2o0 { I# x1_s2o1 ->
        case +# x1_s2o1 1# of sat_s2o2 { __DEFAULT -> I# sat_s2o2 }
        } } in
    let {
      sat_s2nZ :: Int
      sat_s2nZ = sat_s2o3 } in
    (sat_s2o3, sat_s2o3)

h also CSEs nicely if we run it after CorePrep:

-- RHS size: {terms: 23, types: 8, coercions: 0}
h :: Int# -> (Int, Int)
h =
  \ (x_s2nq :: Int#) ->
    case +# x_s2nq 1# of sat_s2nt { __DEFAULT ->
    case *# sat_s2nt 2# of sat_s2nu { __DEFAULT ->
    let {
      sat_s2nv :: Int
      sat_s2nv = I# sat_s2nu } in
    case sat_s2nt of sat_s2nr { __DEFAULT ->
    let {
      sat_s2ns :: Int
      sat_s2ns = I# sat_s2nt } in
    (sat_s2ns, sat_s2nv)
    }
    }
    }

It doesn't work on rev, because of how CorePrep generates a program without any CSE opportunity:

-- RHS size: {terms: 23, types: 19, coercions: 0}
Main.rev :: [GHC.Types.Bool] -> [GHC.Types.Bool]
Main.rev =
  \ (xs_s2nl :: [GHC.Types.Bool]) ->
    let {
      sat_s2np :: [GHC.Types.Bool]
      sat_s2np =
        case GHC.List.reverse1
               @ GHC.Types.Bool xs_s2nl (GHC.Types.[] @ GHC.Types.Bool)
        of sat_s2no { __DEFAULT ->
        GHC.List.filter
          @ GHC.Types.Bool (GHC.Base.id @ GHC.Types.Bool) sat_s2no
        } } in
    case GHC.List.reverse1
           @ GHC.Types.Bool xs_s2nl (GHC.Types.[] @ GHC.Types.Bool)
    of sat_s2nm { __DEFAULT ->
    case GHC.List.reverse1
           @ GHC.Types.Bool sat_s2nm (GHC.Types.[] @ GHC.Types.Bool)
    of sat_s2nn { __DEFAULT ->
    GHC.Base.++ @ GHC.Types.Bool sat_s2nn sat_s2np
    }
    }

Ideally we'd like to generate a binding for GHC.List.reverse1 @ GHC.Types.Bool xs_s2nl (GHC.Types.[] @ GHC.Types.Bool) but since this expression is not bound to a let-binder we can't CSE.

(This would work if had the invariant of having only variables(and literals) in scrutinee position)

For the record, this is what's generated by CSE after CorePrep:

rev :: [Bool] -> [Bool]
rev =
  \ (xs_s2nl :: [Bool]) ->
    let {
      sat_s2np :: [Bool]
      sat_s2np =
        case reverse1 @ Bool xs_s2nl ([] @ Bool) of sat_s2no { __DEFAULT ->
        filter @ Bool (id @ Bool) sat_s2no
        } } in
    case reverse1 @ Bool xs_s2nl ([] @ Bool) of sat_s2nm { __DEFAULT ->
    case reverse1 @ Bool sat_s2nm ([] @ Bool)
    of sat_s2nn { __DEFAULT ->
    ++ @ Bool sat_s2nn sat_s2np
    }
    }

I want to work on running CSE after CorePrep, but before that I'm planning implementing a CoreLint-like pass for checking CorePrep invariants. Simon, does that make sense? Any ideas? Am I missing anything in my observations?

comment:18 Changed 23 months ago by simonpj

See MoreCSE.

I'm inclined to think that it'd be better to do CSE earlier not later.

The main idea (on the MoreCSE page) is this:

  • Make float-out more aggressive by (a) let-binding every sub-expression, and (b) floating let-bindings out even if they don't escape a lambda. So in your first example
    g x = (x + 1, x + 1)
    
    we'd get
    g x = let v1 = x+1 in
          let v2 = x+2 in
          (v1,v2)
    
  • Now do CSE as now. The extra let-bindings should give a lot more more opportunities.
  • Float inwards to reverse the effect of the first pass if CSE did not bite.

Also talk to Joachim who wrote MoreCSE.

comment:19 Changed 23 months ago by nomeata

I did? Oh dear. It must have been before I started to dislike CSE very much, because it gets in the way of rules, and can creasing sharing (e.g. by cse’ing [0..1000]). Or maybe it was precisely when I startd to dislike CSE... :-)

Speaking about the phases CSE should happen: I very much believe we need a CSE phase as an Stg-to-Stg-transformation, to avoid reallocating Lefts in the code for a Either’s fmap, for example. (There is a ticket for this, cannot find it right now.)

comment:20 Changed 23 months ago by osa1

Why do we need to do CSE in STG level to avoid allocation in Eithers fmap? What does STG give us extra over Core? Do you mean not having type terms help? If so, how?

-- RHS size: {terms: 14, types: 17, coercions: 0}
Main.$fFunctorEither1_$cfmap
  :: forall a_aOH a1_avU b_avV.
     (a1_avU -> b_avV) -> Either1 a_aOH a1_avU -> Either1 a_aOH b_avV
Main.$fFunctorEither1_$cfmap =
  \ (@ a_aOH)
    (@ a1_aOL)
    (@ b_aOM)
    (ds_dR3 :: a1_aOL -> b_aOM)
    (ds1_dR4 :: Either1 a_aOH a1_aOL) ->
    case ds1_dR4 of _ {
      Left1 x_avi -> Main.Left1 @ a_aOH @ b_aOM x_avi;
      Right1 y_avk -> Main.Right1 @ a_aOH @ b_aOM (ds_dR3 y_avk)
    }

EDIT: Oh right, I guess that makes sense. This is STG for the same code:

Main.$fFunctorEither1_$cfmap
  :: forall a_aOH a1_avU b_avV.
     (a1_avU -> b_avV)
     -> Main.Either1 a_aOH a1_avU -> Main.Either1 a_aOH b_avV =
    \r srt:SRT:[] [ds_s2pC ds1_s2pD]
        case ds1_s2pD of _ {
          Main.Left1 x_s2pF -> Main.Left1 [x_s2pF];
          Main.Right1 y_s2pG ->
              let { sat_s2pH :: b_aOM = \u srt:SRT:[] [] ds_s2pC y_s2pG;
              } in  Main.Right1 [sat_s2pH];
        };

In the first case we have same expression in LHS and RHS, unlike the same expression in the Core level. Hm..

Last edited 23 months ago by osa1 (previous) (diff)

comment:21 Changed 23 months ago by nomeata

Quite right. If one of us can now find the corresponding ticket, then we can continue there, and not hijack the discussion here :-)... there it is: #9291.

comment:22 Changed 22 months ago by osa1

Simon, may I ask why you're inclined to think that it'd be better to do CSE earlier not later? Is it because of type safety reasons?

comment:23 Changed 22 months ago by osa1

I'm currently implementing CSE in STG level. I'm almost done implementing the infrastructure (TrieMap etc.). I'll update the thread once I get some results.

comment:24 Changed 21 months ago by thomie

Milestone: 8.0.1

comment:25 Changed 7 months ago by ezyang

Keywords: CSE added
Note: See TracTickets for help on using tickets.