Opened 20 months ago

Last modified 4 months ago

#13390 new bug

String literal float-out during desugaring regresses T1969 at -O0

Reported by: bgamari Owned by:
Priority: normal Milestone: 8.8.1
Component: Compiler Version: 8.0.1
Keywords: strings Cc: dfeuer
Operating System: Unknown/Multiple Architecture: Unknown/Multiple
Type of failure: None/Unknown Test Case:
Blocked By: Blocking:
Related Tickets: Differential Rev(s):
Wiki Page:

Description (last modified by bgamari)

Phab:D1259, which teaches the compiler to aggressively float-out string literals during desugaring (namely DsMonad.mkStringExprFSAtTopLevel), regresses compiler allocations on T1969 by 15% or so at -O0.

The problem

In the case of T1969 (compiled with -O0) the difference is quite stark: with floating the non-optimizing simplifier pass produces {terms: 16,893, types: 7,552, coercions: 0, joins: 0/0}, without it produces {terms: 12,693, types: 4,552, coercions: 0, joins: 0/0}.

The (minimized) test looks like,

module T1969 where

class C a where
    c :: a -> String
    d :: a -> String
    d x = c x
    e :: a -> String
    e x = c x

data A1 = A1
instance C A1 where
    c A1 = "A1"

This reduced program simplifies to 261 terms and 127 types with float-out and 219 terms, 97 types without.

Post-desugar

The reason for the regression is in part due to the fact that we float out the unpackCString# "An" expression. That is, after desugaring without floating we get (looking at just the A1 bindings),

T1969.$dme :: forall a. C a => a -> String
T1969.$dme = \ (@ a_aM1) ($dC_a1h9 :: C a_aM1) (x_aM3 :: a_aM1) ->
      c @ a_aM1 $dC_a1h9 x_aM3

-- same as $dme
T1969.$dmd :: forall a. C a => a -> String
T1969.$dmd = \ (@ a_aM1) ($dC_a1h9 :: C a_aM1) (x_aM2 :: a_aM1) ->
      c @ a_aM1 $dC_a1h9 x_aM2

$cc_a1i7 :: A1 -> String
$cc_a1i7= \ (ds_d1jJ :: A1) -> case ds_d1jJ of { A1 -> GHC.CString.unpackCString# "A1"# }

Rec {
T1969.$fCA3 :: C A3
T1969.$fCA3 = T1969.C:C @ A3 $cc_a1hl $cd_a1hp $ce_a1hy

$ce_a1hy :: A3 -> String
$ce_a1hy = T1969.$dme @ A3 T1969.$fCA3

$cd_a1hp :: A3 -> String
$cd_a1hp = T1969.$dmd @ A3 T1969.$fCA3
end Rec }

Whereas with floating we get,

-- same as above
T1969.$dme :: forall a. C a => a -> String
T1969.$dmd :: forall a. C a => a -> String

ds_d1k4 :: [Char]
ds_d1k4 = GHC.CString.unpackCString# "A1"#

$cc_a1i7 :: A1 -> String
$cc_a1i7 = \ (ds_d1k3 :: A1) -> case ds_d1k3 of { A1 -> ds_d1k4 }

Rec {
T1969.$fCA1 :: C A1
T1969.$fCA1 = T1969.C:C @ A1 $cc_a1i7 $cd_a1ib $ce_a1ik

$ce_a1ik :: A1 -> String
$ce_a1ik = T1969.$dme @ A1 T1969.$fCA1

$cd_a1ib :: A1 -> String
$cd_a1ib = T1969.$dmd @ A1 T1969.$fCA1
end Rec }

So far things aren't so bad: the only interesting difference is the floated [Char], which we would expect. However, let's then see what happens during simplification.

Post-simplification

Without floating we see,

T1969.$fCA1 :: C A1
T1969.$fCA1 = T1969.C:C @ A1 $cc_a1i7 $cc_a1i7 $cc_a1i7

$cc_a1i7 :: A1 -> String
$cc_a1i7
  = \ (ds_d1jJ :: A1) ->
      case ds_d1jJ of { A1 -> GHC.CString.unpackCString# "A1"# }

Whereas with floating we have,

ds_d1k4 :: [Char]
ds_d1k4 = GHC.CString.unpackCString# "A1"#

$cc_a1i7 :: A1 -> String
$cc_a1i7 = \ (ds_d1k3 :: A1) -> case ds_d1k3 of { A1 -> ds_d1k4 }

$cd_a1ib :: A1 -> String
$cd_a1ib = \ (x_aM2 :: A1) -> case x_aM2 of { A1 -> ds_d1k4 }

$ce_a1ik :: A1 -> String
$ce_a1ik = \ (x_aM3 :: A1) -> case x_aM3 of { A1 -> ds_d1k4 }

T1969.$fCA1 :: C A1
T1969.$fCA1 = T1969.C:C @ A1 $cc_a1i7 $cd_a1ib $ce_a1ik

This is quite interesting: without floating we are somehow able to collapse each of the A1 -> String bindings into a single binding (despite CSE being disabled due to -O0!).

Change History (17)

comment:1 Changed 20 months ago by bgamari

Description: modified (diff)

comment:2 Changed 19 months ago by bgamari

Milestone: 8.2.18.4.1

Given that 8.2.1-rc1 is imminent, I'm bumping these off to the 8.4

comment:3 Changed 18 months ago by dfeuer

Commentary/Compiler/Core2CorePipeline indicates that in desugaring we only inline "non-recursive bindings that are used only once or where the RHS is trivial". I don't see how to understand what Ben showed based on that. But if a little more inlining is happening for some reason, here's one potential story.

Without floating, imagine that the (default-derived) method definitions are somehow inlined into the dictionary, and then $dme and $dmd are inlined. Then everything squashes down nicely.

With floating, something else inexplicable seems to be happening: the default definitions inline into $ce_a1ik and $cd_a1ib, and somehow $cc_a1i7 (which is now really small, but surprisingly non-trivial) inlines into them.

Glancing at the notes and code for the gentle simplification, I do not understand why any of these inlinings are happening, but maybe someone else does. Or maybe I'm completely misinterpreting. I need to build both commits and dump inlinings.

comment:4 Changed 18 months ago by dfeuer

Oh, I got mixed up. There is inlining in the "Non-opt simplification". So I'm pretty confident the inliner is the main player here.

comment:5 Changed 18 months ago by dfeuer

OK, so I guess the reason for this is pretty clear: with floating, $cc_a1i7 looks small enough to inline (even at -O0), and it inlines, and so we then lose the "phantom CSE" that we got from inlining the defaults instead. It's not clear to me that there's much we could do that would obviously improve matters in general.

comment:6 Changed 18 months ago by dfeuer

Cc: dfeuer added

comment:7 Changed 18 months ago by bgamari

It wasn't entirely clear to me what "phantom CSE" meant in comment:5. dfeuer clarifies:

there's no actual collapse. Those bindings are just *inlined*, and so they aren't needed anymore.

comment:8 Changed 17 months ago by simonpj

I tried the example in the Description to try to get to the bottom of this. With HEAD and no -O I get

-------------- Class-op selectors for c,d,e ----------
c :: forall a. C a => a -> String
[GblId[ClassOp],
 Arity=1,
 Caf=NoCafRefs,
 Str=<S(SLL),U(U,A,A)>,
 RULES: Built in rule for c: "Class op c"]
c = \ (@ a_aop) (v_B1 :: C a_aop) ->
      case v_B1 of v_B1 { T1969.C:C v_B2 v_B3 v_B4 -> v_B2 }

-- .....and similarly for 'd', 'e'


-------------- Default methods for d,e ----------
T1969.$dmd :: forall a. C a => a -> String
[GblId, Arity=2, Caf=NoCafRefs]
T1969.$dmd
  = \ (@ a_aop) ($dC_aT1 :: C a_aop) (x_aoq :: a_aop) ->
      c @ a_aop $dC_aT1 x_aoq

-- .....and similary for 'e'


-------------- Dictionary for (C A1) -----------
$cc_rUv :: A1 -> String
[GblId, Arity=1]
$cc_rUv
  = \ (ds_dUq :: A1) ->
      case ds_dUq of { A1 -> GHC.CString.unpackCString# "A1"# }

T1969.$fCA1 [InlPrag=CONLIKE] :: C A1
[GblId[DFunId]]
T1969.$fCA1 = T1969.C:C @ A1 $cc_rUv $cc_rUv $cc_rUv

This looks absolutely fine to me.

What is the problem we are trying to solve here? Maybe it's solved already? (In which case can we just make sure that T1969 at -O0 is a regression test?)

comment:9 Changed 17 months ago by dfeuer

Simon, HEAD is fine. The trouble shows up with the string literal floating patch, which has not been merged.

comment:10 Changed 17 months ago by dfeuer

All right. I took another look at what's going on in HEAD (without the string literal patch). It looks like I missed one piece: rules. So here's something more like the real story. We start out with

$cc_aT4 :: A1 -> String
$cc_aT4
  = \ (ds_dUh :: A1) ->
      case ds_dUh of { A1 -> GHC.CString.unpackCString# "A1"# }

T1969.$dme :: forall a. C a => a -> String
T1969.$dme
  = \ (@ a_aog) ($dC_aSS :: C a_aog) (x_aoi :: a_aog) ->
      c @ a_aog $dC_aSS x_aoi

T1969.$dmd :: forall a. C a => a -> String
T1969.$dmd
  = \ (@ a_aog) ($dC_aSS :: C a_aog) (x_aoh :: a_aog) ->
      c @ a_aog $dC_aSS x_aoh

Rec {
T1969.$fCA1 :: C A1
T1969.$fCA1 = T1969.C:C @ A1 $cc_aT4 $cd_aT8 $ce_aTf

$ce_aTf :: A1 -> String
$ce_aTf = T1969.$dme @ A1 T1969.$fCA1

$cd_aT8 :: A1 -> String
$cd_aT8 = T1969.$dmd @ A1 T1969.$fCA1
end Rec }

Then $dme inlines, producing

$ce_aTf :: A1 -> String
$ce_aTf = (\ (@ a_aog) ($dC_aSS :: C a_aog) (x_aoi :: a_aog) ->
      c @ a_aog $dC_aSS x_aoi) @ A1 T1969.$fCA1

which reduces to

$ce_aTf :: A1 -> String
$ce_aTf = \ (x_aoi :: A1) ->
      c @ A1 T1969.$fCA1 x_aoi

Then the class op rule for c @A1 fires, turning this into

$ce_aTf :: A1 -> String
$ce_aTf = \ (x_aoi :: A1) -> $cc_aT4 x_aoi

The same thing happens to $dmd and $cd_aT8. At some point, I believe both ce_aTf and cd_aT8 must both get eta-reduced to $cc_aT4. -ddump-inlinings on its own doesn't show any further inlining, but -dverbose-core2core (if I'm reading it right) shows that the eta-reduced versions are indeed inlined into the constructor. These are trivial inlinings, replacing one binding with another, which should get around your statement that we don't (and don't want to) inline into constructor arguments.

Last edited 17 months ago by dfeuer (previous) (diff)

comment:11 Changed 17 months ago by dfeuer

I finally built the patched version. As best I can tell, the story starts the same. That is, we get the same inlining of $dme and $dmd, and the same class op rules firing, but then $cc_aT4 is getting inlined into $ce_aTf and $cd_aT8 instead of allowing those functions to eta-reduce away. So I guess maybe we actually can make a useful change: perhaps we want to check for eta-reduction opportunities before considering inlining. There's not much point inlining a function application into the body of a lambda when we can instead eliminate the lambda (and the application) altogether.

comment:12 Changed 17 months ago by dfeuer

bgamari asked me to summarize my current best guess very briefly, so here goes. I believe the basic problem when the string floating patch is applied is that we're seeing something like

\x -> c x

and deciding to inline c into the body of the lambda. In this case, at least, it would be better to eta reduce instead. I'm not sure that it would ever be advantageous to inline a function into such a removable lambda. I can't think of such a situation myself, but that doesn't mean much.

comment:13 Changed 17 months ago by bgamari

Summary: Strict literal float-out during desugaring regresses T1969 at -O0String literal float-out during desugaring regresses T1969 at -O0

comment:14 Changed 14 months ago by bgamari

Keywords: strings added

comment:15 Changed 9 months ago by bgamari

Milestone: 8.4.18.6.1

This ticket won't be resolved in 8.4; remilestoning for 8.6. Do holler if you are affected by this or would otherwise like to work on it.

comment:16 Changed 4 months ago by bgamari

This will not be addressed in GHC 8.6.

comment:17 Changed 4 months ago by bgamari

Milestone: 8.6.18.8.1

These will not be addressed in GHC 8.6.

Note: See TracTickets for help on using tickets.