Opened 2 years ago
Last modified 8 weeks ago
#13390 new bug
String literal float-out during desugaring regresses T1969 at -O0
Reported by: | bgamari | Owned by: | |
---|---|---|---|
Priority: | normal | Milestone: | 8.10.1 |
Component: | Compiler | Version: | 8.0.1 |
Keywords: | strings | Cc: | dfeuer |
Operating System: | Unknown/Multiple | Architecture: | Unknown/Multiple |
Type of failure: | None/Unknown | Test Case: | |
Blocked By: | Blocking: | ||
Related Tickets: | Differential Rev(s): | ||
Wiki Page: |
Description (last modified by )
Phab:D1259, which teaches the compiler to aggressively float-out string literals during desugaring (namely DsMonad.mkStringExprFSAtTopLevel
), regresses compiler allocations on T1969
by 15% or so at -O0
.
The problem
In the case of T1969
(compiled with -O0
) the difference is quite stark: with floating the non-optimizing simplifier pass produces {terms: 16,893, types: 7,552, coercions: 0, joins: 0/0}
, without it produces {terms: 12,693, types: 4,552, coercions: 0, joins: 0/0}
.
The (minimized) test looks like,
module T1969 where class C a where c :: a -> String d :: a -> String d x = c x e :: a -> String e x = c x data A1 = A1 instance C A1 where c A1 = "A1"
This reduced program simplifies to 261 terms and 127 types with float-out and 219 terms, 97 types without.
Post-desugar
The reason for the regression is in part due to the fact that we float out the unpackCString# "An"
expression. That is, after desugaring without floating we get (looking at just the A1
bindings),
T1969.$dme :: forall a. C a => a -> String T1969.$dme = \ (@ a_aM1) ($dC_a1h9 :: C a_aM1) (x_aM3 :: a_aM1) -> c @ a_aM1 $dC_a1h9 x_aM3 -- same as $dme T1969.$dmd :: forall a. C a => a -> String T1969.$dmd = \ (@ a_aM1) ($dC_a1h9 :: C a_aM1) (x_aM2 :: a_aM1) -> c @ a_aM1 $dC_a1h9 x_aM2 $cc_a1i7 :: A1 -> String $cc_a1i7= \ (ds_d1jJ :: A1) -> case ds_d1jJ of { A1 -> GHC.CString.unpackCString# "A1"# } Rec { T1969.$fCA3 :: C A3 T1969.$fCA3 = T1969.C:C @ A3 $cc_a1hl $cd_a1hp $ce_a1hy $ce_a1hy :: A3 -> String $ce_a1hy = T1969.$dme @ A3 T1969.$fCA3 $cd_a1hp :: A3 -> String $cd_a1hp = T1969.$dmd @ A3 T1969.$fCA3 end Rec }
Whereas with floating we get,
-- same as above T1969.$dme :: forall a. C a => a -> String T1969.$dmd :: forall a. C a => a -> String ds_d1k4 :: [Char] ds_d1k4 = GHC.CString.unpackCString# "A1"# $cc_a1i7 :: A1 -> String $cc_a1i7 = \ (ds_d1k3 :: A1) -> case ds_d1k3 of { A1 -> ds_d1k4 } Rec { T1969.$fCA1 :: C A1 T1969.$fCA1 = T1969.C:C @ A1 $cc_a1i7 $cd_a1ib $ce_a1ik $ce_a1ik :: A1 -> String $ce_a1ik = T1969.$dme @ A1 T1969.$fCA1 $cd_a1ib :: A1 -> String $cd_a1ib = T1969.$dmd @ A1 T1969.$fCA1 end Rec }
So far things aren't so bad: the only interesting difference is the floated [Char]
, which we would expect. However, let's then see what happens during simplification.
Post-simplification
Without floating we see,
T1969.$fCA1 :: C A1 T1969.$fCA1 = T1969.C:C @ A1 $cc_a1i7 $cc_a1i7 $cc_a1i7 $cc_a1i7 :: A1 -> String $cc_a1i7 = \ (ds_d1jJ :: A1) -> case ds_d1jJ of { A1 -> GHC.CString.unpackCString# "A1"# }
Whereas with floating we have,
ds_d1k4 :: [Char] ds_d1k4 = GHC.CString.unpackCString# "A1"# $cc_a1i7 :: A1 -> String $cc_a1i7 = \ (ds_d1k3 :: A1) -> case ds_d1k3 of { A1 -> ds_d1k4 } $cd_a1ib :: A1 -> String $cd_a1ib = \ (x_aM2 :: A1) -> case x_aM2 of { A1 -> ds_d1k4 } $ce_a1ik :: A1 -> String $ce_a1ik = \ (x_aM3 :: A1) -> case x_aM3 of { A1 -> ds_d1k4 } T1969.$fCA1 :: C A1 T1969.$fCA1 = T1969.C:C @ A1 $cc_a1i7 $cd_a1ib $ce_a1ik
This is quite interesting: without floating we are somehow able to collapse each of the A1 -> String
bindings into a single binding (despite CSE being disabled due to -O0
!).
Change History (18)
comment:1 Changed 2 years ago by
Description: | modified (diff) |
---|
comment:2 Changed 23 months ago by
Milestone: | 8.2.1 → 8.4.1 |
---|
comment:3 Changed 22 months ago by
Commentary/Compiler/Core2CorePipeline indicates that in desugaring we only inline "non-recursive bindings that are used only once or where the RHS is trivial". I don't see how to understand what Ben showed based on that. But if a little more inlining is happening for some reason, here's one potential story.
Without floating, imagine that the (default-derived) method definitions are somehow inlined into the dictionary, and then $dme
and $dmd
are inlined. Then everything squashes down nicely.
With floating, something else inexplicable seems to be happening: the default definitions inline into $ce_a1ik
and $cd_a1ib
, and somehow $cc_a1i7
(which is now really small, but surprisingly non-trivial) inlines into them.
Glancing at the notes and code for the gentle simplification, I do not understand why any of these inlinings are happening, but maybe someone else does. Or maybe I'm completely misinterpreting. I need to build both commits and dump inlinings.
comment:4 Changed 22 months ago by
Oh, I got mixed up. There is inlining in the "Non-opt simplification". So I'm pretty confident the inliner is the main player here.
comment:5 Changed 22 months ago by
OK, so I guess the reason for this is pretty clear: with floating, $cc_a1i7
looks small enough to inline (even at -O0
), and it inlines, and so we then lose the "phantom CSE" that we got from inlining the defaults instead. It's not clear to me that there's much we could do that would obviously improve matters in general.
comment:6 Changed 22 months ago by
Cc: | dfeuer added |
---|
comment:7 Changed 22 months ago by
It wasn't entirely clear to me what "phantom CSE" meant in comment:5. dfeuer clarifies:
there's no actual collapse. Those bindings are just *inlined*, and so they aren't needed anymore.
comment:8 Changed 22 months ago by
I tried the example in the Description to try to get to the bottom of this. With HEAD and no -O I get
-------------- Class-op selectors for c,d,e ---------- c :: forall a. C a => a -> String [GblId[ClassOp], Arity=1, Caf=NoCafRefs, Str=<S(SLL),U(U,A,A)>, RULES: Built in rule for c: "Class op c"] c = \ (@ a_aop) (v_B1 :: C a_aop) -> case v_B1 of v_B1 { T1969.C:C v_B2 v_B3 v_B4 -> v_B2 } -- .....and similarly for 'd', 'e' -------------- Default methods for d,e ---------- T1969.$dmd :: forall a. C a => a -> String [GblId, Arity=2, Caf=NoCafRefs] T1969.$dmd = \ (@ a_aop) ($dC_aT1 :: C a_aop) (x_aoq :: a_aop) -> c @ a_aop $dC_aT1 x_aoq -- .....and similary for 'e' -------------- Dictionary for (C A1) ----------- $cc_rUv :: A1 -> String [GblId, Arity=1] $cc_rUv = \ (ds_dUq :: A1) -> case ds_dUq of { A1 -> GHC.CString.unpackCString# "A1"# } T1969.$fCA1 [InlPrag=CONLIKE] :: C A1 [GblId[DFunId]] T1969.$fCA1 = T1969.C:C @ A1 $cc_rUv $cc_rUv $cc_rUv
This looks absolutely fine to me.
What is the problem we are trying to solve here? Maybe it's solved already? (In which case can we just make sure that T1969 at -O0
is a regression test?)
comment:9 Changed 22 months ago by
Simon, HEAD is fine. The trouble shows up with the string literal floating patch, which has not been merged.
comment:10 Changed 22 months ago by
All right. I took another look at what's going on in HEAD (without the string literal patch). It looks like I missed one piece: rules. So here's something more like the real story. We start out with
$cc_aT4 :: A1 -> String $cc_aT4 = \ (ds_dUh :: A1) -> case ds_dUh of { A1 -> GHC.CString.unpackCString# "A1"# } T1969.$dme :: forall a. C a => a -> String T1969.$dme = \ (@ a_aog) ($dC_aSS :: C a_aog) (x_aoi :: a_aog) -> c @ a_aog $dC_aSS x_aoi T1969.$dmd :: forall a. C a => a -> String T1969.$dmd = \ (@ a_aog) ($dC_aSS :: C a_aog) (x_aoh :: a_aog) -> c @ a_aog $dC_aSS x_aoh Rec { T1969.$fCA1 :: C A1 T1969.$fCA1 = T1969.C:C @ A1 $cc_aT4 $cd_aT8 $ce_aTf $ce_aTf :: A1 -> String $ce_aTf = T1969.$dme @ A1 T1969.$fCA1 $cd_aT8 :: A1 -> String $cd_aT8 = T1969.$dmd @ A1 T1969.$fCA1 end Rec }
Then $dme
inlines, producing
$ce_aTf :: A1 -> String $ce_aTf = (\ (@ a_aog) ($dC_aSS :: C a_aog) (x_aoi :: a_aog) -> c @ a_aog $dC_aSS x_aoi) @ A1 T1969.$fCA1
which reduces to
$ce_aTf :: A1 -> String $ce_aTf = \ (x_aoi :: A1) -> c @ A1 T1969.$fCA1 x_aoi
Then the class op rule for c @A1
fires, turning this into
$ce_aTf :: A1 -> String $ce_aTf = \ (x_aoi :: A1) -> $cc_aT4 x_aoi
The same thing happens to $dmd
and $cd_aT8
. At some point, I believe both ce_aTf
and cd_aT8
must both get eta-reduced to $cc_aT4
. -ddump-inlinings
on its own doesn't show any further inlining, but -dverbose-core2core
(if I'm reading it right) shows that the eta-reduced versions are indeed inlined into the constructor. These are trivial inlinings, replacing one binding with another, which should get around your statement that we don't (and don't want to) inline into constructor arguments.
comment:11 Changed 22 months ago by
I finally built the patched version. As best I can tell, the story starts the same. That is, we get the same inlining of $dme
and $dmd
, and the same class op rules firing, but then $cc_aT4
is getting inlined into $ce_aTf
and $cd_aT8
instead of allowing those functions to eta-reduce away. So I guess maybe we actually can make a useful change: perhaps we want to check for eta-reduction opportunities before considering inlining. There's not much point inlining a function application into the body of a lambda when we can instead eliminate the lambda (and the application) altogether.
comment:12 Changed 22 months ago by
bgamari asked me to summarize my current best guess very briefly, so here goes. I believe the basic problem when the string floating patch is applied is that we're seeing something like
\x -> c x
and deciding to inline c
into the body of the lambda. In this case, at least, it would be better to eta reduce instead. I'm not sure that it would ever be advantageous to inline a function into such a removable lambda. I can't think of such a situation myself, but that doesn't mean much.
comment:13 Changed 21 months ago by
Summary: | Strict literal float-out during desugaring regresses T1969 at -O0 → String literal float-out during desugaring regresses T1969 at -O0 |
---|
comment:14 Changed 18 months ago by
Keywords: | strings added |
---|
comment:15 Changed 13 months ago by
Milestone: | 8.4.1 → 8.6.1 |
---|
This ticket won't be resolved in 8.4; remilestoning for 8.6. Do holler if you are affected by this or would otherwise like to work on it.
comment:18 Changed 8 weeks ago by
Milestone: | 8.8.1 → 8.10.1 |
---|
Bumping milestones of low-priority tickets.
Given that 8.2.1-rc1 is imminent, I'm bumping these off to the 8.4