Opened 4 years ago

Last modified 3 months ago

#9020 new bug

Massive blowup of code size on trivial program

Reported by: simonpj Owned by: nomeata
Priority: normal Milestone:
Component: Compiler Version: 7.8.2
Keywords: Inlining Cc: mail@…
Operating System: Unknown/Multiple Architecture: Unknown/Multiple
Type of failure: Compile-time performance bug Test Case: perf/compiler/T9020
Blocked By: Blocking:
Related Tickets: Differential Rev(s): Phab:D851
Wiki Page:

Description

The test simplCore/should_compile/simpl015 was leading to massive compile times. This is weird: it looks like:

main = do { return ()
          ; return ()
          ... hundreds more times ...
          ; return () }

It turns out that the cause was over-eager eta-expansion. Each return () gives a top-level PAP thus

lvl17 :: IO ()
lvl17 = returnIO ()

But these definitions are then eta-expanded, thus:

lvl17 = ((\eta. returnIO () |> sym g) eta) |> g

    where   g :: State# RealWorld -> (# State# RealWorld, () #)
                 ~
                 IO ()

Now in general it makes sense to eta-expand through newtypes (as is done here), because it exposes more lambadas. But it really doesn't make sense to eta-expand a PAP like this.

Fortunately the fix is easy: use exprArity rather than manifestArity in SimplUtils.tryEtaExpandRhs.

The effect on simpl015 is dramatic: compiler allocation drops from 6.6G to 812M; and residency drops from 1.8G to 45M.

Change History (15)

comment:1 Changed 3 years ago by nomeata

Cc: mail@… added

Did you also measure

comment:2 Changed 3 years ago by Simon Peyton Jones <simonpj@…>

In 79e46aea1643b4dfdc7c846bbefe06b83b535efd/ghc:

Don't eta-expand PAPs (fixes Trac #9020)

See Note [Do not eta-expand PAPs] in SimplUtils.  This has a tremendously
good effect on compile times for some simple benchmarks.

The test is now where it belongs, in perf/compiler/T9020 (instead of simpl015).

I did a nofib run and got essentially zero change except for cacheprof which
gets 4% more allocation.  I investigated.  Turns out that we have

    instance PP Reg where
       pp ppm ST_0 = "%st"
       pp ppm ST_1 = "%st(1)"
       pp ppm ST_2 = "%st(2)"
       pp ppm ST_3 = "%st(3)"
       pp ppm ST_4 = "%st(4)"
       pp ppm ST_5 = "%st(5)"
       pp ppm ST_6 = "%st(6)"
       pp ppm ST_7 = "%st(7)"
       pp ppm r    = "%" ++ map toLower (show r)

That (map toLower (show r) does a lot of map/toLowers.  But if we inline show
we get something like

       pp ppm ST_0 = "%st"
       pp ppm ST_1 = "%st(1)"
       pp ppm ST_2 = "%st(2)"
       pp ppm ST_3 = "%st(3)"
       pp ppm ST_4 = "%st(4)"
       pp ppm ST_5 = "%st(5)"
       pp ppm ST_6 = "%st(6)"
       pp ppm ST_7 = "%st(7)"
       pp ppm EAX  = map toLower (show EAX)
       pp ppm EBX  = map toLower (show EBX)
       ...etc...

and all those map/toLower calls can now be floated to top level.
This gives a 4% decrease in allocation.  But it depends on inlining
a pretty big 'show' function.

With this new patch we get slightly better eta-expansion, which makes
a function look slightly bigger, which just stops it being inlined.
The previous behaviour was luck, so I'm not going to worry about
losing it.

I've added some notes to nofib/Simon-nofib-notes

comment:3 Changed 3 years ago by simonpj

Resolution: fixed
Status: newclosed
Test Case: perf/compiler/T9020

comment:4 Changed 3 years ago by nomeata

I might have a different fix for this, that would also avoid implementing the strange heuristics in #10319:

In T9020, the problem is not really that return () is eta-expanded, but that it is eta-expaned in a phase where no inlining happens, namely in the gentle phase. If we do not eta-expand in this phase (by changing simpl_gently in SimplCore), we can revert to old_arity = manifestArity and this program still compiles quickly.

Doesn’t that look like the cleaner solution?

comment:5 in reply to:  4 Changed 3 years ago by simonpj

In T9020, the problem is not really that return () is eta-expanded, but that it is eta-expanded in a phase where no inlining happens, namely in the gentle phase. If we do not eta-expand in this phase (by changing simpl_gently in SimplCore), we can revert to old_arity = manifestArity and this program still compiles quickly.

Aha. That sounds cool. I buy.

  • NB that I am (separately) wanting to make some inlining happen in the gentle phase too...So the no-eta thing should depend on sm_inline rather than on the phase. That makes the reasoning clearer too.
  • Rather than muttering about PAPs, perhaps e can simply switch off eta-expansion altogether under these conditions?

comment:6 Changed 3 years ago by nomeata

Rather than muttering about PAPs, perhaps we can simply switch off eta-expansion altogether under these conditions?

With "these conditions", do you mean "gentle phase" a.k.a. "sm_inline = False"? If so, then that is precisely what I am proposing:

        -- initial simplify: mk specialiser happy: minimum effort please
    simpl_gently = CoreDoSimplify max_iter
                       (base_mode { sm_phase = InitialPhase
                                  , sm_names = ["Gentle"]
                                  , sm_rules = rules_on   -- Note [RULEs enabled in SimplGently]
                                  , sm_inline = False
                                  , sm_case_case = False
                                  , sm_eta_expand = False}) -- This line was added
                          -- Don't do case-of-case transformations.
                          -- This makes full laziness work better

We still need to distinguish between sm_inline and sm_eta_expand, as the former is always on in the base_mode, while the second one depends on Opt_DoLambdaEtaExpansion.

(Which means that this flag isn’t correctly named, as it affects all eta-expansion, not just lambda-eta-expansion.)

comment:7 Changed 3 years ago by nomeata

Here is some concrete code to talk about: Phab:D851

comment:8 Changed 3 years ago by simonpj

Resolution: fixed
Status: closednew

Re-opening.

comment:9 Changed 3 years ago by simonpj

Differential Rev(s): Phab:D851
Milestone: 7.12.1
Owner: set to nomeata

Thanks for working on this, Joachim.

comment:10 Changed 2 years ago by thomie

Type of failure: None/UnknownCompile-time performance bug

comment:11 Changed 2 years ago by thoughtpolice

Milestone: 7.12.18.0.1

Milestone renamed

comment:12 Changed 21 months ago by thomie

Milestone: 8.0.1

comment:13 Changed 15 months ago by mpickering

Keywords: Inlining added

comment:14 Changed 3 months ago by dfeuer

What's it going to take to put this to bed? It sounds like comment:6 suggests there may still be something to be done despite the early inline patch? It's not clear to me just what. For the record, this test case no longer seems to be in the test suite, but compiling something similar dropped from 5.5s in 7.8 to 1s in 7.10 and 8.0 to 0.8s in HEAD.

comment:15 Changed 3 months ago by simonpj

If it's good in HEAD, let's add a regression test and declare victory.

Note: See TracTickets for help on using tickets.