Opened 17 months ago

Last modified 5 weeks ago

#9020 new bug

Massive blowup of code size on trivial program

Reported by: simonpj Owned by: nomeata
Priority: normal Milestone: 7.12.1
Component: Compiler Version: 7.8.2
Keywords: Cc: mail@…
Operating System: Unknown/Multiple Architecture: Unknown/Multiple
Type of failure: Compile-time performance bug Test Case: perf/compiler/T9020
Blocked By: Blocking:
Related Tickets: Differential Revisions: Phab:D851

Description

The test simplCore/should_compile/simpl015 was leading to massive compile times. This is weird: it looks like:

main = do { return ()
          ; return ()
          ... hundreds more times ...
          ; return () }

It turns out that the cause was over-eager eta-expansion. Each return () gives a top-level PAP thus

lvl17 :: IO ()
lvl17 = returnIO ()

But these definitions are then eta-expanded, thus:

lvl17 = ((\eta. returnIO () |> sym g) eta) |> g

    where   g :: State# RealWorld -> (# State# RealWorld, () #)
                 ~
                 IO ()

Now in general it makes sense to eta-expand through newtypes (as is done here), because it exposes more lambadas. But it really doesn't make sense to eta-expand a PAP like this.

Fortunately the fix is easy: use exprArity rather than manifestArity in SimplUtils.tryEtaExpandRhs.

The effect on simpl015 is dramatic: compiler allocation drops from 6.6G to 812M; and residency drops from 1.8G to 45M.

Change History (10)

comment:1 Changed 17 months ago by nomeata

  • Cc mail@… added

Did you also measure

comment:2 Changed 17 months ago by Simon Peyton Jones <simonpj@…>

In 79e46aea1643b4dfdc7c846bbefe06b83b535efd/ghc:

Don't eta-expand PAPs (fixes Trac #9020)

See Note [Do not eta-expand PAPs] in SimplUtils.  This has a tremendously
good effect on compile times for some simple benchmarks.

The test is now where it belongs, in perf/compiler/T9020 (instead of simpl015).

I did a nofib run and got essentially zero change except for cacheprof which
gets 4% more allocation.  I investigated.  Turns out that we have

    instance PP Reg where
       pp ppm ST_0 = "%st"
       pp ppm ST_1 = "%st(1)"
       pp ppm ST_2 = "%st(2)"
       pp ppm ST_3 = "%st(3)"
       pp ppm ST_4 = "%st(4)"
       pp ppm ST_5 = "%st(5)"
       pp ppm ST_6 = "%st(6)"
       pp ppm ST_7 = "%st(7)"
       pp ppm r    = "%" ++ map toLower (show r)

That (map toLower (show r) does a lot of map/toLowers.  But if we inline show
we get something like

       pp ppm ST_0 = "%st"
       pp ppm ST_1 = "%st(1)"
       pp ppm ST_2 = "%st(2)"
       pp ppm ST_3 = "%st(3)"
       pp ppm ST_4 = "%st(4)"
       pp ppm ST_5 = "%st(5)"
       pp ppm ST_6 = "%st(6)"
       pp ppm ST_7 = "%st(7)"
       pp ppm EAX  = map toLower (show EAX)
       pp ppm EBX  = map toLower (show EBX)
       ...etc...

and all those map/toLower calls can now be floated to top level.
This gives a 4% decrease in allocation.  But it depends on inlining
a pretty big 'show' function.

With this new patch we get slightly better eta-expansion, which makes
a function look slightly bigger, which just stops it being inlined.
The previous behaviour was luck, so I'm not going to worry about
losing it.

I've added some notes to nofib/Simon-nofib-notes

comment:3 Changed 17 months ago by simonpj

  • Resolution set to fixed
  • Status changed from new to closed
  • Test Case set to perf/compiler/T9020

comment:4 follow-up: Changed 5 months ago by nomeata

I might have a different fix for this, that would also avoid implementing the strange heuristics in #10319:

In T9020, the problem is not really that return () is eta-expanded, but that it is eta-expaned in a phase where no inlining happens, namely in the gentle phase. If we do not eta-expand in this phase (by changing simpl_gently in SimplCore), we can revert to old_arity = manifestArity and this program still compiles quickly.

Doesn’t that look like the cleaner solution?

comment:5 in reply to: ↑ 4 Changed 5 months ago by simonpj

In T9020, the problem is not really that return () is eta-expanded, but that it is eta-expanded in a phase where no inlining happens, namely in the gentle phase. If we do not eta-expand in this phase (by changing simpl_gently in SimplCore), we can revert to old_arity = manifestArity and this program still compiles quickly.

Aha. That sounds cool. I buy.

  • NB that I am (separately) wanting to make some inlining happen in the gentle phase too...So the no-eta thing should depend on sm_inline rather than on the phase. That makes the reasoning clearer too.
  • Rather than muttering about PAPs, perhaps e can simply switch off eta-expansion altogether under these conditions?

comment:6 Changed 5 months ago by nomeata

Rather than muttering about PAPs, perhaps we can simply switch off eta-expansion altogether under these conditions?

With "these conditions", do you mean "gentle phase" a.k.a. "sm_inline = False"? If so, then that is precisely what I am proposing:

        -- initial simplify: mk specialiser happy: minimum effort please
    simpl_gently = CoreDoSimplify max_iter
                       (base_mode { sm_phase = InitialPhase
                                  , sm_names = ["Gentle"]
                                  , sm_rules = rules_on   -- Note [RULEs enabled in SimplGently]
                                  , sm_inline = False
                                  , sm_case_case = False
                                  , sm_eta_expand = False}) -- This line was added
                          -- Don't do case-of-case transformations.
                          -- This makes full laziness work better

We still need to distinguish between sm_inline and sm_eta_expand, as the former is always on in the base_mode, while the second one depends on Opt_DoLambdaEtaExpansion.

(Which means that this flag isn’t correctly named, as it affects all eta-expansion, not just lambda-eta-expansion.)

comment:7 Changed 5 months ago by nomeata

Here is some concrete code to talk about: Phab:D851

comment:8 Changed 4 months ago by simonpj

  • Resolution fixed deleted
  • Status changed from closed to new

Re-opening.

comment:9 Changed 4 months ago by simonpj

  • Differential Revisions set to Phab:D851
  • Milestone set to 7.12.1
  • Owner set to nomeata

Thanks for working on this, Joachim.

comment:10 Changed 5 weeks ago by thomie

  • Type of failure changed from None/Unknown to Compile-time performance bug
Note: See TracTickets for help on using tickets.