#9020 closed bug (fixed)

Massive blowup of code size on trivial program

Reported by: simonpj Owned by:
Priority: normal Milestone:
Component: Compiler Version: 7.8.2
Keywords: Cc: mail@…
Operating System: Unknown/Multiple Architecture: Unknown/Multiple
Type of failure: None/Unknown Test Case: perf/compiler/T9020
Blocked By: Blocking:
Related Tickets: Differential Revisions:

Description

The test simplCore/should_compile/simpl015 was leading to massive compile times. This is weird: it looks like:

main = do { return ()
          ; return ()
          ... hundreds more times ...
          ; return () }

It turns out that the cause was over-eager eta-expansion. Each return () gives a top-level PAP thus

lvl17 :: IO ()
lvl17 = returnIO ()

But these definitions are then eta-expanded, thus:

lvl17 = ((\eta. returnIO () |> sym g) eta) |> g

    where   g :: State# RealWorld -> (# State# RealWorld, () #)
                 ~
                 IO ()

Now in general it makes sense to eta-expand through newtypes (as is done here), because it exposes more lambadas. But it really doesn't make sense to eta-expand a PAP like this.

Fortunately the fix is easy: use exprArity rather than manifestArity in SimplUtils.tryEtaExpandRhs.

The effect on simpl015 is dramatic: compiler allocation drops from 6.6G to 812M; and residency drops from 1.8G to 45M.

Change History (3)

comment:1 Changed 11 months ago by nomeata

  • Cc mail@… added

Did you also measure

comment:2 Changed 11 months ago by Simon Peyton Jones <simonpj@…>

In 79e46aea1643b4dfdc7c846bbefe06b83b535efd/ghc:

Don't eta-expand PAPs (fixes Trac #9020)

See Note [Do not eta-expand PAPs] in SimplUtils.  This has a tremendously
good effect on compile times for some simple benchmarks.

The test is now where it belongs, in perf/compiler/T9020 (instead of simpl015).

I did a nofib run and got essentially zero change except for cacheprof which
gets 4% more allocation.  I investigated.  Turns out that we have

    instance PP Reg where
       pp ppm ST_0 = "%st"
       pp ppm ST_1 = "%st(1)"
       pp ppm ST_2 = "%st(2)"
       pp ppm ST_3 = "%st(3)"
       pp ppm ST_4 = "%st(4)"
       pp ppm ST_5 = "%st(5)"
       pp ppm ST_6 = "%st(6)"
       pp ppm ST_7 = "%st(7)"
       pp ppm r    = "%" ++ map toLower (show r)

That (map toLower (show r) does a lot of map/toLowers.  But if we inline show
we get something like

       pp ppm ST_0 = "%st"
       pp ppm ST_1 = "%st(1)"
       pp ppm ST_2 = "%st(2)"
       pp ppm ST_3 = "%st(3)"
       pp ppm ST_4 = "%st(4)"
       pp ppm ST_5 = "%st(5)"
       pp ppm ST_6 = "%st(6)"
       pp ppm ST_7 = "%st(7)"
       pp ppm EAX  = map toLower (show EAX)
       pp ppm EBX  = map toLower (show EBX)
       ...etc...

and all those map/toLower calls can now be floated to top level.
This gives a 4% decrease in allocation.  But it depends on inlining
a pretty big 'show' function.

With this new patch we get slightly better eta-expansion, which makes
a function look slightly bigger, which just stops it being inlined.
The previous behaviour was luck, so I'm not going to worry about
losing it.

I've added some notes to nofib/Simon-nofib-notes

comment:3 Changed 11 months ago by simonpj

  • Resolution set to fixed
  • Status changed from new to closed
  • Test Case set to perf/compiler/T9020
Note: See TracTickets for help on using tickets.