Opened 10 months ago

Last modified 9 months ago

#9233 new bug

Compiler performance regression

Reported by: augustss Owned by:
Priority: low Milestone:
Component: Compiler Version: 7.8.2
Keywords: Cc:
Operating System: Windows Architecture: Unknown/Multiple
Type of failure: None/Unknown Test Case:
Blocked By: Blocking:
Related Tickets: Differential Revisions: Phab:D73

Description (last modified by augustss)

Compiling all of our Haskell modules (about 1300 modules, 310 kLOC) takes about 30% longer with ghc 7.8 than it did with ghc 7.6. (Using 32-bit Windows.)

Sorry, I can't really provide any simple test case.

Attachments (3)

slow.tar.gz (9.6 KB) - added by augustss 10 months ago.
Sample slow compilation
th-desugar.tar.gz (12.7 KB) - added by goldfire 10 months ago.
Richard's example of slow compilation
minimized.tar.gz (724 bytes) - added by rwbarton 10 months ago.
extracted from Lennart's example

Download all attachments as: .zip

Change History (27)

comment:1 Changed 10 months ago by simonpj

I would love it if someone could do some profiling on GHC to see what is going on.

Generally, we are weak on profiling the compiler. For example, I suspect there are easily-fixed space leaks.

But someone has to roll up their sleeves and look.

Simon

comment:2 Changed 10 months ago by augustss

  • Description modified (diff)

If I have some time I'll try to find a single module where the compilation has slowed down, and see if I can profile that. Is there a way to get hold of a profiling ghc binary? Compiling ghc on Windows always takes a few days to get working.

Last edited 10 months ago by augustss (previous) (diff)

comment:3 Changed 10 months ago by simonpj

I guess that Austin might be able to put up a Windows version of GHC 7.8 ( plus I suppose 7.6) that has profiling enabled. You would not want to run that regularly, but it would mean you could compile both modules with a compiler that dumps its profile.

Are you using -O2? The SpecConstr thing can make a pretty big difference. You could try switching it off globally and see what happens. Using -dshow-passes shows you the size of each intermediate can also be illuminating, because it shows size blow-ups clearly. Probably worth a try before the profiled-compiler stuff.

Simon

comment:4 Changed 10 months ago by augustss

We are using -O2. I'll try switching off SpecConstr and see what happens.

comment:5 Changed 10 months ago by simonpj

Also see #8852, which I still have to investigate. Do you use attoparsec?

Simon

comment:6 Changed 10 months ago by augustss

We have attoparsec installed, but none of the 1300 modules directly imports anything from it.

comment:7 Changed 10 months ago by augustss

I turned off SpecConstr and the compile times remained pretty much the same. Total compile time with ghc-7.6 is 5041s and with ghc-7.8 it is 6963s, so 38% longer. This is compiling 853 modules, out of which 5 modules compiled faster with the new ghc and the rest are slower, some taking almost 3 times as long.

I will see if I can isolate one of these module and include in this bug report.

Changed 10 months ago by augustss

Sample slow compilation

comment:8 Changed 10 months ago by augustss

I've attached an example. Run the command

ghc --make -O2 -fno-spec-constr Options.hs

On my machine it takes 17s with ghc-7.6 and 38s with ghc-7.8.

comment:9 Changed 10 months ago by goldfire

I, too, have noticed that GHC has gotten slower to compile between 7.6.3 and 7.8.2. And, I have a nice file that demonstrates the slowness.

First, 7.6.3:

rae:10:40:52 ~/temp/th-desugar> ghc --version
The Glorious Glasgow Haskell Compilation System, version 7.6.3
rae:10:43:47 ~/temp/th-desugar> time ghc Language/Haskell/TH/Desugar/Core.hs
[1 of 2] Compiling Language.Haskell.TH.Desugar.Util ( Language/Haskell/TH/Desugar/Util.hs, Language/Haskell/TH/Desugar/Util.o )
[2 of 2] Compiling Language.Haskell.TH.Desugar.Core ( Language/Haskell/TH/Desugar/Core.hs, Language/Haskell/TH/Desugar/Core.o )

real	0m2.674s
user	0m2.512s
sys	0m0.131s
rae:10:43:53 ~/temp/th-desugar> rm -f `find . -name '*.o' -or -name '*.hi' -or -name '*.dyn*'`
rae:10:43:58 ~/temp/th-desugar> time ghc Language/Haskell/TH/Desugar/Core.hs -O2
[1 of 2] Compiling Language.Haskell.TH.Desugar.Util ( Language/Haskell/TH/Desugar/Util.hs, Language/Haskell/TH/Desugar/Util.o )
[2 of 2] Compiling Language.Haskell.TH.Desugar.Core ( Language/Haskell/TH/Desugar/Core.hs, Language/Haskell/TH/Desugar/Core.o )

real	0m9.089s
user	0m8.648s
sys	0m0.241s
rae:10:44:13 ~/temp/th-desugar> rm -f `find . -name '*.o' -or -name '*.hi' -or -name '*.dyn*'`
rae:10:44:17 ~/temp/th-desugar> time ghc Language/Haskell/TH/Desugar/Core.hs -O2 -fno-spec-constr
[1 of 2] Compiling Language.Haskell.TH.Desugar.Util ( Language/Haskell/TH/Desugar/Util.hs, Language/Haskell/TH/Desugar/Util.o )
[2 of 2] Compiling Language.Haskell.TH.Desugar.Core ( Language/Haskell/TH/Desugar/Core.hs, Language/Haskell/TH/Desugar/Core.o )

real	0m9.351s
user	0m8.180s
sys	0m0.242s

Now, 7.8.2:

rae:10:44:41 ~/temp/th-desugar> ghc --version
The Glorious Glasgow Haskell Compilation System, version 7.8.2
rae:10:44:45 ~/temp/th-desugar> rm -f `find . -name '*.o' -or -name '*.hi' -or -name '*.dyn*'`
rae:10:44:49 ~/temp/th-desugar> time ghc Language/Haskell/TH/Desugar/Core.hs
[1 of 2] Compiling Language.Haskell.TH.Desugar.Util ( Language/Haskell/TH/Desugar/Util.hs, Language/Haskell/TH/Desugar/Util.o )
[2 of 2] Compiling Language.Haskell.TH.Desugar.Core ( Language/Haskell/TH/Desugar/Core.hs, Language/Haskell/TH/Desugar/Core.o )

real	1m5.906s
user	0m7.972s
sys	0m0.449s
rae:10:45:58 ~/temp/th-desugar> rm -f `find . -name '*.o' -or -name '*.hi' -or -name '*.dyn*'`
rae:10:46:05 ~/temp/th-desugar> time ghc Language/Haskell/TH/Desugar/Core.hs -O2
[1 of 2] Compiling Language.Haskell.TH.Desugar.Util ( Language/Haskell/TH/Desugar/Util.hs, Language/Haskell/TH/Desugar/Util.o )
[2 of 2] Compiling Language.Haskell.TH.Desugar.Core ( Language/Haskell/TH/Desugar/Core.hs, Language/Haskell/TH/Desugar/Core.o )

real	0m58.498s
user	0m18.374s
sys	0m0.566s
rae:10:47:06 ~/temp/th-desugar> rm -f `find . -name '*.o' -or -name '*.hi' -or -name '*.dyn*'`
rae:10:47:14 ~/temp/th-desugar> time ghc Language/Haskell/TH/Desugar/Core.hs -O2 -fno-spec-constr
[1 of 2] Compiling Language.Haskell.TH.Desugar.Util ( Language/Haskell/TH/Desugar/Util.hs, Language/Haskell/TH/Desugar/Util.o )
[2 of 2] Compiling Language.Haskell.TH.Desugar.Core ( Language/Haskell/TH/Desugar/Core.hs, Language/Haskell/TH/Desugar/Core.o )

real	1m32.710s
user	0m17.552s
sys	0m0.637s

The relevant files are attached. I don't have data for it, but I've noticed that compiling these files in GHCi is relatively snappy.

Changed 10 months ago by goldfire

Richard's example of slow compilation

comment:10 Changed 10 months ago by rwbarton

I profiled GHC and ran it on Lennart's sample code. It spent quite a lot of time, 23.4% in coercionKind.go. Further investigation seemed to indicate that most of that time was spent in the TyConAppCo case, when called from coercionRole. I made the Functor and Applicative (and other) instances of Pair lazy in the pair, trying to avoid traversing the entire coercion when we only need the outermost type constructor. It did speed up the compilation significantly, from 25 seconds to 18 seconds:

# current HEAD
<<ghc: 81626230952 bytes, 3494 GCs, 71837424/232400704 avg/max bytes residency (28 samples), 591M in use, 0.00 INIT (0.00 elapsed), 17.98 MUT (18.23 elapsed), 6.38 GC (6.38 elapsed) :ghc>>
# with lazy Pair instances
<<ghc: 42510020272 bytes, 2493 GCs, 71633789/232686656 avg/max bytes residency (28 samples), 591M in use, 0.00 INIT (0.00 elapsed), 12.00 MUT (12.26 elapsed), 6.06 GC (6.06 elapsed) :ghc>>

However, it had almost no effect on compile allocations for any program in nofib. Perhaps the bad behavior is triggered by large list or record literals.

There must be other regressions, since 7.6.3 takes only 9 seconds on my system. Also, I note that the object files produced by 7.8 or HEAD for Flags and Options are about 25% larger than those produced by 7.6.3.

comment:11 Changed 10 months ago by goldfire

Hrm -- I'm responsible for the offending code.

If coercionRole calling coercionKind is the hotspot that's taking the trouble, that only seems to happen from the NthCo case in coercionRole. It would be straightforward to add a Role field to NthCo to make coercionRole on an NthCo run in constant time. Would that solve this problem? I'm 95% sure that this change would be very straightforward. However, it seems that the change you made already mitigates the problem significantly.

Let me know if I can help further!

Changed 10 months ago by rwbarton

extracted from Lennart's example

comment:12 Changed 10 months ago by rwbarton

Lennart's program seems to be hitting a strange behavior in the simplifier. I managed to extract a relatively simple test case on which ghc 7.8 is much slower than 7.6. With -v3 I see output like this (may be for a slightly different version):

...
*** Demand analysis:
Result size of Demand analysis
  = {terms: 274, types: 282, coercions: 3}
*** Worker Wrapper binds:
Result size of Worker Wrapper binds
  = {terms: 642, types: 692, coercions: 3}
*** Simplifier:
Result size of Simplifier iteration=1
  = {terms: 519, types: 669, coercions: 556}     <---- !!!
Result size of Simplifier iteration=2
  = {terms: 285, types: 329, coercions: 3}
Result size of Simplifier = {terms: 285, types: 329, coercions: 3}
...

The program is sensitive to changes in various ways. Doubling the number of fields in X doubles the number of coercions on that line, and increases ghc's runtime by a factor of about 16. Reducing the number of fields in Options to ten or fewer eliminates this spike in coercions completely! And changing " " to "" eliminates the excess coercions, too.

My laziness change to Pair seems to only help in this pathological case, and it slightly hurts on any other program I have tried. So better to track down and avoid this bad behavior in the simplifier, and leave coercionKind and friends alone, I think.

Last edited 10 months ago by rwbarton (previous) (diff)

comment:13 Changed 10 months ago by rwbarton

The following patch reduces compile time for Lennart's program to 15s (compare 9s 7.6.3, 25s 7.8.1, 18s with my previous patch):

diff --git a/compiler/types/Coercion.lhs b/compiler/types/Coercion.lhs
index b33eae9..1417ad7 100644
--- a/compiler/types/Coercion.lhs
+++ b/compiler/types/Coercion.lhs
@@ -1033,6 +1033,7 @@ mkNthCo n (Refl r ty) = ASSERT( ok_tc_app ty n )
                         Refl r' (tyConAppArgN n ty)
   where tc = tyConAppTyCon ty
         r' = nthRole r tc n
+mkNthCo n (TyConAppCo _ _ cos) = cos !! n
 mkNthCo n co        = ASSERT( ok_tc_app _ty1 n && ok_tc_app _ty2 n )
                       NthCo n co
                     where

Richard, does this look okay to you (modulo error handling—is it possible that n >= length cos?)? I am emboldened by this equation from OptCoercion

opt_co' env sym mrole (NthCo n (TyConAppCo _ _ cos))
  = opt_co env sym mrole (getNth cos n)

but cautious due to the comment above it.

The simplifier was somehow building up a hugely nested coercion (Nth:1 (Nth:1 (Nth:1 (... (<String>_R -> <Bool>_R -> <Bool>_R -> ... -> <Bool>_R -> Sym (Identity.NTCo:Identity[0] <Options>_R))...)))).

After this patch, the slowdown from 7.6 to 7.8 on Lennart's program is in line with what I see on other modules. There is still a signification regression overall, but this program was particularly bad.

comment:14 Changed 9 months ago by goldfire

I'm looking into this now.

I don't really like the optimization in mkNthCo. It's almost certainly correct and harmless, but it's quite incompatible with my work on a branch of GHC that has changed coercions quite a bit. (The branch is integrating dependent types into GHC.) And, the problem shouldn't be there in the first place, so that optimization is just masking something deeper.

In looking at the minimized test case (thanks!) I'm flummoxed by the dependency on the magical number "10" for the Options type. Does that have to do with inliner thresholds? This is an area I'm not familiar with.

But, I'm almost positive that the 10-field cutoff is unrelated to the underlying problem of coercion blowup and quartic (!) slowdown. I'm not convinced that the coercion blowup, itself, is so terrible, but quartic behavior certainly is.

Will post if I have any more news.

comment:15 Changed 9 months ago by goldfire

  • Differential Revisions set to Phab:D73

Have a first attempt up at Phab:D73. More to come.

comment:16 Changed 9 months ago by Richard Eisenberg <eir@…>

In 34ec0bd942b732b127b1a955cd3508da0a588b6f/ghc:

Rewrite coercionRole. (#9233)

Summary:
coercionRole is now much more efficient, computing both the coercion's
kind and role together. The previous version calculated them separately,
leading to quite possibly exponential behavior.

This is still too slow, but it's a big improvement.

Test Plan: Evaluate by running the "minimized" test from the Trac ticket.

Reviewers: simonpj, austin

Subscribers: simonmar, relrod, carter

Differential Revision: https://phabricator.haskell.org/D73

comment:17 Changed 9 months ago by Richard Eisenberg <eir@…>

In 5e7406d9f5857e4ff30aed348f731d16dbd8e64c/ghc:

Optimise optCoercion. (#9233)

The old optCoercion (and helper functions) used coercionKind and
coercionRole internally. This was terrible when these had to be
called at *every* point in the coercion tree during the recursive
descent. This is rewritten to avoid such calls.

comment:18 Changed 9 months ago by goldfire

Success. The two commits above apply different optimizations to different coercion functions, vastly speeding up the test case. If I increase the size of X to include 200 fields, profiling tells me that coercion-oriented functions take about 6% of the ~3 second (MUT) running time. Before these optimizations, the coercion functions were taking up ~98% of the running time, so this is an improvement. :)

Before closing the ticket, would someone else like to test the work and confirm that the speedup is enough to close the issue? Lennart, how does this affect your compilation times? I wouldn't be surprised if these commits make a sizable difference: each commit removed potentially-exponential behavior.

In any case, I'm stopping work on this for now. Let me know if I need to resume somewhere.

comment:19 Changed 9 months ago by Joachim Breitner <mail@…>

In da7cfa99def372cde32af62801a7a7e0163efad8/ghc:

Richards optCoercion improvement made test cases fail the nice way

This was likely caused by 5e7406d9, which fixed #9233.

comment:20 follow-up: Changed 9 months ago by simonpj

Some questions for Richard:

  • In the first commit "Rewrite coercionRole", there is a small change in OptCoercion, whicg does entirely un-commented. What's it about? Could you comment it? A Note?
  • In the new OptCoercion (after both commits) I see that opt_co1 is called only with Nothing as its third arg. So can we get rid of that arg?
  • Ditto opt_co2. (In contrast opt_co3 does have non-Nothing calls. It's the only one that does.
  • Do those changes change something in Note [Optimising coercion optimisation]?

Great work BTW.

Simon

Last edited 9 months ago by simonpj (previous) (diff)

comment:21 in reply to: ↑ 20 Changed 9 months ago by goldfire

Replying to simonpj:

Some questions for Richard:

  • In the first commit "Rewrite coercionRole", there is a small change in OptCoercion, whicg does entirely un-commented. What's it about? Could you comment it? A Note?

Those changes are just to take advantage of the new efficiency available by calling coercionKindRole instead of coercionKind and coercionRole separately. I don't think further explanation is needed -- there's no change in behavior or algorithm really.

  • In the new OptCoercion (after both commits) I see that opt_co1 is called only with Nothing as its third arg. So can we get rid of that arg?

Yes. I was unsure when originally structuring the new opt_co variants if we'd need those parameters, and forgot to go back and double-check. I wil do this.

  • Ditto opt_co2. (In contrast opt_co3 does have non-Nothing calls. It's the only one that does.

Ditto here.

  • Do those changes change something in Note [Optimising coercion optimisation]?

No.

Great work BTW.

Simon

Thanks for checking things over!

comment:22 Changed 9 months ago by Richard Eisenberg <eir@…>

In 612d948b159c209020c12479a846af5b42e9601e/ghc:

Remove unused parameters in OptCoercion (#9233)

comment:23 Changed 9 months ago by simonpj

Re first bullet, yes I see now. Silly me.

comment:24 Changed 9 months ago by simonmar

Is there a test case we can add to the perf tests?

Note: See TracTickets for help on using tickets.