Opened 4 years ago

Last modified 6 months ago

#7428 new bug

GHC compile times are seriously non-linear in program size

Reported by: nudded Owned by:
Priority: normal Milestone:
Component: Compiler Version: 7.4.2
Keywords: Cc: gidyn, RyanGlScott
Operating System: Unknown/Multiple Architecture: Unknown/Multiple
Type of failure: Compile-time performance bug Test Case:
Blocked By: Blocking:
Related Tickets: Differential Rev(s):
Wiki Page:

Description

When compiling the attached code with -O2 GHC runs out of memory. Experimenting with different monad stacks shows exponential memory usage.

Attachments (10)

cont-state8.hs (2.4 KB) - added by nudded 4 years ago.
cont-state.hs (2.1 KB) - added by nudded 4 years ago.
cont-state2.hs (2.1 KB) - added by nudded 4 years ago.
ghc-stage2.prof (60.7 KB) - added by nudded 4 years ago.
output.log (3.2 KB) - added by parcs 4 years ago.
output of ghc-stage2 -O2 cont-state8.hs -fforce-recomp -Wall -v +RTS -M1G -RTS
cont-state8-cleaned-up.hs (1.9 KB) - added by parcs 4 years ago.
Cont8.hs (2.4 KB) - added by thoughtpolice 22 months ago.
AMP compatible test case
Cont8.2.hs (2.4 KB) - added by thoughtpolice 22 months ago.
AMP compatible test case
Cont8-smaller.hs (2.4 KB) - added by thoughtpolice 22 months ago.
cont8-hc.png (94.2 KB) - added by thoughtpolice 22 months ago.

Download all attachments as: .zip

Change History (35)

Changed 4 years ago by nudded

Attachment: cont-state8.hs added

comment:1 Changed 4 years ago by nudded

Type of failure: Compile-time performance bugCompile-time crash

comment:2 Changed 4 years ago by simonpj

difficulty: Unknown

Thank you. We know there's a performance bug in the occurrence analyser (see #7258), but it's very helpful to have a small case that tickles it. Two questions

  • How do you know it's the occurrence analyser at fault?
  • Can you be specific about the changes you made to the test program that showed you the memory behaviour was exponential (ugh).

Thanks

Simon

comment:3 Changed 4 years ago by nudded

I used several debug flags. The output of dump-occur-anal seems to increase in size when adjusting the size of the LargeState transformer stack. So, in the attached code LargeState consists of 10 StateT transformers.

When compiling with 7 it will still work. When compiling with 8 it will try to allocate > 4 gigs of memory.

Those are some of the observations I ran into when trying to figure out what is going on.

Sidenote: when compiling without optimisation there is no problem at all.

Changed 4 years ago by nudded

Attachment: cont-state.hs added

Changed 4 years ago by nudded

Attachment: cont-state2.hs added

comment:4 Changed 4 years ago by nudded

I added 2 files with stack size 1 and stack size 2.

Changed 4 years ago by nudded

Attachment: ghc-stage2.prof added

comment:5 Changed 4 years ago by nudded

Profiling output when compiling with 7 StateT transformers.

comment:6 Changed 4 years ago by nudded

Resolution: invalid
Status: newclosed

Since after more analysis this doesn't seem to be related to the occurrence analyzer I'm closing this as invalid.

comment:7 Changed 4 years ago by simonpj

Resolution: invalid
Status: closednew
Summary: Occurrence analyzer out of memoryGHC compile times are seriously non-linear in program size

I'm re-opening becuase I believe nudded is telling us that compilation times are seriously non-linear in program size. That should not happen. Moreover you were kind enough to profile the compiler and in the profile you gave it seems that 64% of all allocation is in MkIface.addFingerprints which is not supposed to be an expensive function.

Could you help us some more?

  • Give a table of compile times against program size (or number of state transformers)
  • Explain exactly how to reproduce (what flags, what changes to make to the code to make N=1, N=2, N=4, etc
  • Use -dshow-passes to see if the size of the simplified program is blowing up
  • If you feel up to it, drill down into addFingerprints to see where the time is going.

Thanks!

Simon

Changed 4 years ago by parcs

Attachment: output.log added

output of ghc-stage2 -O2 cont-state8.hs -fforce-recomp -Wall -v +RTS -M1G -RTS

Changed 4 years ago by parcs

Attachment: cont-state8-cleaned-up.hs added

comment:8 Changed 4 years ago by parcs

I've attached a cleaned-up version of cont-state8.hs and an output log of ghc-stage2 -O2 cont-state8.hs -fforce-recomp -Wall -v +RTS -M1G -RTS to help move this ticket forward.

comment:9 Changed 4 years ago by igloo

Milestone: 7.8.1

comment:10 Changed 3 years ago by simonmar

Priority: normalhigh

Simon - this looks like a real issue. The types blow up during successive simplification stages, look at this:

*** Float inwards:
Result size of Float inwards
  = {terms: 985, types: 6,189, coercions: 267}
*** Simplifier:
Result size of Simplifier iteration=1
  = {terms: 4,502, types: 100,514, coercions: 35,229}
Result size of Simplifier iteration=2
  = {terms: 3,022, types: 335,649, coercions: 137,994}
Result size of Simplifier iteration=3
  = {terms: 2,659, types: 766,643, coercions: 324,940}
Result size of Simplifier iteration=4
  = {terms: 2,461, types: 2,556,807, coercions: 1,206,055}
Result size of Simplifier
  = {terms: 2,461, types: 2,556,807, coercions: 1,206,055}
*** Simplifier:
Result size of Simplifier iteration=1
  = {terms: 2,396, types: 9,160,050, coercions: 4,470,133}
Result size of Simplifier iteration=2
  = {terms: 2,341, types: 30,948,777, coercions: 15,796,211}
ghc-stage2: out of memory (requested 1048576 bytes)

comment:11 Changed 3 years ago by carter

I just tested this issue with a copy of HEAD I built today, ghc -O2 still blows up on the cleaned up file.

comment:12 Changed 3 years ago by thoughtpolice

Milestone: 7.8.37.10.1

Bumping priority down (these tickets haven't been closely followed or fixed in 7.4), and moving out to 7.10 and out of 7.8.3.

comment:13 Changed 3 years ago by thoughtpolice

Priority: highnormal

Actually dropping priority. :)

comment:14 Changed 2 years ago by thomie

Architecture: x86_64 (amd64)Unknown/Multiple
Operating System: MacOS XUnknown/Multiple

The file cont-state8-cleaned-up.hs from comment:8 is not AMP compatible.

Last edited 2 years ago by thomie (previous) (diff)

comment:15 Changed 2 years ago by thoughtpolice

Milestone: 7.10.17.12.1

Moving to 7.12.1 milestone; if you feel this is an error and should be addressed sooner, please move it back to the 7.10.1 milestone.

Changed 22 months ago by thoughtpolice

Attachment: Cont8.hs added

AMP compatible test case

Changed 22 months ago by thoughtpolice

Attachment: Cont8.2.hs added

AMP compatible test case

comment:16 Changed 22 months ago by thoughtpolice

I've updated the test case. By default, even on my monster build machine, it looks like the compilation of this test is literally never going to finish, so it needs to be toned down to look at it.

Changed 22 months ago by thoughtpolice

Attachment: Cont8-smaller.hs added

Changed 22 months ago by thoughtpolice

Attachment: cont8-hc.png added

comment:17 Changed 22 months ago by thoughtpolice

Here's a smaller version (that actually completes in a reasonable amount of time), along with a profile. Something is definitely quadratic or something around CorePrep/TidyCore; when run with -v3, I estimate CorePrep alone takes about 20 seconds.

comment:18 Changed 22 months ago by thoughtpolice

(For a build that's about 1 minute, FWIW)

comment:19 Changed 22 months ago by gidyn

Cc: gidyn added

comment:20 Changed 22 months ago by bgamari

I may be missing something obvious here but I'm not convinced that the problem is in CorePrep or TidyCore. If one looks at the hi file or -ddump-simpl output from even a relatively shallow four-layer transformer stack one finds quite large types that appear to be growing quadratically in the depth. This is supported by the fact that Type and IfaceType show up strongly in the compiler's heap profiles as well as the fact that, in my runs, core2core shows up about as strongly in the profiles as addFingerprints.

Looking at the types in the Core, it seems that the inliner may be over-zealous in its work. Due to the CPS style of the example, the types snowball extremely quickly. Most of the damage comes in the first two simplifier iterations after FloatOut. The larger of these types are coercions and their signatures.

Compare, for instance, the Core produced by the,

I'm not sure I see what is failing here. The question then is what measure is supposed to prevent this snow-balling and why is it not working? I'll have another look tomorrow.

Last edited 22 months ago by bgamari (previous) (diff)

comment:21 Changed 22 months ago by simonpj

See Scrap your type applications Section 2.3, for why types can (legitimately I fear) blow up. I'm not say that is what is happening here, but it might be similar.

Simon

comment:22 Changed 19 months ago by thoughtpolice

Milestone: 7.12.18.0.1

Milestone renamed

comment:23 Changed 19 months ago by thomie

Type of failure: Compile-time crashCompile-time performance bug

comment:24 Changed 14 months ago by thomie

Milestone: 8.0.1

comment:25 Changed 6 months ago by RyanGlScott

Cc: RyanGlScott added
Note: See TracTickets for help on using tickets.