Opened 6 years ago
Last modified 2 years ago
#7428 new bug
GHC compile times are seriously non-linear in program size
Reported by: | nudded | Owned by: | |
---|---|---|---|
Priority: | normal | Milestone: | |
Component: | Compiler | Version: | 7.4.2 |
Keywords: | Cc: | gidyn, RyanGlScott | |
Operating System: | Unknown/Multiple | Architecture: | Unknown/Multiple |
Type of failure: | Compile-time performance bug | Test Case: | |
Blocked By: | Blocking: | ||
Related Tickets: | Differential Rev(s): | ||
Wiki Page: |
Description
When compiling the attached code with -O2 GHC runs out of memory. Experimenting with different monad stacks shows exponential memory usage.
Attachments (10)
Change History (35)
Changed 6 years ago by
Attachment: | cont-state8.hs added |
---|
comment:1 Changed 6 years ago by
Type of failure: | Compile-time performance bug → Compile-time crash |
---|
comment:2 Changed 6 years ago by
difficulty: | → Unknown |
---|
comment:3 Changed 6 years ago by
I used several debug flags. The output of dump-occur-anal seems to increase in size when adjusting the size of the LargeState transformer stack. So, in the attached code LargeState consists of 10 StateT transformers.
When compiling with 7 it will still work. When compiling with 8 it will try to allocate > 4 gigs of memory.
Those are some of the observations I ran into when trying to figure out what is going on.
Sidenote: when compiling without optimisation there is no problem at all.
Changed 6 years ago by
Attachment: | cont-state.hs added |
---|
Changed 6 years ago by
Attachment: | cont-state2.hs added |
---|
Changed 6 years ago by
Attachment: | ghc-stage2.prof added |
---|
comment:6 Changed 6 years ago by
Resolution: | → invalid |
---|---|
Status: | new → closed |
Since after more analysis this doesn't seem to be related to the occurrence analyzer I'm closing this as invalid.
comment:7 Changed 6 years ago by
Resolution: | invalid |
---|---|
Status: | closed → new |
Summary: | Occurrence analyzer out of memory → GHC compile times are seriously non-linear in program size |
I'm re-opening becuase I believe nudded
is telling us that compilation times are seriously non-linear in program size. That should not happen. Moreover you were kind enough to profile the compiler and in the profile you gave it seems that 64% of all allocation is in MkIface.addFingerprints
which is not supposed to be an expensive function.
Could you help us some more?
- Give a table of compile times against program size (or number of state transformers)
- Explain exactly how to reproduce (what flags, what changes to make to the code to make N=1, N=2, N=4, etc
- Use
-dshow-passes
to see if the size of the simplified program is blowing up - If you feel up to it, drill down into
addFingerprints
to see where the time is going.
Thanks!
Simon
Changed 6 years ago by
Attachment: | output.log added |
---|
output of ghc-stage2 -O2 cont-state8.hs -fforce-recomp -Wall -v +RTS -M1G -RTS
Changed 6 years ago by
Attachment: | cont-state8-cleaned-up.hs added |
---|
comment:8 Changed 6 years ago by
I've attached a cleaned-up version of cont-state8.hs
and an output log of ghc-stage2 -O2 cont-state8.hs -fforce-recomp -Wall -v +RTS -M1G -RTS
to help move this ticket forward.
comment:9 Changed 6 years ago by
Milestone: | → 7.8.1 |
---|
comment:10 Changed 5 years ago by
Priority: | normal → high |
---|
Simon - this looks like a real issue. The types blow up during successive simplification stages, look at this:
*** Float inwards: Result size of Float inwards = {terms: 985, types: 6,189, coercions: 267} *** Simplifier: Result size of Simplifier iteration=1 = {terms: 4,502, types: 100,514, coercions: 35,229} Result size of Simplifier iteration=2 = {terms: 3,022, types: 335,649, coercions: 137,994} Result size of Simplifier iteration=3 = {terms: 2,659, types: 766,643, coercions: 324,940} Result size of Simplifier iteration=4 = {terms: 2,461, types: 2,556,807, coercions: 1,206,055} Result size of Simplifier = {terms: 2,461, types: 2,556,807, coercions: 1,206,055} *** Simplifier: Result size of Simplifier iteration=1 = {terms: 2,396, types: 9,160,050, coercions: 4,470,133} Result size of Simplifier iteration=2 = {terms: 2,341, types: 30,948,777, coercions: 15,796,211} ghc-stage2: out of memory (requested 1048576 bytes)
comment:11 Changed 5 years ago by
I just tested this issue with a copy of HEAD I built today, ghc -O2 still blows up on the cleaned up file.
comment:12 Changed 5 years ago by
Milestone: | 7.8.3 → 7.10.1 |
---|
Bumping priority down (these tickets haven't been closely followed or fixed in 7.4), and moving out to 7.10 and out of 7.8.3.
comment:14 Changed 4 years ago by
Architecture: | x86_64 (amd64) → Unknown/Multiple |
---|---|
Operating System: | MacOS X → Unknown/Multiple |
The file cont-state8-cleaned-up.hs
from comment:8 is not AMP compatible.
comment:15 Changed 4 years ago by
Milestone: | 7.10.1 → 7.12.1 |
---|
Moving to 7.12.1 milestone; if you feel this is an error and should be addressed sooner, please move it back to the 7.10.1 milestone.
comment:16 Changed 4 years ago by
I've updated the test case. By default, even on my monster build machine, it looks like the compilation of this test is literally never going to finish, so it needs to be toned down to look at it.
Changed 4 years ago by
Attachment: | Cont8-smaller.hs added |
---|
Changed 4 years ago by
Attachment: | cont8-hc.png added |
---|
comment:17 Changed 4 years ago by
Here's a smaller version (that actually completes in a reasonable amount of time), along with a profile. Something is definitely quadratic or something around CorePrep
/TidyCore
; when run with -v3
, I estimate CorePrep
alone takes about 20 seconds.
comment:19 Changed 4 years ago by
Cc: | gidyn added |
---|
comment:20 Changed 4 years ago by
I may be missing something obvious here but I'm not convinced that the problem is in CorePrep
or TidyCore
. If one looks at the hi
file or -ddump-simpl
output from even a relatively shallow four-layer transformer stack one finds quite large types that appear to be growing quadratically in the depth. This is supported by the fact that Type
and IfaceType
show up strongly in the compiler's heap profiles as well as the fact that, in my runs, core2core
shows up about as strongly in the profiles as addFingerprints
.
Looking at the types in the Core, it seems that the inliner may be over-zealous in its work. Due to the CPS style of the example, the types snowball extremely quickly. Most of the damage comes in the first two simplifier iterations after FloatOut
. The larger of these types are coercions and their signatures.
Compare, for instance, the Core produced by the,
StateT Int IO
testcase: https://gist.github.com/bgamari/026e3670cd6a50a07790StateT Int (StateT Int IO)
case: https://gist.github.com/bgamari/9e31d7a5f40d99751b0d- three-
StateT
case: https://gist.github.com/bgamari/baf33b1f31aa5960c9c7 - four-
StateT
case: https://gist.github.com/bgamari/856d4954b488c03bf5e9
I'm not sure I see what is failing here. The question then is what measure is supposed to prevent this snow-balling and why is it not working? I'll have another look tomorrow.
comment:21 Changed 4 years ago by
See Scrap your type applications Section 2.3, for why types can (legitimately I fear) blow up. I'm not say that is what is happening here, but it might be similar.
Simon
comment:23 Changed 3 years ago by
Type of failure: | Compile-time crash → Compile-time performance bug |
---|
comment:24 Changed 3 years ago by
Milestone: | 8.0.1 |
---|
comment:25 Changed 2 years ago by
Cc: | RyanGlScott added |
---|
Thank you. We know there's a performance bug in the occurrence analyser (see #7258), but it's very helpful to have a small case that tickles it. Two questions
Thanks
Simon