Opened 8 years ago

Closed 7 years ago

#3016 closed bug (fixed)

Long compile times, large memory use with static data in 6.10

Reported by: dons Owned by: igloo
Priority: normal Milestone: 6.12.3
Component: Compiler Version: 6.10.1
Keywords: static data Cc: awick
Operating System: Unknown/Multiple Architecture: Unknown/Multiple
Type of failure: None/Unknown Test Case: simplCore/should_compile/T3016
Blocked By: Blocking:
Related Tickets: Differential Rev(s):
Wiki Page:


A collection of modules like this, when compiled together, are taking 2 1/2 hrs to compile with ghc 6.10, up from 30 mins with 6.8.x

This is slowing down a work project somewhat.

Attached is a sample module, that should illustrate the problem, when compiled ghc uses 2G and takes a few minutes.

Attachments (2)

Large2.hs (170.7 KB) - added by dons 8 years ago.
large numeric constants
h.png (76.4 KB) - added by igloo 8 years ago.

Download all attachments as: .zip

Change History (19)

Changed 8 years ago by dons

Attachment: Large2.hs added

large numeric constants

comment:1 Changed 8 years ago by awick

Cc: awick added

comment:2 Changed 8 years ago by simonpj

difficulty: Unknown

I've investigated a bit. The root cause is that GHC is much too keen to inline. The big integers are broken into lots of small ones (since we don't have big-integer literals), thus

blah = plusInteger (mkInteger 3434433) 
                   (timesInteger (mkInteger 9343) (mkInteger 33344))

Although plusInteger has quite a big unfolding, it also has large "argument discounts" because if you know what constructor the argument is built with, the unfolding would get a lot smaller. The trouble is that in GHC 6.10 the discount applies when GHC knows anything at all about the argument.

In particular, here we know that the argument is a call to timesInteger. So each literal gives rise to dozens of plusInteger and timesInteger calls, and they are all (fruitlessly) inlined.

That's the diagnosis. I'm working on a fix, by making inlining a bit less eager.


comment:3 Changed 8 years ago by simonpj

Owner: set to igloo
Test Case: simplCore/should_compile/T3016

OK I fixed this bug as a part of this patch:

Fri Apr  3 01:46:34 PDT 2009
  * Adjust inlining heursitics
  Ignore-this: a2d6849726f6705e06941c74137bfe7e

  This patch is the result of a long series of nofib-based experiments
  to improve GHC's inlining heuristics.

Now compilation of the file you attached is much, much faster:

$ time $vpj2 Large2.hs -c -O -fforce-recomp

real	0m29.653s
user	0m27.935s
sys	0m0.631s

Each of the 1000-odd big constants in Large2.hs expands to the product of a dozen or so small constants. A file of 12,000 constants combined with calls to plusInteger etc still shouldn't take as long as 30s, but still an improvement of two orders of magnitude should help you.

Ian: I'll add the example as a test case (simplCore/should_compile/T3116.hs), with the idea that it'll time out if Bad Things happen again in the future. Can you just check that it doesn't time out on some architectures already; if so, make the test a bit smaller? I'll reassign the bug to you for that sole purpose. After that you can close it.

Oh, the test runs slowly, so you might want not to have it in the validation run.


Changed 8 years ago by igloo

Attachment: h.png added

comment:4 Changed 8 years ago by igloo

There might be more to look at here. I've attached h.png, a heap profile with only the first 56 lines of the largeConsts list. Nothing jumps out from the profiling data:

unSM                           SimplMonad             7.3    0.0
thenSmpl                       SimplMonad             5.7    1.0
thenFC                         CgMonad                4.7    3.2
completeCall                   Simplify               2.7    6.8
>>=_aUF                        RegAlloc.Linear.State   2.2    0.5
getCommonNodeUFMData           UniqFM                 2.1    2.5
insert_ele                     UniqFM                 1.4    2.6
thenNat                        NCGMonad               1.2    0.1
simplVar                       Simplify               1.2    0.0
==_a3sk                        DynFlags               1.2    0.0

comment:5 Changed 8 years ago by simonpj

OK so the focus moves from time to space. Ian think residency may be growing non-linearly. ToDo:

  • Confirm no-linearity
  • Try to simplify code while maintaining non-linear behaviour.


comment:6 Changed 8 years ago by igloo

It actually looks not far off linear:

# constants      peak heap usage      bytes allocated
160               63874448             3193731536
320               86003152             6609305256
640              186134592            14211355072
1280             <stack overflow>

and the heap usage is lower here than the heap profile showed.

So I think that all we need to do here is make sure the testsuite test is OK.

comment:7 Changed 8 years ago by igloo

Milestone: 6.12.1

comment:8 Changed 8 years ago by awick

These patches seem to help our code a little, but we're still seeing >1hr compile times that hit swap. I'll try to generate and upload some test vectors in the next couple days.

comment:9 Changed 8 years ago by igloo

What flags are you using?

When compiling via C, gcc seems to take a lot of time and RAM.

comment:10 Changed 8 years ago by awick

We're compiling via Cabal, with this set:

    ghc-options: -O2 -Wall -fno-warn-orphans -funbox-strict-fields

I accidentally destroyed my HEAD installation, but based out of 6.10.1, cabal build -v2 reports these as the flags:

--make -hide-all-packages -i -idist/build -i. -idist/build/autogen -Idist/build/autogen -Idist/build -optP-include -optPdist/build/autogen/cabal_macros.h -odir dist/build -hidir dist/build -stubdir dist/build -package QuickCheck- -package SHA-1.4.0 -package array- -package base- -package binary- -package bytestring- -package monadLib-3.5.2 -package parallel- -package pureMD5-0.2.4 -package random- -package readline- -package syb- -package time- -O -O2 -Wall -fno-warn-orphans -funbox-strict-fields -XCPP -XFlexibleContexts -XFlexibleInstances -XDeriveDataTypeable -XMultiParamTypeClasses

So I don't think it's going through gcc.

comment:11 Changed 8 years ago by awick

Alright, I have a tarball that exhibits the nasty behavior we're seeing, but unfortunately Trac's file size limit is too small for me to attach it. You can grab it here:

On our machines with 6.10.1, this goes over 2GB in 3-4 minutes, which hits swap. I usually kill it at that point.

comment:12 in reply to:  11 Changed 8 years ago by igloo

Replying to awick:

On our machines with 6.10.1, this goes over 2GB in 3-4 minutes, which hits swap. I usually kill it at that point.

What's the output up to that point? i.e. what has it compiled, and what is it currently compiling?

Here, my HEAD ghc (as compiled by validate) finished with I think around 1G RAM used:

166.70s user 2.10s system 101% cpu 2:46.41 total

I'll do more testing, e.g. with 6.10, later.

comment:13 Changed 8 years ago by awick

Well, it's here when it hits swap (again, that's at a bit over 2GB, and takes around 3 minutes):

ghc -O2 -funbox-strict-fields --make Foo.hs
[ 1 of 25] Compiling Data.Structs     ( Data/Structs.hs, Data/Structs.o )
[ 2 of 25] Compiling Data.Data31      ( Data/Data31.hs, Data/Data31.o )

This is a 32-bit machine, as an aside. It completes after ~30 minutes, which is actually better than I remembered, although still pretty ridiculous when it's just compiling constants. The memory use tends to peak around 2.5GB, but spends most of its time at 2.0GB. I'll try to replicate your HEAD build shortly.

comment:14 Changed 8 years ago by awick


It appears that some/most of this has to do with -fsplit-objs being set. I didn't notice this before because it's hidden in a Cabal file on our machines.

comment:15 Changed 7 years ago by igloo

Milestone: branch
Type of failure: None/Unknown

comment:16 Changed 7 years ago by igloo

Milestone: 6.12 branch6.12.3

comment:17 Changed 7 years ago by igloo

Resolution: fixed
Status: newclosed

In HEAD this finishes in 1G of RAM in 2.5mins.

6.12.2 ran out of memory on the 24th file, but rerunning the command succeeded, in a total of about 3mins.

Memory use is still high, but memory use of --make is known to be high.

I therefore think this is fixed.

Note: See TracTickets for help on using tickets.