Long compile time/high memory usage for modules with many deriving clauses
I've seen many different people complain about this in various different contexts. One of the most common I personally see is when using Persistent with a large number of entities, which results in multi-minute build times for that single module. Usually, these reports have come from private repositories, which makes for a difficult repro. Also, the usage of Template Haskell in those modules tends to confuse the issue.
So I'd like to report this issue from a completely separate project, with the guess that this represents an issue that is affecting many other users. Steps to reproduce:
cabal unpack fpco-api-1.2.0.1
cd fpco-api-1.2.0.1
cabal install
ghc --make -isrc/library src/library/FP/API/Types.hs
touch src/library/FP/API/Types.hs
Then, to see the time and memory usage of compiling just the one module, run:
time ghc -O2 --make -isrc/library src/library/FP/API/Types.hs +RTS -s
On my system (16GB RAM, quadcore i7), the results are:
$ time ghc -O2 --make -isrc/library src/library/FP/API/Types.hs +RTS -s
[3 of 3] Compiling FP.API.Types ( src/library/FP/API/Types.hs, src/library/FP/API/Types.o )
51,846,533,568 bytes allocated in the heap
6,799,246,288 bytes copied during GC
389,297,584 bytes maximum residency (22 sample(s))
11,330,472 bytes maximum slop
1041 MB total memory in use (0 MB lost due to fragmentation)
Tot time (elapsed) Avg pause Max pause
Gen 0 2041 colls, 0 par 9.19s 9.19s 0.0045s 0.0998s
Gen 1 22 colls, 0 par 4.33s 4.33s 0.1969s 0.5436s
TASKS: 4 (1 bound, 3 peak workers (3 total), using -N1)
SPARKS: 0 (0 converted, 0 overflowed, 0 dud, 0 GC'd, 0 fizzled)
INIT time 0.00s ( 0.00s elapsed)
MUT time 34.21s ( 37.74s elapsed)
GC time 13.52s ( 13.53s elapsed)
EXIT time 0.07s ( 0.07s elapsed)
Total time 47.81s ( 51.34s elapsed)
Alloc rate 1,515,662,937 bytes per MUT second
Productivity 71.7% of total user, 66.8% of total elapsed
gc_alloc_block_sync: 0
whitehole_spin: 0
gen[0].sync: 0
gen[1].sync: 0
real 0m51.375s
user 0m50.864s
sys 0m0.456s
By contrast, with optimizations turned off:
$ time ghc -O0 --make -isrc/library src/library/FP/API/Types.hs +RTS -s
[3 of 3] Compiling FP.API.Types ( src/library/FP/API/Types.hs, src/library/FP/API/Types.o )
12,767,593,936 bytes allocated in the heap
1,078,202,664 bytes copied during GC
179,551,768 bytes maximum residency (13 sample(s))
6,684,544 bytes maximum slop
439 MB total memory in use (0 MB lost due to fragmentation)
Tot time (elapsed) Avg pause Max pause
Gen 0 623 colls, 0 par 1.24s 1.24s 0.0020s 0.0651s
Gen 1 13 colls, 0 par 0.76s 0.76s 0.0586s 0.2397s
TASKS: 4 (1 bound, 3 peak workers (3 total), using -N1)
SPARKS: 0 (0 converted, 0 overflowed, 0 dud, 0 GC'd, 0 fizzled)
INIT time 0.00s ( 0.00s elapsed)
MUT time 8.28s ( 9.23s elapsed)
GC time 2.00s ( 2.00s elapsed)
EXIT time 0.05s ( 0.05s elapsed)
Total time 10.34s ( 11.28s elapsed)
Alloc rate 1,542,443,897 bytes per MUT second
Productivity 80.6% of total user, 73.9% of total elapsed
gc_alloc_block_sync: 0
whitehole_spin: 0
gen[0].sync: 0
gen[1].sync: 0
real 0m11.304s
user 0m11.108s
sys 0m0.172s
and -O1:
$ time ghc -O1 --make -isrc/library src/library/FP/API/Types.hs +RTS -s
[3 of 3] Compiling FP.API.Types ( src/library/FP/API/Types.hs, src/library/FP/API/Types.o )
45,550,443,664 bytes allocated in the heap
5,721,700,512 bytes copied during GC
358,036,456 bytes maximum residency (21 sample(s))
9,167,176 bytes maximum slop
906 MB total memory in use (0 MB lost due to fragmentation)
Tot time (elapsed) Avg pause Max pause
Gen 0 1642 colls, 0 par 7.70s 7.70s 0.0047s 0.1031s
Gen 1 21 colls, 0 par 3.68s 3.69s 0.1756s 0.4968s
TASKS: 4 (1 bound, 3 peak workers (3 total), using -N1)
SPARKS: 0 (0 converted, 0 overflowed, 0 dud, 0 GC'd, 0 fizzled)
INIT time 0.00s ( 0.00s elapsed)
MUT time 30.21s ( 33.09s elapsed)
GC time 11.38s ( 11.39s elapsed)
EXIT time 0.07s ( 0.07s elapsed)
Total time 41.68s ( 44.55s elapsed)
Alloc rate 1,507,571,106 bytes per MUT second
Productivity 72.7% of total user, 68.0% of total elapsed
gc_alloc_block_sync: 0
whitehole_spin: 0
gen[0].sync: 0
gen[1].sync: 0
real 0m44.577s
user 0m44.164s
sys 0m0.368s
Note that this module is essentially just 900 lines of data declarations, with a large number of derived classes, and a few manually written helper functions and instances.
This high memory usage hasn't just resulted in user complaints: automated Jenkins and Travis jobs will often times fail without optimizations disabled, which can be problematic for proper testing and production code. In the case of fpco-api, I've worked around this by adding -O0
to the cabal file, but it would be much nicer to not have to do that.
For some other examples of complaints along these lines:
- https://groups.google.com/d/msg/yesodweb/MX1bnOFu8Hc/VLQXhnvpIkMJ
- https://groups.google.com/d/msg/yesodweb/XPWixNjuOnM/FN26bmkudgwJ
I believe there are a few other threads discussing this, if it would be helpful. I did my testing on GHC 7.8.3 64-bit, Ubuntu 12.04.
Trac metadata
Trac field | Value |
---|---|
Version | 7.8.3 |
Type | Bug |
TypeOfFailure | OtherFailure |
Priority | normal |
Resolution | Unresolved |
Component | Compiler |
Test case | |
Differential revisions | |
BlockedBy | |
Related | |
Blocking | |
CC | |
Operating system | |
Architecture |