compile-time performance regression compiling genprimcode
compiling utils/genprimopcode
(where compiling the Alex-generated Lexer takes up the most amount of time) with GHC 7.8.4:
$ time /opt/ghc/7.8.4/bin/ghc -Rghc-timing --make -O Main.hs
[1 of 5] Compiling ParserM ( ParserM.hs, ParserM.o )
[2 of 5] Compiling Lexer ( Lexer.hs, Lexer.o )
[3 of 5] Compiling Syntax ( Syntax.hs, Syntax.o )
[4 of 5] Compiling Parser ( Parser.hs, Parser.o )
[5 of 5] Compiling Main ( Main.hs, Main.o )
Linking Main ...
<<ghc: 20742311240 bytes, 2092 GCs, 60174814/140258264 avg/max bytes residency (33 samples), 332M in use, 0.00 INIT (0.00 elapsed), 7.84 MUT (8.35 elapsed), 6.93 GC (6.93 elapsed) :ghc>>
real 0m15.331s
user 0m15.173s
sys 0m0.172s
(btw, is it ok that GC takes up so much?)
the same with a GHC 7.10 snapshot takes *alot* longer:
$ time /opt/ghc/7.10.1/bin/ghc -Rghc-timing --make -O Main.hs
[1 of 5] Compiling ParserM ( ParserM.hs, ParserM.o )
[2 of 5] Compiling Lexer ( Lexer.hs, Lexer.o )
[3 of 5] Compiling Syntax ( Syntax.hs, Syntax.o )
[4 of 5] Compiling Parser ( Parser.hs, Parser.o )
[5 of 5] Compiling Main ( Main.hs, Main.o )
Linking Main ...
<<ghc: 147681534568 bytes, 37824 GCs, 203338614/375314176 avg/max bytes residency (538 samples), 876M in use, 0.001 INIT (0.001 elapsed), 81.374 MUT (81.888 elapsed), 235.240 GC (235.136 elapsed) :ghc>>
real 5m17.061s
user 5m16.702s
sys 0m0.499s
- ..and here the GC/MUT ratio is even more off.
- Show closed items
Relates to
Activity
-
Newest first Oldest first
-
Show all activity Show comments only Show history only
- Herbert Valerio Riedel changed milestone to %7.10.1
changed milestone to %7.10.1
- Herbert Valerio Riedel changed weight to 5
changed weight to 5
- Herbert Valerio Riedel added Tbug Trac import labels
added Tbug Trac import labels
- Herbert Valerio Riedel changed the description
changed the description
- Author Maintainer
- Developer
Can anyone build a profiled GHC and see where the time is going?
- Developer
I can't reproduce this with HEAD. I used this command line, from the root directory, which I took from a build log
ghc -hisuf hi -osuf o -hcsuf hc -static -H32m -O -Werror -Wall -H64m -i -iutils/genprimopcode/. -iutils/genprimopcode/dist/build -iutils/genprimopcode/dist/build/autogen -Iutils/genprimopcode/dist/build -Iutils/genprimopcode/dist/build/autogen -package array -no-user-package-db -rtsopts -odir utils/genprimopcode/dist/build -hidir utils/genprimopcode/dist/build -stubdir utils/genprimopcode/dist/build --make utils/genprimopcode/Main.hs -fforce-recomp +RTS -s
With HEAD I get
4,189,727,416 bytes allocated in the heap 525,871,120 bytes copied during GC 43,253,816 bytes maximum residency (12 sample(s)) 1,758,368 bytes maximum slop 116 MB total memory in use (0 MB lost due to fragmentation) Tot time (elapsed) Avg pause Max pause Gen 0 600 colls, 0 par 1.824s 1.823s 0.0030s 0.0271s Gen 1 12 colls, 0 par 0.811s 0.811s 0.0676s 0.1351s TASKS: 4 (1 bound, 3 peak workers (3 total), using -N1) SPARKS: 0 (0 converted, 0 overflowed, 0 dud, 0 GC'd, 0 fizzled) INIT time 0.002s ( 0.001s elapsed) MUT time 6.997s ( 7.946s elapsed) GC time 2.635s ( 2.635s elapsed) EXIT time 0.038s ( 0.037s elapsed) Total time 9.711s ( 10.619s elapsed)
With ghc 7.8 I get
3,874,936,096 bytes allocated in the heap 416,943,272 bytes copied during GC 44,518,152 bytes maximum residency (10 sample(s)) 1,549,168 bytes maximum slop 115 MB total memory in use (0 MB lost due to fragmentation) Tot time (elapsed) Avg pause Max pause Gen 0 396 colls, 0 par 1.21s 1.25s 0.0032s 0.0207s Gen 1 10 colls, 0 par 0.57s 0.65s 0.0654s 0.1421s TASKS: 4 (1 bound, 3 peak workers (3 total), using -N1) SPARKS: 0 (0 converted, 0 overflowed, 0 dud, 0 GC'd, 0 fizzled) INIT time 0.00s ( 0.00s elapsed) MUT time 5.63s ( 6.59s elapsed) GC time 1.78s ( 1.91s elapsed) EXIT time 0.03s ( 0.04s elapsed) Total time 7.46s ( 8.54s elapsed)
They may not be quite comparable (e.g. my HEAD libraries are built with -O, but I think my installed GHC probably has libraries built with -O2). But HEAD is not way out of line.
I suppose I can now try building the 7.10 branch.
Meanwhile, would someone try with
-dshow-passes
(which displays intermediate sizes)?Simon
- Developer
I tried with the 7.10 branch, and got
4,187,629,104 bytes allocated in the heap 476,423,296 bytes copied during GC 42,531,768 bytes maximum residency (11 sample(s)) 1,561,840 bytes maximum slop 96 MB total memory in use (0 MB lost due to fragmentation) Tot time (elapsed) Avg pause Max pause Gen 0 619 colls, 0 par 1.937s 1.936s 0.0031s 0.0290s Gen 1 11 colls, 0 par 0.710s 0.710s 0.0645s 0.1338s TASKS: 4 (1 bound, 3 peak workers (3 total), using -N1) SPARKS: 0 (0 converted, 0 overflowed, 0 dud, 0 GC'd, 0 fizzled) INIT time 0.002s ( 0.001s elapsed) MUT time 6.782s ( 7.641s elapsed) GC time 2.647s ( 2.646s elapsed) EXIT time 0.035s ( 0.034s elapsed) Total time 9.497s ( 10.323s elapsed)
So I can't reproduce the slow compile time. Alas.
- Author Maintainer
That's weird... I just tried with today's GHC HEAD (at 696f2cfd), and a
BuildFlavour="quick"
build. The exact steps I took now were:cd utils/genprimopcode/ rm *.o *.hi alex Lexer.x ../inplace/bin/ghc-stage2 -Rghc-timing -O -c ParserM.hs # <<ghc: 906859816 bytes, 319 GCs, 4611760/14141808 avg/max bytes residency (8 samples), 31M in use, 0.001 INIT (0.001 elapsed), 0.822 MUT (0.833 elapsed), 0.225 GC (0.225 elapsed) :ghc>> ../inplace/bin/ghc-stage2 -Rghc-timing -O -c Lexer.hs # ...runs for over 15 minutes...
So I can actually reproduce this with GHC HEAD
- Author Maintainer
Here's the output of
time /opt/ghc/7.10.1/bin/ghc -O -dshow-passes -c Lexer.hs |& ts -s "%.s"
Each output line is prepended by a timestamp expressed in seconds:
0.000012 Glasgow Haskell Compiler, Version 7.10.0.20141227, stage 2 booted by GHC version 7.8.4 0.000080 Using binary package database: /opt/ghc/7.10.1/lib/ghc-7.10.0.20141227/package.conf.d/package.cache 0.001052 Using binary package database: /home/hvr/.ghc/x86_64-linux-7.10.0.20141227/package.conf.d/package.cache 0.020922 wired-in package ghc-prim mapped to ghc-prim-0.3.1.0-000208e3187fe823e04e0c88352da107 0.020997 wired-in package integer-gmp mapped to integer-gmp-1.0.0.0-bb665344376cc98555dad190d849a337 0.021545 wired-in package base mapped to base-4.8.0.0-7f94b08ebb75c6cfab2660fb04137780 0.021578 wired-in package rts mapped to builtin_rts 0.021650 wired-in package template-haskell mapped to template-haskell-2.10.0.0-ab6ca4770f90f6f44262138da6877395 0.021676 wired-in package ghc mapped to ghc-7.10.0.20141227-217cbe8ba9c57a128065806ea73c39b4 0.021689 wired-in package dph-seq not found. 0.021758 wired-in package dph-par not found. 0.022632 Created temporary directory: /tmp/ghc16466_0 0.022703 *** C pre-processor: 0.028039 *** Checking old interface for Lexer: 0.028103 *** Parser: 0.214401 *** Renamer/typechecker: 1.362793 *** Desugar: 1.841327 Result size of Desugar (after optimization) 1.841402 = {terms: 97,708, types: 6,414, coercions: 7} 1.841418 *** Simplifier: 2.113916 Result size of Simplifier iteration=1 2.113977 = {terms: 92,764, types: 1,443, coercions: 16} 2.369659 Result size of Simplifier 2.369721 = {terms: 92,764, types: 1,443, coercions: 16} 2.369732 *** Specialise: 2.432081 Result size of Specialise 2.432143 = {terms: 92,928, types: 1,649, coercions: 34} 2.432154 *** Float out(FOS {Lam = Just 0, Consts = True, OverSatApps = False}): 2.432163 Result size of Float out(FOS {Lam = Just 0, 2.432174 Consts = True, 2.761993 OverSatApps = False}) 2.762059 = {terms: 161,836, types: 37,472, coercions: 34} 2.762070 *** Simplifier: 7.394822 Result size of Simplifier iteration=1 7.394887 = {terms: 233,709, types: 119,483, coercions: 965} 8.202722 Result size of Simplifier iteration=2 8.202785 = {terms: 233,373, types: 119,022, coercions: 774} 8.873386 Result size of Simplifier iteration=3 8.873447 = {terms: 233,369, types: 119,019, coercions: 771} 9.562065 Result size of Simplifier 9.562127 = {terms: 233,369, types: 119,019, coercions: 771} 9.562142 *** Simplifier: 10.462517 Result size of Simplifier iteration=1 10.462582 = {terms: 233,348, types: 118,990, coercions: 771} 11.290064 Result size of Simplifier 11.290130 = {terms: 233,340, types: 118,978, coercions: 771} 11.290146 *** Simplifier: 12.081775 Result size of Simplifier iteration=1 12.081839 = {terms: 233,381, types: 119,015, coercions: 771} 13.042842 Result size of Simplifier iteration=2 13.042906 = {terms: 233,356, types: 118,973, coercions: 771} 13.916546 Result size of Simplifier 13.916613 = {terms: 233,356, types: 118,973, coercions: 771} 13.916629 *** Float inwards: 14.237678 Result size of Float inwards 14.237740 = {terms: 233,356, types: 118,973, coercions: 771} 14.237755 *** Called arity analysis: 14.602005 Result size of Called arity analysis 14.602069 = {terms: 233,356, types: 118,973, coercions: 771} 14.602085 *** Simplifier: 15.475958 Result size of Simplifier iteration=1 15.476022 = {terms: 233,350, types: 118,967, coercions: 771} 16.353738 Result size of Simplifier 16.353807 = {terms: 233,350, types: 118,967, coercions: 771} 16.353824 *** Demand analysis: 17.733968 Result size of Demand analysis 17.734033 = {terms: 233,350, types: 118,967, coercions: 771} 17.734050 *** Worker Wrapper binds: 17.862780 Result size of Worker Wrapper binds 17.862858 = {terms: 233,515, types: 119,216, coercions: 771} 17.862873 *** Simplifier: 18.758890 Result size of Simplifier iteration=1 18.758955 = {terms: 233,463, types: 119,172, coercions: 771} 19.651759 Result size of Simplifier iteration=2 19.651822 = {terms: 233,394, types: 119,034, coercions: 771} 20.558257 Result size of Simplifier 20.558318 = {terms: 233,394, types: 119,034, coercions: 771} 20.558329 *** Float out(FOS {Lam = Just 0, Consts = True, OverSatApps = True}): 20.558337 Result size of Float out(FOS {Lam = Just 0, 20.558347 Consts = True, 21.488077 OverSatApps = True}) 21.488144 = {terms: 292,556, types: 150,261, coercions: 771} 21.488160 *** Common sub-expression: 314.182413 Result size of Common sub-expression 314.182475 = {terms: 263,668, types: 149,641, coercions: 219} 314.182488 *** Float inwards: 314.655439 Result size of Float inwards 314.655503 = {terms: 263,668, types: 149,641, coercions: 219} 314.655518 *** Simplifier: 315.557057 Result size of Simplifier iteration=1 315.557125 = {terms: 206,235, types: 120,272, coercions: 219} 316.231673 Result size of Simplifier iteration=2 316.231735 = {terms: 206,236, types: 120,271, coercions: 219} 317.062088 Result size of Simplifier 317.062153 = {terms: 206,236, types: 120,271, coercions: 219} 317.062169 *** Tidy Core: 317.468495 Result size of Tidy Core 317.468565 = {terms: 206,104, types: 120,130, coercions: 219} 317.473975 *** CorePrep: 318.294956 Result size of CorePrep 318.295026 = {terms: 206,396, types: 120,491, coercions: 219} 318.295038 *** Stg2Stg: 318.295050 *** CodeGen: 323.746799 *** Assembler: 323.934329 *** Deleting temp files: 323.934396 Warning: deleting non-existent /tmp/ghc16466_0/ghc16466_3.c 323.934873 *** Deleting temp dirs: real 5m24.015s user 5m23.707s sys 0m0.489s
- Developer
great. Do you have the equivalent output with 7.8.
(I'm still puzzled why my compilation is fast and yours is slow.)
- Author Maintainer
Have you tried my exact same steps?
Btw, tomorrow I can throw a
prof
-compiled GHC atLexer.hs
, if you tell me exactly what kind of output you're interested in...Here's the equivalent output to ticket:9961#comment:94984 for GHC 7.8.4:
0.003879 Glasgow Haskell Compiler, Version 7.8.4, stage 2 booted by GHC version 7.6.3 0.003990 Using binary package database: /opt/ghc/7.8.4/lib/ghc-7.8.4/package.conf.d/package.cache 0.013057 Using binary package database: /home/hvr/.ghc/x86_64-linux-7.8.4/package.conf.d/package.cache 0.022682 hiding package shelly-1.5.6 to avoid conflict with later version shelly-1.6.0 0.022748 wired-in package ghc-prim mapped to ghc-prim-0.3.1.0-a24f9c14c632d75b683d0f93283aea37 0.022836 wired-in package integer-gmp mapped to integer-gmp-0.5.1.0-26579559b3647acf4f01d5edd9491a46 0.022874 wired-in package base mapped to base-4.7.0.2-bfd89587617e381ae01b8dd7b6c7f1c1 0.022938 wired-in package rts mapped to builtin_rts 0.022964 wired-in package template-haskell mapped to template-haskell-2.9.0.0-6d27c2b362b15abb1822f2f34b9ae7f9 0.022982 wired-in package dph-seq not found. 0.023020 wired-in package dph-par not found. 0.023869 Created temporary directory: /tmp/ghc12579_0 0.023918 *** C pre-processor: 0.029089 *** Checking old interface for main:Lexer: 0.029165 *** Parser: 0.103668 *** Renamer/typechecker: 0.901522 *** Desugar: 1.507213 Result size of Desugar (after optimization) 1.507273 = {terms: 97,707, types: 6,407, coercions: 4} 1.507283 *** Simplifier: 1.722230 Result size of Simplifier iteration=1 1.722290 = {terms: 92,782, types: 1,457, coercions: 31} 2.014619 Result size of Simplifier iteration=2 2.014680 = {terms: 92,784, types: 1,469, coercions: 40} 2.251739 Result size of Simplifier 2.251799 = {terms: 92,784, types: 1,469, coercions: 40} 2.251808 *** Specialise: 2.308621 Result size of Specialise 2.308682 = {terms: 92,784, types: 1,469, coercions: 40} 2.308692 *** Float out(FOS {Lam = Just 0, Consts = True, PAPs = False}): 2.308699 Result size of Float out(FOS {Lam = Just 0, 2.308715 Consts = True, 2.721925 PAPs = False}) 2.721984 = {terms: 161,685, types: 37,280, coercions: 40} 2.721993 *** Float inwards: 2.806740 Result size of Float inwards 2.806800 = {terms: 161,685, types: 37,280, coercions: 40} 2.806810 *** Simplifier: 4.854193 Result size of Simplifier iteration=1 4.854254 = {terms: 89,091, types: 2,930, coercions: 1,043} 5.037276 Result size of Simplifier iteration=2 5.037336 = {terms: 88,743, types: 2,545, coercions: 770} 5.299382 Result size of Simplifier iteration=3 5.299442 = {terms: 88,739, types: 2,542, coercions: 767} 5.557920 Result size of Simplifier 5.557980 = {terms: 88,739, types: 2,542, coercions: 767} 5.557989 *** Simplifier: 5.828540 Result size of Simplifier iteration=1 5.828599 = {terms: 88,694, types: 32,129, coercions: 767} 6.148168 Result size of Simplifier 6.148227 = {terms: 88,686, types: 32,117, coercions: 767} 6.148237 *** Simplifier: 6.428055 Result size of Simplifier iteration=1 6.428115 = {terms: 88,707, types: 32,132, coercions: 781} 6.708043 Result size of Simplifier iteration=2 6.708103 = {terms: 88,707, types: 32,132, coercions: 767} 6.996411 Result size of Simplifier 6.996470 = {terms: 88,707, types: 32,132, coercions: 767} 6.996480 *** Demand analysis: 7.259522 Result size of Demand analysis 7.259587 = {terms: 88,707, types: 32,132, coercions: 767} 7.259599 *** Worker Wrapper binds: 7.309554 Result size of Worker Wrapper binds 7.309618 = {terms: 88,872, types: 32,381, coercions: 767} 7.309631 *** Simplifier: 7.583439 Result size of Simplifier iteration=1 7.583499 = {terms: 88,820, types: 32,337, coercions: 767} 7.867272 Result size of Simplifier iteration=2 7.867333 = {terms: 88,751, types: 32,199, coercions: 767} 8.146394 Result size of Simplifier 8.146453 = {terms: 88,751, types: 32,199, coercions: 767} 8.146462 *** Float out(FOS {Lam = Just 0, Consts = True, PAPs = True}): 8.146469 Result size of Float out(FOS {Lam = Just 0, 8.146486 Consts = True, 8.732691 PAPs = True}) 8.732752 = {terms: 205,891, types: 122,523, coercions: 767} 8.732761 *** Common sub-expression: 9.101395 Result size of Common sub-expression 9.101457 = {terms: 177,032, types: 121,927, coercions: 217} 9.101466 *** Float inwards: 9.207068 Result size of Float inwards 9.207128 = {terms: 177,032, types: 121,927, coercions: 217} 9.207137 *** Simplifier: 9.811384 Result size of Simplifier iteration=1 9.811444 = {terms: 119,608, types: 92,568, coercions: 217} 10.273151 Result size of Simplifier iteration=2 10.273205 = {terms: 119,609, types: 92,567, coercions: 217} 10.829669 Result size of Simplifier 10.829729 = {terms: 119,609, types: 92,567, coercions: 217} 10.829739 *** Tidy Core: 11.177167 Result size of Tidy Core 11.177231 = {terms: 119,673, types: 92,627, coercions: 217} 11.207078 *** CorePrep: 11.549034 Result size of CorePrep 11.549100 = {terms: 119,974, types: 92,991, coercions: 217} 11.549129 *** Stg2Stg: 11.549139 *** CodeOutput: 11.549306 *** New CodeGen: 11.549336 *** CPSZ: 11.549480 *** CPSZ: 11.550487 *** CPSZ: (...almost 30000 lines of uninteresting "*** CPSZ:" removed...) 13.468961 *** CPSZ: 13.469193 *** CPSZ: 13.469423 *** CPSZ: 13.473052 *** Assembler: 13.611920 *** Deleting temp files: 13.612006 Warning: deleting non-existent /tmp/ghc12579_0/ghc12579_3.c 13.612314 *** Deleting temp dirs:
- Author Maintainer
Attached file
ghc-stage2.prof
(download).+RTS -p
output - Author Maintainer
I just realised I had already build a
prof
GHC earlier, so I just ran it with$ time ../inplace/bin/ghc-stage2 +RTS -p -RTS -O -fforce-recomp -c Lexer.hs real 6m23.129s user 6m22.575s sys 0m0.723s
and attached the resulting attachment:ghc-stage2.prof
- Developer
Peter, could you look at this please?
lintAnnots
(whatever that is, but it's yours) is taking more than 80% of GHC's allocation!!!One thought might be to make it flag-enabled. (And not
-dcore-lint
, because we have that on a lot.) But another question is: does it have to be so expensive?Simon
Sorry for late reaction, somehow missed the mail.
lintAnnots
should only do something with-dannot-lint
enabled. It will then be *very* expensive, because it runs every Core pass twice, and diff'ing Core can be almost arbitrarily costly.That being said, without
-dannot-lint
we should really havelintAnnots pname pass guts = pass guts
. Not quite sure though how we get that much individual allocation on it, I'll investigate.lintAnnots
is definitely not the villain here: It doesn't change performance if I simply short-cut it out. All cost then lands inCore2Core
. Reason is thatFloatOut
floats out the bodies ofdoCorePass
past the cost-centre annotations. This is because the bodies are partially applied functions (soexprIsHNF
is set), which makes floating treat it like a constructor application. I'm pretty sure that's wrong, and ezyang's tick semantics seem to agree.At this point I'm still trying to understand how this got introduced (most likely my fault in some way). If my diagnosis is right, we'll probably need a new version of
exprIsHNF
ormkTick
. In any case, while this definitely needs to be fixed, it's not the cause for this slow-down.- thoughtpolice changed weight to 10
changed weight to 10
Trac metadata
Trac field Value Priority normal → highest Attached file
fix-9961.patch
(download).Proposed fix