#10800 closed bug (fixed)
vector-0.11 compile time increased substantially with 7.10.1
Reported by: | bgamari | Owned by: | bgamari |
---|---|---|---|
Priority: | high | Milestone: | 8.0.1 |
Component: | Compiler | Version: | 7.10.2 |
Keywords: | Cc: | dolio | |
Operating System: | Unknown/Multiple | Architecture: | Unknown/Multiple |
Type of failure: | Compile-time performance bug | Test Case: | |
Blocked By: | Blocking: | ||
Related Tickets: | #13535 | Differential Rev(s): | |
Wiki Page: |
Description
vector-0.11 takes substantially longer to compile with 7.10 than 7.8
Unfortunately the issue appears to be in generated code.
Lacking time right now so I'm just going to drop the IRC log here,
Compile log: https://travis-ci.org/haskell/vector/builds/56179052
* hvr thought some compile-time improvements landed in 7.10.2 <dolio> I don't think it uses sufficiently less memory to prevent it from using all memory on travis and failing. <dolio> Which is why it suddenly went from 8 minutes to 35. <dolio> Although it's also significantly worse even when you have enough memory. <dolio> hvr: 7.8.4 takes 4m40s, 7.10.2 takes 16m40s. <dolio> i7 4770, 32GB RAM. <dolio> vector 0.11 branch. <dolio> It's not as bad in the 0.10 branch. <dolio> It uses like 60% more memory, too. <dolio> Maybe more. <dolio> It's really just one or two files in the test suite that take all the time/memory. <hvr> dolio: does vector do some aggressive INLINEing? <dolio> Well, yes. <hvr> more in 0.11 than in 0.10? <dolio> It does more stuff in 0.11, yes. <bgamari> vector 0.11 has this new chunked streaming abstraction IIRC <hvr> there's a generic one though iirc <dolio> 0.11 introduced the generalized stream fusion, so there's like three different things going on that specialization has to select one of. <dolio> Instead of just one thing like in 0.10. <dolio> Anyhow, the problem file(s) generate a bunch of tests using template Haskell, so they're pretty opaque. <dolio> I'm pretty sure it's the generated code that's the problem, though. <dolio> Not the template haskell that generates it.
Change History (23)
comment:1 Changed 3 years ago by
Summary: | vector-0.11 → vector-0.11 compile time increased substantially with 7.10.1 |
---|
comment:2 Changed 3 years ago by
Priority: | normal → highest |
---|
comment:4 Changed 3 years ago by
The culprit appears to be tests/Tests/Vector.hs
, which seems to blow up in the simplifier with GHC 7.10.1,
7.10.1
[5 of 6] Compiling Tests.Vector ( tests/Tests/Vector.hs, dist/dist-sandbox-7565eaee/build/vector-tests-O2/vector-tests-O2-tmp/Tests/Vector.o ) *** Parser: *** Renamer/typechecker: *** Simplify: *** CorePrep: *** ByteCodeGen: ... *** Simplify: *** CorePrep: *** ByteCodeGen: *** Simplify: *** CorePrep: *** ByteCodeGen: *** Simplify: *** CorePrep: *** ByteCodeGen: *** Simplify: *** CorePrep: *** ByteCodeGen: *** Simplify: *** CorePrep: *** ByteCodeGen: *** Simplify: *** CorePrep: *** ByteCodeGen: *** Simplify: *** CorePrep: *** ByteCodeGen: *** Simplify: *** CorePrep: *** ByteCodeGen: *** Simplify: *** CorePrep: *** ByteCodeGen: *** Simplify: *** CorePrep: *** ByteCodeGen: *** Simplify: *** CorePrep: *** ByteCodeGen: tests/Tests/Vector.hs:20:1: Warning: The import of ‘Data.Monoid’ is redundant except perhaps to import instances from ‘Data.Monoid’ To import instances alone, use: import Data.Monoid() tests/Tests/Vector.hs:405:5: Warning: Defined but not used: ‘limitUnfolds’ tests/Tests/Vector.hs:408:5: Warning: Defined but not used: ‘prop_unfoldr’ tests/Tests/Vector.hs:615:1: Warning: Defined but not used: ‘testBoolUnboxedVector’ *** Desugar: Result size of Desugar (after optimization) = {terms: 8,929, types: 29,113, coercions: 12,626} *** Simplifier: Result size of Simplifier iteration=1 = {terms: 7,718, types: 22,842, coercions: 12,683} Result size of Simplifier iteration=2 = {terms: 7,645, types: 22,550, coercions: 12,627} Result size of Simplifier = {terms: 7,645, types: 22,550, coercions: 12,627} *** Specialise: Result size of Specialise = {terms: 158,601, types: 473,559, coercions: 266,978} *** Float out(FOS {Lam = Just 0, Consts = True, OverSatApps = False}): Result size of Float out(FOS {Lam = Just 0, Consts = True, OverSatApps = False}) = {terms: 191,838, types: 610,090, coercions: 266,978} *** Simplifier: Result size of Simplifier iteration=1 = {terms: 341,716, types: 752,822, coercions: 950,308} Result size of Simplifier iteration=2 = {terms: 341,082, types: 678,899, coercions: 700,199} Result size of Simplifier iteration=3 = {terms: 324,911, types: 649,679, coercions: 477,703} Result size of Simplifier iteration=4 = {terms: 323,711, types: 646,845, coercions: 470,304} Result size of Simplifier = {terms: 323,711, types: 646,845, coercions: 470,304} *** Simplifier: Result size of Simplifier iteration=1 = {terms: 2,421,941, types: 4,434,113, coercions: 2,600,098} Result size of Simplifier iteration=2 = {terms: 1,532,846, types: 2,868,287, coercions: 1,167,104} Result size of Simplifier iteration=3 = {terms: 1,333,796, types: 2,347,850, coercions: 910,168} Result size of Simplifier iteration=4 = {terms: 1,326,682, types: 2,333,558, coercions: 893,473} Result size of Simplifier = {terms: 1,326,682, types: 2,333,558, coercions: 893,473} *** Simplifier: Result size of Simplifier iteration=1 = {terms: 2,517,661, types: 3,700,180, coercions: 1,244,341} Result size of Simplifier iteration=2 = {terms: 1,648,651, types: 2,362,735, coercions: 1,118,836} Result size of Simplifier iteration=3 = {terms: 1,533,366, types: 2,160,536, coercions: 1,018,335} Result size of Simplifier iteration=4 = {terms: 1,531,057, types: 2,158,142, coercions: 1,017,409} Result size of Simplifier = {terms: 1,531,057, types: 2,158,142, coercions: 1,017,409} *** Float inwards: Result size of Float inwards = {terms: 1,531,057, types: 2,158,142, coercions: 1,017,409} *** Called arity analysis: Result size of Called arity analysis = {terms: 1,531,057, types: 2,158,142, coercions: 1,017,409} *** Simplifier: Result size of Simplifier iteration=1 = {terms: 1,525,383, types: 2,148,094, coercions: 1,012,837} Result size of Simplifier iteration=2 = {terms: 1,519,162, types: 2,137,087, coercions: 1,009,451} Result size of Simplifier iteration=3 = {terms: 1,518,511, types: 2,135,070, coercions: 1,008,419} Result size of Simplifier iteration=4 = {terms: 1,517,741, types: 2,133,682, coercions: 1,008,349} Result size of Simplifier = {terms: 1,517,741, types: 2,133,682, coercions: 1,008,349} *** Demand analysis: Result size of Demand analysis = {terms: 1,517,741, types: 2,133,682, coercions: 1,008,349} *** Worker Wrapper binds: Result size of Worker Wrapper binds = {terms: 1,638,656, types: 2,267,189, coercions: 1,016,585} *** Simplifier: Result size of Simplifier iteration=1 = {terms: 1,581,976, types: 2,235,934, coercions: 1,015,794} Result size of Simplifier iteration=2 = {terms: 1,498,911, types: 2,130,083, coercions: 1,007,028} Result size of Simplifier iteration=3 = {terms: 1,498,641, types: 2,129,656, coercions: 1,007,547} Result size of Simplifier iteration=4 = {terms: 1,498,663, types: 2,129,564, coercions: 1,008,931} Result size of Simplifier = {terms: 1,498,663, types: 2,129,564, coercions: 1,008,931} *** Float out(FOS {Lam = Just 0, Consts = True, OverSatApps = True}): Result size of Float out(FOS {Lam = Just 0, Consts = True, OverSatApps = True}) = {terms: 1,574,915, types: 2,289,456, coercions: 1,008,931} *** Common sub-expression: Result size of Common sub-expression = {terms: 1,437,681, types: 2,110,200, coercions: 999,668} *** Float inwards: Result size of Float inwards = {terms: 1,437,681, types: 2,110,200, coercions: 999,668} *** Liberate case: Result size of Liberate case = {terms: 1,460,235, types: 2,122,207, coercions: 1,002,808} *** Simplifier: Result size of Simplifier iteration=1 = {terms: 1,409,789, types: 2,018,248, coercions: 1,006,095} Result size of Simplifier iteration=2 = {terms: 1,409,904, types: 2,020,763, coercions: 1,004,750} Result size of Simplifier iteration=3 = {terms: 1,407,598, types: 2,016,738, coercions: 1,005,425} Result size of Simplifier iteration=4 = {terms: 1,407,068, types: 2,015,811, coercions: 1,003,311} Result size of Simplifier = {terms: 1,407,068, types: 2,015,811, coercions: 1,003,311} *** SpecConstr: Result size of SpecConstr = {terms: 2,875,513, types: 4,153,092, coercions: 1,868,890} *** Simplifier: Result size of Simplifier iteration=1 = {terms: 2,811,029, types: 3,774,293, coercions: 1,781,476} Result size of Simplifier iteration=2 = {terms: 1,657,609, types: 2,285,271, coercions: 1,158,260} Result size of Simplifier iteration=3 = {terms: 1,634,476, types: 2,261,260, coercions: 1,111,721} Result size of Simplifier iteration=4 = {terms: 1,623,734, types: 2,229,084, coercions: 1,097,616} Result size of Simplifier = {terms: 1,623,734, types: 2,229,084, coercions: 1,097,616} *** Tidy Core: Result size of Tidy Core = {terms: 1,374,780, types: 1,868,771, coercions: 949,928} writeBinIface: 12038 Names writeBinIface: 14118 dict entries writeBinIface: 12038 Names writeBinIface: 14118 dict entries *** CorePrep: Result size of CorePrep = {terms: 1,742,444, types: 2,159,089, coercions: 949,848} *** Stg2Stg: *** CodeGen: *** Assembler: /usr/bin/gcc -fno-stack-protector -DTABLES_NEXT_TO_CODE -Itests/Tests -Idist/dist-sandbox-7565eaee/build/vector-tests-O2/vector-tests-O2-tmp -Idist/dist-sandbox-7565eaee/build/vector-tests-O2/vector-tests-O2-tmp -Idist/dist-sandbox-7565eaee/build/autogen -Idist/dist-sandbox-7565eaee/build/vector-tests-O2/vector-tests-O2-tmp -x assembler -c /tmp/ghc32343_0/ghc32343_13.s -o dist/dist-sandbox-7565eaee/build/vector-tests-O2/vector-tests-O2-tmp/Tests/Vector.o *** CorePrep: Result size of CorePrep = {terms: 1,742,444, types: 2,159,089, coercions: 949,848}
This is a substantial change from 7.8.3, which behaves as follows,
7.8.3
[5 of 6] Compiling Tests.Vector ( tests/Tests/Vector.hs, dist/dist-sandbox-8a24a56a/build/vector-tests-O2/vector-tests-O2-tmp/Tests/Vector.o ) *** Parser: *** Renamer/typechecker: *** Simplify: *** CorePrep: *** ByteCodeGen: *** Simplify: *** CorePrep: *** ByteCodeGen: *** Simplify: *** CorePrep: *** ByteCodeGen: *** Simplify: *** CorePrep: *** ByteCodeGen: *** Simplify: *** CorePrep: *** ByteCodeGen: *** Simplify: *** CorePrep: *** ByteCodeGen: *** Simplify: *** CorePrep: *** ByteCodeGen: *** Simplify: *** CorePrep: *** ByteCodeGen: *** Simplify: *** CorePrep: *** ByteCodeGen: *** Simplify: *** CorePrep: *** ByteCodeGen: *** Simplify: *** CorePrep: *** ByteCodeGen: *** Simplify: *** CorePrep: *** ByteCodeGen: tests/Tests/Vector.hs:405:5: Warning: Defined but not used: ‘limitUnfolds’ tests/Tests/Vector.hs:408:5: Warning: Defined but not used: ‘prop_unfoldr’ tests/Tests/Vector.hs:615:1: Warning: Defined but not used: ‘testBoolUnboxedVector’ *** Desugar: Result size of Desugar (after optimization) = {terms: 11,244, types: 57,883, coercions: 7,302} *** Simplifier: Result size of Simplifier iteration=1 = {terms: 9,193, types: 44,105, coercions: 7,337} Result size of Simplifier iteration=2 = {terms: 9,116, types: 43,718, coercions: 7,159} Result size of Simplifier = {terms: 9,116, types: 43,718, coercions: 7,159} *** Specialise: Result size of Specialise = {terms: 82,833, types: 519,765, coercions: 78,989} *** Float out(FOS {Lam = Just 0, Consts = True, PAPs = False}): Result size of Float out(FOS {Lam = Just 0, Consts = True, PAPs = False}) = {terms: 100,958, types: 568,910, coercions: 78,989} *** Float inwards: Result size of Float inwards = {terms: 100,958, types: 568,910, coercions: 78,989} *** Simplifier: Result size of Simplifier iteration=1 = {terms: 179,170, types: 343,254, coercions: 645,024} Result size of Simplifier iteration=2 = {terms: 123,469, types: 170,930, coercions: 259,665} Result size of Simplifier iteration=3 = {terms: 113,030, types: 155,318, coercions: 230,494} Result size of Simplifier iteration=4 = {terms: 112,745, types: 154,554, coercions: 221,711} Result size of Simplifier = {terms: 112,745, types: 154,554, coercions: 221,711} *** Simplifier: Result size of Simplifier iteration=1 = {terms: 515,401, types: 809,314, coercions: 598,712} Result size of Simplifier iteration=2 = {terms: 345,742, types: 520,083, coercions: 376,036} Result size of Simplifier iteration=3 = {terms: 294,322, types: 416,385, coercions: 312,220} Result size of Simplifier iteration=4 = {terms: 287,445, types: 402,244, coercions: 297,516} Result size of Simplifier = {terms: 287,445, types: 402,244, coercions: 297,516} *** Simplifier: Result size of Simplifier iteration=1 = {terms: 453,076, types: 567,575, coercions: 280,117} Result size of Simplifier iteration=2 = {terms: 321,815, types: 378,160, coercions: 222,556} Result size of Simplifier iteration=3 = {terms: 309,758, types: 365,364, coercions: 219,987} Result size of Simplifier iteration=4 = {terms: 308,628, types: 364,331, coercions: 219,583} Result size of Simplifier = {terms: 308,628, types: 364,331, coercions: 219,583} *** Demand analysis: Result size of Demand analysis = {terms: 308,628, types: 364,331, coercions: 219,583} *** Worker Wrapper binds: Result size of Worker Wrapper binds = {terms: 352,621, types: 419,343, coercions: 223,102} *** Simplifier: Result size of Simplifier iteration=1 = {terms: 334,327, types: 396,352, coercions: 217,632} Result size of Simplifier iteration=2 = {terms: 297,776, types: 344,782, coercions: 211,762} Result size of Simplifier iteration=3 = {terms: 297,134, types: 343,498, coercions: 211,068} Result size of Simplifier iteration=4 = {terms: 296,791, types: 342,860, coercions: 211,033} Result size of Simplifier = {terms: 296,791, types: 342,860, coercions: 211,033} *** Float out(FOS {Lam = Just 0, Consts = True, PAPs = True}): Result size of Float out(FOS {Lam = Just 0, Consts = True, PAPs = True}) = {terms: 311,717, types: 380,250, coercions: 211,033} *** Common sub-expression: Result size of Common sub-expression = {terms: 281,729, types: 346,991, coercions: 196,315} *** Float inwards: Result size of Float inwards = {terms: 281,729, types: 346,991, coercions: 196,315} *** Liberate case: Result size of Liberate case = {terms: 301,329, types: 362,935, coercions: 200,458} *** Simplifier: Result size of Simplifier iteration=1 = {terms: 281,187, types: 322,484, coercions: 200,278} Result size of Simplifier iteration=2 = {terms: 279,511, types: 320,001, coercions: 199,549} Result size of Simplifier = {terms: 279,511, types: 320,001, coercions: 199,549} *** SpecConstr: Result size of SpecConstr = {terms: 398,168, types: 420,420, coercions: 230,461} *** Simplifier: Result size of Simplifier iteration=1 = {terms: 390,005, types: 411,465, coercions: 230,270} Result size of Simplifier iteration=2 = {terms: 272,588, types: 315,504, coercions: 201,634} Result size of Simplifier iteration=3 = {terms: 269,759, types: 310,458, coercions: 200,366} Result size of Simplifier = {terms: 269,759, types: 310,458, coercions: 200,366} *** Tidy Core: Result size of Tidy Core = {terms: 269,759, types: 310,458, coercions: 200,366} writeBinIface: 518 Names writeBinIface: 694 dict entries writeBinIface: 518 Names writeBinIface: 694 dict entries *** CorePrep: Result size of CorePrep = {terms: 341,307, types: 360,564, coercions: 200,286}
comment:6 Changed 3 years ago by
I have cut down the tests/Tests/Vector.hs
file from vector
commit e000d6c50e68f539a6f6cfa35fe91350a0691499
to something a bit more minimal that may still exhibit a similar slow-down. The result is the following,
module Tests.Vector (tests) where import Boilerplater import Utilities as Util import qualified Data.Vector.Generic as V import qualified Data.Vector import Test.QuickCheck import Test.Framework import Test.Framework.Providers.QuickCheck2 import Data.List #define COMMON_CONTEXT(a, v) \ VANILLA_CONTEXT(a, v), VECTOR_CONTEXT(a, v) #define VANILLA_CONTEXT(a, v) \ Eq a, Show a, Arbitrary a, CoArbitrary a, TestData a, Model a ~ a, EqTest a ~ Property #define VECTOR_CONTEXT(a, v) \ Eq (v a), Show (v a), Arbitrary (v a), CoArbitrary (v a), TestData (v a), Model (v a) ~ [a], EqTest (v a) ~ Property, V.Vector v a testTuplyFunctions:: forall a v. (COMMON_CONTEXT(a, v), VECTOR_CONTEXT((a, a), v), VECTOR_CONTEXT((a, a, a), v)) => v a -> [Test] testTuplyFunctions _ = [testProperty "unzip3" prop_unzip3] where prop_unzip3 :: P (v (a, a, a) -> (v a, v a, v a)) = V.unzip3 `eq` unzip3 tests = [ testGroup "Data.Vector.Vector (Bool)" (testTuplyFunctions (undefined :: Data.Vector.Vector Bool)) ]
GHC 7.10.1 takes over 30% longer than 7.8.3 with this simple example, with the program size peaking at 10000 terms during specialization, as opposed to 7.8 which never goes above 400 terms.
comment:7 Changed 3 years ago by
Well this potentially is interesting...
vector
's test suite defines a type class TestData
,
class (Testable (EqTest a), Conclusion (EqTest a)) => TestData a where type Model a model :: a -> Model a unmodel :: Model a -> a type EqTest a equal :: a -> a -> EqTest a
The -ddump-simpl
output produced by 7.10 contains a variety of bindings referring to types containing Model
, e.g.,
lvl18 :: Model (Vector (Bool, Bool, Bool) -> (Vector Bool, Vector Bool, Vector Bool)) -> Vector (Bool, Bool, Bool) -> (Vector Bool, Vector Bool, Vector Bool)
whereas the Core produced by 7.8 contains no such references. I'm not sure whether this is problematic or not
The simplified testcase above produces 6500 lines of Core in 7.10 and around 800 with 7.8. Despite the failure to resolve the Model
type family, 7.10 seems to generally be doing more simplification. I believe much of the difference is due to the fact that 7.10 inlines the $fTestData(,,)
dictionary whereas 7.8 does not.
comment:8 Changed 3 years ago by
-ddump-inlinings
reveals that there are indeed 7.10 is doing a great deal more inlining than 7.8. Right off the bat 7.10 inlines a variety of methods from the Applicative
et al. which never appear to be inlined under 7.8,
Inlining done: Data.Vector.Fusion.Util.$fApplicativeId_$cpure Inlining done: Data.Vector.Fusion.Util.$fApplicativeId4 Inlining done: Data.Vector.Fusion.Util.$fApplicativeId_$cpure Inlining done: Data.Vector.Fusion.Util.$fApplicativeId4 Inlining done: Data.Vector.Fusion.Util.$fApplicativeId_$cpure Inlining done: Data.Vector.Fusion.Util.$fApplicativeId4 Inlining done: Data.Vector.Fusion.Util.$fApplicativeId_$cpure Inlining done: Data.Vector.Fusion.Util.$fApplicativeId4 Inlining done: GHC.Show.$fShow(,,)_$cshowList Inlining done: GHC.Show.$fShow(,,)_$cshow Inlining done: Test.QuickCheck.Arbitrary.$fArbitrary(,,)_$cshrink Inlining done: Test.QuickCheck.Arbitrary.$fArbitrary(,,)_$carbitrary Inlining done: Utilities.$fTestDataVector2_$cequal
comment:9 Changed 3 years ago by
Compiling vector
master with HEAD using a devel2 build (which means a stage2 compiler built with -DDEBUG
), results in a lots of warnings such as the following:
*** Core Lint warnings : in result of Simplifier *** libraries/vector/Data/Vector/Generic.hs:221:1: warning: [RHS of length :: forall (v_a3hG :: * -> *) a_a3hH. Vector v_a3hG a_a3hH => v_a3hG a_a3hH -> Int] INLINE binder is (non-rule) loop breaker: length libraries/vector/Data/Vector/Generic.hs:1891:1: warning: [RHS of unsafeCopy :: forall (v_a38Q :: * -> *) (m_a38R :: * -> *) a_a38S. (PrimMonad m_a38R, Vector v_a38Q a_a38S) => Mutable v_a38Q (PrimState m_a38R) a_a38S -> v_a38Q a_a38S -> m_a38R ()] INLINE binder is (non-rule) loop breaker: unsafeCopy
Maybe this gives a clue to what the problem is.
comment:10 Changed 3 years ago by
Cc: | dolio added |
---|
comment:11 Changed 3 years ago by
I have noticed that there is a qualitative difference in the verbose-core2core output between 7.8.4 and 7.10.3: namely 7.8 lacks an initial gentle simplifier pass. In the case of 7.10.3 the compiler inlines Data.Vector.Generic.zip
in the SimplMode {Phase = 2 [main], inline, rules, eta-expand, case-of-case}
pass, producing a significant amount of unresolved stream
/unstream
pairs relatively early in simplification. The last appearance of unstream
doesn't disappear until SimplMode {Phase = 0 [main], inline, rules, eta-expand, case-of-case}
.
By contrast, 7.8.4 doesn't perform this inlining until much later (right before the second FloatOut; it's a bit hard to tell precisely since verbose-core2core
doesn't output simplifier passes). As far as I can tell, it manages to collapse all of the stream fusion apparatus in a single simplifier pass.
I believe this may be the reason for the difference although it is quite late so take this with a grain of salt.
comment:12 follow-up: 16 Changed 2 years ago by
I have ghc 7.8.4, 7.10.3 and 8.0.0.20160316 installed on my machine. I cloned the vector git repo, set up a cabal sandbox and then for each ghc, cabal install --dependencies-only
. Finally for each ghc I did cabal clean && time cabal build
.
I've timed this with each compiler three times and this result is pretty representative:
Compiler real user sys ---------------------------------------------------- 7.8.4 3m26.106s 4m42.508s 1m40.868s 7.10.3 4m32.690s 6m18.956s 3m6.416s 8.0.0.20160316 1m19.510s 1m52.796s 0m47.992s
It seems this partially fixed between 7.10.1 and 7.10.3 anyway, and 8.0 is far better than either of the old releases.
comment:14 Changed 2 years ago by
From IRC:
<bgamari> erikd, IIRC I have a more minimal testcase as well <bgamari> the problem centers around zip <bgamari> when things are inlined has changed <bgamari> which results in a large number of stream fusion primitives that can't be resolved until late in compilation <bgamari> which produces a large amount of code <bgamari> erikd, that should at least be a hint <bgamari> unfortunately I ran out of time before identifying the root cause <bgamari> erikd, https://gist.github.com/bgamari/2ca0f7d2a4e0b6447aee <bgamari> erikd, that's my minimized testcase <erikd> thanks, i'll check it out <bgamari> It helps to look at the -dverbose-core2core output <bgamari> comparing between the relevant compiler versions <bgamari> erikd, this utility may come in handy for turning the verbose-core2core output into something less unwieldy <bgamari> https://github.com/bgamari/ghc-utils/blob/master/split-core2core.py
comment:15 Changed 2 years ago by
Replying to simonpj:
Yay!
Certainly great that the compile times have improved, but @Bgamari points out that the original problem was about compilng the test suite. More investigation still needed (tomorrow).
comment:16 Changed 2 years ago by
Much like for comment 12, set myself up so I could build the vector test suite with ghc versions 7.8.4, 7.10.3 and 8.0.0.20160316 and then with each compiler ran cabal clean && time cabal test
a number of times with each.
This is the result of a single run, but is representative of this test:
Compiler real user sys 7.8.4 1m38.167s 2m03.336s 0m43.800s 7.10.3 8m59.597s 11m40.756s 5m37.828s 8.0.0 0m45.165s 1m0.148s 0m23.744s
The 8.0 compiler has not only fixed the compile time regression from 7.8 to 7.10, its actually about twice as fast as 7.8.
Just as validation I also timed the runtime of the test suites compiled by each of the three compilers and there didn't seem to be any measurable difference in run time performance.
comment:17 Changed 2 years ago by
Hmm, very interesting. Indeed it has been quite a while since I've checked this issue with 8.0. Judging by comment:5 it seems this was still reproducible on master
in September or so. Ideally it would be nice to know what fixed this, but at least we don't need to worry about this for 8.0.
Just to make sure we weren't getting fooled I did my own measurements (timing cabal build
of vector-tests
, also collecting RTS stats with +RTS -s
in one set of measurements), which largely agree with your observations. Yay!
Compiler | Real | user | sys | allocations | max residency |
---|---|---|---|---|---|
8.0.0.20160311 | 1m7.445s | 1m1.257s | 5.938s | ||
1m2.751s | 55.979s | 6.774s | |||
59.469s | 55.507s | 3.959s | |||
58.90s | 56.98s | 1.41s | 39.77e9 bytes | 248.65e6 bytes | |
7.8.4 | 2m2.411s | 1m58.572s | 3.429s | ||
2m3.170s | 2m0.194s | 3.027s | |||
2m2.185s | 2m0.342s | 1.834s | |||
2m1.44s | 1m59.81s | 1.61s | 106.07e9 bytes | 369.51e6 bytes | |
7.10.3 | 11m40.961s | 11m38.834s | 6.592s | ||
11m29.73s | 11m28.62s | 5.47s | 454.467e9 bytes | 70.714e6 bytes |
Thanks for looking at this Erik!
comment:18 Changed 2 years ago by
Milestone: | 8.0.1 → 8.0.2 |
---|---|
Priority: | highest → high |
comment:19 Changed 2 years ago by
Resolution: | → fixed |
---|---|
Status: | new → closed |
comment:20 Changed 2 years ago by
Milestone: | 8.0.2 → 8.0.1 |
---|
comment:21 Changed 2 years ago by
Interesting. Do you by chance have 7.6.3 to compare with? 7.8 was already significantly worse than it.
bgamari, is your max residency for 7.10 off? The figures for both 7.8 and 7.10 seem low to me, really. When I build the test suite on 7.10 I get like 11 GB max residency, and I'm not sure it would ever complete if I limited it to ~400 MB. Unless your test is a smaller case derived from the real test suite.
comment:22 Changed 2 years ago by
I do have 7.6.3 installed, but it seems to have become broken since last time I tried. I think its actually a cabal issue.
comment:23 Changed 16 months ago by
Related Tickets: | → #13535 |
---|
While it's not clear whether it's related in cause, #13535 also reports a compile-time regression in Tests.Vector
, this time in GHC 8.2.
Let's be sure to look at this; increasing priority.