Opened 2 years ago

Closed 20 months ago

Last modified 7 months ago

#10800 closed bug (fixed)

vector-0.11 compile time increased substantially with 7.10.1

Reported by: bgamari Owned by: bgamari
Priority: high Milestone: 8.0.1
Component: Compiler Version: 7.10.2
Keywords: Cc: dolio
Operating System: Unknown/Multiple Architecture: Unknown/Multiple
Type of failure: Compile-time performance bug Test Case:
Blocked By: Blocking:
Related Tickets: #13535 Differential Rev(s):
Wiki Page:

Description

vector-0.11 takes substantially longer to compile with 7.10 than 7.8

Unfortunately the issue appears to be in generated code.

Lacking time right now so I'm just going to drop the IRC log here,

Compile log: https://travis-ci.org/haskell/vector/builds/56179052

* hvr thought some compile-time improvements landed in 7.10.2
<dolio> I don't think it uses sufficiently less memory to prevent it from using all memory on travis and failing.
<dolio> Which is why it suddenly went from 8 minutes to 35.
<dolio> Although it's also significantly worse even when you have enough memory.
<dolio> hvr: 7.8.4 takes 4m40s, 7.10.2 takes 16m40s.
<dolio> i7 4770, 32GB RAM.
<dolio> vector 0.11 branch.
<dolio> It's not as bad in the 0.10 branch.
<dolio> It uses like 60% more memory, too.
<dolio> Maybe more.
<dolio> It's really just one or two files in the test suite that take all the time/memory.
<hvr> dolio: does vector do some aggressive INLINEing?
<dolio> Well, yes.
<hvr> more in 0.11 than in 0.10?
<dolio> It does more stuff in 0.11, yes.
<bgamari> vector 0.11 has this new chunked streaming abstraction IIRC
<hvr> there's a generic one though iirc
<dolio> 0.11 introduced the generalized stream fusion, so there's like three different things going on that specialization has to select one of.
<dolio> Instead of just one thing like in 0.10.
<dolio> Anyhow, the problem file(s) generate a bunch of tests using template Haskell, so they're pretty opaque.
<dolio> I'm pretty sure it's the generated code that's the problem, though.
<dolio> Not the template haskell that generates it.

Change History (23)

comment:1 Changed 2 years ago by bgamari

Summary: vector-0.11vector-0.11 compile time increased substantially with 7.10.1

comment:2 Changed 2 years ago by simonpj

Priority: normalhighest

Let's be sure to look at this; increasing priority.

comment:3 Changed 2 years ago by thoughtpolice

Milestone: 7.12.18.0.1

Milestone renamed

comment:4 Changed 2 years ago by bgamari

The culprit appears to be tests/Tests/Vector.hs, which seems to blow up in the simplifier with GHC 7.10.1,

7.10.1

[5 of 6] Compiling Tests.Vector     ( tests/Tests/Vector.hs, dist/dist-sandbox-7565eaee/build/vector-tests-O2/vector-tests-O2-tmp/Tests/Vector.o )
*** Parser:
*** Renamer/typechecker:
*** Simplify:
*** CorePrep:
*** ByteCodeGen:

...

*** Simplify:
*** CorePrep:
*** ByteCodeGen:
*** Simplify:
*** CorePrep:
*** ByteCodeGen:
*** Simplify:
*** CorePrep:
*** ByteCodeGen:
*** Simplify:
*** CorePrep:
*** ByteCodeGen:
*** Simplify:
*** CorePrep:
*** ByteCodeGen:
*** Simplify:
*** CorePrep:
*** ByteCodeGen:
*** Simplify:
*** CorePrep:
*** ByteCodeGen:
*** Simplify:
*** CorePrep:
*** ByteCodeGen:
*** Simplify:
*** CorePrep:
*** ByteCodeGen:
*** Simplify:
*** CorePrep:
*** ByteCodeGen:
*** Simplify:
*** CorePrep:
*** ByteCodeGen:

tests/Tests/Vector.hs:20:1: Warning:
    The import of ‘Data.Monoid’ is redundant
      except perhaps to import instances from ‘Data.Monoid’
    To import instances alone, use: import Data.Monoid()

tests/Tests/Vector.hs:405:5: Warning:
    Defined but not used: ‘limitUnfolds’

tests/Tests/Vector.hs:408:5: Warning:
    Defined but not used: ‘prop_unfoldr’

tests/Tests/Vector.hs:615:1: Warning:
    Defined but not used: ‘testBoolUnboxedVector’
*** Desugar:
Result size of Desugar (after optimization)
  = {terms: 8,929, types: 29,113, coercions: 12,626}
*** Simplifier:
Result size of Simplifier iteration=1
  = {terms: 7,718, types: 22,842, coercions: 12,683}
Result size of Simplifier iteration=2
  = {terms: 7,645, types: 22,550, coercions: 12,627}
Result size of Simplifier
  = {terms: 7,645, types: 22,550, coercions: 12,627}
*** Specialise:
Result size of Specialise
  = {terms: 158,601, types: 473,559, coercions: 266,978}
*** Float out(FOS {Lam = Just 0, Consts = True, OverSatApps = False}):
Result size of Float out(FOS {Lam = Just 0,
                              Consts = True,
                              OverSatApps = False})
  = {terms: 191,838, types: 610,090, coercions: 266,978}
*** Simplifier:
Result size of Simplifier iteration=1
  = {terms: 341,716, types: 752,822, coercions: 950,308}
Result size of Simplifier iteration=2
  = {terms: 341,082, types: 678,899, coercions: 700,199}
Result size of Simplifier iteration=3
  = {terms: 324,911, types: 649,679, coercions: 477,703}
Result size of Simplifier iteration=4
  = {terms: 323,711, types: 646,845, coercions: 470,304}
Result size of Simplifier
  = {terms: 323,711, types: 646,845, coercions: 470,304}
*** Simplifier:
Result size of Simplifier iteration=1
  = {terms: 2,421,941, types: 4,434,113, coercions: 2,600,098}
Result size of Simplifier iteration=2
  = {terms: 1,532,846, types: 2,868,287, coercions: 1,167,104}
Result size of Simplifier iteration=3
  = {terms: 1,333,796, types: 2,347,850, coercions: 910,168}
Result size of Simplifier iteration=4
  = {terms: 1,326,682, types: 2,333,558, coercions: 893,473}
Result size of Simplifier
  = {terms: 1,326,682, types: 2,333,558, coercions: 893,473}
*** Simplifier:
Result size of Simplifier iteration=1
  = {terms: 2,517,661, types: 3,700,180, coercions: 1,244,341}
Result size of Simplifier iteration=2
  = {terms: 1,648,651, types: 2,362,735, coercions: 1,118,836}
Result size of Simplifier iteration=3
  = {terms: 1,533,366, types: 2,160,536, coercions: 1,018,335}
Result size of Simplifier iteration=4
  = {terms: 1,531,057, types: 2,158,142, coercions: 1,017,409}
Result size of Simplifier
  = {terms: 1,531,057, types: 2,158,142, coercions: 1,017,409}
*** Float inwards:
Result size of Float inwards
  = {terms: 1,531,057, types: 2,158,142, coercions: 1,017,409}
*** Called arity analysis:
Result size of Called arity analysis
  = {terms: 1,531,057, types: 2,158,142, coercions: 1,017,409}
*** Simplifier:
Result size of Simplifier iteration=1
  = {terms: 1,525,383, types: 2,148,094, coercions: 1,012,837}
Result size of Simplifier iteration=2
  = {terms: 1,519,162, types: 2,137,087, coercions: 1,009,451}
Result size of Simplifier iteration=3
  = {terms: 1,518,511, types: 2,135,070, coercions: 1,008,419}
Result size of Simplifier iteration=4
  = {terms: 1,517,741, types: 2,133,682, coercions: 1,008,349}
Result size of Simplifier
  = {terms: 1,517,741, types: 2,133,682, coercions: 1,008,349}
*** Demand analysis:
Result size of Demand analysis
  = {terms: 1,517,741, types: 2,133,682, coercions: 1,008,349}
*** Worker Wrapper binds:
Result size of Worker Wrapper binds
  = {terms: 1,638,656, types: 2,267,189, coercions: 1,016,585}
*** Simplifier:
Result size of Simplifier iteration=1
  = {terms: 1,581,976, types: 2,235,934, coercions: 1,015,794}
Result size of Simplifier iteration=2
  = {terms: 1,498,911, types: 2,130,083, coercions: 1,007,028}
Result size of Simplifier iteration=3
  = {terms: 1,498,641, types: 2,129,656, coercions: 1,007,547}
Result size of Simplifier iteration=4
  = {terms: 1,498,663, types: 2,129,564, coercions: 1,008,931}
Result size of Simplifier
  = {terms: 1,498,663, types: 2,129,564, coercions: 1,008,931}
*** Float out(FOS {Lam = Just 0, Consts = True, OverSatApps = True}):
Result size of Float out(FOS {Lam = Just 0,
                              Consts = True,
                              OverSatApps = True})
  = {terms: 1,574,915, types: 2,289,456, coercions: 1,008,931}
*** Common sub-expression:
Result size of Common sub-expression
  = {terms: 1,437,681, types: 2,110,200, coercions: 999,668}
*** Float inwards:
Result size of Float inwards
  = {terms: 1,437,681, types: 2,110,200, coercions: 999,668}
*** Liberate case:
Result size of Liberate case
  = {terms: 1,460,235, types: 2,122,207, coercions: 1,002,808}
*** Simplifier:
Result size of Simplifier iteration=1
  = {terms: 1,409,789, types: 2,018,248, coercions: 1,006,095}
Result size of Simplifier iteration=2
  = {terms: 1,409,904, types: 2,020,763, coercions: 1,004,750}
Result size of Simplifier iteration=3
  = {terms: 1,407,598, types: 2,016,738, coercions: 1,005,425}
Result size of Simplifier iteration=4
  = {terms: 1,407,068, types: 2,015,811, coercions: 1,003,311}
Result size of Simplifier
  = {terms: 1,407,068, types: 2,015,811, coercions: 1,003,311}
*** SpecConstr:
Result size of SpecConstr
  = {terms: 2,875,513, types: 4,153,092, coercions: 1,868,890}
*** Simplifier:
Result size of Simplifier iteration=1
  = {terms: 2,811,029, types: 3,774,293, coercions: 1,781,476}
Result size of Simplifier iteration=2
  = {terms: 1,657,609, types: 2,285,271, coercions: 1,158,260}
Result size of Simplifier iteration=3
  = {terms: 1,634,476, types: 2,261,260, coercions: 1,111,721}
Result size of Simplifier iteration=4
  = {terms: 1,623,734, types: 2,229,084, coercions: 1,097,616}
Result size of Simplifier
  = {terms: 1,623,734, types: 2,229,084, coercions: 1,097,616}
*** Tidy Core:
Result size of Tidy Core
  = {terms: 1,374,780, types: 1,868,771, coercions: 949,928}
writeBinIface: 12038 Names
writeBinIface: 14118 dict entries
writeBinIface: 12038 Names
writeBinIface: 14118 dict entries
*** CorePrep:
Result size of CorePrep
  = {terms: 1,742,444, types: 2,159,089, coercions: 949,848}
*** Stg2Stg:
*** CodeGen:
*** Assembler:
/usr/bin/gcc -fno-stack-protector -DTABLES_NEXT_TO_CODE -Itests/Tests -Idist/dist-sandbox-7565eaee/build/vector-tests-O2/vector-tests-O2-tmp -Idist/dist-sandbox-7565eaee/build/vector-tests-O2/vector-tests-O2-tmp -Idist/dist-sandbox-7565eaee/build/autogen -Idist/dist-sandbox-7565eaee/build/vector-tests-O2/vector-tests-O2-tmp -x assembler -c /tmp/ghc32343_0/ghc32343_13.s -o dist/dist-sandbox-7565eaee/build/vector-tests-O2/vector-tests-O2-tmp/Tests/Vector.o
*** CorePrep:
Result size of CorePrep
  = {terms: 1,742,444, types: 2,159,089, coercions: 949,848}

This is a substantial change from 7.8.3, which behaves as follows,

7.8.3

[5 of 6] Compiling Tests.Vector     ( tests/Tests/Vector.hs, dist/dist-sandbox-8a24a56a/build/vector-tests-O2/vector-tests-O2-tmp/Tests/Vector.o )
*** Parser:
*** Renamer/typechecker:
*** Simplify:
*** CorePrep:
*** ByteCodeGen:
*** Simplify:
*** CorePrep:
*** ByteCodeGen:
*** Simplify:
*** CorePrep:
*** ByteCodeGen:
*** Simplify:
*** CorePrep:
*** ByteCodeGen:
*** Simplify:
*** CorePrep:
*** ByteCodeGen:
*** Simplify:
*** CorePrep:
*** ByteCodeGen:
*** Simplify:
*** CorePrep:
*** ByteCodeGen:
*** Simplify:
*** CorePrep:
*** ByteCodeGen:
*** Simplify:
*** CorePrep:
*** ByteCodeGen:
*** Simplify:
*** CorePrep:
*** ByteCodeGen:
*** Simplify:
*** CorePrep:
*** ByteCodeGen:
*** Simplify:
*** CorePrep:
*** ByteCodeGen:

tests/Tests/Vector.hs:405:5: Warning:
    Defined but not used: ‘limitUnfolds’

tests/Tests/Vector.hs:408:5: Warning:
    Defined but not used: ‘prop_unfoldr’

tests/Tests/Vector.hs:615:1: Warning:
    Defined but not used: ‘testBoolUnboxedVector’
*** Desugar:
Result size of Desugar (after optimization)
  = {terms: 11,244, types: 57,883, coercions: 7,302}
*** Simplifier:
Result size of Simplifier iteration=1
  = {terms: 9,193, types: 44,105, coercions: 7,337}
Result size of Simplifier iteration=2
  = {terms: 9,116, types: 43,718, coercions: 7,159}
Result size of Simplifier
  = {terms: 9,116, types: 43,718, coercions: 7,159}
*** Specialise:
Result size of Specialise
  = {terms: 82,833, types: 519,765, coercions: 78,989}
*** Float out(FOS {Lam = Just 0, Consts = True, PAPs = False}):
Result size of Float out(FOS {Lam = Just 0,
                              Consts = True,
                              PAPs = False})
  = {terms: 100,958, types: 568,910, coercions: 78,989}
*** Float inwards:
Result size of Float inwards
  = {terms: 100,958, types: 568,910, coercions: 78,989}
*** Simplifier:
Result size of Simplifier iteration=1
  = {terms: 179,170, types: 343,254, coercions: 645,024}
Result size of Simplifier iteration=2
  = {terms: 123,469, types: 170,930, coercions: 259,665}
Result size of Simplifier iteration=3
  = {terms: 113,030, types: 155,318, coercions: 230,494}
Result size of Simplifier iteration=4
  = {terms: 112,745, types: 154,554, coercions: 221,711}
Result size of Simplifier
  = {terms: 112,745, types: 154,554, coercions: 221,711}
*** Simplifier:
Result size of Simplifier iteration=1
  = {terms: 515,401, types: 809,314, coercions: 598,712}
Result size of Simplifier iteration=2
  = {terms: 345,742, types: 520,083, coercions: 376,036}
Result size of Simplifier iteration=3
  = {terms: 294,322, types: 416,385, coercions: 312,220}
Result size of Simplifier iteration=4
  = {terms: 287,445, types: 402,244, coercions: 297,516}
Result size of Simplifier
  = {terms: 287,445, types: 402,244, coercions: 297,516}
*** Simplifier:
Result size of Simplifier iteration=1
  = {terms: 453,076, types: 567,575, coercions: 280,117}
Result size of Simplifier iteration=2
  = {terms: 321,815, types: 378,160, coercions: 222,556}
Result size of Simplifier iteration=3
  = {terms: 309,758, types: 365,364, coercions: 219,987}
Result size of Simplifier iteration=4
  = {terms: 308,628, types: 364,331, coercions: 219,583}
Result size of Simplifier
  = {terms: 308,628, types: 364,331, coercions: 219,583}
*** Demand analysis:
Result size of Demand analysis
  = {terms: 308,628, types: 364,331, coercions: 219,583}
*** Worker Wrapper binds:
Result size of Worker Wrapper binds
  = {terms: 352,621, types: 419,343, coercions: 223,102}
*** Simplifier:
Result size of Simplifier iteration=1
  = {terms: 334,327, types: 396,352, coercions: 217,632}
Result size of Simplifier iteration=2
  = {terms: 297,776, types: 344,782, coercions: 211,762}
Result size of Simplifier iteration=3
  = {terms: 297,134, types: 343,498, coercions: 211,068}
Result size of Simplifier iteration=4
  = {terms: 296,791, types: 342,860, coercions: 211,033}
Result size of Simplifier
  = {terms: 296,791, types: 342,860, coercions: 211,033}
*** Float out(FOS {Lam = Just 0, Consts = True, PAPs = True}):
Result size of Float out(FOS {Lam = Just 0,
                              Consts = True,
                              PAPs = True})
  = {terms: 311,717, types: 380,250, coercions: 211,033}
*** Common sub-expression:
Result size of Common sub-expression
  = {terms: 281,729, types: 346,991, coercions: 196,315}
*** Float inwards:
Result size of Float inwards
  = {terms: 281,729, types: 346,991, coercions: 196,315}
*** Liberate case:
Result size of Liberate case
  = {terms: 301,329, types: 362,935, coercions: 200,458}
*** Simplifier:
Result size of Simplifier iteration=1
  = {terms: 281,187, types: 322,484, coercions: 200,278}
Result size of Simplifier iteration=2
  = {terms: 279,511, types: 320,001, coercions: 199,549}
Result size of Simplifier
  = {terms: 279,511, types: 320,001, coercions: 199,549}
*** SpecConstr:
Result size of SpecConstr
  = {terms: 398,168, types: 420,420, coercions: 230,461}
*** Simplifier:
Result size of Simplifier iteration=1
  = {terms: 390,005, types: 411,465, coercions: 230,270}
Result size of Simplifier iteration=2
  = {terms: 272,588, types: 315,504, coercions: 201,634}
Result size of Simplifier iteration=3
  = {terms: 269,759, types: 310,458, coercions: 200,366}
Result size of Simplifier
  = {terms: 269,759, types: 310,458, coercions: 200,366}
*** Tidy Core:
Result size of Tidy Core
  = {terms: 269,759, types: 310,458, coercions: 200,366}
writeBinIface: 518 Names
writeBinIface: 694 dict entries
writeBinIface: 518 Names
writeBinIface: 694 dict entries
*** CorePrep:
Result size of CorePrep
  = {terms: 341,307, types: 360,564, coercions: 200,286}

comment:5 Changed 2 years ago by bgamari

This is also reproducible with 7.11.

comment:6 Changed 2 years ago by bgamari

I have cut down the tests/Tests/Vector.hs file from vector commit e000d6c50e68f539a6f6cfa35fe91350a0691499 to something a bit more minimal that may still exhibit a similar slow-down. The result is the following,

module Tests.Vector (tests) where

import Boilerplater
import Utilities as Util

import qualified Data.Vector.Generic as V
import qualified Data.Vector

import Test.QuickCheck

import Test.Framework
import Test.Framework.Providers.QuickCheck2

import Data.List

#define COMMON_CONTEXT(a, v) \
 VANILLA_CONTEXT(a, v), VECTOR_CONTEXT(a, v)

#define VANILLA_CONTEXT(a, v) \
  Eq a,     Show a,     Arbitrary a,     CoArbitrary a,     TestData a,     Model a ~ a,        EqTest a ~ Property

#define VECTOR_CONTEXT(a, v) \
  Eq (v a), Show (v a), Arbitrary (v a), CoArbitrary (v a), TestData (v a), Model (v a) ~ [a],  EqTest (v a) ~ Property, V.Vector v a


testTuplyFunctions:: forall a v. (COMMON_CONTEXT(a, v), VECTOR_CONTEXT((a, a), v), VECTOR_CONTEXT((a, a, a), v)) => v a -> [Test]
testTuplyFunctions _ = [testProperty "unzip3" prop_unzip3]
  where
    prop_unzip3 :: P (v (a, a, a) -> (v a, v a, v a))   = V.unzip3 `eq` unzip3

tests = [
        testGroup "Data.Vector.Vector (Bool)"           (testTuplyFunctions      (undefined :: Data.Vector.Vector Bool))
    ]

GHC 7.10.1 takes over 30% longer than 7.8.3 with this simple example, with the program size peaking at 10000 terms during specialization, as opposed to 7.8 which never goes above 400 terms.

Last edited 2 years ago by bgamari (previous) (diff)

comment:7 Changed 2 years ago by bgamari

Well this potentially is interesting...

vector's test suite defines a type class TestData,

class (Testable (EqTest a), Conclusion (EqTest a)) => TestData a where
  type Model a
  model :: a -> Model a
  unmodel :: Model a -> a

  type EqTest a
  equal :: a -> a -> EqTest a

The -ddump-simpl output produced by 7.10 contains a variety of bindings referring to types containing Model, e.g.,

lvl18
  :: Model
       (Vector (Bool, Bool, Bool)
        -> (Vector Bool, Vector Bool, Vector Bool))
     -> Vector (Bool, Bool, Bool)
     -> (Vector Bool, Vector Bool, Vector Bool)

whereas the Core produced by 7.8 contains no such references. I'm not sure whether this is problematic or not

The simplified testcase above produces 6500 lines of Core in 7.10 and around 800 with 7.8. Despite the failure to resolve the Model type family, 7.10 seems to generally be doing more simplification. I believe much of the difference is due to the fact that 7.10 inlines the $fTestData(,,) dictionary whereas 7.8 does not.

comment:8 Changed 2 years ago by bgamari

-ddump-inlinings reveals that there are indeed 7.10 is doing a great deal more inlining than 7.8. Right off the bat 7.10 inlines a variety of methods from the Applicative et al. which never appear to be inlined under 7.8,

Inlining done: Data.Vector.Fusion.Util.$fApplicativeId_$cpure
Inlining done: Data.Vector.Fusion.Util.$fApplicativeId4
Inlining done: Data.Vector.Fusion.Util.$fApplicativeId_$cpure
Inlining done: Data.Vector.Fusion.Util.$fApplicativeId4
Inlining done: Data.Vector.Fusion.Util.$fApplicativeId_$cpure
Inlining done: Data.Vector.Fusion.Util.$fApplicativeId4
Inlining done: Data.Vector.Fusion.Util.$fApplicativeId_$cpure
Inlining done: Data.Vector.Fusion.Util.$fApplicativeId4
Inlining done: GHC.Show.$fShow(,,)_$cshowList
Inlining done: GHC.Show.$fShow(,,)_$cshow
Inlining done: Test.QuickCheck.Arbitrary.$fArbitrary(,,)_$cshrink
Inlining done:
    Test.QuickCheck.Arbitrary.$fArbitrary(,,)_$carbitrary
Inlining done: Utilities.$fTestDataVector2_$cequal

comment:9 Changed 2 years ago by thomie

Compiling vector master with HEAD using a devel2 build (which means a stage2 compiler built with -DDEBUG), results in a lots of warnings such as the following:

*** Core Lint warnings : in result of Simplifier ***
libraries/vector/Data/Vector/Generic.hs:221:1: warning:
    [RHS of length :: forall (v_a3hG :: * -> *) a_a3hH.
                      Vector v_a3hG a_a3hH =>
                      v_a3hG a_a3hH -> Int]
    INLINE binder is (non-rule) loop breaker: length

libraries/vector/Data/Vector/Generic.hs:1891:1: warning:
    [RHS of unsafeCopy :: forall (v_a38Q :: * -> *)
                                 (m_a38R :: * -> *)
                                 a_a38S.
                          (PrimMonad m_a38R, Vector v_a38Q a_a38S) =>
                          Mutable v_a38Q (PrimState m_a38R) a_a38S
                          -> v_a38Q a_a38S -> m_a38R ()]
    INLINE binder is (non-rule) loop breaker: unsafeCopy

Maybe this gives a clue to what the problem is.

Last edited 2 years ago by thomie (previous) (diff)

comment:10 Changed 2 years ago by thomie

Cc: dolio added

comment:11 Changed 22 months ago by bgamari

I have noticed that there is a qualitative difference in the verbose-core2core output between 7.8.4 and 7.10.3: namely 7.8 lacks an initial gentle simplifier pass. In the case of 7.10.3 the compiler inlines Data.Vector.Generic.zip in the SimplMode {Phase = 2 [main], inline, rules, eta-expand, case-of-case} pass, producing a significant amount of unresolved stream/unstream pairs relatively early in simplification. The last appearance of unstream doesn't disappear until SimplMode {Phase = 0 [main], inline, rules, eta-expand, case-of-case}.

By contrast, 7.8.4 doesn't perform this inlining until much later (right before the second FloatOut; it's a bit hard to tell precisely since verbose-core2core doesn't output simplifier passes). As far as I can tell, it manages to collapse all of the stream fusion apparatus in a single simplifier pass.

I believe this may be the reason for the difference although it is quite late so take this with a grain of salt.

Last edited 22 months ago by bgamari (previous) (diff)

comment:12 Changed 20 months ago by erikd

I have ghc 7.8.4, 7.10.3 and 8.0.0.20160316 installed on my machine. I cloned the vector git repo, set up a cabal sandbox and then for each ghc, cabal install --dependencies-only. Finally for each ghc I did cabal clean && time cabal build.

I've timed this with each compiler three times and this result is pretty representative:

Compiler             real         user        sys
----------------------------------------------------
7.8.4            3m26.106s    4m42.508s    1m40.868s
7.10.3           4m32.690s    6m18.956s    3m6.416s
8.0.0.20160316   1m19.510s    1m52.796s    0m47.992s

It seems this partially fixed between 7.10.1 and 7.10.3 anyway, and 8.0 is far better than either of the old releases.

Last edited 20 months ago by erikd (previous) (diff)

comment:13 Changed 20 months ago by simonpj

Yay!

comment:14 Changed 20 months ago by erikd

From IRC:

<bgamari> erikd, IIRC I have a more minimal testcase as well
<bgamari> the problem centers around zip
<bgamari> when things are inlined has changed
<bgamari> which results in a large number of stream fusion primitives that
          can't be resolved until late in compilation
<bgamari> which produces a large amount of code
<bgamari> erikd, that should at least be a hint
<bgamari> unfortunately I ran out of time before identifying the root cause
<bgamari> erikd, https://gist.github.com/bgamari/2ca0f7d2a4e0b6447aee
<bgamari> erikd, that's my minimized testcase
<erikd>   thanks, i'll check it out
<bgamari> It helps to look at the -dverbose-core2core output
<bgamari> comparing between the relevant compiler versions
<bgamari> erikd, this utility may come in handy for turning the
          verbose-core2core output into something less unwieldy
<bgamari> https://github.com/bgamari/ghc-utils/blob/master/split-core2core.py

comment:15 in reply to:  13 Changed 20 months ago by erikd

Replying to simonpj:

Yay!

Certainly great that the compile times have improved, but @Bgamari points out that the original problem was about compilng the test suite. More investigation still needed (tomorrow).

comment:16 in reply to:  12 Changed 20 months ago by erikd

Much like for comment 12, set myself up so I could build the vector test suite with ghc versions 7.8.4, 7.10.3 and 8.0.0.20160316 and then with each compiler ran cabal clean && time cabal test a number of times with each.

This is the result of a single run, but is representative of this test:

Compiler      real          user           sys
7.8.4       1m38.167s     2m03.336s      0m43.800s
7.10.3      8m59.597s    11m40.756s      5m37.828s
8.0.0       0m45.165s     1m0.148s       0m23.744s

The 8.0 compiler has not only fixed the compile time regression from 7.8 to 7.10, its actually about twice as fast as 7.8.

Just as validation I also timed the runtime of the test suites compiled by each of the three compilers and there didn't seem to be any measurable difference in run time performance.

Last edited 20 months ago by erikd (previous) (diff)

comment:17 Changed 20 months ago by bgamari

Hmm, very interesting. Indeed it has been quite a while since I've checked this issue with 8.0. Judging by comment:5 it seems this was still reproducible on master in September or so. Ideally it would be nice to know what fixed this, but at least we don't need to worry about this for 8.0.

Just to make sure we weren't getting fooled I did my own measurements (timing cabal build of vector-tests, also collecting RTS stats with +RTS -s in one set of measurements), which largely agree with your observations. Yay!

Compiler Real user sys allocations max residency
8.0.0.20160311 1m7.445s 1m1.257s 5.938s
1m2.751s 55.979s 6.774s
59.469s 55.507s 3.959s
58.90s 56.98s 1.41s 39.77e9 bytes 248.65e6 bytes
7.8.4 2m2.411s 1m58.572s 3.429s
2m3.170s 2m0.194s 3.027s
2m2.185s 2m0.342s 1.834s
2m1.44s 1m59.81s 1.61s 106.07e9 bytes 369.51e6 bytes
7.10.3 11m40.961s 11m38.834s 6.592s
11m29.73s 11m28.62s 5.47s 454.467e9 bytes 70.714e6 bytes

Thanks for looking at this Erik!

Last edited 20 months ago by bgamari (previous) (diff)

comment:18 Changed 20 months ago by bgamari

Milestone: 8.0.18.0.2
Priority: highesthigh

comment:19 Changed 20 months ago by bgamari

Resolution: fixed
Status: newclosed

comment:20 Changed 20 months ago by bgamari

Milestone: 8.0.28.0.1

comment:21 Changed 20 months ago by dolio

Interesting. Do you by chance have 7.6.3 to compare with? 7.8 was already significantly worse than it.

bgamari, is your max residency for 7.10 off? The figures for both 7.8 and 7.10 seem low to me, really. When I build the test suite on 7.10 I get like 11 GB max residency, and I'm not sure it would ever complete if I limited it to ~400 MB. Unless your test is a smaller case derived from the real test suite.

comment:22 Changed 20 months ago by erikd

I do have 7.6.3 installed, but it seems to have become broken since last time I tried. I think its actually a cabal issue.

comment:23 Changed 7 months ago by bgamari

While it's not clear whether it's related in cause, #13535 also reports a compile-time regression in Tests.Vector, this time in GHC 8.2.

Note: See TracTickets for help on using tickets.