Opened 6 years ago

Closed 3 years ago

Last modified 2 months ago

#3064 closed bug (fixed)

Very long compile times with type functions

Reported by: simonpj Owned by:
Priority: low Milestone: 7.12.1
Component: Compiler (Type checker) Version: 6.10.1
Keywords: Cc: ben.franksen@…, pho@…, coreyoconnor@…
Operating System: Unknown/Multiple Architecture: Unknown/Multiple
Type of failure: Compile-time performance bug Test Case: perf/compiler/T3064
Blocked By: Blocking:
Related Tickets: Differential Revisions:

Description

the attached module is a much reduced version of some type-level assurance
stuff (inspired by the Lightweight Monadic Regions paper) I am trying to
do. I am almost certain that it could be reduced further but it is late and
I want to get this off my desk.

Note the 4 test functions, test11 .. test14. The following are timings for
compiling the module only with all test functions commented out, except
respectively, test11, test12, test13, and test14:

ben@sarun[1] > time ghc -c Bug2.hs
ghc -c Bug2.hs  1,79s user 0,04s system 99% cpu 1,836 total

ben@sarun[1] > time ghc -c Bug2.hs
ghc -c Bug2.hs  5,87s user 0,14s system 99% cpu 6,028 total

ben@sarun[1] > time ghc -c Bug2.hs
ghc -c Bug2.hs  23,52s user 0,36s system 99% cpu 23,899 total

ben@sarun[1] > time ghc -c Bug2.hs
ghc -c Bug2.hs  102,20s user 1,32s system 97% cpu 1:45,89 total

It seems something is scaling very badly. You really don't want to wait for
a version with 20 levels of nesting to compile...

If anyone has a good explanation for this, I'd be grateful.

BTW, I am not at all certain that this is ghc's fault, it may well be my
program, i.e. the constraints are too complex, whatever. I have no idea how
hard it is for the compiler to do all the unification. Also, the problem is
not of much practical relevance, as no sensible program will use more than
a handful levels of nesting.

SLPJ says: let's investigate this once the new solver is done.

Attachments (2)

Bug2.hs (1.6 KB) - added by simonpj 6 years ago.
Bug file showing long compile times
ghc-stage2.prof.gz (74.1 KB) - added by CoreyOConnor 4 years ago.
Profile of stage2 compiling Bug2 with test11 and test12 functions.

Download all attachments as: .zip

Change History (24)

Changed 6 years ago by simonpj

Bug file showing long compile times

comment:1 Changed 6 years ago by igloo

  • Milestone set to 6.12 branch

comment:2 Changed 5 years ago by simonmar

  • Type of failure set to Compile-time performance bug

comment:3 Changed 5 years ago by igloo

  • Milestone changed from 6.12 branch to 6.12.3

comment:4 Changed 5 years ago by PHO

  • Cc pho@… added

comment:5 Changed 5 years ago by igloo

  • Milestone changed from 6.12.3 to 6.14.1
  • Priority changed from normal to low

comment:6 Changed 5 years ago by CoreyOConnor

  • Cc coreyoconnor@… added

comment:8 Changed 4 years ago by CoreyOConnor

Using ghc-stage2 version 6.13.20100913. Which, as I understand it, should be using the new typechecker.

[coconnor@toast scratch]$ time ../inplace/bin/ghc-stage2 -c Bug2.hs

real    0m2.296s
user    0m2.245s
sys     0m0.049s
[coconnor@toast scratch]$ time ../inplace/bin/ghc-stage2 -c Bug2.hs

real    0m11.539s
user    0m11.391s
sys     0m0.143s
[coconnor@toast scratch]$ time ../inplace/bin/ghc-stage2 -c Bug2.hs

real    0m53.307s
user    0m52.859s
sys     0m0.438s
[coconnor@toast scratch]$ time ../inplace/bin/ghc-stage2 -c Bug2.hs

real    4m0.904s
user    3m59.552s
sys     0m1.308s

Memory usage appeared to remain constant over time.

comment:9 Changed 4 years ago by simonpj

Thanks for making the measurements. I assume the four runs are with different numbers of functions commented out?

I have so far paid zero attention to the performance of the constraint solver, since "most" programs have pretty simple constraints. Your program is a nice small example of something that needs better performance, all ready for me to profile. Thanks.

But other things to do first.

Simon

Changed 4 years ago by CoreyOConnor

Profile of stage2 compiling Bug2 with test11 and test12 functions.

comment:10 Changed 4 years ago by CoreyOConnor

Yes, the successive runs are with increasing number of functions un-commented out.

I understand, and agree, that other things should come before performance. Still, as I can't help out much with correctness I am going to start looking into optimizations. Perhaps there are some easy and safe optimizations I can contribute.

comment:11 Changed 4 years ago by CoreyOConnor

The following patch looked like it resolves this issue:
Performance bug fixes

Testing with the attached example:

[coconnor@toast scratch]$ time ../inplace/bin/ghc-stage2 -c Bug2.hs 

real    0m0.716s
user    0m0.700s
sys     0m0.015s
[coconnor@toast scratch]$ time ../inplace/bin/ghc-stage2 -c Bug2.hs 

real    0m0.945s
user    0m0.931s
sys     0m0.014s
[coconnor@toast scratch]$ time ../inplace/bin/ghc-stage2 -c Bug2.hs 

real    0m1.192s
user    0m1.169s
sys     0m0.021s
[coconnor@toast scratch]$ time ../inplace/bin/ghc-stage2 -c Bug2.hs 

real    0m1.515s
user    0m1.492s
sys     0m0.022s

I think this is resolved! Awesome work! I have a data marshaling library that will greatly benefit from such a speedup.

comment:12 Changed 4 years ago by simonpj

  • Owner changed from chak to igloo

Great, thanks for testing.

Ian: might you take a moment to turn Bug2.hs into a performance test, and then close? Thanks.

comment:13 Changed 4 years ago by igloo

  • Resolution set to fixed
  • Status changed from new to closed

Test added.

comment:14 Changed 3 years ago by simonpj

  • Owner igloo deleted
  • Resolution fixed deleted
  • Status changed from closed to new

I'm re-opening this for HEAD, following D's recent patches. The definition at the bottom of T3064.hs looks like

test14 :: IO ()
test14 = runCA(newRgn(newRgn(newRgn(newRgn(newRgn(newRgn(newRgn(newRgn(
               newRgn(newRgn(newRgn(newRgn(return())))))))))))))

(Despite its name, that looks like 12 nested applications of newRgn to me.) I tried doubling the number of nested calls, and re-doubling, and compile with +RTS -s:

# nested calls     Bytes alloc'd by HEAD       Bytes alloc'd by 7.4.1
----------------------------------------------------------------------
      12                  86M                        68M
      24                 205M                       105M
      48                 920M                       241M

Something is amiss here.

comment:15 Changed 3 years ago by simonmar

  • Milestone changed from 7.0.1 to 7.6.1

comment:16 Changed 3 years ago by simonpj

  • Resolution set to fixed
  • Status changed from new to closed
  • Test Case set to perf/compiler/T3064

So, after Dimitrios's recent changes we get

# nested calls     Bytes alloc'd by HEAD       Bytes alloc'd by 7.4.1
----------------------------------------------------------------------
      12                  75M                        68M
      24                 114M                       105M
      48                 251M                       241M

which looks much more plausible. Residency is 50% higher than with 7.4.1, but we can fight that battle another day.

I'll adjust the test to use the bigger example, so that if it goes bad on us we'll see.

comment:17 Changed 4 months ago by Simon Peyton Jones <simonpj@…>

In 096b7e664351d102cc9e15f3aa226a976af5dae2/ghc:

Switch off lazy flattening (fix Trac #3064)

See Note [Lazy flattening] in TcFlatten.

Lazy flattening was an apparently good idea which actually made
the type inference engine go a LOTS slower in T3064.  So I switched
it off again.

comment:18 Changed 2 months ago by Simon Peyton Jones <simonpj@…>

In a6f0f5ab45b2643b561e0a0a54a4f14745ab2152/ghc:

Eliminate so-called "silent superclass parameters"

The purpose of silent superclass parameters was to solve the
awkward problem of superclass dictinaries being bound to bottom.
See THE PROBLEM in Note [Recursive superclasses] in TcInstDcls

Although the silent-superclass idea worked,

  * It had non-local consequences, and had effects even in Haddock,
    where we had to discard silent parameters before displaying
    instance declarations

  * It had unexpected peformance costs, shown up by Trac #3064 and its
    test case.  In monad-transformer code, when constructing a Monad
    dictionary you had to pass an Applicative dictionary; and to
    construct that you neede a Functor dictionary. Yet these extra
    dictionaries were often never used.  (All this got much worse when
    we added Applicative as a superclass of Monad.) Test T3064
    compiled *far* faster after silent superclasses were eliminated.

  * It introduced new bugs.  For example SilentParametersOverlapping,
    T5051, and T7862, all failed to compile because of instance overlap
    directly because of the silent-superclass trick.

So this patch takes a new approach, which I worked out with Dimitrios
in the closing hours before Christmas.  It is described in detail
in THE PROBLEM in Note [Recursive superclasses] in TcInstDcls.

Seems to work great!

Quite a bit of knock-on effect

 * The main implementation work is in tcSuperClasses in TcInstDcls
   Everything else is fall-out

 * IdInfo.DFunId no longer needs its n-silent argument
   * Ditto IDFunId in IfaceSyn
   * Hence interface file format changes

 * Now that DFunIds do not have silent superclass parameters, printing
   out instance declarations is simpler. There is tiny knock-on effect
   in Haddock, so that submodule is updated

 * I realised that when computing the "size of a dictionary type"
   in TcValidity.sizePred, we should be rather conservative about
   type functions, which can arbitrarily increase the size of a type.
   Hence the new datatype TypeSize, which has a TSBig constructor for
   "arbitrarily big".

 * instDFunType moves from TcSMonad to Inst

 * Interestingly, CmmNode and CmmExpr both now need a non-silent
   (Ord r) in a couple of instance declarations. These were previously
   silent but must now be explicit.

 * Quite a bit of wibbling in error messages

comment:19 Changed 2 months ago by carter

should this be merged into the 7.10 branch or remilestoned for 7.12 only?

comment:20 Changed 2 months ago by goldfire

  • Milestone changed from 7.6.1 to 7.12.1

From the second-to-last bullet point in Simon's commit message, it looks like this is a breaking change and should not be merged into 7.10 since RC1 is already out.

I have to say the breakage is a little disappointing here. Here is the real-life example from the Note:

        class Ord r => UserOfRegs r a where ...
(i1)    instance UserOfRegs r a => UserOfRegs r (Maybe a) where ...
(i2)    instance (Ord r, UserOfRegs r CmmReg) => UserOfRegs r CmmExpr where ...

Note the Ord r constraint in (i2), which is newly-required by this change. What's disappointing here is that Ord r is "obviously" derivable from UserOfRegs r CmmReg! While I understand the reasoning in the Note (that UserOfRegs r CmmReg is not strictly smaller than the instance head, and therefore looking in its superclasses might, in perverse but realistic scenarios, might cause the superclass dictionary to loop), it's far from obvious from a user standpoint.

Is it possible to alert the user to this problem in an error message? It seems, from the diff, that this is not yet done.

And, should this perhaps be the first 7.12 release note? As it's a breaking change, I do think it should be called out in the user manual.

comment:21 Changed 2 months ago by simonpj

It may be "a little disappointing" but nevertheless I think it's a notable step forward:

  • It eliminates a tricky and hard-to-explain complication in the implementation.
  • It fixes a substantial performance regression that affects programs with deeply-nested superclasses, affecting both compile time and run time.
  • It eliminates a collection of regressions, whereby silent superclasses actually cause some programs to fail unexpectedly, for very hard-to-explain reasons. They are listed in the commit message.
  • Moreover, there is no good workaround if you actually hit those regressions. In contrast, there is an immediate and easy workaround if you hit the new regression.
  • None of this bites you unless you are sailing close to the wind (UndecidableInstances).

As to whether this belongs in 7.10 I am agnostic. The performance improvements will help everyone slightly (especially in monad-transformer-heavy code). The behaviour change is there all right, but will affect very few people.

I don't feel strongly either way. I am not arguing to push it into 7.10, but I will not argue against doing so either. Does anyone else have a view? The default decision is "no" because it is a change wrt RC1.

Incidentally, you say "in perverse but realistic scenarios, might cause the superclass dictionary to loop". Indeed it is realistic. It would be all too easy to add by accident

    instance (UserOfRegs r CmmExpr) => UserOfRegs r CmmReg where ...

perhaps even in another module. Guarding against such subtle loops is a very good thing, in my view.

Simon

comment:22 Changed 2 months ago by goldfire

Good points in comment:21. I'm especially swayed by the ease with which this regression can be avoided, whereas previous problems were not.

I do think, once a decision is made regarding 7.10 vs 7.12, a release note should be added. And, if possible, an error message that includes a note whenever the strictly-decreasing-superclass criterion bites would also be great.

Note: See TracTickets for help on using tickets.