Opened 5 years ago

Closed 2 years ago

#3064 closed bug (fixed)

Very long compile times with type functions

Reported by: simonpj Owned by:
Priority: low Milestone: 7.6.1
Component: Compiler (Type checker) Version: 6.10.1
Keywords: Cc: ben.franksen@…, pho@…, coreyoconnor@…
Operating System: Unknown/Multiple Architecture: Unknown/Multiple
Type of failure: Compile-time performance bug Difficulty: Unknown
Test Case: perf/compiler/T3064 Blocked By:
Blocking: Related Tickets:

Description

the attached module is a much reduced version of some type-level assurance
stuff (inspired by the Lightweight Monadic Regions paper) I am trying to
do. I am almost certain that it could be reduced further but it is late and
I want to get this off my desk.

Note the 4 test functions, test11 .. test14. The following are timings for
compiling the module only with all test functions commented out, except
respectively, test11, test12, test13, and test14:

ben@sarun[1] > time ghc -c Bug2.hs
ghc -c Bug2.hs  1,79s user 0,04s system 99% cpu 1,836 total

ben@sarun[1] > time ghc -c Bug2.hs
ghc -c Bug2.hs  5,87s user 0,14s system 99% cpu 6,028 total

ben@sarun[1] > time ghc -c Bug2.hs
ghc -c Bug2.hs  23,52s user 0,36s system 99% cpu 23,899 total

ben@sarun[1] > time ghc -c Bug2.hs
ghc -c Bug2.hs  102,20s user 1,32s system 97% cpu 1:45,89 total

It seems something is scaling very badly. You really don't want to wait for
a version with 20 levels of nesting to compile...

If anyone has a good explanation for this, I'd be grateful.

BTW, I am not at all certain that this is ghc's fault, it may well be my
program, i.e. the constraints are too complex, whatever. I have no idea how
hard it is for the compiler to do all the unification. Also, the problem is
not of much practical relevance, as no sensible program will use more than
a handful levels of nesting.

SLPJ says: let's investigate this once the new solver is done.

Attachments (2)

Bug2.hs (1.6 KB) - added by simonpj 5 years ago.
Bug file showing long compile times
ghc-stage2.prof.gz (74.1 KB) - added by CoreyOConnor 4 years ago.
Profile of stage2 compiling Bug2 with test11 and test12 functions.

Download all attachments as: .zip

Change History (18)

Changed 5 years ago by simonpj

Bug file showing long compile times

comment:1 Changed 5 years ago by igloo

  • Milestone set to 6.12 branch

comment:2 Changed 4 years ago by simonmar

  • Type of failure set to Compile-time performance bug

comment:3 Changed 4 years ago by igloo

  • Milestone changed from 6.12 branch to 6.12.3

comment:4 Changed 4 years ago by PHO

  • Cc pho@… added

comment:5 Changed 4 years ago by igloo

  • Milestone changed from 6.12.3 to 6.14.1
  • Priority changed from normal to low

comment:6 Changed 4 years ago by CoreyOConnor

  • Cc coreyoconnor@… added

comment:8 Changed 4 years ago by CoreyOConnor

Using ghc-stage2 version 6.13.20100913. Which, as I understand it, should be using the new typechecker.

[coconnor@toast scratch]$ time ../inplace/bin/ghc-stage2 -c Bug2.hs

real    0m2.296s
user    0m2.245s
sys     0m0.049s
[coconnor@toast scratch]$ time ../inplace/bin/ghc-stage2 -c Bug2.hs

real    0m11.539s
user    0m11.391s
sys     0m0.143s
[coconnor@toast scratch]$ time ../inplace/bin/ghc-stage2 -c Bug2.hs

real    0m53.307s
user    0m52.859s
sys     0m0.438s
[coconnor@toast scratch]$ time ../inplace/bin/ghc-stage2 -c Bug2.hs

real    4m0.904s
user    3m59.552s
sys     0m1.308s

Memory usage appeared to remain constant over time.

comment:9 Changed 4 years ago by simonpj

Thanks for making the measurements. I assume the four runs are with different numbers of functions commented out?

I have so far paid zero attention to the performance of the constraint solver, since "most" programs have pretty simple constraints. Your program is a nice small example of something that needs better performance, all ready for me to profile. Thanks.

But other things to do first.

Simon

Changed 4 years ago by CoreyOConnor

Profile of stage2 compiling Bug2 with test11 and test12 functions.

comment:10 Changed 4 years ago by CoreyOConnor

Yes, the successive runs are with increasing number of functions un-commented out.

I understand, and agree, that other things should come before performance. Still, as I can't help out much with correctness I am going to start looking into optimizations. Perhaps there are some easy and safe optimizations I can contribute.

comment:11 Changed 4 years ago by CoreyOConnor

The following patch looked like it resolves this issue:
Performance bug fixes

Testing with the attached example:

[coconnor@toast scratch]$ time ../inplace/bin/ghc-stage2 -c Bug2.hs 

real    0m0.716s
user    0m0.700s
sys     0m0.015s
[coconnor@toast scratch]$ time ../inplace/bin/ghc-stage2 -c Bug2.hs 

real    0m0.945s
user    0m0.931s
sys     0m0.014s
[coconnor@toast scratch]$ time ../inplace/bin/ghc-stage2 -c Bug2.hs 

real    0m1.192s
user    0m1.169s
sys     0m0.021s
[coconnor@toast scratch]$ time ../inplace/bin/ghc-stage2 -c Bug2.hs 

real    0m1.515s
user    0m1.492s
sys     0m0.022s

I think this is resolved! Awesome work! I have a data marshaling library that will greatly benefit from such a speedup.

comment:12 Changed 4 years ago by simonpj

  • Owner changed from chak to igloo

Great, thanks for testing.

Ian: might you take a moment to turn Bug2.hs into a performance test, and then close? Thanks.

comment:13 Changed 3 years ago by igloo

  • Resolution set to fixed
  • Status changed from new to closed

Test added.

comment:14 Changed 2 years ago by simonpj

  • Owner igloo deleted
  • Resolution fixed deleted
  • Status changed from closed to new

I'm re-opening this for HEAD, following D's recent patches. The definition at the bottom of T3064.hs looks like

test14 :: IO ()
test14 = runCA(newRgn(newRgn(newRgn(newRgn(newRgn(newRgn(newRgn(newRgn(
               newRgn(newRgn(newRgn(newRgn(return())))))))))))))

(Despite its name, that looks like 12 nested applications of newRgn to me.) I tried doubling the number of nested calls, and re-doubling, and compile with +RTS -s:

# nested calls     Bytes alloc'd by HEAD       Bytes alloc'd by 7.4.1
----------------------------------------------------------------------
      12                  86M                        68M
      24                 205M                       105M
      48                 920M                       241M

Something is amiss here.

comment:15 Changed 2 years ago by simonmar

  • Milestone changed from 7.0.1 to 7.6.1

comment:16 Changed 2 years ago by simonpj

  • Resolution set to fixed
  • Status changed from new to closed
  • Test Case set to perf/compiler/T3064

So, after Dimitrios's recent changes we get

# nested calls     Bytes alloc'd by HEAD       Bytes alloc'd by 7.4.1
----------------------------------------------------------------------
      12                  75M                        68M
      24                 114M                       105M
      48                 251M                       241M

which looks much more plausible. Residency is 50% higher than with 7.4.1, but we can fight that battle another day.

I'll adjust the test to use the bigger example, so that if it goes bad on us we'll see.

Note: See TracTickets for help on using tickets.