Opened 3 years ago

Last modified 3 weeks ago

#7258 new bug

Compiling DynFlags is jolly slow

Reported by: simonpj Owned by: simonpj
Priority: normal Milestone: 7.12.1
Component: Compiler Version: 7.6.1
Keywords: Cc: iustin@…, pho@…, dterei, bgamari, asr, gidyn
Operating System: Unknown/Multiple Architecture: Unknown/Multiple
Type of failure: Compile-time performance bug Test Case:
Blocked By: Blocking:
Related Tickets: Differential Revisions:

Description

Compiling DynFlags really takes a long time these days.

Ian thinks that it's due to the Read and Show instances he has added (see attached W2.hs.

Simon M suggests: instead of using Read/Show, you could generate some code in mkDerivedConstants to use ReadP and Outputable, which should be much smaller and faster.

This ticket is

  • To see if we can speed up compilation of DynFlags
  • To check WHY it is so slow. Are there any lessons we can learn or ways to make it compile faster? Is it tickling some asymptotically-bad corner of the compiler?

Simon

Attachments (3)

W2.hs (6.4 KB) - added by simonpj 3 years ago.
Exhibts long compilation times
overview_heap_profile.png (85.6 KB) - added by igloo 3 years ago.
overview_optimised_heap_profile.png (70.6 KB) - added by igloo 3 years ago.

Download all attachments as: .zip

Change History (25)

Changed 3 years ago by simonpj

Exhibts long compilation times

comment:1 Changed 3 years ago by simonpj

  • Owner set to igloo

comment:2 Changed 3 years ago by igloo

  • Milestone set to 7.8.1

Changed 3 years ago by igloo

comment:3 Changed 3 years ago by igloo

    Fri Oct 19 20:23 2012 Time and Allocation Profiling Report  (Final)

       ghc-stage2 +RTS -p -h -RTS -B/home/ian/ghc/git/ghc/inplace/lib -fforce-recomp -c W2.hs

    total time  =       26.08 secs   (26080 ticks @ 1000 us, 1 processor)
    total alloc = 14,601,152,424 bytes  (excludes profiling overheads)

COST CENTRE       MODULE         %time %alloc

OccAnal           SimplCore       27.0   24.5
RegAlloc          AsmCodeGen      19.2   22.4
pprNativeCode     AsmCodeGen      16.8   19.9
regLiveness       AsmCodeGen       8.0    6.3
genMachCode       AsmCodeGen       5.0    5.4
StgCmm            HscMain          4.3    3.5
occAnalBind.assoc OccurAnal        3.3    3.3
tops              CmmPipeline      3.3    3.4
SimplTopBinds     SimplCore        1.7    1.6
do_block          Hoopl.Dataflow   1.1    0.8

Changed 3 years ago by igloo

comment:4 Changed 3 years ago by igloo

    Fri Oct 19 20:32 2012 Time and Allocation Profiling Report  (Final)

       ghc-stage2 +RTS -p -h -RTS -B/home/ian/ghc/git/ghc/inplace/lib -fforce-recomp -O -c W2.hs

    total time  =       76.26 secs   (76258 ticks @ 1000 us, 1 processor)
    total alloc = 45,073,993,600 bytes  (excludes profiling overheads)

COST CENTRE       MODULE      %time %alloc

OccAnal           SimplCore    49.7   45.3
CoreTidy          HscMain      22.3   31.8
SimplTopBinds     SimplCore    12.9    9.9
pprNativeCode     AsmCodeGen    2.0    1.8
RegAlloc          AsmCodeGen    1.9    2.0
sink2             CmmPipeline   1.5    1.8
StgCmm            HscMain       1.0    0.8
occAnalBind.assoc OccurAnal     1.0    1.0

comment:5 Changed 3 years ago by igloo

                                                     individual     inherited   
COST CENTRE               MODULE      no.  entries  %time %alloc   %time %alloc

getProxies.fwd_pe         OccurAnal  1776     9226    6.7    0.0    25.5   23.7 
 getProxies.fwd_pe.add1   OccurAnal  1779  9733280    5.8    1.6    18.8   23.7 
  getProxies.fwd_pe.add2  OccurAnal  1781  9733280   13.1   22.1    13.1   22.1 

(without -O)

comment:6 Changed 3 years ago by igloo

  • Owner changed from igloo to simonpj

Simon, I think the conclusion was that you'd look into this. Please bounce it back to me if you'd like me to investigate further.

comment:7 Changed 3 years ago by igloo

See also #7450

comment:8 Changed 3 years ago by iustin

  • Cc iustin@… added

comment:9 Changed 3 years ago by PHO

  • Cc pho@… added

comment:10 Changed 3 years ago by simonpj@…

commit 52e43004f63276c1342933e40a673ad25cf2113a

Author: Simon Peyton Jones <[email protected]>
Date:   Fri Dec 21 17:39:33 2012 +0000

    Use expectP in deriving( Read )
    
    Note [Use expectP]   in TcGenDeriv
    ~~~~~~~~~~~~~~~~~~
    Note that we use
       expectP (Ident "T1")
    rather than
       Ident "T1" <- lexP
    The latter desugares to inline code for matching the Ident and the
    string, and this can be very voluminous. The former is much more
    compact.  Cf Trac #7258, although that also concerned non-linearity in
    the occurrence analyser, a separate issue.

 compiler/prelude/PrelNames.lhs    |    3 +-
 compiler/typecheck/TcGenDeriv.lhs |   43 ++++++++++++++++++++++--------------
 2 files changed, 28 insertions(+), 18 deletions(-)

comment:11 Changed 3 years ago by simonpj@…

commit 9ea2b6666cb6684279a120c688e8557bcef3dc73

Author: Simon Peyton Jones <[email protected]>
Date:   Mon Dec 24 11:34:51 2012 +0000

    Simplify the binder-swap transformation
    
    The occurrence analyser implements the "binder-swap" transformation,
    described in Note [Binder swap] in OccAnal. For some reason I had
    implemeted an extremely complicated version, I believe intended to get
    as much as possible done in single simplifier pass.  But it turned
    out (Trac #7258) that the 'getProxies' bit of this complicated code
    scaled rather non-linearly, and all by itself could consume half of
    the entire compile time.
    
    The patch dramatically simplifies the transformation, so that
    we simply swizzle
         case x of y { I# v -> e }
    to
         case x of y { I# v -> let x = y in e }
    
    I can't see any reason not to do this
    
      * Compiler allocation for #7258 with 200 fields goes down by 25%
        and compile time by 20%
    
      * The nofib figures do not budge
    
      * Quite a bit of complicated code goes away

 compiler/simplCore/OccurAnal.lhs |  255 +++++++-------------------------------
 1 files changed, 46 insertions(+), 209 deletions(-)

comment:12 Changed 3 years ago by simonpj

OK I have verified that the changes to the occurrence analyser (above) make essentially zero different to nofib numbers. It's a very worthwhile simplification, because it completely gets rid of the getProxies stuff that was eating all the time before.

Alas, compiling W2 is still non-linear. Here's the allocation by the stage-2 compiler

  • 50 fields: 1.01 Gbyte
  • 100 fields: 2.98 Gbyte
  • 200 fields: 9.64 Gbyte

This is after including the improvements to the derived Read code in #7450.

So something is still wrong. Need to do some profilling to find out.

There are some very deeply nested lambdas, which lead to SRTs of ever-increasing size, so there is definitely a quadratic effect there. But I'm not sure that is all.

comment:13 Changed 2 years ago by simonmar

I can't compile DynFlags on my 256MB Raspberry Pi. Could we do something about having less auto-generated code in this module please?

comment:14 Changed 2 years ago by dterei

  • Cc dterei added

comment:15 Changed 15 months ago by thoughtpolice

  • Milestone changed from 7.8.3 to 7.10.1

Moving to 7.10.1

comment:16 Changed 7 months ago by thoughtpolice

  • Milestone changed from 7.10.1 to 7.12.1

Moving to 7.12.1 milestone; if you feel this is an error and should be addressed sooner, please move it back to the 7.10.1 milestone.

comment:17 Changed 4 months ago by thomie

  • Type of failure changed from None/Unknown to Compile-time performance bug

comment:18 Changed 4 months ago by bgamari

  • Cc bgamari added

comment:19 Changed 4 months ago by asr

  • Cc asr added

comment:20 Changed 7 weeks ago by gidyn

  • Cc gidyn added

comment:21 Changed 4 weeks ago by bgamari

I strongly suspect that some of the fixes that I've made while working on #7450 will help this although I have yet to test this.

comment:22 Changed 3 weeks ago by bgamari

ticket:9669#comment:13 contains a summary of the performance impact of Phab:D1012 and Phab:D1040 on this and potentially-related ticket #7450.

Note: See TracTickets for help on using tickets.