Opened 16 months ago

Closed 16 months ago

Last modified 15 months ago

#13701 closed task (fixed)

GHCi 2x slower without -keep-tmp-files

Reported by: niteria Owned by: duog
Priority: normal Milestone: 8.4.1
Component: GHCi Version: 8.3
Keywords: Cc: dfeuer, duog
Operating System: Unknown/Multiple Architecture: Unknown/Multiple
Type of failure: Compile-time performance bug Test Case: T13701
Blocked By: Blocking:
Related Tickets: Differential Rev(s): Phab:D3620
Wiki Page:

Description

In D3562, I've observed that -keep-tmp-files makes :load 3x faster on my test case. I can't share my test case, but I've found a way to approximate it with MultiLayerModules I just added in D3575.

Here are the steps:

# in ghc top dir
$ mkdir tmp
$ cd tmp
$ cp ../testsuite/tests/perf/compiler/genMultiLayerModules .
# edit genMultiLayerModules to say DEPTH=0, WIDTH=5000
$ ./genMultiLayerModules 
$ echo ':load MultiLayerModules' | ../inplace/bin/ghc-stage2 --interactive +RTS -s
  11,132,224,952 bytes allocated in the heap
   1,004,238,408 bytes copied during GC
     185,091,216 bytes maximum residency (14 sample(s))
       2,813,504 bytes maximum slop
             365 MB total memory in use (0 MB lost due to fragmentation)

                                     Tot time (elapsed)  Avg pause  Max pause
  Gen  0       706 colls,     0 par    0.907s   0.906s     0.0013s    0.0125s
  Gen  1        14 colls,     0 par    0.607s   0.606s     0.0433s    0.2244s

  TASKS: 4 (1 bound, 3 peak workers (3 total), using -N1)

  SPARKS: 0 (0 converted, 0 overflowed, 0 dud, 0 GC'd, 0 fizzled)

  INIT    time    0.001s  (  0.000s elapsed)
  MUT     time   20.219s  ( 20.493s elapsed)
  GC      time    1.514s  (  1.513s elapsed)
  EXIT    time    0.000s  (  0.005s elapsed)
  Total   time   21.733s  ( 22.010s elapsed)

  Alloc rate    550,585,275 bytes per MUT second

  Productivity  93.0% of total user, 93.1% of total elapsed
$ echo ':load MultiLayerModules' | ../inplace/bin/ghc-stage2 --interactive -keep-tmp-files +RTS -s
   4,603,831,672 bytes allocated in the heap
     971,623,904 bytes copied during GC
     184,019,808 bytes maximum residency (14 sample(s))
       2,262,680 bytes maximum slop
             365 MB total memory in use (0 MB lost due to fragmentation)

                                     Tot time (elapsed)  Avg pause  Max pause
  Gen  0       448 colls,     0 par    0.724s   0.723s     0.0016s    0.0321s
  Gen  1        14 colls,     0 par    0.621s   0.620s     0.0443s    0.2242s

  TASKS: 4 (1 bound, 3 peak workers (3 total), using -N1)

  SPARKS: 0 (0 converted, 0 overflowed, 0 dud, 0 GC'd, 0 fizzled)

  INIT    time    0.001s  (  0.000s elapsed)
  MUT     time    7.966s  (  8.202s elapsed)
  GC      time    1.345s  (  1.344s elapsed)
  EXIT    time    0.000s  (  0.004s elapsed)
  Total   time    9.312s  (  9.550s elapsed)

  Alloc rate    577,938,762 bytes per MUT second

  Productivity  85.5% of total user, 85.9% of total elapsed

So it's 2x slower and allocates 2.5x more.

Profiling pointed to https://phabricator.haskell.org/diffusion/GHC/browse/master/compiler/main/SysTools.hs;8bf50d5026f92eb5a6768eb2ac38479802da1411$1074 We're creating dont_delete_set a lot.

Looks like this was improved in D3111 recently.

Change History (8)

comment:1 Changed 16 months ago by niteria

Douglas Wilson (new contributor) was interested in looking at this. I've created this to coordinate.

comment:2 Changed 16 months ago by duog

Cc: duog added
Owner: set to duog

Thanks niteria, I'll look at this soon.

comment:3 Changed 16 months ago by duog

Differential Rev(s): D3620
Status: newpatch

comment:4 Changed 16 months ago by RyanGlScott

Differential Rev(s): D3620Phab:D3620

comment:5 Changed 16 months ago by Ben Gamari <ben@…>

In 3ee3822c/ghc:

Refactor temp files cleanup

Remove filesToNotIntermediateClean from DynFlags, create a data type
FilesToClean, and change filesToClean in DynFlags to be a FilesToClean.

Modify SysTools.newTempName and the Temporary constructor of
PipelineMonad.PipelineOutput to take a TempFileLifetime, which specifies
whether a temp file should live until the end of GhcMonad.withSession,
or until the next time cleanIntermediateTempFiles is called.

These changes allow the cleaning of intermediate files in GhcMake to be
much more efficient.

HscTypes.hptObjs is removed as it is no longer used.

A new performance test T13701 is added, which passes both with and
without -keep-tmp-files.  The test fails by 25% without the patch, and
passes when -keep-tmp-files is added.

Note that there are still at two hotspots caused by
algorithms quadratic in the number of modules, however neither of them
allocate. They are:

* DriverPipeline.compileOne'.needsLinker
* GhcMake.getModLoop

DriverPipeline.compileOne'.needsLinker is changed slightly to improve
the situation.

I don't like adding these Types to DynFlags, but they need to be seen by
Dynflags, SysTools and PipelineMonad. The alternative seems to be to
create a new module.

Reviewers: austin, hvr, bgamari, dfeuer, niteria, simonmar, erikd

Reviewed By: simonmar

Subscribers: rwbarton, thomie

GHC Trac Issues: #13701

Differential Revision: https://phabricator.haskell.org/D3620

comment:6 Changed 16 months ago by bgamari

Milestone: 8.4.1
Resolution: fixed
Status: patchclosed

comment:7 Changed 16 months ago by duog

Test Case: T13701

comment:8 Changed 15 months ago by nomeata

At least on perf.haskell.org, this test case is very flaky, and varies a lot between runs. So if you come here confused because your unrelated change looks like a regression, then that might just be the reason.

Note: See TracTickets for help on using tickets.