Opened 8 months ago

Last modified 5 weeks ago

#13426 new bug

compile-time memory-usage regression for DynFlags between GHC 8.0 and GHC 8.2

Reported by: hvr Owned by: dfeuer
Priority: high Milestone: 8.4.1
Component: Compiler Version: 8.1
Keywords: Cc: niteria, asr
Operating System: Unknown/Multiple Architecture: Unknown/Multiple
Type of failure: Compile-time performance bug Test Case:
Blocked By: Blocking:
Related Tickets: #7258 Differential Rev(s): Phab:D3399, Phab:D3400, Phab:D3421
Wiki Page:

Description

I notice that buildbots were having a hard-time building GHC 8.2 snapshots... so I investigated...

One thing that jumped at me was the memory usage while compiling DynFlags which is the worst memory-offender with 4777MiB of resident size:

"inplace/bin/ghc-stage1" -hisuf hi -osuf  o -hcsuf hc -static  -H32m -O -Wall   -Iincludes -Iincludes/dist -Iincludes/dist-derivedconstants/header -Iincludes/dist-ghcconstants/header   -this-unit-id ghc-8.2.0.201
70313 -hide-all-packages -i -icompiler/backpack -icompiler/basicTypes -icompiler/cmm -icompiler/codeGen -icompiler/coreSyn -icompiler/deSugar -icompiler/ghci -icompiler/hsSyn -icompiler/iface -icompiler/llvmGen -
icompiler/main -icompiler/nativeGen -icompiler/parser -icompiler/prelude -icompiler/profiling -icompiler/rename -icompiler/simplCore -icompiler/simplStg -icompiler/specialise -icompiler/stgSyn -icompiler/stranal 
-icompiler/typecheck -icompiler/types -icompiler/utils -icompiler/vectorise -icompiler/stage2/build -Icompiler/stage2/build -icompiler/stage2/build/./autogen -Icompiler/stage2/build/./autogen -Icompiler/. -Icompi
ler/parser -Icompiler/utils -Icompiler/../rts/dist/build -Icompiler/stage2   -optP-DGHCI -optP-include -optPcompiler/stage2/build/./autogen/cabal_macros.h -package-id base-4.10.0.0 -package-id deepseq-1.4.3.0 -pa
ckage-id directory-1.3.0.2 -package-id process-1.4.3.0 -package-id bytestring-0.10.8.2 -package-id binary-0.8.4.1 -package-id time-1.8.0.1 -package-id containers-0.5.10.2 -package-id array-0.5.1.2 -package-id fil
epath-1.4.1.1 -package-id template-haskell-2.12.0.0 -package-id hpc-0.6.0.3 -package-id transformers-0.5.2.0 -package-id ghc-boot-8.2.0.20170313 -package-id ghc-boot-th-8.2.0.20170313 -package-id ghci-8.2.0.20170
313 -package-id hoopl-3.10.2.2 -package-id unix-2.7.2.1 -package-id terminfo-0.4.0.2 -Wall -fno-warn-name-shadowing -this-unit-id ghc -XHaskell2010 -optc-DTHREADED_RTS -DGHCI_TABLES_NEXT_TO_CODE -DSTAGE=2 -Rghc-t
iming -O2  -no-user-package-db -rtsopts      -Wnoncanonical-monad-instances  -odir compiler/stage2/build -hidir compiler/stage2/build -stubdir compiler/stage2/build   -dynamic-too -c compiler/main/DynFlags.hs -o 
compiler/stage2/build/DynFlags.o -dyno compiler/stage2/build/DynFlags.dyn_o
<<ghc: 63014533832 bytes, 4032 GCs, 395131448/1973969928 avg/max bytes residency (24 samples), 4777M in use, 0.000 INIT (0.000 elapsed), 32.207 MUT (33.221 elapsed), 26.682 GC (26.685 elapsed) :ghc>>

Then I did two clean full builds of the ghc-8.0 and ghc-8.2 branch, and collected the -Rghc-timings output into simple 3-columns .csv files;

Here's are the respective top-12 entries for GHC 8.0 and GHC 8.2 respectively:

$ sort -t, -k2n build.80.csv | tail -n12
"inplace/bin/ghc-stage1",560,compiler/stage2/build/TcRnDriver.o
"inplace/bin/ghc-stage1",572,compiler/stage2/build/TcRnDriver.p_o
"inplace/bin/ghc-stage1",572,compiler/stage2/build/TcSplice.o
"inplace/bin/ghc-stage1",847,compiler/stage2/build/OptCoercion.p_o
"/opt/ghc/bin/ghc",979,compiler/stage1/build/OptCoercion.o
"/opt/ghc/bin/ghc",1036,compiler/stage1/build/DynFlags.o
"/opt/ghc/bin/ghc",1040,compiler/stage1/build/X86/CodeGen.o
"inplace/bin/ghc-stage1",1119,compiler/stage2/build/DynFlags.p_o
"inplace/bin/ghc-stage1",1123,compiler/stage2/build/DynFlags.o
"inplace/bin/ghc-stage1",1178,compiler/stage2/build/OptCoercion.o
"inplace/bin/ghc-stage1",1233,compiler/stage2/build/X86/CodeGen.o
"inplace/bin/ghc-stage1",1282,compiler/stage2/build/X86/CodeGen.p_o
$ sort -t, -k2n build.82.csv | tail -n12
"inplace/bin/ghc-stage1",891,compiler/stage2/build/CmmParse.p_o
"/opt/ghc/bin/ghc",1053,compiler/stage1/build/DynFlags.o
"/opt/ghc/bin/ghc",1058,compiler/stage1/build/OptCoercion.o
"inplace/bin/ghc-stage1",1284,compiler/stage2/build/HsExpr.p_o
"inplace/bin/ghc-stage1",1313,compiler/stage2/build/HsExpr.o
"/opt/ghc/bin/ghc",1394,compiler/stage1/build/X86/CodeGen.o
"inplace/bin/ghc-stage1",1495,compiler/stage2/build/X86/CodeGen.o
"inplace/bin/ghc-stage1",1605,compiler/stage2/build/X86/CodeGen.p_o
"inplace/bin/ghc-stage1",1620,compiler/stage2/build/OptCoercion.o
"inplace/bin/ghc-stage1",1733,compiler/stage2/build/OptCoercion.p_o
"inplace/bin/ghc-stage1",3989,compiler/stage2/build/DynFlags.p_o
"inplace/bin/ghc-stage1",4777,compiler/stage2/build/DynFlags.o

What's apparent from this data:

  • GHC stage0 (which was GHC 8.0.2) is able to build DynFlags.o for GHC 8.0 and GHC 8.2 w/ roughtly the same amount of max resident memory, i.e. 1 GiB
  • GHC 8.2 stage1 however needs 4 times the amount of memory more than GHC 8.0 to build the same DynFlags.hs module**

Attachments (8)

ghc82-stage0-verbose-output.log (9.5 KB) - added by hvr 8 months ago.
Compiling GHC 8.2's stage1 DynFlags.hs w/ stage0 (GHC 8.0.2)
ghc82-stage1-verbose-output.log (14.3 KB) - added by hvr 8 months ago.
Compiling GHC 8.2's stage2 DynFlags.hs w/ stage1 (GHC 8.2)
ghc82-stage1-verbose-output.revert.log (14.5 KB) - added by hvr 8 months ago.
Compiling GHC 8.2's stage2 DynFlags.hs w/ stage1 (GHC 8.2) & 2effe18ab reverted
ghc82-stage1-verbose-output.O1.log (13.2 KB) - added by hvr 8 months ago.
Compiling GHC 8.2's stage2 DynFlags.hs w/ stage1 (GHC 8.2) (at -O1 instead of -O2; 2effe is NOT reverted here)
ghc-8.0.svg (122.3 KB) - added by bgamari 8 months ago.
Building DynFlags with GHC 8.0
ghc-8.2.svg (131.8 KB) - added by bgamari 8 months ago.
Building DynFlags with GHC 8.2
ghc-8.0.log (9.6 KB) - added by bgamari 8 months ago.
Log output from GHC 8.0
ghc-8.2.log (13.2 KB) - added by bgamari 8 months ago.
Log output from GHC 8.2

Download all attachments as: .zip

Change History (56)

Changed 8 months ago by hvr

Compiling GHC 8.2's stage1 DynFlags.hs w/ stage0 (GHC 8.0.2)

Changed 8 months ago by hvr

Compiling GHC 8.2's stage2 DynFlags.hs w/ stage1 (GHC 8.2)

comment:1 Changed 8 months ago by simonpj

That is bad! Let's find out why. Something that egregious MUST have low-handing fruit.

comment:2 Changed 8 months ago by rwbarton

I have similar numbers from after the join points commit (Feb 1) and they do not show this large regression. So my educated guess is that the new Typeable stuff is responsible.

comment:3 Changed 8 months ago by hvr

Ben suggested that reverting 2effe18ab51d66474724d38b20e49cc1b8738f60 could have an interesting effect... however it appears to have things worse :-)

<<ghc: 60080847424 bytes, 4089 GCs, 436124979/1969454128 avg/max bytes residency (24 samples), 5367M in use, 0.000 INIT (0.000 elapsed), 33.164 MUT (34.275 elapsed), 28.562 GC (28.606 elapsed) :ghc>>

I'll attach --verbose output as well

Changed 8 months ago by hvr

Compiling GHC 8.2's stage2 DynFlags.hs w/ stage1 (GHC 8.2) & 2effe18ab reverted

Changed 8 months ago by hvr

Compiling GHC 8.2's stage2 DynFlags.hs w/ stage1 (GHC 8.2) (at -O1 instead of -O2; 2effe is NOT reverted here)

comment:4 Changed 8 months ago by niteria

Cc: niteria added

I've found myself unable to build GHC inside vagrant running on VirtualBox with 7GB of memory. It always gets killed by OOM killer when compiling DynFlags.

Changed 8 months ago by bgamari

Attachment: ghc-8.0.svg added

Building DynFlags with GHC 8.0

Changed 8 months ago by bgamari

Attachment: ghc-8.2.svg added

Building DynFlags with GHC 8.2

comment:5 Changed 8 months ago by bgamari

Indeed something catastrophic is happening here; this can be plainly seen in the +RTS -h profiles I attached above. These were produced by building two GHC trees, one bootstrapping with 8.2 and another with 8.0.2, taking the DynFlags command line from the stage1 build, and adding -fforce-recomp -v +RTS -h >log 2>&1 to it. Clearly we have a significant amount of leakage. However, otherwise the Core program sizes are comparable on average (GHC 8.2 gets bigger earlier than 8.0, but reduces to be a bit smaller by the end of simplification).

Last edited 8 months ago by bgamari (previous) (diff)

Changed 8 months ago by bgamari

Attachment: ghc-8.0.log added

Log output from GHC 8.0

Changed 8 months ago by bgamari

Attachment: ghc-8.2.log added

Log output from GHC 8.2

comment:6 Changed 8 months ago by rwbarton

I bisected and found out that the offending commit was none other than 8d5cf8bf584fd4849917c29d82dcf46ee75dd035 "Join points". I didn't see this in my previous data because, ironically, -dcore-lint causes the large space usage to go away. So it seems likely to be a garden-variety thunk buildup in some field which is examined by core lint, but otherwise not examined until late in the compilation pipeline.

comment:7 Changed 8 months ago by simonpj

Great work. There is definitely a space leak here and it should not be to hard to find. Some ideas

  • In CoreLint.lintSingleBinding you'll find a series of tests. If you comment then out one by one, you'll find when the space leak comes back. That test must be forcing the offending bit of IdInfo.
  • Does the leak happen when you use -dshow-passes? That shows the size of eac intermediate, and exprSize is deliberately strict (see CoreStats.bndrSize).

You could try making CoreSeq.megaSeqidInfo actually force the unfolding info (currently commented out) and see if that helps. (I'm not sure WHY it is commented out, incidentally.)

comment:8 in reply to:  7 Changed 8 months ago by rwbarton

Replying to simonpj:

Great work. There is definitely a space leak here and it should not be to hard to find. Some ideas

  • In CoreLint.lintSingleBinding you'll find a series of tests. If you comment then out one by one, you'll find when the space leak comes back. That test must be forcing the offending bit of IdInfo.

I commented out lintIdUnfolding and then the leak came back with -dcore-lint enabled.

  • Does the leak happen when you use -dshow-passes? That shows the size of eac intermediate,

and exprSize is deliberately strict (see CoreStats.bndrSize).

I've been using -v, which I think shows the same information, and it doesn't eliminate the leak.

You could try making CoreSeq.megaSeqidInfo actually force the unfolding info (currently commented out) and see if that helps. (I'm not sure WHY it is commented out, incidentally.)

Instead I defined a new version of seqUnfolding, based on lintIdUnfolding, and added it to megaSeqIdInfo:

miniSeqUnfolding :: Unfolding -> ()
miniSeqUnfolding (CoreUnfolding { uf_tmpl = e, uf_src = src })
  | isStableSource src
  = seqExpr e
miniSeqUnfolding (DFunUnfolding { df_con = con, df_bndrs = bndrs
                                            , df_args = args })
  = seqExprs args
miniSeqUnfolding _ = ()

This was enough to make the leak go away, even without using -dcore-lint. Both the CoreUnfolding and DFunUnfolding cases are needed to make the leak go away; or perhaps there are two leaks. (Without the isStableSource guard, the runtime exploded.)

So there are thunks in the expressions inside unfoldings which, I believe, point to old versions of the Core program. I'm going to continue invesigating but I'm out of time for today.

comment:9 Changed 8 months ago by rwbarton

I now suspect this leak doesn't really have anything to do with unfoldings in particular. It just shows up that way as a result of the fact that the evaluation of coreBindsSize in the simplifier loop forces most parts of the Core program, but does not force unfoldings (as mentioned in comment:7). It seems that the simplifier repeatedly traverses these stable Core and DFun unfoldings but there is some part of the new unfolding that the next simplifier iteration does not look at that retains a reference to the old version of the program.

I also suspect that the evaluation of coreBindsSize in the simplifier loop should be unnecessary if there weren't space leaks in the simplifier, and if we got rid of it we would see this space leak not just in unfoldings but everywhere, and possibly in more programs, which might make it easier to track down.

I spent a lot of time using some RTS functions to dump heap representations and in particular to look for thunks and (using DWARF information) find where they were coming from. But it turned out not to be a good approach because of how the simplifier is structured. Basically, the output of each simplifier iteration contains a large number of thunks, of which most (but apparently not all) will be forced by the next simplifier iteration, and replaced by a large number of new thunks. I don't know how to find only the thunks that aren't getting forced by this process.

comment:10 Changed 8 months ago by simonpj

Very useful to know that the stable unfoldings alone are enough.

I have an idea. Consider Simplify line 1005

simplExprF1 env (Type ty)      cont = ASSERT( contIsRhsOrArg cont )
                                      rebuild env (Type (substTy env ty)) cont

Yikes! That (substTy env ty) is a thunk that the simplifier may not force; and that'll hold onto env which is disater.

Replace the RHS by

  = do { ty' <- simplType env ty
       ; retbuild env (Type ty') cont }

You'll see that simplType is careful to seq on substituted type; and TyCoRep.substTy has the property (I think, though it is sadly not documented) that seqing on the result is enough to push the substitution right through the type (see the $! calls in subst_ty.

It is very unsatisfactory that this kind of leak is SO hard to find.

But finding it will help ALL programs!

comment:11 in reply to:  10 Changed 8 months ago by rwbarton

Replying to simonpj:

I have an idea. Consider Simplify line 1005

simplExprF1 env (Type ty)      cont = ASSERT( contIsRhsOrArg cont )
                                      rebuild env (Type (substTy env ty)) cont

This wasn't enough unfortunately.

Some more RTS/profiling hackery led me to... the next case of the same function:

simplExprF1 env (App fun arg) cont
  = simplExprF env fun $
    case arg of
      Type ty -> ApplyToTy  { sc_arg_ty  = substTy env ty
                            , sc_hole_ty = substTy env (exprType fun)
                            , sc_cont    = cont }
      _       -> ApplyToVal { sc_arg = arg, sc_env = env
                            , sc_dup = NoDup, sc_cont = cont }

Here sc_arg_ty and sc_hole_ty apparently need a similar treatment. And with that the space leak seems to be gone! I'll confirm with a clean rebuild.

What I still don't understand is why this wasn't always a problem, since those lines have not changed in a long time. I guess there is some new code that manipulates ApplyToTy, so maybe the old version of that code did enough evaluation to avoid the space leak.

comment:12 Changed 8 months ago by rwbarton

Owner: set to rwbarton

comment:13 Changed 8 months ago by simonpj

Well done! Yes, use simplType! Can you add a Note to simplType explaining why it is so important to use it rather than naked substTy calls; and pointing to this ticket?

I suspect that it has been a problem for a long time; just that no one has noticed.

It's very uncomfortable that this is so delicate; and even when recognised is so hard to find.

comment:14 Changed 8 months ago by simonpj

PS: you may then be able to get rid of the call to coreBindsSize in SimplCore. If so, that would be good: faster compilation! Worth trying... it might I suppose show up another leak currently patched by coreBindsSize.

comment:15 Changed 8 months ago by rwbarton

There was actually a third required change which I had guessed at earlier, namely changing

simplExprF1 env (Case scrut bndr _ alts) cont
  = simplExprF env scrut (Select { sc_dup = NoDup, sc_bndr = bndr

to

simplExprF1 env (Case scrut bndr ty alts) cont
  = seqType ty `seq`
    simplExprF env scrut (Select { sc_dup = NoDup, sc_bndr = bndr

Alternatively, one can insert a seqType into reallyRebuildCase:

@@ -2222,7 +2223,7 @@ reallyRebuildCase env scrut case_bndr alts cont
 
         ; dflags <- getDynFlags
         ; let alts_ty' = contResultType dup_cont
-        ; case_expr <- seqType alts_ty' `seq` mkCase dflags scrut' case_bndr' alts_ty' alts'
+        ; case_expr <- mkCase dflags scrut' case_bndr' alts_ty' alts'

I don't understand specifically why these changes make any difference, especially the first one, where we were throwing away the ty field anyways. It must share structure with some other part of the program, but I'm not sure what.

Replying to simonpj:

PS: you may then be able to get rid of the call to coreBindsSize in SimplCore. If so, that would be good: faster compilation! Worth trying... it might I suppose show up another leak currently patched by coreBindsSize.

Yes, I'll try that next as I have a good setup for finding leaks now.

comment:16 Changed 8 months ago by simonpj

Wow, I have no idea how you found that! Well done.

Alternatively, one can insert a seqType into reallyRebuildCase:

That would be much better. That kills the leak when we build a Case, whereas the former alternative waits until the subsequent run of the simplifier before squashing it.

Please add a Note to explain the seq (point to this ticket too).

Does it need to be a seqType or will seq do? I expect that seq would be enough, since the OutType returned by contResultType is (I think) a result of calling simplType.

It must share structure with some other part of the program, but I'm not sure what.

My guess: exprType often returns that type. Still I'm surprised that it leads to a cumulative space leak (i.e. every growing). Does it?

comment:17 in reply to:  14 Changed 8 months ago by rwbarton

Differential Rev(s): Phab:D3399, Phab:D3400

Replying to simonpj:

PS: you may then be able to get rid of the call to coreBindsSize in SimplCore. If so, that would be good: faster compilation! Worth trying... it might I suppose show up another leak currently patched by coreBindsSize.

This line of investigation already paid some dividends. It turns out that that coreBindsSize is what takes care of forcing the demand and strictness info produced by the demand analysis pass. Otherwise, the demand analysis pass leaks basically a copy of the whole Core program.

But late demand analysis runs after all simplifier passes, and there is no more coreBindsSize or seqBinds after that. Presumably code generation uses some of the information added by this pass, but it doesn't force all of it; and the result is that we hang on to two copies of the Core program during code generation. And code generation is the phase that uses the most memory anyways, so this makes a big difference in peak memory usage.

As a crude measure I added a seqBinds to demand analysis (Phab:D3400) and it reduced peak memory usage on DynFlags by around 30% and even made the running time a few percent faster (I think because of reduced GC work).

comment:18 in reply to:  14 Changed 8 months ago by rwbarton

Replying to simonpj:

PS: you may then be able to get rid of the call to coreBindsSize in SimplCore. If so, that would be good: faster compilation! Worth trying... it might I suppose show up another leak currently patched by coreBindsSize.

The input core program size is used to bound the number of simplifier ticks so we can't remove the call to coreBindsSize entirely without affecting the behavior of the simplifier. I'm not sure whether we could make do with a more crude estimate of program size.

However, coreBindsSize currently goes out of its way to force a lot of structure (like seqBinds does) that is not needed to compute the Core program size. So an easy intermediate option is to just remove all the extra seqs from coreBindsSize. This still leaves a lot less work than coreBindsSize does currently and forcing the main program body may be useful.

I almost have this working but in order to fix another space leak I had to make another similar change of

simplExpr env expr = simplExprC env expr (mkBoringStop expr_out_ty)
  where
    expr_out_ty :: OutType
    expr_out_ty = substTy env (exprType expr)

to

simplExpr env expr
  = do { expr_out_ty <- simplType env (exprType expr)
       ; simplExprC env expr (mkBoringStop expr_out_ty) }

However, this also produces a lot of warnings like

exprType TYPE: ModuleName

which seem to be from calling simplExpr on a Type t. Is it a bug that that happens in the first place? What is the best way to handle this? I get another smal improvement in space and time from this change so I'm hoping this is easy to finish off. Please advise!

comment:19 Changed 8 months ago by simonpj

About forcing the type before passing to mkBoringStop. Often the type passed into mkBoringStop is discarded unused, so I'd prefer not to force it when building a SimplCont. I'd prefer to force it when getting it out of the SimplCont.

Notably, contResultType does that. So, possible alternative plan: when extracting the contResultType can we just force it then? E.g.

        ; let alts_ty' = contResultType dup_cont
        ; case_expr <- mkCase dflags scrut' case_bndr' alts_ty' alts'

Aha -- you have already added a seq there! Here's another (line 1622 of Simplify)

missingAlt env case_bndr _ cont
  = WARN( True, text "missingAlt" <+> ppr case_bndr )
    return (env, mkImpossibleExpr (contResultType cont))

Here want to force that type. Ditto here (line 1656)

  | not (contIsTrivial cont)     -- Only do this if there is a non-trivial
  = return (env, castBottomExpr res cont_ty)  -- continuation to discard, else we do it
  where                                       -- again and again!
    res     = argInfoExpr fun rev_args
    cont_ty = contResultType cont

But not here (line 1591)

    trim_cont 0 cont
      = mkBoringStop (contResultType cont)
}}
because we are just building another `SimplCont`.

comment:20 Changed 8 months ago by Simon Peyton Jones <simonpj@…>

In 2964527/ghc:

Refactor simplExpr (Type ty)

This small refactoring, provoked by comment:18 on Trac #13426,
makes it so that simplExprF never gets a (Type ty) expression to
simplify, which in turn means that calls to exprType on its argument
will always succeed.

No change in behaviour.

comment:21 Changed 8 months ago by simonpj

Please advise!

I've pushed a fix (to HEAD) that should resolve this problem.

comment:22 Changed 8 months ago by Ben Gamari <ben@…>

In e13419c5/ghc:

Fix space leaks in simplifier (#13426)

The Join points commit (8d5cf8bf) introduced a space leak
somewhere in the simplifier. The extra strictness added in this commit
fixes the leak. Unfortunately I don't really understand the details.

Unfortunately, the extra strictness appears to result in more overall
allocations in some cases, even while the peak heap size decreases in others.

Test Plan: harbormaster

Reviewers: austin, bgamari

Reviewed By: bgamari

Subscribers: thomie

Differential Revision: https://phabricator.haskell.org/D3399

comment:23 Changed 8 months ago by bgamari

Resolution: fixed
Status: newclosed

comment:24 Changed 8 months ago by simonpj

Owner: rwbarton deleted
Resolution: fixed
Status: closednew

Ben, I'm puzzled here. This commit seems to implement comment:15 (in rebuildCase) but not comment:11 and its antecedents. How does your commit relate to Reid's work on this ticket?

And did this one change cure the leak? Reid thinks not?

I'd like to work through comment:19 too.

Moreover, presumably the commit you did make (in rebuildCase) presumably does help something substantially, or you would not have done it (esp given its equivocal effect on other perf tests). But neither the commit message, nor (more importantly) the comment in the code, says what! DynFlags perhaps? Please say, so that someone wondering if they can now take it out knows what they have to test against.

I'm re-opening pending resolving some of these mysteries.

comment:25 Changed 8 months ago by bgamari

I'll have to defer to Reid on this. I merely merged the patch he provided in response to my request for a fix.

comment:26 Changed 8 months ago by rwbarton

Differential Rev(s): Phab:D3399, Phab:D3400Phab:D3399, Phab:D3400, Phab:D3421

Yes, I'm not sure what happened here but one of the parts of the patch seems to have been lost in a rebase. Thanks for catching this Simon!

Moreover, presumably the commit you did make (in rebuildCase) presumably does help something substantially, or you would not have done it (esp given its equivocal effect on other perf tests). But neither the commit message, nor (more importantly) the comment in the code, says what! DynFlags perhaps? Please say, so that someone wondering if they can now take it out knows what they have to test against.

Yes, good point. Together with the missing part of the patch, it reduces the space usage while compiling DynFlags by at least a factor of 2. Unfortunately I was unable to construct a simpler test case that shows a similarly dramatic improvement, or diagnose exactly what is so special about DynFlags... maybe I'll try again to come up with a perf test case, even if the effect is not as dramatic as for DynFlags.

comment:27 Changed 8 months ago by rwbarton

Regarding comment:19: this is only needed for the subsequent change to make coreBindsSize less strict without introducing new space leaks. It's not needed to get the space usage under control in the first place, for which Phab:D3421 should be sufficient.

That said I think it is still worth pursuing that line, I'm just not sure about the timing for 8.2.1.

comment:28 Changed 8 months ago by rwbarton

Now I see that Phab:D3421 is also about forcing types that go into a SimplCont, so is the situation similar to that of comment:19 where it would be better to force the types when we take them out of the SimplCont and put them into the actual program?

comment:29 Changed 8 months ago by simonpj

would be better to force the types when we take them out of the SimplCont and put them into the actual program?

Yes for the type in the Stop continuation; but I think no for the types in ApplyToTy continuation, which will certainly be used. I'd force the latter on the way in; but the former only on the way out in contResultType. Does that make sense?

comment:30 Changed 8 months ago by rwbarton

After writing comment:28 I got the validate results from Phab:D3421 which show a ~75% increase in allocations when compiling T5631 which involves many very large types; the only change in that commit being to the App fun arg case of simplExprF1.

I see that the sc_arg_ty is (almost) always used because it occurs as the type a function is applied to in the output program. But sc_hole_ty is only ever used via contHoleType and put in a mkBoringStop. There is even a Note [The hole type in ApplyToTy] which suggests that evaluating sc_hole_ty eagerly would be bad.

I tried going back to using substTy for the sc_hole_ty field, while using simplType for the sc_arg_ty field. This still avoids the space leak when compiling DynFlags and also avoids the regression in T5631. So it seems to be the right thing to do.

comment:31 Changed 8 months ago by simonpj

But sc_hole_ty is only ever used via contHoleType and put in a mkBoringStop

Ah, very good! Yes yes.

Let's make sure we document all this carefully when we are done :-).

Simon

comment:32 Changed 8 months ago by Ben Gamari <ben@…>

In 59c925e8/ghc:

More changes to fix a space leak in the simplifier (#13426)

Part of e13419c55 was accidentally lost during a rebase. This commit
adds the missing change, along with some more improvements
regarding where we do and don't use `seqType`.

Also include a comment about where the space leak showed up
and a Note explaining the strategy being used here.

Test Plan: harbormaster, plus local testing on DynFlags

Reviewers: austin, bgamari

Reviewed By: bgamari

Subscribers: thomie

Differential Revision: https://phabricator.haskell.org/D3421

comment:33 Changed 8 months ago by bgamari

Resolution: fixed
Status: newclosed

comment:34 Changed 8 months ago by simonpj

Resolution: fixed
Status: closednew

I'm re-opening because we have not closed the discussion I opened in comment:19. We discussed it on Tuesday and Reid was going to try forcing the type on the way out of SimplCont rather than on the way in.

OK, Reid?

comment:35 Changed 8 months ago by rwbarton

I think that is what I've done in the final version of Phab:D3421.

comment:36 Changed 8 months ago by simonpj

Resolution: fixed
Status: newclosed

Ah yes, I wasn't looking hard enough. Thanks.

comment:37 Changed 7 months ago by simonpj

Resolution: fixed
Status: closednew

Reopening again, because we still have the Big Hamme

      | let sz = coreBindsSize binds
      , () <- sz `seq` ()     -- Force it

in SimplCore. Is this still necessary? Could we just remove it altogether?

comment:38 Changed 7 months ago by asr

Cc: asr added

comment:39 Changed 6 months ago by dfeuer

I made the big hammer smaller (see Phab:D3606). Things don't go utterly disastrous, but there are still a few modules over a gigabyte (max somewhere around 1700M). I'm not sure exactly which ones; is there a way to find out without doing a single-threaded build? I'll try to do some profiling and see if I can figure out what's going wrong, exactly.

comment:40 Changed 6 months ago by bgamari

I really wish that the build system could record RTS statistics on a per-file basis (e.g. perhaps just emitting a log file per GHC process). That would make collecting this sort of thing much easier.

comment:41 in reply to:  40 Changed 6 months ago by hvr

Replying to bgamari:

I really wish that the build system could record RTS statistics on a per-file basis (e.g. perhaps just emitting a log file per GHC process). That would make collecting this sort of thing much easier.

We already emit some extra output (mostly for debugging purposes) to -dumpdir dir, so there'd be precedent to use output files for logging data, rather than dump it to stdout only. So one way would be to extend -Rghc-timing to optionally take a filename, e.g. -Rghc-timing=rts-stats.log; similarly for other statistics David pointed out (e.g. optimiser stats).

comment:42 Changed 5 months ago by Ben Gamari <ben@…>

In 2088d0be/ghc:

Stop forcing everything in coreBindsSize

`coreBindsSize` forced a ton of structure to stop space leaks.
Reid Barton has done some work recently to try to stop the leaks
at their source instead. Memory residency remains well below the
numbers Herbert posted on #13426 originally, but in some cases
a ways above the ones from 8.0. I need to figure out how to get
the numbers matched up to individual modules and do some
profiling.

Relates to #13426

Reviewers: austin, bgamari

Reviewed By: bgamari

Subscribers: rwbarton, thomie

Differential Revision: https://phabricator.haskell.org/D3606

comment:43 Changed 5 months ago by simonpj

Hang on

in some cases a ways above the ones from 8.0.

But I see no changes in the tessuite in this patch. So what regressed?

Or, were all the regressions relative to 8.0, and had already been accepted for 8.2?

Which cases in particular regressed wrt 8.0? "Some cases" isn't very explicit!

Did removing the coreBindsSize improve compiler speed (slightly) becuase of doing less work?

comment:44 Changed 5 months ago by bgamari

dfeuer will need to comment on what specifically "some cases" refers to, but the patch in comment:42 removes some seqs which don't appear to be necessary to prevent thunk leaks. It doesn't completely remove the coreBindsSize. I would expect that this did have a positive impact on compiler speed, but in practice this effect appears to be too small to measure.

David, perhaps you could have a look into eliminating the coreBindsSize entirely, replacing the current simplifier ticks check with a high absolute upper bound on ticks as we discussed a few weeks ago?

Last edited 5 months ago by bgamari (previous) (diff)

comment:45 Changed 5 months ago by bgamari

Milestone: 8.2.18.2.2
Owner: set to dfeuer

dfeuer is going to look into eliminating the need for coreBindsSize on every simplifier pass.

comment:46 Changed 5 months ago by bgamari

Milestone: 8.2.28.4.1

comment:47 Changed 5 months ago by dfeuer

If I remove the size calculation altogether, allowing unlimited simplifier ticks, then we get performance regressions. This suggests that some of the leaks are still with us, although apparently not nearly as severe as they once were. I'm working on making the size calculation a bit more efficient, but we're going to leave the rest alone for now.

comment:48 Changed 5 weeks ago by dfeuer

Note: See TracTickets for help on using tickets.