Opened 2 years ago

Last modified 4 days ago

#9370 new bug

unfolding info as seen when building a module depends on flags in a previously-compiled module

Reported by: carter Owned by: richardfung
Priority: high Milestone: 8.2.1
Component: Compiler Version: 7.8.3
Keywords: newcomer, Inlining Cc: ekmett
Operating System: Unknown/Multiple Architecture: Unknown/Multiple
Type of failure: Compile-time performance bug Test Case:
Blocked By: Blocking:
Related Tickets: #8635 Differential Rev(s):
Wiki Page:

Description

I've observed a blowup in ghc memory usage when invoked with parallel build flags.

to reproduce

cabal get xmlhtml-0.2.3.2
cabal install xmlhtml-0.2.3.2 --only-dependencies
cd xmlhtml-0.2.3.2 

then

 cabal clean ; cabal configure --ghc-options="-j4" ; time cabal build -j4 -v3

will take quite a while and use > 1gb of ram

whereas

 cabal clean ; cabal configure --ghc-options="-j1 ; time cabal build -j1  -v3

will use < 150mb of ram.

Based upon the output of cabal build -j4 -v3, it looks like the parallel build is spending a lottt more time in the simplifier passes. I'm not sure what changes between the parallel and sequential builds to change this. I'll try to dig into this more in a few days, but recording this problem now before i forget. (though of course any insights would be appreciated)

Change History (34)

comment:1 Changed 2 years ago by ekmett

  • Cc ekmett added

comment:2 Changed 2 years ago by hvr

  • Keywords parmake added
  • Milestone set to 7.10.1
  • Type of failure changed from None/Unknown to Compile-time performance bug

comment:3 Changed 2 years ago by simonpj

Adding -dshow-passes will show you the size of the intermediates of each pass.

Simon

comment:4 Changed 2 years ago by rwbarton

Indeed something seems to be going wrong in the simplifier when invoking ghc with -j4.

*** Checking old interface for xmlhtml-0.2.3.2:Text.XmlHtml.HTML.Meta:
[ 1 of 10] Compiling Text.XmlHtml.HTML.Meta ( src/Text/XmlHtml/HTML/Meta.hs, dist/build/Text/XmlHtml/HTML/Meta.o )
*** Parser:
*** Renamer/typechecker:
*** Desugar:
Result size of Desugar (after optimization)
  = {terms: 26,260, types: 20,021, coercions: 0}
*** Simplifier:
Result size of Simplifier iteration=1
  = {terms: 1,446,658, types: 953,432, coercions: 314,352}
...

With -ddump-inlinings I see this repeated over and over, which is not present with -j1:

Inlining done: Data.String.fromString
Inlining done: Data.Text.$fIsStringText
Inlining done: Data.Text.pack
Inlining done: Data.Text.Internal.Fusion.unstream
Inlining done: Data.Text.Internal.Fusion.Common.map
Inlining done: Data.Text.Internal.Fusion.Common.streamList
Inlining done: Data.Text.Internal.safe
Inlining done: Data.Bits.$fBitsInt_$c.&.
Inlining done: Data.Text.Internal.Fusion.Types.$WYield
[ repeats ~4000 times ]

According to -ddump-inlinings -dverbose-core2core, GHC never even considered inlining fromString with -j1.

comment:5 Changed 2 years ago by simonpj

That's bizarre. I can't think why the the simplifier would do anything different with -j1 vs -j4!

I'm hard pressed even to see where to start. Is it possible to determine when the log files start to differ? They should be identical except for the uniques (which you can suppress with -dsuppress-uniques).

Simon

comment:6 Changed 2 years ago by rwbarton

  • Summary changed from large blowup in memory usage and time when doing parallel build of xmlhtml package to OPTIONS_GHC (or its absence) leaks between modules compiled in parallel

Oh, the cause is hiding in plain sight, on the first line of Text.XmlHtml.HTML.Meta:

{-# OPTIONS_GHC -O0 -fno-case-merge -fno-strictness -fno-cse #-}

If I add the same line to the start of the nine other modules in the package, then Text.XmlHtml.HTML.Meta compiles as quickly with -j4 as with -j1. So somehow flags set by OPTIONS_GHC pragmas (or in this case, their absence) are leaking between different modules being built in parallel.

These are all dynamic flags at least, right?

comment:7 Changed 2 years ago by simonpj

Brilliant work!

I have no idea about how the leakage is taking place... that will need someone who knows how the -j stuff works in GHC.

But I am puzzled by the OPTIONS_GHC flags in Text.XmlHtml.HTML.Meta. That pretty much switches off all optimisation in GHC, and you should need a jolly good reason do to that. Indeed, the necessity of doing so signals either a bug in GHC, or else some bad rewrite rules that go into a loop. It would be good to know which.

Is the OPTIONS_GHC stuff accompanied with a prominent comment to explain such a bizarre options line?

If not, would someone care to investigate? (All this is separate from the flag leakage question, of course.)

Simon

comment:8 Changed 2 years ago by hvr

  • Priority changed from normal to highest

This sounds kinda serious

comment:9 Changed 2 years ago by rwbarton

  • Keywords parmake removed
  • Priority changed from highest to high
  • Summary changed from OPTIONS_GHC (or its absence) leaks between modules compiled in parallel to unfolding info as seen when building a module depends on flags in a previously-compiled module

OK I take it all back. The parallel build is not really responsible at all. You get the same blowup when building Text.XmlHtml.HTML.Meta with -j1 if you just add an import Text.XmlHtml.Common so that Text.XmlHtml.Common is built before Text.XmlHtml.HTML.Meta.

As far as I can tell, the reason is that when we read an interface file from an external package, we may or may not attach unfolding info to its Ids according to the dynamic flags that are in effect for the module we are building. But when we later build another module, we might reuse that external package information even though different dynamic flags are now in effect. I haven't tried to track down the code that loads interface files, but does this sound plausible?

comment:10 Changed 2 years ago by simonpj

Hmm. Is this with ghc --make? Then yes, that sounds plausible.

  • When compiling a module M with -O, and then reading an interface file to support that compilation, GHC makes the unfolding of Ids in that interface file.
  • With ghc --make when GHC goes on to compile a new module L, it doesn't re-read interface files it has already read (that is part of why --make is faster). So the already-read Ids still have those unfoldings in them.
  • As a result, if L is compiled with -O0, it will still see the unfoldings.

I can see that is perplexing. If it's important, the solution would be to disable inlining for imported functions when -O0 is on. (Actually, more precisely, it's controlled by -fignore-interface-pragmas.) I don't think it would be terribly hard to do that, although it would be an extra test on every inlining for an imported function.

I still wish someone could explain why it's so crucial for this module to be compiled with -O0. That must be a bug, either in the RULES for some package, or in GHC.

Simon

comment:11 follow-up: Changed 2 years ago by rwbarton

I would guess it's to avoid the very code size explosion (inlining 4000 copies of Text's fromString) that Carter encountered that originated this ticket. But that doesn't explain the -fno-case-merge -fno-strictness -fno-cse part. Carter has asked about this on the xmlhtml bug tracker.

The same scenario could apply in reverse, right? In the actual xmlhtml package, Text.XmlHtml.HTML.Meta is the first module built and it is built with -O0 so any interface files that are read while compiling that module will not have unfoldings attached. Then those modules will not have unfoldings during the compilation of any subsequent module either, even though those are built with -O.

comment:12 in reply to: ↑ 11 ; follow-ups: Changed 2 years ago by simonpj

Replying to rwbarton:

I would guess it's to avoid the very code size explosion (inlining 4000 copies of Text's fromString)

I would guess so too. But WHY is 4000 copies of fromString getting inlined. I doubt GHC is doing that unaided. I bet it's an INLINE pragma or RULE in Text. And if so, it's a landmine waiting to kill new victims.

The same scenario could apply in reverse, right? In the actual xmlhtml package, Text.XmlHtml.HTML.Meta is the first module built and it is built with -O0 so any interface files that are read while compiling that module will not have unfoldings attached. Then those modules will not have unfoldings during the compilation of any subsequent module either, even though those are built with -O.

No, that part at least is not so. The interface file built for Text.XmlHtml.HTML.Meta will not have unfoldings in it, but that only affects functions actually defined in Text.XmlHtml.HTML.Meta. Later modules, built with -O will certainly get interface files with unfoldings in them.

Simon

comment:13 in reply to: ↑ 12 Changed 2 years ago by rwbarton

Replying to simonpj:

Replying to rwbarton:

I would guess it's to avoid the very code size explosion (inlining 4000 copies of Text's fromString)

I would guess so too. But WHY is 4000 copies of fromString getting inlined. I doubt GHC is doing that unaided. I bet it's an INLINE pragma or RULE in Text. And if so, it's a landmine waiting to kill new victims.

I created #9400 for this issue; let's continue there.

comment:14 in reply to: ↑ 12 Changed 2 years ago by rwbarton

Replying to simonpj:

The same scenario could apply in reverse, right? In the actual xmlhtml package, Text.XmlHtml.HTML.Meta is the first module built and it is built with -O0 so any interface files that are read while compiling that module will not have unfoldings attached. Then those modules will not have unfoldings during the compilation of any subsequent module either, even though those are built with -O.

No, that part at least is not so. The interface file built for Text.XmlHtml.HTML.Meta will not have unfoldings in it, but that only affects functions actually defined in Text.XmlHtml.HTML.Meta. Later modules, built with -O will certainly get interface files with unfoldings in them.

Sorry, I should have been more clear. I mean that when GHC builds Text.XmlHtml.HTML.Meta, causing it to read the interface file for Data.String, optimization level -O0 is in effect, so GHC will not read the unfolding for Data.String.fromString. Then, when GHC builds the next module Text.XmlHtml.Common, even though it is now at optimization level -O1, it reuses its in-memory interface file information for Data.String and therefore has no unfolding available for fromString, so it cannot inline it.

If we force GHC to build Text.XmlHtml.Common before Text.XmlHtml.HTML.Meta by adding an import of the former to the latter, then GHC does inline fromString in Text.XmlHtml.Common. You can observe this either with -ddump-inlinings -dverbose-core2core, or by examining the symbol table of the resulting object file: there is no reference to an undefined symbol base_DataziString_fromString_info.

This doesn't just apply to the module Data.String, of course; we might be missing many other opportunities to inline when we build the rest of the modules in this package.

comment:15 follow-up: Changed 2 years ago by rwbarton

(For anyone who is paying careful attention to #9400 and is wondering why we have ~4000 identical inlinings here while in that ticket we have only ~2000 copies of unstream, the reason is that here we are building with -O0, so we do not process the rule which rewrites Data.Text.pack . unpackCString# to Data.Text.unpackCString#! The text library does not expect its users to follow its INLINE directives without also following its RULES.)

comment:16 in reply to: ↑ 15 Changed 2 years ago by simonpj

Replying to rwbarton:

The text library does not expect its users to follow its INLINE directives without also following its RULES.)

And THAT is the GHC bug here: we are compiling a client with -O0 but

  • we are obeying INLINE pragmas from Text,
  • we are not obeying RULES from Text.

Right?

Simon

comment:17 Changed 2 years ago by rwbarton

In the case that originated this ticket, yes. In general, the bug is that whether we obey INLINE pragmas in an imported module M depends not just on the optimization flags in effect for the module we are currently compiling, but also on the optimization flags that were in effect for the first module in this ghc --make invocation that imported M. (Or that caused GHC to load the interface file for M for any reason, if that ever happens other than because M was imported.)

comment:18 Changed 2 years ago by simonpj

Right. I'm utterly swamped, but happy to advise. I think it would not be hard to fix this.

Simon

comment:19 Changed 19 months ago by thoughtpolice

  • Milestone changed from 7.10.1 to 7.12.1

Looks like this one isn't changing in time for 7.10, so I'm moving it out.

comment:20 Changed 19 months ago by simonpj

Just to add: I think Reid's comment:14 is spot on.

The solution, I believe, is this:

  • Always read in unfoldings, etc, including with -O. (Possible exception: one-shot mode.)
  • Check the -fignore-interface-pragmas flag when considering inlining a function.

So an Id would always contain its inlining; but at the usage sites we'd decide whether or not to use it. An extra test at every inlining, but there are already a lot of tests at inlining sites!

This would be a good, small, contained project for someone. Volunteers?

If it's mission-critical for anyone, please yell.

Simon

comment:21 Changed 13 months ago by rwbarton

This is pretty much the same issue as #8635.

comment:22 Changed 12 months ago by thoughtpolice

  • Milestone changed from 7.12.1 to 8.0.1

Milestone renamed

comment:23 Changed 6 months ago by bgamari

  • Keywords newcomers added
  • Milestone changed from 8.0.1 to 8.2.1

Seems like a reasonable bite-sized project for a newcomer.

Sadly, it won't be happening for 8.0.1, however.

comment:24 Changed 4 months ago by richardfung

  • Owner set to richardfung

comment:25 Changed 3 weeks ago by mpickering

  • Keywords newcomer Inlining added; newcomers removed

Any progress on this ticket Richard?

comment:26 Changed 3 weeks ago by richardfung

Not really sadly.. I've been just trying to understand all the code.

I started looking from where --make starts, trying to go down to where the ModInfo is actually generated.

I see that mkIface_ generates the [IfaceDecl] that gets passed to addFingerprints which eventually returns the ModIface which we save and I believe reuse, but that's the farthest I've gotten; I still don't know where IfaceIdInfo is set to NoInfo.

At least.. I think that's what's happening?

Given the fact I haven't even started looking at the second part of this problem, checking when to inline, I'm beginning to wonder if I'm in over my head!

Any advice would certainly be appreciated.

comment:27 Changed 13 days ago by mpickering

t is certainly hard work on the first ticket. I spent about an hour this morning trying to find out where the right part to change was and then read Ben's email earlier in this thread which made everything much clearer for me!

The essential problem we are trying to solve is:

When a module is compiled with -O0, it does not read unfoldings from modules it imports. Because we reuse the unfolding information across modules (to avoid reloading interface files very often), this unfolding information is then not available in the rest of the compilation pipeline.

Simon's suggested fix is

  1. Always read in unfoldings, etc, including with -O. (Possible

exception: one-shot mode.)

  1. Check the -fignore-interface-pragmas flag when considering inlining

a function.

So for part 1, we need to find the part which deals with *reading in* interface files and seeing where the unfolding information is lost.

For part 2, we need to find the part of the optimiser which makes decisions about whether to inline a function.

So now I'm re-reading Ben's comment and (as usual) his analysis is spot-on. If you trace the origin of the ignore_prags flag then you'll find it is set by checking whether the option Opt_IgnoreInterfacePragmas is set or not. You can find that this flag is automatically set when the opt-level is 0 in DynFlags, I'll leave you to find the exact location!

It seems like Simon's suggested fix is removing this guard in loadDecls. It isn't clear to me whether the other information (other than the unfoldings) should still be read in or not. Simon will be able to comment.

Now for part 2: I think you should start by looking at the callSiteInline function in coreSyn/CoreUnfold. That will at least get you started looking in the right place.

Does that help?

comment:28 Changed 12 days ago by simonpj

Matthew is right.

  • Part 1: the easiest thing is to set Opt_IgnoreInterfacePragmas to true only if you specify the flag -fignore-interface-pragmas explicitly, and not with -O0. Do that in DynFlags. That way an explicit -fignore-interface-pragmas flag will still work.
  • Part 2: with O0 we want to switch off all inlining (except "compulsory" inlinings; see InlineCompulsory in CoreSyn). I think this can conveniently be done by setting sm_inline = False in the SimplMode in SimplCore. Set it in base_mode.

So actually it looks fairly simple.

Does that help?

comment:29 Changed 11 days ago by richardfung

Yes that helps tremendously! This change was a lot simpler than I expected..

I only wish I asked for help sooner. Thanks!

comment:30 Changed 9 days ago by richardfung

Sorry even though you basically did the ticket for me already I still have more questions..

I'm trying to think of a good test case to add for this but it seems like the tests are limited to comparing stdout and stderr and checking if it compiled or ran successfully, at least if I understand correctly.

That being the case, I was thinking I could make a test that runs with -ddump-inlinings and compare the output. Would that work?

Ideally I would like to run --make twice (with the scenario that someone else mentioned where we force the module with optimizations to run first and where it runs second) and compare the inlining output to make sure they're the same. This would be nice in case the -ddump-inlinings format ever changes but I'm not sure this kind of test is supported and I also don't know how likely it is that the format would change.

comment:31 Changed 9 days ago by ezyang

There are few cases where we have just directly checked in output to the test suite, grepping for -ddump I see -ddump-simpl (be sure to add -dsuppress-uniques), I also recently added a -ddump-stg.

If you want to go your original approach, you'll need to write a Makefile rule for it. If you look at, for example, testsuite/tests/driver/Makefile you can see examples. Note that you'll have to edit testsuite/driver/extra_files.py to make sure all your test files get added

comment:32 Changed 9 days ago by richardfung

Unless someone has a strong preference otherwise I'm inclined to go with the easier solution of comparing against a hard coded -ddump-inlinings output. I don't think I'd be able to manage the more complicated test without help and I'm not convinced it's worth the extra effort especially if we're already doing things the simpler way elsewhere.

Where should these tests go though?

comment:33 Changed 9 days ago by simonpj

Tests should got in testsuite/tests/simplCore

comment:34 Changed 4 days ago by richardfung

Am I correct in thinking the right way to test this is to write a function which would only be inlined if the INLINE pragma is there? Otherwise, the function would be inlined even in source files that import it with OPTIONS_GHC -O0 right?

I was completely unfamiliar with inlining before this ticket so please bear with me. It seems like imported modules are still inlined even if OPTIONS_GHC -O0 is set if you run ghc --make -O. The documentation also says that -O enables cross module inlining so I'm guessing even if the source file has OPTIONS_GHC -O0, inlinings from other modules are still supposed to happen?

Note: See TracTickets for help on using tickets.