Opened 20 months ago

Closed 18 months ago

Last modified 18 months ago

#13059 closed bug (fixed)

High memory usage during compilation

Reported by: domenkozar Owned by:
Priority: highest Milestone: 8.2.1
Component: Compiler Version: 8.0.2-rc2
Keywords: Generics Cc: domenkozar, MikolajKonarski, sjakobi, simonpj
Operating System: Unknown/Multiple Architecture: x86_64 (amd64)
Type of failure: Compile-time performance bug Test Case:
Blocked By: Blocking:
Related Tickets: #5642, #9630 Differential Rev(s):
Wiki Page:

Description

I'm building stack using GHC 8.0.2-rc2 and noticed store uses 5.17GB of (RSS) memory during https://github.com/fpco/store/blob/master/src/Data/Store/Internal.hs compilation step. Using GHC 8.0.1 memory goes up to only 1.6GB.

I haven't dug deeper, but something suspicious is going on.

You should be able to reproduce using installed Nix and executing nix-build -A haskell.packages.ghc802.store '<nixpkgs>' using recent channel.

Building haddock for store also goes up to 5.8GB.

It'd be great if someone is able to reproduce this using plan GHC 8.0.2-rc2

See also https://github.com/fpco/store/issues/91 for upstream bug report.

Attachments (17)

stats-801.txt (2.2 KB) - added by dfeuer 19 months ago.
stats-802rc2.txt (2.2 KB) - added by dfeuer 19 months ago.
passes-a0472f8 (32.7 KB) - added by dfeuer 19 months ago.
passes-54b887b (32.9 KB) - added by dfeuer 19 months ago.
StoreImpl.hs (13.4 KB) - added by RyanGlScott 19 months ago.
Store.hs (6.4 KB) - added by RyanGlScott 19 months ago.
Store-Tuples.hs (211 bytes) - added by dfeuer 19 months ago.
Store-Random.hs (4.2 KB) - added by dfeuer 19 months ago.
prof-Store-Random-old (1022 bytes) - added by dfeuer 19 months ago.
prof-Store-Random-new (1023 bytes) - added by dfeuer 19 months ago.
passes-Store-Random-old (11.3 KB) - added by dfeuer 19 months ago.
passes-Store-Random-new (11.5 KB) - added by dfeuer 19 months ago.
passes-Store-Tuples-old (8.6 KB) - added by dfeuer 19 months ago.
passes-Store-Tuples-new (8.8 KB) - added by dfeuer 19 months ago.
prof-Store-Tuples-old (1021 bytes) - added by dfeuer 19 months ago.
passes-Store-Tuples-new.2 (8.8 KB) - added by dfeuer 19 months ago.
prof-Store-Tuples-new (1021 bytes) - added by dfeuer 19 months ago.

Download all attachments as: .zip

Change History (41)

comment:1 Changed 20 months ago by RyanGlScott

It's difficult to investigate this issue at the moment, given that it relies on a package with a high number of dependencies and a module with a lot of code. It would be tremendously helpful if you could isolate a smaller test case (preferably with one module with no dependencies other than what ships with GHC) that exhibits the issue.

As a start, you can try commenting out code from Data.Store.Internal and see if there's a definition or group of definitions that triggers the memory spike.

comment:2 Changed 20 months ago by domenkozar

Thanks Ryan. I agree it's a hard starting point, I won't have time until late-Jan to further research the issue, but wanted to report the bug since store is a dependency of Stack.

Changed 19 months ago by dfeuer

Attachment: stats-801.txt added

Changed 19 months ago by dfeuer

Attachment: stats-802rc2.txt added

comment:3 Changed 19 months ago by dfeuer

I haven't finished bisecting, but I have a bit of information to report. The allocations rose 10GB between 8.0.1 and aa6da11. They rose 68GB between aa6da11 and 5662cea. They then dropped about 15GB between there and 8.0.2rc2.

comment:4 Changed 19 months ago by dfeuer

One last datapoint before I go to sleep: there wasn't any notable change between aa6da11 and 61be194, so the big rise was between 61be194 and 5662cea.

comment:5 Changed 19 months ago by MikolajKonarski

Cc: MikolajKonarski added

comment:6 Changed 19 months ago by dfeuer

I've bisected some more. It looks almost certain that the change causing this problem was 54b887b5abf6ee723c6ac6aaa2d2f4c14cf74060, "Improve typechecking of instance defaults", whose commit message mentions that it slows down a test that involves a very large number of instance declarations. That message claims that real programs don't have nearly so many, but perhaps store proves that wrong.

comment:7 Changed 19 months ago by simonpj

Quadrupling memory, adding 4Gb (that's a LOT of memory), is ridiculous! Yes, there are a dozen or two instance declarations, but only some use the default methods, and even if they do typechecking them should not take tons of memory.

Something is mysterious here.

I'd try -dshow-passes before and after, to see if the code size changes. It shouldn't change as a result of the patch, but with numbers as above I bet it does! Then it'd be a question of seeing why.

comment:8 Changed 19 months ago by dfeuer

It's also conceivable (but unlikely) that I identified the wrong commit. I will verify. The module uses an awful lot of TH. Could it be generating lots of instance declarations?

Changed 19 months ago by dfeuer

Attachment: passes-a0472f8 added

Changed 19 months ago by dfeuer

Attachment: passes-54b887b added

comment:9 Changed 19 months ago by dfeuer

There indeed seems to be a code size change (and you can ignore the minor 10GB wibble I mentioned; I think that was me measuring wrong). Everything is similar (but not identical) for a while, until float out. a0472f8 produced

*** Float out(FOS {Lam = Just 0,
                   Consts = True,
                   OverSatApps = False}) [Data.Store.Internal]:
Result size of Float out(FOS {Lam = Just 0,
                              Consts = True,
                              OverSatApps = False})
  = {terms: 90,610, types: 838,922, coercions: 461,514}
!!! Float out(FOS {Lam = Just 0,
                   Consts = True,
                   OverSatApps = False}) [Data.Store.Internal]: finished in 446.16 milliseconds, allocated 469.370 megabytes
*** Simplifier [Data.Store.Internal]:
Result size of Simplifier iteration=1
  = {terms: 260,919, types: 1,555,565, coercions: 747,071}

But 54b887b produced

*** Float out(FOS {Lam = Just 0,
                   Consts = True,
                   OverSatApps = False}) [Data.Store.Internal]:
Result size of Float out(FOS {Lam = Just 0,
                              Consts = True,
                              OverSatApps = False})
  = {terms: 90,693, types: 839,131, coercions: 461,514}
!!! Float out(FOS {Lam = Just 0,
                   Consts = True,
                   OverSatApps = False}) [Data.Store.Internal]: finished in 942.78 milliseconds, allocated 470.779 megabytes
*** Simplifier [Data.Store.Internal]:
Result size of Simplifier iteration=1
  = {terms: 502,024, types: 2,614,803, coercions: 1,192,064}

That's quite a jump indeed.

comment:10 Changed 19 months ago by dfeuer

Per Ryan Scott's advice, I checked -ddump-spices. The splices define something like 48 instances without bodies. He suggests I copy/paste the splices in and strip down the package until only the interesting bits remain. The only other thing I can think of is going for -dverbose-core2core and trying to find the right spots. Figure it out tomorrow.

Changed 19 months ago by RyanGlScott

Attachment: StoreImpl.hs added

Changed 19 months ago by RyanGlScott

Attachment: Store.hs added

comment:11 Changed 19 months ago by RyanGlScott

Well, I strongly suspected that default methods were the culprit here, so I trimmed down store (and store-core) into two modules and compiled each with +RTS -s. But to my amazement, both GHC 8.0.1 and 8.0.2 allocation almost the same number of bytes on the heap with this program! So it's likely something other than default methods that are causing the issue.

In any case, I hope the attached programs serve as a useful springboard for further attempts to dissect this issue.

comment:12 Changed 19 months ago by dfeuer

Ryan pointed out on IRC that 54b887b5abf6ee723c6ac6aaa2d2f4c14cf74060 doesn't just change type checking for methods with default signatures—it also changes how all default methods are desugared. Previously, their code was inlined immediately; now they desugar to references to top-level bindings. I know that early inlining can reduce code size in some cases; perhaps this is one of them?

comment:13 Changed 19 months ago by dfeuer

Building on Ryan's StoreImpl attachment, I've reduced his Store to Store-Tuples and Store-Random. Store-Tuples is very simple, and demonstrates a roughly 50% increase in allocations, but not an increase in residency. Store-Random is somewhat more involved, and demonstrates increases in both (although it's clearly not the biggest culprit). I hope these will at least help point toward the problem.

Last edited 19 months ago by dfeuer (previous) (diff)

Changed 19 months ago by dfeuer

Attachment: Store-Tuples.hs added

Changed 19 months ago by dfeuer

Attachment: Store-Random.hs added

Changed 19 months ago by dfeuer

Attachment: prof-Store-Random-old added

Changed 19 months ago by dfeuer

Attachment: prof-Store-Random-new added

Changed 19 months ago by dfeuer

Attachment: passes-Store-Random-old added

Changed 19 months ago by dfeuer

Attachment: passes-Store-Random-new added

Changed 19 months ago by dfeuer

Attachment: passes-Store-Tuples-old added

Changed 19 months ago by dfeuer

Attachment: passes-Store-Tuples-new added

Changed 19 months ago by dfeuer

Attachment: prof-Store-Tuples-old added

Changed 19 months ago by dfeuer

Attachment: passes-Store-Tuples-new.2 added

Changed 19 months ago by dfeuer

Attachment: prof-Store-Tuples-new added

comment:14 Changed 19 months ago by dfeuer

I've attached -dshow-passes and -s profiles for both the tuples and "random" files. It looks like there's practically no difference in passes between old and new tuples, but there's a good bit more allocation. Differences are much more substantial in the random one, with code size going up quite a lot right after float out and the profile showing considerably higher residency.

comment:15 Changed 18 months ago by sjakobi

Cc: sjakobi added

comment:16 Changed 18 months ago by RyanGlScott

Milestone: 8.2.1
Priority: normalhighest

I'd like to make one last push on this before the 8.2 release, as using 4 GB of extra memory is really unfortunate. Feel free to change the milestone if you disagree.

comment:17 Changed 18 months ago by RyanGlScott

I've independently verified that 54b887b5abf6ee723c6ac6aaa2d2f4c14cf74060 caused this regression (at least on Store-Random.hs, which I believe is representative of store's compilation regression as a whole).

Furthermore, I discovered something interesting. I tweaked StoreImpl.hs and Store-Random.hs a little such that:

  1. StoreImpl.hs no longer uses DefaultSignatures
  2. Store-Random.hs now explicitly defines all of its instance methods. That is, Store-Random.hs contains these lines:
instance Store ModName
instance Store Name
instance Store NameFlavour
instance Store Type
instance Store TyVarBndr
instance Store NameSpace
instance Store PkgName
instance Store OccName
instance Store TyLit

I tweaked it to be this instead:

instance Store ModName where
  size = genericSize
  peek = genericPeek
  poke = genericPoke

  {-# INLINE size #-}
  {-# INLINE peek #-}
  {-# INLINE poke #-}

instance Store Name where
  size = genericSize
  peek = genericPeek
  poke = genericPoke

  {-# INLINE size #-}
  {-# INLINE peek #-}
  {-# INLINE poke #-}

instance Store NameFlavour where
  size = genericSize
  peek = genericPeek
  poke = genericPoke

  {-# INLINE size #-}
  {-# INLINE peek #-}
  {-# INLINE poke #-}

instance Store Type where
  size = genericSize
  peek = genericPeek
  poke = genericPoke

  {-# INLINE size #-}
  {-# INLINE peek #-}
  {-# INLINE poke #-}

instance Store TyVarBndr where
  size = genericSize
  peek = genericPeek
  poke = genericPoke

  {-# INLINE size #-}
  {-# INLINE peek #-}
  {-# INLINE poke #-}

instance Store NameSpace where
  size = genericSize
  peek = genericPeek
  poke = genericPoke

  {-# INLINE size #-}
  {-# INLINE peek #-}
  {-# INLINE poke #-}

instance Store PkgName where
  size = genericSize
  peek = genericPeek
  poke = genericPoke

  {-# INLINE size #-}
  {-# INLINE peek #-}
  {-# INLINE poke #-}

instance Store OccName where
  size = genericSize
  peek = genericPeek
  poke = genericPoke

  {-# INLINE size #-}
  {-# INLINE peek #-}
  {-# INLINE poke #-}

instance Store TyLit where
  size = genericSize
  peek = genericPeek
  poke = genericPoke

  {-# INLINE size #-}
  {-# INLINE peek #-}
  {-# INLINE poke #-}

I then ran -dshow-passes on this program using a pre-54b887b5abf6ee723c6ac6aaa2d2f4c14cf74060 GHC and a post-54b887b5abf6ee723c6ac6aaa2d2f4c14cf74060 GHC.

On the original Store-Random.hs program:

*** Float out(FOS {Lam = Just 0,
                   Consts = True,
                   OverSatApps = False}) [Store]:
Result size of Float out(FOS {Lam = Just 0,
                              Consts = True,
                              OverSatApps = False})
  = {terms: 12,859, types: 119,332, coercions: 72,859}
!!! Float out(FOS {Lam = Just 0,
                   Consts = True,
                   OverSatApps = False}) [Store]: finished in 236.00 milliseconds, allocated 252.916 megabytes
*** Simplifier [Store]:
Result size of Simplifier iteration=1
  = {terms: 40,398, types: 205,815, coercions: 109,233}
*** Float out(FOS {Lam = Just 0,
                   Consts = True,
                   OverSatApps = False}) [Store]:
Result size of Float out(FOS {Lam = Just 0,
                              Consts = True,
                              OverSatApps = False})
  = {terms: 12,859, types: 119,332, coercions: 72,859}
!!! Float out(FOS {Lam = Just 0,
                   Consts = True,
                   OverSatApps = False}) [Store]: finished in 160.00 milliseconds, allocated 252.680 megabytes
*** Simplifier [Store]:
Result size of Simplifier iteration=1
  = {terms: 59,073, types: 286,228, coercions: 152,155}

On the tweaked Store-Random.hs program:

*** Float out(FOS {Lam = Just 0,
                   Consts = True,
                   OverSatApps = False}) [Store2]:
Result size of Float out(FOS {Lam = Just 0,
                              Consts = True,
                              OverSatApps = False})
  = {terms: 12,796, types: 119,305, coercions: 72,832}
!!! Float out(FOS {Lam = Just 0,
                   Consts = True,
                   OverSatApps = False}) [Store2]: finished in 156.00 milliseconds, allocated 252.064 megabytes
*** Simplifier [Store2]:
Result size of Simplifier iteration=1
  = {terms: 58,888, types: 283,015, coercions: 149,870}
*** Float out(FOS {Lam = Just 0,
                   Consts = True,
                   OverSatApps = False}) [Store2]:
Result size of Float out(FOS {Lam = Just 0,
                              Consts = True,
                              OverSatApps = False})
  = {terms: 12,796, types: 119,305, coercions: 72,832}
!!! Float out(FOS {Lam = Just 0,
                   Consts = True,
                   OverSatApps = False}) [Store2]: finished in 176.00 milliseconds, al    located 252.076 megabytes
*** Simplifier [Store2]:
Result size of Simplifier iteration=1
  = {terms: 58,888, types: 283,015, coercions: 149,870}

Notice that in the tweaked Store-Random.hs, there is no difference before and after 54b887b5abf6ee723c6ac6aaa2d2f4c14cf74060. While this consistency is nice, the resulting program size is the same as the original program post-54b887b5abf6ee723c6ac6aaa2d2f4c14cf74060, so this certainly didn't fix the bug—it just made it easier to see in the source program.

comment:18 Changed 18 months ago by RyanGlScott

Cc: simonpj added

The question now is: how was GHC generating default instance methods prior to 54b887b5abf6ee723c6ac6aaa2d2f4c14cf74060? Perhaps there's a way we could go back to the way the code was generated while preserving the typechecking benefits of 54b887b5abf6ee723c6ac6aaa2d2f4c14cf74060.

Simon, do you recall how this worked prior to 54b887b5abf6ee723c6ac6aaa2d2f4c14cf74060? Unfortunately, -ddump-deriv didn't show the generated code for default instance methods prior to 54b887b5abf6ee723c6ac6aaa2d2f4c14cf74060, so it's hard for me to figure out what happened before.

comment:19 Changed 18 months ago by simonpj

Good data -- but

  • The terms seem to get bigger by 25% and the types/coercions by up to 50%. That does not account for a quadrupling of memory usage, as reported in the Description. I wonder if there is a space leak, which is beoming more obvious with this program? Worth taking a space profile?
  • I doubt that messinng around with default-method generation is going to help much. After all, the user might write the source code as in the tweaked version. What we should focus on is what causes the program size to increase so dramatically. Is it just the usual nested-tuple thing (which I still do now know how to fix), or is there more afoot? Maybe take a small example (one small type, one default method) and compare before and after.

comment:20 Changed 18 months ago by RyanGlScott

Keywords: Generics added

After staring at this some more, I think I have a more solid grasp of what is going on here.

I stared at the code pre-54b887b5abf6ee723c6ac6aaa2d2f4c14cf74060 (which I will henceforth refer to as "8.0.1" for brevity), and noticed something interesting about the code which generates default method implementations for things that use DefaultSignatures (GenericDM):

tc_default sel_id (Just (dm_name, GenericDM {}))
  = do { meth_bind <- mkGenericDefMethBind clas inst_tys sel_id dm_name
       ; tcMethodBody clas tyvars dfun_ev_vars inst_tys
                              dfun_ev_binds is_derived hs_sig_fn prags
                              sel_id meth_bind inst_loc }

Compare this to the treatment for non-DefaultSignatures-using default method implementations (VanillaDM):

tc_default sel_id (Just (dm_name, VanillaDM)) -- A polymorphic default method
  = do { 
         
         ...

       ; let dm_inline_prag = idInlinePragma dm_id
             rhs = HsWrap (mkWpEvVarApps [self_dict] <.> mkWpTyApps inst_tys) $
                   HsVar (noLoc dm_id)

             meth_bind = mkVarBind local_meth_id (L inst_loc rhs)
             meth_id1 = meth_id `setInlinePragma` dm_inline_prag
                    -- Copy the inline pragma (if any) from the default
                    -- method to this version. Note [INLINE and default methods]

             export = ABE { abe_wrap = idHsWrapper
                          , abe_poly = meth_id1
                          , abe_mono = local_meth_id
                          , abe_prags = mk_meth_spec_prags meth_id1 spec_inst_prags [] }
             bind = AbsBinds { abs_tvs = tyvars, abs_ev_vars = dfun_ev_vars
                             , abs_exports = [export]
                             , abs_ev_binds = [EvBinds (unitBag self_ev_bind)]
                             , abs_binds    = unitBag meth_bind }

       ; return (meth_id1, L inst_loc bind, Nothing) }

One crucial difference between the two is that in the latter VanillaDM case, it explicitly copies any user-supplied INLINE pragmas, whereas in the former GenericDM case, it does not! (I've perused the definition of tcMethodBody to make sure of this.)

Now after 54b887b5abf6ee723c6ac6aaa2d2f4c14cf74060 (which I will refer to as "8.0.2" from here on out), the treatment of generic and non-generic default method implementations were unified, and split out into the mkDefMethBind function (source here). It is quite similar to the treatment of VanillaDM before, and in particular, it also copies any user-supplied INLINE pragmas.

What does this have to do with store? The code in store uses a design pattern that tends to result in huge amounts of memory consumption: it uses lots of GHC.Generics instances and marks pretty much everything with INLINE. Overly aggressive use of GHC.Generics has been known to lead to memory blowup before (see #5642 and #11068). The common patterns between those tickets are:

  1. Defining "generic classes" with more than one method
  2. Putting INLINE methods on all instances for GHC.Generics datatypes

store does not exhibit the first issue, so I can state with reasonable certainty that this problem falls into the second camp.

So why did this problem only arise in 8.0.2? In 8.0.1 and earlier, GHC would desugar DefaultSignatures method implementations like:

instance Store Foo where
  size = $gdm_size Foo $dStoreFoo

This lacks an INLINE pragma. But in 8.0.2, it's desugared as:

instance Store Foo where
  size = $gdm_size @Foo
  {-# INLINE size #-}

But $gdm_size = genericSize, and genericSize is the composition of tons of INLINEd GHC.Generics functionality. We know that GHC.Generics has a propensity for high memory usage, and adding this extra INLINE pragma is enough to make the difference between 1.6 GB and 5.17 GB of memory when compiling the insanely high number of GHC.Generics-based instances in store. This claim is backed up by comment:17, where explicitly implementing the instances and adding INLINE pragmas resulted in the same high memory usage in both 8.0.1 and 8.0.2.

So in the end, this is really just another case of GHC.Generics of being (ab)used in a way that it doesn't handle well (at least, not at the moment). I didn't spend much time digging into the space profiles for these programs in 8.0.1 and 8.0.2, as both just stated that SimplTopBinds took the bulk of the time without going into more detail, which wasn't terribly helpful.

But all is not lost for store, as this analysis reveals a workaround. If you remove just the right INLINE pragmas from store:

  • src/Data/Store/Impl.hs

    diff --git a/src/Data/Store/Impl.hs b/src/Data/Store/Impl.hs
    index 26ea82e..bbf62a5 100644
    a b class Store a where 
    6464
    6565    default size :: (Generic a, GStoreSize (Rep a)) => Size a
    6666    size = genericSize
    67     {-# INLINE size #-}
     67    -- {-# INLINE size #-}
    6868
    6969    default poke :: (Generic a, GStorePoke (Rep a)) => a -> Poke ()
    7070    poke = genericPoke
    71     {-# INLINE poke #-}
     71    -- {-# INLINE poke #-}
    7272
    7373    default peek :: (Generic a , GStorePeek (Rep a)) => Peek a
    7474    peek = genericPeek
    75     {-# INLINE peek #-}
     75    -- {-# INLINE peek #-}
    7676
    7777------------------------------------------------------------------------
    7878-- Utilities for encoding / decoding strict ByteStrings

Then I can confirm that compiling store with GHC 8.0.1 and 8.0.2 both result in about 1.7 GB of maximum memory residency. And given the benchmark results for a similar patch that I made for aeson, I don't think this would result in much of a difference in performance, if any.

comment:21 Changed 18 months ago by RyanGlScott

Here is a PR for store which implements the suggested change in comment:20.

comment:22 Changed 18 months ago by simonpj

Interesting! It looks to me as if the new behaviour is more faithfully following what the user said than the old behaviour. In this case, it is inlining $gdm_size Foo $dStoreFoo at every call of size at type Foo; and that's just what you'd expect from an INLINE pragma on the generic default method of Store.

So maybe that was a bug in the old version, now fixed. And indeed, refraining from inlining the entire body of size at every call site seems like a good thing.

Case closed?

Or would there merit in some user-manual guidance about INLINE pragmas on default methods?

Last edited 18 months ago by simonpj (previous) (diff)

comment:23 Changed 18 months ago by RyanGlScott

Resolution: fixed
Status: newclosed

I honestly don't think there's a consistent rule that you shouldn't inline any default method implementations. Rather, this is a very particular case where inlining a chain of GHC.Generics-related instance methods together leads to high memory usage, which feels more like a bug than anything.

My vote is to close this and pursue the discussion in #5642 and #9630, where this behavior has been observed before. Feel free to reopen if you disagree.

Last edited 18 months ago by RyanGlScott (previous) (diff)

comment:24 Changed 18 months ago by RyanGlScott

Note: See TracTickets for help on using tickets.