With -O1 and above causes ghc to use all available memory before being killed by OOM killer

changed milestone to %8.0.2

changed weight to 5

@carter suggested that adding the Safe language pragma might prevent this, but it didn't help.

Maybe not too surprisingly, removing all the INLINE pragmas (51 in 500 lines of code) allows the code to compile even with -O2. (suggested by @mpickering).

changed weight to 7

Reproducible with HEAD. Here is a testcase that doesn't depend on any packages that aren't in the GHC tree.

module T12425 where

import Control.Applicative
import Control.Monad
import Control.Monad.Trans.State.Lazy (StateT(..))

data Result a m b = RecurseOnly (Maybe (CondT a m b))
                  | KeepAndRecurse b (Maybe (CondT a m b))

instance Monad m => Functor (Result a m) where
    fmap f (RecurseOnly l)      = RecurseOnly (liftM (fmap f) l)
    fmap f (KeepAndRecurse a l) = KeepAndRecurse (f a) (liftM (fmap f) l)
    {-# INLINE fmap #-}

newtype CondT a m b = CondT { getCondT :: StateT a m (Result a m b) }

instance Monad m => Functor (CondT a m) where
    fmap f (CondT g) = CondT (liftM (fmap f) g)
    {-# INLINE fmap #-}

instance Monad m => Applicative (CondT a m) where
    pure  = undefined
    (<*>) = undefined

instance Monad m => Monad (CondT a m) where
    return = undefined
    (>>=) = undefined

@erikd: the following change fixes the problem.

instance Monad m => Functor (CondT a m) where
-    fmap f (CondT g) = CondT (liftM (fmap f) g)
+    fmap f (CondT g) = CondT (liftA (fmap f) g)
    {-# INLINE fmap #-}

Tested with GHC 8 and HEAD. To compile conduit-find with HEAD, you need to make the following other changes:

add Cabal < 1.25 to the .cabal file, to workaround https://github.com/ekmett/distributive/issues/17
use conduit with this patch: https://github.com/snoyberg/conduit/pull/274
use tagged >= 0.8.5, which fixes https://github.com/ekmett/semigroupoids/issues/48

Trac metadata

Trac field	Value
Priority	normal → high

@thomie Oh wow, that change you suggested (switching from liftM to liftA) prevents this problem in conduit-find even with stock 8.0.1.

Edit: And a bit of CPP lets me use liftA for ghc >= 7.10 and liftM for earlier.

A guess would be that in 8.0’s library version, liftM is implemented in a way that mentions fmap, and liftM is inlined to expose that. Then all your unconditinally marked inline functions become effectively recursive, but still keep inlining. But didn’t investigate deeper.

Thats seems to be a possible explanation. Is there no loop checker for the inliner?

This looks bad. Inlining should not go on for ever ... there's a tick counter as a last-ditch hard stop. It could just possibly be too many INLINE pragmas, but it's quite difficult to eat 16G.

Does anyone feel able to investigate further?

Simon

Trac metadata

Trac field	Value
TypeOfFailure	OtherFailure → CompileTimePerformance

Indeed there does appear to be some sort of loop here. -ddump-inlinings shows long repeated runs of,

...
Inlining done: Control.Monad.Trans.State.Lazy.$fMonadStateT1
Inlining done: Control.Monad.Trans.State.Lazy.$fMonadStateT_$c>>=
Inlining done: GHC.Base.$
Inlining done: Control.Monad.Trans.State.Lazy.runStateT
Inlining done: Control.Monad.Trans.State.Lazy.runStateT1
Inlining done: Control.Monad.Trans.State.Lazy.runStateT
Inlining done: Control.Monad.Trans.State.Lazy.runStateT1
Inlining done: GHC.Base.liftM
Inlining done:
    Control.Monad.Trans.State.Lazy.$fMonadStateT_$creturn
...

These runs are punctuated by runs of,

...
Inlining done: GHC.Base.$fApplicativeMaybe_$sliftM
...

The runs appear to get longer as the simplifier progresses. It indeed seems plausible that all of these bindings have INLINE pragmas attached.

changed milestone to %8.0.3

It doesn't look like this will get fixed for 8.0.2.

I believe this is not fixed by my fix to #12776 (closed), sadly

Thomie, THANK YOU for the small repro case in ticket:12425#comment:122699. With its help I rapidly nailed the bug. Here's what is happening.

Suppose we have

let rec { f = ...g...g...
        ; g = ...f...f... }
in
case x of
  True  -> ...f..
  False -> ..f...

In each iteration of the simplifier the occurrence analyser OccAnal chooses a loop breaker. Suppose in iteration 1 it choose g as the loop breaker. That means it is free to inline f.

Suppose that GHC decides to inline f in the branches of the case, but (for some reason; eg it is not satureated) in the rhs of g. So we get

let rec { f = ...g...g...
        ; g = ...f...f... }
in
case x of
  True  -> ...g...g.....
  False -> ..g..g....

Now suppose that, for some reason, in the next iteraion the occurrence analyser chooses f as the loop breaker, so it can freely inling g. And again for some reason the simplifer inlines g at its calls in the case branches, but not in the RHS of f. Then we get

let rec { f = ...g...g...
        ; g = ...f...f... }
in
case x of
  True  -> ...(...f...f...)...(...f..f..).....
  False -> ..(...f...f...)...(..f..f...)....

You can see where this is going! Each iteration of the simplifier doubles the number of calls to f or g. No wonder GHC is slow!

(In the particular example in ticket:12425#comment:122699, f and g are the two mutually recursive fmap instances for CondT and Result. They are both marked INLINE which, oddly, is why they don't inline in each other's RHS, because the call there is not saturated.)

The root cause is that we flip-flop on our choice of loop breaker. I always thought it didn't matter, and indeed for any single iteration to terminate, it doesn't matter. But when we iterate, it matters a lot!!

I think this was not happening before because, by a fluke, we tended to always choose the same one. But, perhaps as a resul of Bartosz's work on making GHC deterministic, the occurrence analyser now reliably flip-flops in each iteration, as above.

Trac #12234 (closed) turns out to be exactly the same issue

What to do? We want the occurrence analyser to choose a loop breaker based on stable criteria, not arbitrary things. Simple idea: if two or more potential loop breakers look equally plausible, choose the one that was a loop breaker in the previous iteration. That should make it stable.

Stay tuned. But I thought I'd explain what is happening in case I get distracted.

mentioned in commit 517d03e4

I think I nailed this one, in HEAD at least. Can you see if it fixes your original library?

Probably worth merging this to 8.0.3 if we produce such a thing.

Trac metadata

Trac field	Value
Test case	- → perf/compiler/T12425

added 1 deleted label

I built a compiler from git HEAD, but that wouldn't build the dependencies for my library (numerous problems with the primitve and vector libraries).

I tried checking out the ghc-8.0 branch and then cherry-picking commit d03e41b4f5 but that failed due to merge conflicts in compiler/simplCore/OccurAnal.hs that seem rather tricky to resolve.

Thanks for trying.

I think this would be retro-fittable to 8.0.3 with a bit of work -- if it really matters.

Sorry erikd, several of those dependencies not building on GHC HEAD were probably my fault. After adjusting some template-haskell upper version bounds on dependent packages on Hackage, I was able to successfully install conduit-find with GHC HEAD, and I did not observe any noticeable slowness.

Thanks @RyanGlScott . As long it compiled with -O1 or above, then its fixed. The original problem was GHC chewing up all memory when *compiling* conduit-find.

Just for the record, this bug can be reproduced by compiling Hackage's find-conduit 0.4.4 with GHC 8.0.1. find-conduit is the old version of conduit-find prior to a maintainer change, and prior to the workarounds to avoid this bug that have been incorporated into conduit-find.

changed milestone to %8.2.1

At this point it is rather unlikely that there will be an 8.0.3. Re-milestoning.

closed

Trac metadata

Trac field	Value
Resolution	Unresolved → ResolvedFixed

removed 1 deleted label

added Phigh label

With -O1 and above causes ghc to use all available memory before being killed by OOM killer

Child items 0

Activity

Trac field	Value
Version	8.0.1
Type	Bug
TypeOfFailure	OtherFailure
Priority	normal
Resolution	Unresolved
Component	Compiler
Test case
Differential revisions
BlockedBy
Related
Blocking
CC
Operating system
Architecture