Opened 6 months ago

Last modified 3 months ago

#13080 new bug

Memory leak caused by nested monadic loops

Reported by: Feuerbach Owned by:
Priority: normal Milestone: 8.4.1
Component: Compiler Version: 8.0.1
Keywords: Cc: RyanGlScott, nomeata, michalt
Operating System: Unknown/Multiple Architecture: Unknown/Multiple
Type of failure: Runtime performance bug Test Case:
Blocked By: Blocking:
Related Tickets: Differential Rev(s):
Wiki Page:

Description

See this stackoverflow post: http://stackoverflow.com/a/41521876/110081

Basically, the following program leaks memory on ghc 7.10.3 and ghc 8.0.1.

import Control.Monad (forever)

main :: IO ()
main = worker

{-# NOINLINE worker #-}
worker :: (Monad m) => m ()
worker = forever $ poll ()

poll :: (Monad m) => () -> m a
poll action = do
    return ()
    poll action

Could ghc do better there?

Change History (22)

comment:1 Changed 6 months ago by rwbarton

Floating out return () won't help much. Even if you do that, the program looks like

worker = let l = poll ()
         in forever l

r = return ()
poll a = r >> poll a

so executing l will cause l to be evaluated as r >> r >> r >> r >> ... and we still have a live reference to l from within forever.

If you instead define

poll a = r >>= \_ -> poll a

and compile without optimization then l now looks like r >>= (\_ -> ...) and the space leak is gone. But turning on optimizations seems to cause GHC to rewrite the above to

poll a = r >>= (let s = poll a in \_ -> s)

which now has the original problem again.

(BTW I am always compiling with -fno-state-hack, since that just adds another layer of confusion to an already confusing situation.)

comment:2 Changed 6 months ago by Feuerbach

I also wanted to see what difference the state hack makes, but couldn't find -fno-stack-hack in ghc 8 user's guide. I thought that perhaps it was replaced by a less hacky optimization in ghc 8.

comment:3 Changed 6 months ago by RyanGlScott

Cc: RyanGlScott added

comment:4 Changed 6 months ago by dfeuer

Milestone: 8.2.1
Type of failure: None/UnknownRuntime performance bug

comment:5 Changed 6 months ago by Feuerbach

Reid: you're probably right and the return () is a red herring. I guess I still don't completely understand how this works.

comment:6 Changed 6 months ago by dfeuer

The original program leaks memory even without optimization. In this case, the trouble is that

poll () = return () >> return () >> ...

forever always holds on to its argument, which prevents any part of poll () from being collected as it's run. You can fix this (with or without optimization) by defining

poll _ = fix (return () >>)

which makes poll () finite in size.

While you and I both know that forever doesn't need to hold on to poll () in this case, I don't see how GHC can be expected to see that.

comment:7 Changed 6 months ago by dfeuer

I missed something. It seems that {-# NOINLINE #-} here is effectively preventing poll from being specialized within its own RHS. While the constant argument ends up going away, the constant Monad dictionary does not. Writing

poll :: (Monad m) => x -> m a
poll = go where
  go action = do
    return ()
    go action

fixes the leak. So maybe something needs to be changed to allow internal specialization in the face of NOINLINE? Of course, that could have some other downside.

comment:8 Changed 6 months ago by michaelt

By the way, forever isn't needed, you can replace it with (\x -> x >> x) in Feuerbach's program. In the SO question the user noticed that the problem is solved by defining poll a b without explicit recursive use of poll a b.

Last edited 6 months ago by michaelt (previous) (diff)

comment:9 Changed 5 months ago by Feuerbach

Summary: Constant values are not floated out of the loopMemory leak caused by nested monadic loops

I think I now understand what's going on here. (Basically, what Reid said.) I summarized my thoughts in this blog post: https://ro-che.info/articles/2017-01-10-nested-loop-space-leak

I don't know what ghc can do here, but hopefully there is something.

comment:10 Changed 5 months ago by nomeata

Cc: nomeata added

comment:11 Changed 5 months ago by nomeata

In the blog post you write

Let’s change (“eta-expand”) the poll code as if then had arity 4, without actually changing then or thenIO or their runtime arities:

and that this fixes the space leak. Isn’t this eta-expansion exactly what the state hack is about? So why does it not not work here? Ah, becuase poll is not at type IO a but rather for an arbitrary Monad… yeah, then it’s harder.

comment:12 Changed 5 months ago by Feuerbach

I thought that the state hack doesn't change the arity of functions, just marks them single-entry.

Also, I don't think ghc needs the state hack to figure out the correct arity (in the non-polymorphic case), since it can see through the newtypes.

comment:13 Changed 5 months ago by nomeata

I thought that the state hack doesn't change the arity of functions, just marks them single-entry.

That’s how it works, but the ultimate goal of the state hack (the way I see it) is to eta-expand them. Usually this works by treating a State#-typed lambda as one-shot:

foo = bar x >> baz y

will, after inlining >> (if that happens, but it usually does for IO) look like this:

foo =
  let a = bar x
      b = baz y
  in \s0 -> case a s0 of (_,s1) -> b s1

which is usually bad because of the allocations of a and b. (But I say “usually” because it is good if bar x is expensive and foo is used many times – these are the instances when people complain about the state hack removing this sharing).

Now because s0 is treated as one-shot, two optimizations are enabled, both of which have ultimately the same effect:

  • The inliner feels empowered to inline a and b into the lambda, because it is one shot. Then we obtain
    foo = \s0 -> case bar x s0 of (_,s1) -> baz y s1
    

which we want.

  • The arity analysis uses the one-shot information on the lambda to determine that foo has arity 1 and eta-expands it:
    foo = \s ->
        (let a = bar x
            b = baz y
        in \s0 -> case a s0 of (_,s1) -> b s1) s
    

which then gets simplified to

foo = \s -> case bar x s of (_,s1) -> baz y s1

again.

I once had an idea for a less intrusive state hack variant which would do this eta-expansion directly (but otherwise do not meddle with the one-shotness of State#-typed lambdas, which can be wrong), but did not follow through with it. See 9388#comment:6 for some background.

But this is all State#-specific, while your bug is about abstractly monad code, where the compiler may now use knowledge about IO’s bind, right?

comment:14 Changed 5 months ago by simonpj

Suppose it had been

worker :: (Monad m) => m ()
worker = forever $ poll 1

poll :: (Monad m) => Int -> m a
poll n
 | n>10000    = return ()
 | otherwise = do print (expensive n)
                  poll (n+1)

where expensive is expensive to compute. After printing out the results of 10000 calls to expensive, the forever will do it all again.

Question: would you expect all the calls to expensive to be recomputed? Presumably not. poll builds a big action

print (expensive 0) >> print (expensive 1) >> print (expensive 2) >> ...

and forever just repeatedly executes that action. But to remember all those results clearly takes O(n) space.

Now in this case there is no real work to be shared, but that's clearly harder for GHC to spot. Especially when (as in this case) the monad is unspecified, so perhaps the call return () does a tremendous amount of work.

I'm not saying things can't be improved here, but you are in delicate territory.

comment:15 Changed 5 months ago by Feuerbach

Simon: as a matter of fact, I would expect all the calls to expensive to be recomputed. I think most programmers would interpret the above code as

while (1) {
  for (n = 1; 1 <= 10000; n++) {
    print(expensive(n));
  }
}

If I wanted the calls to expensive to be shared, I would probably put them into a list. (Of course, a similar piece of code could be intended for m = []; but based on print, it looks like something intended for an IO-like monad.)

I do understand the tradeoffs involved, and I'm not saying that this should be obvious to the compiler. I'm just saying that this might not be the most convincing example where the current behavior is the one a programmer would expect.

What if a programmer could annotate variables (like worker or poll) as "function-like" vs "value-like"? "Function-like" means "I don't care about this value, feel free to recompute it" and "value-like" means "please cache it if possible".

comment:16 Changed 5 months ago by dfeuer

If we write

foo = ...
  where
    bar = ...
      where
        baz = ...

do we intend for bar to be recomputed on each call to foo, or shared globally? Do we intend for baz to be recomputed on each call to bar, or on each call to foo, or shared globally? The compiler doesn't know. Worse, experienced Haskell developers have gotten used to the way GHC tends to float things around, so making things simpler and more predictable is likely to turn a lot of currently-good code bad. Maybe Joachim's oneShot can help in some places?

comment:17 Changed 5 months ago by Feuerbach

David:

If foo is marked as function-like by the programmer, bar and baz are recomputed on each invocation of foo. If foo is marked as value-like and is shared, bar and baz are shared, too. Furthermore, bar and baz could have their own annotations.

Examples:

{-# FUNCTION_LIKE fib #-}
fib = (fibs !!) where
  {- VALUE_LIKE fibs #-}
  fibs = 0 : 1 : zipWith (+) fibs (tail fibs)
{-# VALUE_LIKE val #-}
val = sum $ map f [1..10^6] where
  {-# FUNCTION_LIKE f #-}
  f = (+5) . (^2)

Now, what would FUNCTION_LIKE mean, exactly? In the fib example above, there can be two interpretations:

  1. Don't bother "remembering" the PAP (!!) fibs, but do remember something like fibs !! n.
  2. Don't even remember fibs !! n.

The second interpretation is useful in cases like

mk_action :: Monad m => Int -> m ()
mk_action = flip replicateM_ (return ())

where we may want to say, "don't bother to remember either the expansion or the result of mk_action".

So, simple FUNCTION_LIKE can be ambiguous. What if we annotated functions with arities? Assigning an explicit arity to a function means that:

  1. ghc attempts to perform eta reductions or expansions to match the declared arity;
  2. ghc doesn't float things out of lambdas with declared arity >= 1.

E.g. following Reid's example (https://ghc.haskell.org/trac/ghc/timeline?from=2017-01-07T14%3A42%3A24Z&precision=second), he could write

poll a = r >>= f
  where
    {-# ARITY 1 f #-}
    f _ = poll a

and ghc would not transform f to (let s = poll a in \_ -> s).

And, of course, the original problem could be easily solved by annotating poll with the correct arity, as I point out in the blog post.

It is also possible that these two things — forcing eta-expansion and not floating out local bindings — should be two different and orthogonal pragmass. What do you all think?

Last edited 5 months ago by Feuerbach (previous) (diff)

comment:18 Changed 5 months ago by rwbarton

So there are three issues here:

  1. The programs under discussion leak space.
  1. The space leak is hard to see and unintuitive. (I didn't understand what was going on until I saw Roman's answer on SO.)
  1. You can't even eliminate the space leak in a reliable way.

Fixing 1 directly seems infeasible without breaking other programs, because of examples like Simon's in comment:14. Plus, the space leak does exist if you evaluate the program naively using lazy evaluation, despite 2. (And there are parallel examples that don't involve IO.) But we should be able to do something about 3, and it would be useful in many settings besides this one (such as benchmarking).

I like the idea that if the user wants FUNCTION_LIKE behavior, then they should write a function. It aligns well with the basic rule for sharing (an expression inside a lambda is shared for the lifetime of that call to the lambda) and we already know how to implement it (just compile with -O0). We just have to not stuff it up during optimization. This is basically the full laziness problem yet again.

The problem is of course that writing a function isn't sufficient because GHC will probably just float the body out. Besides that there's a second danger: we could inline and then beta-reduce. I don't want to write it out, but if you imagine inlining >>= and f in Roman's latest example, and then if you allow beta-reducing f, the lambda in f would disappear and then presumably nothing would stop GHC from floating poll a out of the lambda in >>=. It's hard to see how the inlining could be to blame here, so I blame the beta-reduction. (Then perhaps if we are never going to be allowed to beta-reduce, we can also not bother inlining. Not sure.)

So, it seems to me we need a new kind of Core lambda, or a flag on lambda, that means

  • don't float out of this,
  • don't beta reduce this.

How to give the user access to this is another question. I haven't thought about this ARITY suggestion much yet. Another possibility would be a magic pattern synonym _#, which matches anything but turns the lambda into a beta blocker. So then we'd write Roman's example as something like

import GHC.Exts (pattern _#)
poll a = r >>= f
  where
    f _# = poll a

See also #9520, #8457. #12656 is a bit different, but it is the ticket I was prompted to remember by Roman's SO post. #12620 has a proposed solution but I think tying the "no floating" annotation to a lambda rather than an expression within the lambda might be more robust and easier to understand.

comment:19 Changed 5 months ago by simonpj

So, it seems to me we need a new kind of Core lambda, or a flag on lambda, that means (a) don't float out of this, (b) don't beta reduce this.

We already have (a), namely one-shot lambdas. I don't understand why (b) could possibly be good.

Perhaps someone can give a from-scratch articulation of the problem and possible solutions, informed by this thread?

comment:20 Changed 5 months ago by nomeata

do we intend for bar to be recomputed on each call to foo, or shared globally?

The answer to this question depends heavily on what it means for “foo” to be called! Consider this:

foo :: Int -> IO ()
foo x = 
  where bar = expensive x

If we now run mapM (foo 1) [1.1000], then, in one sense, the function foo is called once (when passed x). This returns a value of type IO (), which is then executed 1000 times. This is the sense that the compiler understands, and without further hacks, bar would be evaluated only once here. Some users know and expect this.

But there is another sense where one thinks of a call to foo as the execution of the IO action produced by foo 1. This is probably how most users in most cases think about functions returning an IO something. . With the current implementation of IO (), this is when the state token is passed to the function wrapped in IO ().

The state hack is about eta-expanding foo so that these notions coincide. Unfortunately, and as far as I can tell, there is no way of writing foo to get this result directly (without breaking the IO abstraction barrier).

The same distinction works for other monads, of course: foo 1 might return a Parser (), and we have the distinction between calculating the parser, and applying it to some input. And, in extension, with an arbitrary Monad the distinction is even more evident.

So in this thread, we should be very precise which form of “calling” is the right one.

comment:21 Changed 5 months ago by michalt

Cc: michalt added

comment:22 Changed 3 months ago by bgamari

Milestone: 8.2.18.4.1

Given that 8.2.1-rc1 is imminent, I'm bumping these off to the 8.4

Note: See TracTickets for help on using tickets.