Opened 3 years ago

Closed 9 months ago

Last modified 9 months ago

#9520 closed bug (invalid)

Running an action twice uses much more memory than running it once

Reported by: snoyberg Owned by:
Priority: normal Milestone:
Component: Compiler Version: 7.8.3
Keywords: Cc: meteficha, edsko
Operating System: Linux Architecture: x86_64 (amd64)
Type of failure: Runtime performance bug Test Case:
Blocked By: Blocking:
Related Tickets: #8457, #12620 Differential Rev(s):
Wiki Page:

Description (last modified by edsko)

EDIT: A detailed analysis of the problems discussed in this ticket can be found at http://www.well-typed.com/blog/2016/09/sharing-conduit/ . There is no ghc bug here, as such, except perhaps #8457 "-ffull-laziness does more harm than good". See also #12620 "Allow the user to prevent floating and CSE".

This started as a Haskell cafe discussion about conduit. This may be related to #7206, but I can't be certain. It's possible that GHC is not doing anything wrong here, but I can't see a way that the code in question is misbehaving to trigger this memory usage.

Consider the following code, which depends on conduit-1.1.7 and conduit-extra:

import Data.Conduit ( Sink, (=$), ($$), await )
import qualified Data.Conduit.Binary as CB
import System.IO (withBinaryFile, IOMode (ReadMode))

main :: IO ()
main = do
    action "random.gz"
    --action "random.gz"

action :: FilePath -> IO ()
action filePath = withBinaryFile filePath ReadMode $ \h -> do
    _ <- CB.sourceHandle h
      $$ CB.lines
      =$ sink2 1
    return ()

sink2 :: (Monad m) => Int -> Sink a m Int
sink2 state = do
  maybeToken <- await
  case maybeToken of
    Nothing     -> return state
    Just _      -> sink2 $! state + 1

The code should open up the file "random.gz" (I simply gziped about 10MB of data from /dev/urandom), break it into chunks at each newline character, and then count the number of lines. When I run it as-is, it uses 53KB of memory, which seems reasonable.

However, if I uncomment the second call to action in main, maximum residency shoots up to 45MB (this seems to be linear in the size of the input file. I additionally tried copying random.gz into two files, random1.gz and random2.gz, and changed the two calls to action to use different file names. It still resulted in large memory usage.

I'm going to continue working to make this a smaller reproducing test case, but I wanted to start with what I had so far. I'll also attach the core generated by both the low-memory and high-memory versions.

Attachments (3)

good.core (24.7 KB) - added by snoyberg 3 years ago.
Low memory
bad.core (25.8 KB) - added by snoyberg 3 years ago.
High memory
mem.ps (2.1 KB) - added by snoyberg 3 years ago.
base-only heap profile

Download all attachments as: .zip

Change History (20)

Changed 3 years ago by snoyberg

Attachment: good.core added

Low memory

Changed 3 years ago by snoyberg

Attachment: bad.core added

High memory

comment:1 Changed 3 years ago by simonpj

Well I don't understand conduit. But looking at bad.core I see:

  • main3 is called twice (in the RHS of main1). This corresponds to the two calls of action.
  • So I wonder if there are any values shared between call calls of main3. These will be top-level CAFs.
  • Aha yes! main5 is shared. But it's fine: it is simply Done ().
  • Aha again! We see
    main6 :: Data.Conduit.Internal.Pipe
               Data.ByteString.Internal.ByteString
               Data.ByteString.Internal.ByteString
               Data.Void.Void
               ()
               IO
               Int
    main6 = main9 main8 (main7 `cast` ...)
    
    So if main6 generates a big data structure, it will be retained across both calls.

Back to you

comment:2 Changed 3 years ago by snoyberg

OK, I've got a version of this that only relies on base now:

import System.IO (withBinaryFile, IOMode (ReadMode), Handle, hIsEOF, hGetChar)

main :: IO ()
main = do
    action
    --action

action :: IO ()
action = do
    _ <- withBinaryFile "1mb" ReadMode
       $ \h -> connect (sourceHandle h) sinkCount
    return ()

data Conduit i o m r
    = Pure r
    | M (m (Conduit i o m r))
    | Await (i -> Conduit i o m r) (Conduit i o m r)
    | Yield (Conduit i o m r) o

sourceHandle :: Handle -> Conduit i Char IO ()
sourceHandle h =
    loop
  where
    loop = M $ do
        isEof <- hIsEOF h
        if isEof
            then return $ Pure ()
            else do
                c <- hGetChar h
                return $ Yield loop c

sinkCount :: Monad m => Conduit i o m Int
sinkCount =
    loop 0
  where
    loop cnt = Await
        (\_ -> loop $! cnt + 1)
        (Pure cnt)

connect :: Monad m => Conduit a b m r' -> Conduit b c m r -> m r
connect _ (Pure r) = return r
connect (Yield left b) (Await right _) = connect left (right b)
connect (Pure x) (Await _ right) = connect (Pure x) right
connect (M mleft) right = mleft >>= flip connect right

Same behavior regarding action. I'll attach a heap profile with the large memory usage.

Changed 3 years ago by snoyberg

Attachment: mem.ps added

base-only heap profile

comment:3 Changed 3 years ago by snoyberg

So if main6 generates a big data structure, it will be retained across both calls.

Well, that's sort of the idea: conduit is essentially a free monad, and it's evaluated by interpreting steps like "wait for next input" or "provide next value." What *should* be happening is that it creates a value indicating the next step, and that value is immediately consumed and garbage collected. Instead, for some reason it's maintaining this structure between multiple calls, even though the two data structures will not match (not that the Handle used by each loop will be different).

Hopefully the base-only version of the code will demonstrate the issue more clearly.

comment:4 Changed 3 years ago by snoyberg

Apologies if this is becoming repetitive, but here's a slightly simpler version demonstrating the same issue:

import System.IO

data Sink i r = Sink (i -> Sink i r) r

sinkCount :: Sink i Int
sinkCount =
    loop 0
  where
    loop cnt = Sink (\_ -> loop $! cnt + 1) cnt

feed :: Handle -> Sink Char r -> IO r
feed h =
    loop
  where
    loop (Sink f g) = do
        eof <- hIsEOF h
        if eof
            then return g
            else do
                c <- hGetChar h
                loop $! f c

action :: IO ()
action = withBinaryFile "1mb" ReadMode $ \h -> do
    feed h sinkCount
    return ()

main :: IO ()
main = do
    action
    action

The following code, however, does *not* demonstrate the problem:

import System.IO

data Sink i r = Sink (i -> Sink i r) r

sinkCount :: Sink i Int
sinkCount =
    loop 0
  where
    loop cnt = Sink (\_ -> loop $! cnt + 1) cnt

feed :: Sink Char r -> IO r
feed =
    loop 10000000
  where
    loop 0 (Sink _ g) = return g
    loop i (Sink f _) = loop (i - 1) (f 'A')

action :: IO ()
action = do
    feed sinkCount
    return ()

main :: IO ()
main = do
    action
    action

comment:5 Changed 3 years ago by meteficha

Cc: meteficha added

comment:6 Changed 3 years ago by snoyberg

As pointed out by Bryan Vicknair in the cafe thread, my first example in my previous comment does not always leak memory. In particular, I had to turn on optimizations (either -O or -O2) to get it to happen.

comment:7 in reply to:  4 Changed 3 years ago by int-e

Replying to snoyberg: Without looking at the core, sinkCount has the potential to become a large shared data structure, if the loop $! cnt + 1 part is floated out like so:

sinkCount :: Sink i Int
sinkCount =
    loop 0
  where
    loop cnt = let sink' = loop $! cnt + 1 in Sink (\_ -> sink') cnt

Then the run-time representation of (\_ -> sink') is a closure that points to the next sink, sink'. The first time sinkCount is used it'll produce many sinks (Sink (\_ -> sink') cnt) for increasing counts, each pointing to the next.

Ideally, sinkCount and feed should be fused, but that requires inlining parts of sinkCount which given its recursive definition is tricky.

Last edited 3 years ago by int-e (previous) (diff)

comment:8 Changed 3 years ago by edsko

Cc: edsko added

comment:9 Changed 10 months ago by edsko

I have no answers here, just more questions. I ran into this problem again with a large project that uses conduit. My program suffered from a large memory leak, and in the -hy profile the types were reported as ->Pipe and Sink; moreover, the -hc profile told me memory was being retained by a CAF. All of this pointed to the exact problem discussed in this ticket, and indeed adding

{-# OPTIONS_GHC -fno-full-laziness #-}

to the top of my module got rid of the problem. However, I can't say I fully understand what is going on. Experimenting with @snoyberg 's examples, above, I noticed that the memory behaviour of these modules interacts in odd ways with profiling options, which doesn't make this any easier! For @snoyberg's first example (https://ghc.haskell.org/trac/ghc/ticket/9520#comment:4):

    | No profiling | -prof   | -prof -fprof-auto
----+--------------+---------+------------------    
-O0 | OK           | OK      | OK
-O1 | OK           | OK      | LEAK(1)
-O2 | OK           | OK      | LEAK(1)

where OK means "runs in constant space" and LEAK(1) indicates a memory leak consisting of Int, ->Sink and Sink, according to +RTS -hy. In other words, this has a memory leak only when both optimization and -fprof-auto are specified (-fprof by itself is not enough).

Bizarrely, for the second example the behaviour is reversed (perhaps this is why Michael concluded that this example "however, does not demonstrate the problem"?):

    | No profiling | -prof   | -prof -fprof-auto
----+--------------+---------+------------------    
-O0 | OK           | OK      | OK
-O1 | LEAK         | LEAK(1) | OK
-O2 | LEAK         | LEAK(1) | OK

Unlike for the first example, here we also get a LEAK without any profiling enabled (as indicated by a very high maximum residency reported by +RTS -s).

I added a third example:

foreign import ccall "doNothing" doNothing :: IO ()

data Sink i r = Sink (i -> Sink i r) r

sinkCount :: Sink i Int
sinkCount =
    loop 0
  where
    loop cnt = Sink (\_ -> loop $! cnt + 1) cnt

feed :: Sink Char r -> IO r
feed =
    loop 10000000
  where
    loop 0 (Sink _ g) = return g
    loop i (Sink f _) = doNothing >> loop (i - 1) (f 'A')

action :: IO ()
action = do
    feed sinkCount
    return ()

main :: IO ()
main = do
    action
    action

This differs from @snoyberg 's second example only in the additional call to doNothing in feed.loop; doNothing is defined in an external .c file:

void doNothing() {}

(I used an externally defined C function because I wanted something that the optimizer couldn't get rid of but without getting all kinds of crud about Handles etc in the core/STG output, which is what would happen with a print statement, say.) I have no idea why, but this program's memory behaviour is quite different from version 2:

    | No profiling | -prof   | -prof -fprof-auto
----+--------------+---------+------------------    
-O0 | LEAK         | LEAK(2) | LEAK(2)
-O1 | LEAK         | LEAK(1) | LEAK(1)
-O2 | LEAK         | LEAK(1) | LEAK(1)

Now this program leaks no matter what we do; although LEAK(2) reported here, according to RTS -hy, consists of different type (a single type, in fact: PAP).

Getting to the bottom of this would require more time than I currently have; I guess for me the take-away currently is: full laziness is dangerous when using free monads such as conduit.

comment:10 Changed 10 months ago by edsko

As Andres points out, my version 3 is not particularly enlightening; without any optimization, the loop in feed is not tail recursive (this relies on unfolding of >>); with optimization, we are bitten by the full laziness problem again. Bit of a red herring. (I had been trying to simulate the hIsEOF in the original example.)

So I guess the only take away from my experimentation is: CAFs/full laziness and profiling modes interact in ways that make memory behaviour very unpredictable. Proceed with caution.

comment:11 Changed 9 months ago by edsko

comment:12 Changed 9 months ago by edsko

comment:13 Changed 9 months ago by edsko

Resolution: invalid
Status: newclosed

Ok, I believe there is no ghc bug here, although admittedly these issues are incredibly subtle. The memory leak comes from the full laziness optimization, as @int-e points out. The interaction with -fprof-auto that I observed comes from interaction of cost centres with the state hack. And the difference between Michael's two examples turns out to be unimportant; if you split example one into two separate modules so that the optimizer gets less change to optimize the heck out of it, the memory behaviour of both examples is identical.

I've written a long blog post that explores these issues in great detail; will publish soon.

comment:14 Changed 9 months ago by simonpj

Edsko: when you've written the post, do link it from this ticket so that people following the trail can find it. (Both as a comment and in the main Description, I suggest.)

Thanks

Simon

comment:15 Changed 9 months ago by edsko

Sure. Article is now published at http://www.well-typed.com/blog/2016/09/sharing-conduit/ . It discusses all the issues mentioned in this ticket (including, as a bonus why prof-auto has the effect it has :). As requested will also add a link in the ticket description.

comment:16 Changed 9 months ago by edsko

Description: modified (diff)

comment:17 Changed 9 months ago by simonpj

Terrific. I've added it to the Haskell Performance Resource

Simon

Note: See TracTickets for help on using tickets.