Running an action twice uses much more memory than running it once

EDIT: A detailed analysis of the problems discussed in this ticket can be found at http://www.well-typed.com/blog/2016/09/sharing-conduit/ . There is no ghc bug here, as such, except perhaps #8457 "-ffull-laziness does more harm than good". See also #12620 "Allow the user to prevent floating and CSE".

This started as a Haskell cafe discussion about conduit. This may be related to #7206, but I can't be certain. It's possible that GHC is not doing anything wrong here, but I can't see a way that the code in question is misbehaving to trigger this memory usage.

Consider the following code, which depends on conduit-1.1.7 and conduit-extra:

import Data.Conduit ( Sink, (=$), ($$), await )
import qualified Data.Conduit.Binary as CB
import System.IO (withBinaryFile, IOMode (ReadMode))

main :: IO ()
main = do
    action "random.gz"
    --action "random.gz"

action :: FilePath -> IO ()
action filePath = withBinaryFile filePath ReadMode $ \h -> do
    _ <- CB.sourceHandle h
      $$ CB.lines
      =$ sink2 1
    return ()

sink2 :: (Monad m) => Int -> Sink a m Int
sink2 state = do
  maybeToken <- await
  case maybeToken of
    Nothing     -> return state
    Just _      -> sink2 $! state + 1

The code should open up the file "random.gz" (I simply gziped about 10MB of data from /dev/urandom), break it into chunks at each newline character, and then count the number of lines. When I run it as-is, it uses 53KB of memory, which seems reasonable.

However, if I uncomment the second call to action in main, maximum residency shoots up to 45MB (this seems to be linear in the size of the input file. I additionally tried copying random.gz into two files, random1.gz and random2.gz, and changed the two calls to action to use different file names. It still resulted in large memory usage.

I'm going to continue working to make this a smaller reproducing test case, but I wanted to start with what I had so far. I'll also attach the core generated by both the low-memory and high-memory versions.

Edited Mar 10, 2019 by edsko@edsko.net

To upload designs, you'll need to enable LFS and have an admin enable hashed storage. More information