Opened 2 years ago

Closed 2 years ago

Last modified 2 years ago

#7556 closed bug (fixed)

build/fold causes with ByteString unpack causes huge memory leak

Reported by: glguy Owned by: duncan
Priority: normal Milestone: 7.8.1
Component: libraries (other) Version: 7.6.1
Keywords: Cc: shachaf@…
Operating System: Unknown/Multiple Architecture: Unknown/Multiple
Type of failure: Runtime performance bug Test Case:
Blocked By: Blocking:
Related Tickets: Differential Revisions:

Description

module Main where

import qualified Data.ByteString as B
import Data.Word (Word8)

-- works fast without optimizations
-- with optimizations this has a space leak
-- seems related to fold/build fusion in foldr/unpack

main :: IO ()
main = do

let b = B.replicate 100000000 1
print $ B.length b
print $ example1 b -- fast
print $ example2 b -- slow

search :: [Word8] -> Bool
search [] = False

search (x:xs) = x == 1
search xs

example1, example2 :: B.ByteString -> Bool
example1 = search . B.unpack

example2 = foldr (\x xs -> x == 1
xs) False . B.unpack

Change History (6)

comment:1 Changed 2 years ago by glguy

module Main where

import qualified Data.ByteString as B
import Data.Word (Word8)

-- works fast without optimizations
-- with optimizations this has a space leak
-- seems related to fold/build fusion in foldr/unpack

main :: IO ()
main = do
  let b = B.replicate 100000000 1
  print $ B.length b
  print $ example1 b -- fast
  print $ example2 b -- slow

search :: [Word8] -> Bool
search [] = False
search (x:xs) = x == 1 || search xs

example1, example2 :: B.ByteString -> Bool
example1 = search . B.unpack
example2 = foldr (\x xs -> x == 1 || xs) False . B.unpack

comment:2 Changed 2 years ago by shachaf

  • Cc shachaf@… added

The issue is here:

unpack ps = build (unpackFoldr ps)
{-# INLINE unpack #-}

--
-- Have unpack fuse with good list consumers
--
-- critical this isn't strict in the acc
-- as it will break in the presence of list fusion. this is a known
-- issue with seq and build/foldr rewrite rules, which rely on lazy
-- demanding to avoid bottoms in the list.
--
unpackFoldr :: ByteString -> (Word8 -> a -> a) -> a -> a
unpackFoldr (PS fp off len) f ch = withPtr fp $ \p -> do
    let loop q n    _   | q `seq` n `seq` False = undefined -- n.b.
        loop _ (-1) acc = return acc
        loop q n    acc = do
           a <- peekByteOff q n
           loop q (n-1) (a `f` acc)
    loop (p `plusPtr` off) (len-1) ch
{-# INLINE [0] unpackFoldr #-}

{-# RULES
"ByteString unpack-list" [1]  forall p  .
    unpackFoldr p (:) [] = unpackBytes p

When we use foldr, foldr/build fusion turns the whole expression into an application of unpackFoldr, which is tail recursive and therefore not sufficiently lazy -- but also not strict in the accumulator, so it builds up a big thunk. In example1, fusion doesn't happen, so the fold happens over unpackBytes instead, which generates list in small chunks that can be processed lazily.

This looks like a bytestring bug to me.

comment:3 Changed 2 years ago by igloo

  • Component changed from libraries/base to libraries (other)
  • difficulty set to Unknown
  • Milestone set to 7.8.1
  • Owner set to duncan

Thanks for the diagnosis.

Duncan, could you take a look please?

comment:4 Changed 2 years ago by duncan

Crikey that's some old code. It's also totally wrong, as are the implementations of foldl and foldr. Patch in the works.

comment:5 Changed 2 years ago by duncan

  • Resolution set to fixed
  • Status changed from new to closed

Fixed in bytestring head, will be in next point releases.

Tue Jan  8 16:43:21 GMT 2013  Duncan Coutts <[email protected]>
  * Re-implement the foldr and foldl functions and fix unpack fusion
  They were just wrong. The old foldr and foldl were doing strict
  accumulation when they should be lazy.
    
  Also, the fusion for (List.foldr f z . BS.unpack) was using a
  tail-recursive variant on foldr (though not strictly accumulating)
  which meant it would build up a huge chain of thunks when it should
  be lazy and run in linear space. See ghc ticket 7556

comment:6 Changed 2 years ago by ian@…

commit 77e78416e97d96b55c299c6ccdd1741d38372052

Author: Ian Lynagh <[email protected]>
Date:   Fri Jan 11 14:39:46 2013 +0000

    Update bytestring and terminfo repos
    
    bytestring fixes #7556.
    terminfo fixes #7281.

 libraries/bytestring |    2 +-
 libraries/terminfo   |    2 +-
 2 files changed, 2 insertions(+), 2 deletions(-)
Note: See TracTickets for help on using tickets.