Opened 5 years ago

Closed 2 years ago

#7170 closed bug (fixed)

Foreign.Concurrent finalizer called twice in some cases

Reported by: joeyadams Owned by: niteria
Priority: high Milestone: 7.6.1
Component: Core Libraries Version: 7.8.3
Keywords: Cc: hvr, ekmett, core-libraries-committee@…
Operating System: Unknown/Multiple Architecture: Unknown/Multiple
Type of failure: Runtime crash Test Case: ffi/should_run/7170
Blocked By: Blocking:
Related Tickets: Differential Rev(s): D921
Wiki Page:

Description

When Foreign.Concurrent.newForeignPtr is used, here's a case where the finalizer is called twice:

{-# LANGUAGE ForeignFunctionInterface #-}
import Control.Concurrent
import Control.Exception    (bracket)
import Foreign.Ptr          (Ptr, intPtrToPtr)
import Foreign.ForeignPtr   (ForeignPtr)
import qualified Foreign.Concurrent as FC
import qualified Foreign.ForeignPtr as FP

testForeignPtr_Concurrent :: Ptr a -> IO (ForeignPtr a)
testForeignPtr_Concurrent ptr = FC.newForeignPtr ptr (fin ptr)

fin :: Ptr a -> IO ()
fin ptr = putStrLn $ "finalizing " ++ show ptr

main :: IO ()
main = do
    mv <- newEmptyMVar
    bracket (testForeignPtr_Concurrent $ intPtrToPtr 1)
            FP.finalizeForeignPtr $ \_ ->
        -- hang, so the thread and foreign pointer get GCed
        takeMVar mv

This produces the following output:

finalizing 0x0000000000000001
finalizing 0x0000000000000001
foreignptr: thread blocked indefinitely in an MVar operation

This happens on GHC 7.4.2 and 7.6.0.20120810, with and without -threaded.

This can easily lead to segfaults when you have an FFI library that does this:

create :: IO Object
create = do
    ptr <- c_create
    CObject <$> newForeignPtr ptr (finalize ptr)

finalize :: Ptr CObject -> IO ()
finalize ptr = do
    ...

destroy :: Object -> IO ()
destroy (Object fptr) =
    finalizeForeignPtr fptr

And application code does this:

bracket create destroy $ \obj -> do
    ...

Attachments (2)

custom_finalizer.diff (1006 bytes) - added by arybczak 3 years ago.
add verbose finalizer
valgrind-output.txt (136.3 KB) - added by arybczak 3 years ago.
output of running the testcase with valgrind

Download all attachments as: .zip

Change History (13)

comment:1 Changed 5 years ago by joeyadams

I'm guessing that the problem is in this function from GHC.ForeignPtr:

foreignPtrFinalizer :: IORef (Finalizers, [IO ()]) -> IO ()
foreignPtrFinalizer r = do (_, fs) <- readIORef r; sequence_ fs

This invokes the finalizers, but does not empty out the list. When the application calls finalizeForeignPtr later, the same list of finalizers is run again.

While we're at it, this really bothers me:

finalizeForeignPtr (ForeignPtr _ foreignPtr) = do
        (ftype, finalizers) <- readIORef refFinalizers
        sequence_ finalizers
        writeIORef refFinalizers (ftype, [])

This isn't thread safe or exception safe. Finalizers can be repeated if another thread comes along and calls finalizeForeignPtr on the same pointer, or if a finalizer throws an exception.

A reasonable solution might be to wrap incoming finalizers with this:

once :: IO () -> IO (IO ())
once io = do
    mv <- newMVar io
    return $ tryTakeMVar mv >>= maybe (return ()) id

This way, we don't have to worry about finalizers being called twice.

comment:2 Changed 5 years ago by simonmar

difficulty: Unknown
Milestone: 7.6.1
Owner: set to simonmar
Priority: normalhigh

comment:3 Changed 5 years ago by simonmar

Component: Runtime Systemlibraries/base
Status: newmerge
Test Case: ffi/should_run/7170

Fixed:

commit 895dd47937c6c9340bf4f289f9f43d5f9be9ffcc

Author: Simon Marlow <marlowsd@gmail.com>
Date:   Tue Aug 21 15:42:50 2012 +0100

    Remove finalizers from a ForeignPtr atomically (#7170)

comment:4 Changed 5 years ago by pcapriotti

Resolution: fixed
Status: mergeclosed

Changed 3 years ago by arybczak

Attachment: custom_finalizer.diff added

add verbose finalizer

Changed 3 years ago by arybczak

Attachment: valgrind-output.txt added

output of running the testcase with valgrind

comment:5 Changed 3 years ago by arybczak

Cc: hvr ekmett added
Owner: simonmar deleted
Resolution: fixed
Status: closednew
Version: 7.4.27.8.3

This is still happening, albeit is much harder to reproduce (I wasn't able to write simple testcase for that).

Steps to reproduce (hopefully):

  1. git clone https://github.com/scrive/hpqtypes.git
  2. git checkout 8b4d57876dc573759c2935064306806b793844c5 (this is the last commit before the workaround I applied to solve the issue)
  3. git apply custom_finalizer.diff (to be sure it's haskell that tries to free the same memory twice)
  4. Change tests/Main.hs to the following code:
    module Main where
    
    import Control.Applicative
    import Control.Concurrent
    import Control.Exception
    import Control.Monad
    import Debug.Trace
    import System.Environment
    import qualified Data.ByteString.Char8 as BS
    
    import Database.PostgreSQL.PQTypes
    
    mkConnSource :: String -> ConnectionSource
    mkConnSource ci = defaultSource $ defaultSettings {
        csConnInfo = BS.pack ci
      }
    
    main :: IO ()
    main = do
      cs <- mkConnSource . head <$> getArgs
      let dts = defaultTransactionSettings
    
      flip finally (runDBT cs dts $ return ()) $ do
        void . forkIO $ do
          forM_ [1..100000::Int] $ \n -> trace (show n) $ return ()
          return ()
        threadDelay 1000000
    
  5. cabal sandbox init
  6. cabal install --dependencies-only
  7. cabal configure -ftests --ghc-options="-O0"
  8. cabal build
  9. cd dist/build/tests
  10. valgrind ./tests user=postgres

The last step must be repeated a couple of times to get the error we are interested in. I attach the output of running it on my machine (valgrind-output.txt).

If you look at the output file, at line 1112 I pressed Ctrl-C to interrupt the program, then the finalizer was called for the first time at line 1880 and then for the second time at line 2255, which results in tons of output from valgrind about invalid reads of already freed memory.

Relevant file: https://github.com/scrive/hpqtypes/blob/8b4d57876dc573759c2935064306806b793844c5/src/Database/PostgreSQL/PQTypes/Internal/Connection.hs

runDBT calls withConnection of the defaultSource (line 89), which is a simple wrapper for bracket connect disconnect. In disconnect (line 152) the destruction of the database connection is forced by calling finalizeForeignPtr. When finalizeForeignPtr is removed, the bug disappears.

comment:6 Changed 3 years ago by thomie

Cc: core-libraries-committee@… added
Component: libraries/baseCore Libraries
Owner: set to ekmett

comment:7 Changed 3 years ago by ekmett

Owner: ekmett deleted

Relinquishing control over this ticket: It requires someone with a better grasp of the runtime issues than me, and having it assigned to me, may cause someone who could solve it to pass it by.

comment:8 Changed 2 years ago by niteria

Owner: set to niteria

comment:9 Changed 2 years ago by niteria

Differential Rev(s): D921
Status: newpatch

This turned out to be a race condition that happened when a finalizer of a last generation was called manually and no major gc before shutdown.

comment:10 Changed 2 years ago by Simon Marlow <marlowsd@…>

In dfdc50d666498c5a1118557d67209fe067c61cc1/ghc:

Don't call DEAD_WEAK finalizer again on shutdown (#7170)

Summary:
There's a race condition like this:

  # A foreign pointer gets promoted to the last generation
  # It has its finalizer called manually
  # We start shutting down the runtime in `hs_exit_` from the main
    thread
  # A minor GC starts running (`scheduleDoGC`) on one of the threads
  # The minor GC notices that we're in `SCHED_INTERRUPTING` state and
    advances to `SCHED_SHUTTING_DOWN`
  # The main thread tries to do major GC (with `scheduleDoGC`), but it
    exits early because we're in `SCHED_SHUTTING_DOWN` state
  # We end up with a `DEAD_WEAK` left on the list of weak pointers of
    the last generation, because it relied on major GC removing it from
    that list

This change:
  * Ignores DEAD_WEAK finalizers when shutting down
  * Makes the major GC on shutdown more likely
  * Fixes a bogus assert

Test Plan:
before this diff https://ghc.haskell.org/trac/ghc/ticket/7170#comment:5
reproduced and after it doesn't

Reviewers: ezyang, austin, simonmar

Reviewed By: simonmar

Subscribers: bgamari, thomie

Differential Revision: https://phabricator.haskell.org/D921

GHC Trac Issues: #7170

comment:11 Changed 2 years ago by niteria

Resolution: fixed
Status: patchclosed
Note: See TracTickets for help on using tickets.