Opened 3 years ago

Closed 4 months ago

#10241 closed bug (wontfix)

BlockedIndefinitelyOnMVar thrown to the thread which is not blocked indefinitely

Reported by: asukamirai Owned by: simonmar
Priority: normal Milestone:
Component: Runtime System Version: 7.8.3
Keywords: Cc: simonmar, osa1, ndmitchell@…
Operating System: Unknown/Multiple Architecture: x86_64 (amd64)
Type of failure: Incorrect result at runtime Test Case:
Blocked By: Blocking:
Related Tickets: #10793 Differential Rev(s): Phab:D4644
Wiki Page:

Description

BlockedIndefinatelyOnMVar exception is thrown to the main thread in below source code although the thread is not blocked indefinitely.

module Main where

import qualified Control.Concurrent.MVar as MV
import qualified Control.Concurrent      as CC
import qualified Control.Exception       as E

main :: IO ()
main = do
    -- call this thread "threadA"
    mvar1 <- MV.newEmptyMVar :: IO (MV.MVar ())
    mvar2 <- MV.newEmptyMVar :: IO (MV.MVar ())
    _ <- CC.forkIO $ do
        -- call this thread "threadB"
        MV.takeMVar mvar1 `E.catch` errorHandler1
        putStrLn "after error catch"
        CC.threadDelay 1000000
        MV.putMVar mvar2 ()
        putStrLn "after putMVar"
    MV.readMVar mvar2 `E.catch` errorHandler2
    putStrLn "after readMVar"
    CC.threadDelay 5000000
    where
        errorHandler1 :: E.BlockedIndefinitelyOnMVar -> IO ()
        errorHandler1 e = putStrLn $ "errorHandler1 : " ++ show e
        errorHandler2 :: E.BlockedIndefinitelyOnMVar -> IO ()
        errorHandler2 e = putStrLn $ "errorHandler2 : " ++ show e

Save above as "mvar.hs" and run by ghc as below.

> runhaskell mvar.hs
errorHandler1 : thread blocked indefinitely in an MVar operation
errorHandler2 : thread blocked indefinitely in an MVar operation
after error catch
after readMVar
after putMVar

BlockedIndefinitelyOnMVar thrown for mvar1 is correct. It will be caught by errorHandler1 and "threadB" can continue to put the value to mvar2. It means that "threadA" can wait for the value of mvar2 and it is not blocked indefinately. However, BlockedIndefinitelyOnMVar is thrown for mvar2 on "threadA" before "threadB" puts value to the mvar2. I think it is incorrect.


I tested another case that adding "CC.threadDelay 10000000" before "readMVar" as below.

module Main where

import qualified Control.Concurrent.MVar as MV
import qualified Control.Concurrent      as CC
import qualified Control.Exception       as E

main :: IO ()
main = do
    -- call this thread "threadA"
    mvar1 <- MV.newEmptyMVar :: IO (MV.MVar ())
    mvar2 <- MV.newEmptyMVar :: IO (MV.MVar ())
    _ <- CC.forkIO $ do
        -- call this thread "threadB"
        MV.takeMVar mvar1 `E.catch` errorHandler1
        putStrLn "after error catch"
        CC.threadDelay 1000000
        MV.putMVar mvar2 ()
        putStrLn "after putMVar"
    CC.threadDelay 10000000 -- <-- this line is added
    MV.readMVar mvar2 `E.catch` errorHandler2
    putStrLn "after readMVar"
    CC.threadDelay 5000000
    where
        errorHandler1 :: E.BlockedIndefinitelyOnMVar -> IO ()
        errorHandler1 e = putStrLn $ "errorHandler1 : " ++ show e
        errorHandler2 :: E.BlockedIndefinitelyOnMVar -> IO ()
        errorHandler2 e = putStrLn $ "errorHandler2 : " ++ show e

And it will run correctly (BlockedIndefinitelyOnMVar is not thrown for mvar2).

> runhaskell mvar.hs
errorHandler1 : thread blocked indefinitely in an MVar operation
after error catch
after putMVar
after readMVar

I found this behavior is same on STM / BlockedIndefinitelyOnSTM.

module Main where

import qualified Control.Concurrent.STM as STM
import qualified Control.Concurrent     as CC
import qualified Control.Exception      as E

main :: IO ()
main = do
    tmv1 <- STM.newEmptyTMVarIO :: IO (STM.TMVar ())
    tmv2 <- STM.newEmptyTMVarIO :: IO (STM.TMVar ())
    _ <- CC.forkIO $ do
        STM.atomically (STM.takeTMVar tmv1) `E.catch` errorHandler1
        putStrLn "after error catch"
        CC.threadDelay 1000000
        STM.atomically $ STM.putTMVar tmv2 ()
        putStrLn "after putTMVar"
    STM.atomically (STM.readTMVar tmv2) `E.catch` errorHandler2
    putStrLn "after readTMVar"
    CC.threadDelay 5000000
    where
        errorHandler1 :: E.BlockedIndefinitelyOnSTM -> IO ()
        errorHandler1 e = putStrLn $ "errorHandler1 : " ++ show e
        errorHandler2 :: E.BlockedIndefinitelyOnSTM -> IO ()
        errorHandler2 e = putStrLn $ "errorHandler2 : " ++ show e
> runhaskell stm.hs
errorHandler1 : thread blocked indefinitely in an STM transaction
errorHandler2 : thread blocked indefinitely in an STM transaction
after error catch
after readTMVar
after putTMVar

I tested this in below versions/OSs and got same result (exception thrown for mvar2/tmv2).

ghc7.8.3 on Windows7 ghc7.8.3 on lubuntu14.04 on VirtualBox on Windows7 ghc7.8.4 on lubuntu14.04 on VirtualBox on Windows7 ghc7.10.1 on lubuntu14.04 on VirtualBox on Windows7

Similar report https://ghc.haskell.org/trac/ghc/ticket/8804 found but not the same. (In this case, the reference to the MVar is not weak)

Change History (7)

comment:1 Changed 5 months ago by osa1

Cc: osa1 added

I've been recently reading the relevant code, and I think this is indeed a bug.

Basically we raise an async BlockedIndefinitelyOnMVar when a thread is not reachable and blocked in an MVar operation.

Reachability in this context means:

  • If a thread has work to do it's reachable (because it's in run queue of one of the capabilities)
  • If a thread is blocked in an MVar it's reachable iff the MVar is reachable

Now suppose that thread_1 is blocked on mvar_1 and has a reference to mvar_2, and thread_2 is blocked on mvar_2, and both MVars become unreachable after a GC. This makes both thread_1 and thread_2 unreachable and we add them to the "thrads to resurrect" list. By the time we resurrect thread_1, mvar_2 becomes reachable, which makes thread_2 reachable as well. But we already added thread_2 to the "threads to resurrect" list so we raise an exception in thread_2 as well.

I think this is basically what's happening in this ticket.

Why adding a delay fixes this? Because by the time the main thread is paused with the delay it's in the run queue of the capability because it has more work to do. So the delay makes the main thread reachable for a few more GC cycles, and BlockedIndefinitelyOnMVar is only raised on resurrected (unreachable) threads.

I think fixing this may be easy; instead of resurrecting all unreachable threads at once, we resurrect one from the list, then continue with the scavenge loop. I'll try this once my tree builds.

comment:2 Changed 5 months ago by osa1

#10793 is basically the same problem.

comment:3 Changed 5 months ago by NeilMitchell

Cc: ndmitchell@… added

comment:4 Changed 5 months ago by osa1

I think fixing this may be easy; instead of resurrecting all unreachable threads at once, we resurrect one from the list, then continue with the scavenge loop. I'll try this once my tree builds.

This fix works: Phab:D4644. I'll now try to validate to see if it breaks anything.

comment:5 Changed 5 months ago by osa1

Differential Rev(s): Phab:D4644
Status: newpatch

Phab:D4644 passes validate.

comment:6 Changed 5 months ago by simonmar

This comes up from time to time. We can't pick an order to resurrect threads in because we don't know which one to resurrect first (as I commented on Phab:D4644). e.g. if we happened to resurrect the main thread first in this example, the result would be exactly the same.

We could do multiple traversals of the heap to figure out the relationship, but that's not practical. The only deterministic thing to do is to treat all unreachable threads as deadlocked and send them all an exception at the same time.

comment:7 Changed 4 months ago by osa1

Resolution: wontfix
Status: patchclosed
Note: See TracTickets for help on using tickets.