Calling hs_try_putmvar from an unsafe foreign call can cause the RTS to hang
An unsafe foreign call which calls hs_try_putmvar
can cause the RTS to hang, preventing any Haskell threads from making progress. However, compiling with -debug
causes it instead to fail an assertion in the scheduler:
internal error: ASSERTION FAILED: file rts/Schedule.c, line 510
(GHC version 8.4.3 for x86_64_apple_darwin)
Here is a minimal test case which reproduces the assertion. It needs to be built with -debug -threaded
and run with +RTS -N2
or higher.
import Control.Concurrent (forkIO, threadDelay)
import Control.Concurrent.MVar (MVar, newEmptyMVar, takeMVar)
import Control.Monad (forever)
import Foreign.C.Types (CInt(..))
import Foreign.StablePtr (StablePtr)
import GHC.Conc (PrimMVar, newStablePtrPrimMVar)
foreign import ccall unsafe hs_try_putmvar :: CInt -> StablePtr PrimMVar -> IO ()
main = do
mvar <- newEmptyMVar
forkIO $ forever $ do
takeMVar mvar
forkIO $ forever $ do
sp <- newStablePtrPrimMVar mvar
hs_try_putmvar (-1) sp
threadDelay 1
-- Let it spin a few times to trigger the bug
threadDelay 500
I actually checked out GHC and added this as a test case and did some debugging. The specific assertion that fails is ASSERT(task->cap == cap)
. This seems to happen because of this code in hs_try_putmvar
:
Task *task = getTask();
// ...
ACQUIRE_LOCK(&cap->lock);
// If the capability is free, we can perform the tryPutMVar immediately
if (cap->running_task == NULL) {
cap->running_task = task;
task->cap = cap;
RELEASE_LOCK(&cap->lock);
// ...
releaseCapability(cap);
} else {
// ...
}
Basically it assumes that the current thread's task isn't currently running a capability, so it takes a new one and then releases it without restoring the previous value of task->cap
.
Modifying the code to restore the value of task->cap
after releasing the capability fixes the assertion. But I don't know enough about the RTS to be sure I'm not missing something here. In particular, is there a problem with the task basically holding two capabilities for a short time?
My other thought is that maybe it should check if its task is currently running a capability, and in that case do something else. But I'm not sure what.
Trac metadata
Trac field | Value |
---|---|
Version | 8.4.3 |
Type | Bug |
TypeOfFailure | OtherFailure |
Priority | normal |
Resolution | Unresolved |
Component | Runtime System |
Test case | |
Differential revisions | |
BlockedBy | |
Related | |
Blocking | |
CC | |
Operating system | |
Architecture |