Opened 5 years ago

Last modified 3 months ago

#5553 new bug

sendWakeup error in simple test program with MVars and killThread

Reported by: bit Owned by:
Priority: high Milestone: 7.4.2
Component: Runtime System Version: 8.0.1
Keywords: Cc: johan.tibell@…, roma@…, basvandijk, simonmar
Operating System: Linux Architecture: x86
Type of failure: Incorrect result at runtime Test Case:
Blocked By: Blocking:
Related Tickets: Differential Rev(s):
Wiki Page:

Description

The following test program causes a sendWakeup error to be printed. It happens rarely, not on every run of the program.

I'm running GHC 7.2.1 on a fairly old Linux 2.6.27 system.

Running it from the shell in a loop should cause it to eventually display the error message. I found that by causing CPU activity (such as running "yes" in another terminal) while the shell loop below is running triggers the error.

$ ghc --make -Wall -O -threaded -rtsopts ghc_sendWakeup_bug.hs
$ while [ 1 ]; do ./ghc_sendWakeup_bug 40; done
ghc_sendWakeup_bug: sendWakeup: invalid argument (Bad file descriptor)

ghc_sendWakeup_bug.hs

module Main
    ( startTest
    , main
    ) where

import Control.Concurrent (ThreadId, forkIO, killThread, threadDelay)
import Control.Concurrent.MVar
import Control.Exception (finally, catch, SomeException, mask_)
import Control.Monad (when, replicateM_, forever)
import Prelude hiding (catch)
import System.Environment (getArgs, getProgName)
import System.Exit (exitFailure)
import System.IO (hPutStrLn, stderr)

startClient :: IO ()
startClient = threadDelay (1000 * 10)

startTest :: Int -> IO ()
startTest numClients = do
    -- Code adapted from:
    -- http://hackage.haskell.org/packages/archive/base/4.4.0.0/doc/html/Control-Concurrent.html#g:12
    children <- newMVar [] :: IO (MVar [MVar ()])

    let forkChild :: IO () -> IO ThreadId
        forkChild io = do
            mvar <- newEmptyMVar
            mask_ $ do
                modifyMVar_ children (return . (mvar:))
                forkIO (io `finally` putMVar mvar ())
        waitForChildren :: IO ()
        waitForChildren = do
            cs <- takeMVar children
            case cs of
                [] -> return ()
                m:ms -> do
                    putMVar children ms
                    takeMVar m
                    waitForChildren

    serverThread <- forkIO $ forever (threadDelay 1000000)

    replicateM_ numClients (forkChild startClient)
    catch waitForChildren (printException "waitForChildren")
    catch (killThread serverThread) (printException "killThread")

printException :: String -> SomeException -> IO ()
printException place ex =
    hPutStrLn stderr $ "Error in " ++ place ++ ": " ++ show ex

main :: IO ()
main = do
    args <- getArgs
    when (length args /= 1) $ do
        prog <- getProgName
        hPutStrLn stderr $ "Usage: " ++ prog ++ " <numClients>"
        exitFailure
    let numClients = read (args !! 0)
    startTest numClients

Change History (10)

comment:1 Changed 5 years ago by tibbe

Cc: johan.tibell@… added

comment:2 Changed 5 years ago by tibbe

Owner: set to tibbe

I've assigned the ticket to myself but I'm pretty swamped right now so if someone else has time feel free to take a look.

sendWakeup is defined in GHC/Event/Control.hs and is used to wake up the I/O manager every time a new file descriptor or timeout (i.e. threadDelay) is added. Here's the relevant code:

sendWakeup :: Control -> IO ()
#if defined(HAVE_EVENTFD)
sendWakeup c = alloca $ \p -> do
  poke p (1 :: Word64)
  throwErrnoIfMinus1_ "sendWakeup" $
    c_write (fromIntegral (controlEventFd c)) (castPtr p) 8
#else
sendWakeup c = do
  n <- sendMessage (wakeupWriteFd c) CMsgWakeup
  case n of
    _ | n /= -1   -> return ()
      | otherwise -> do
                   errno <- getErrno
                   when (errno /= eAGAIN && errno /= eWOULDBLOCK) $
                     throwErrno "sendWakeup"
#endif

Since you're on Linux the first #if case applies.

comment:3 Changed 5 years ago by Feuerbach

Couldn't reproduce here, even by loading all cores by 100% and setting numClients to 10000. GHC 7.2.1, Linux 2.6.32.

comment:4 Changed 5 years ago by Feuerbach

Cc: roma@… added

comment:5 Changed 5 years ago by igloo

Milestone: 7.4.1
Priority: normalhigh

comment:6 Changed 5 years ago by michalt

I can't reproduce it either. Tried GHC 7.2.2 and HEAD with gcc 4.6.2, Linux 3.1.1.

comment:7 Changed 5 years ago by igloo

Milestone: 7.4.17.4.2

comment:8 Changed 5 years ago by bit

I am the original reporter of this bug.

I would just like to report that ghc 7.4.1 seems to have resolved this bug, and I am no longer getting the error from the test program.

comment:9 in reply to:  8 Changed 5 years ago by simonmar

difficulty: Unknown
Resolution: worksforme
Status: newclosed

Replying to bit:

I am the original reporter of this bug.

I would just like to report that ghc 7.4.1 seems to have resolved this bug, and I am no longer getting the error from the test program.

Thanks!

comment:10 Changed 3 months ago by basvandijk

Cc: basvandijk simonmar added
Owner: tibbe deleted
Resolution: worksforme
Status: closednew
Version: 7.2.18.0.1

In a program at work I get exactly the same error.

I'm on GHC-8.0.1.

I'm using Don Stewarts ghc-gc-tune to find the optimal GC settings for a server program. ghc-gc-tune expects the program to terminate with a success exit code. Since it's a server this doesn't happen. To fix this I wrap the program in timeout --preserve-status 5 to terminate the server after 5 seconds. ghc-gc-tune will run the program many times with different -H and -A settings.

Note that I'm running the program with -N2. I couldn't reproduce the problem with -N1.

Among other things, the program forks a thread which basically does the same thing as the serverThread of the OP:

tid <- forkIO $ forever $ threadDelay 5000000

The program ends with installing a signal handler for SIGTERM which will unlock a lock that the program is waiting on. Finally the forked thread is killed:

lock <- newEmptyMVar
installHandler sigTERM (Catch $ putMVar lock ()) (Just fullSignalSet)
takeMVar lock

killThread tid

I believe the sendWakeup exception is thrown in threadDelay.

I'm trying to reduce to program so it doesn't contain any proprietary code but so far I'm not succeeding.

It appears that closeControl is called before wakeManager. Could it be that the state IORef is finalized before we run wakeManager? If so, how can we ensure the garbage collector treats the IORef as reachable until after wakeManager? Maybe writing to it, as in:

wakeManager :: TimerManager -> IO ()
wakeManager mgr = do 
  sendWakeup (emControl mgr)
  atomicWriteIORef (emState mgr) $ \x -> (x, ())

Or maybe we need a touchIORef similar to touchForeignPtr that ensures the IORef is kept alive at the given place in the sequence of IO actions.

Last edited 3 months ago by basvandijk (previous) (diff)
Note: See TracTickets for help on using tickets.