Opened 6 years ago

Closed 7 weeks ago

Last modified 7 weeks ago

#5553 closed bug (fixed)

sendWakeup error in simple test program with MVars and killThread

Reported by: bit Owned by:
Priority: high Milestone: 8.2.2
Component: Runtime System Version: 8.0.1
Keywords: Cc: johan.tibell@…, roma@…, basvandijk, simonmar
Operating System: Linux Architecture: x86
Type of failure: Incorrect result at runtime Test Case:
Blocked By: Blocking:
Related Tickets: #12038 Differential Rev(s):
Wiki Page:


The following test program causes a sendWakeup error to be printed. It happens rarely, not on every run of the program.

I'm running GHC 7.2.1 on a fairly old Linux 2.6.27 system.

Running it from the shell in a loop should cause it to eventually display the error message. I found that by causing CPU activity (such as running "yes" in another terminal) while the shell loop below is running triggers the error.

$ ghc --make -Wall -O -threaded -rtsopts ghc_sendWakeup_bug.hs
$ while [ 1 ]; do ./ghc_sendWakeup_bug 40; done
ghc_sendWakeup_bug: sendWakeup: invalid argument (Bad file descriptor)


module Main
    ( startTest
    , main
    ) where

import Control.Concurrent (ThreadId, forkIO, killThread, threadDelay)
import Control.Concurrent.MVar
import Control.Exception (finally, catch, SomeException, mask_)
import Control.Monad (when, replicateM_, forever)
import Prelude hiding (catch)
import System.Environment (getArgs, getProgName)
import System.Exit (exitFailure)
import System.IO (hPutStrLn, stderr)

startClient :: IO ()
startClient = threadDelay (1000 * 10)

startTest :: Int -> IO ()
startTest numClients = do
    -- Code adapted from:
    children <- newMVar [] :: IO (MVar [MVar ()])

    let forkChild :: IO () -> IO ThreadId
        forkChild io = do
            mvar <- newEmptyMVar
            mask_ $ do
                modifyMVar_ children (return . (mvar:))
                forkIO (io `finally` putMVar mvar ())
        waitForChildren :: IO ()
        waitForChildren = do
            cs <- takeMVar children
            case cs of
                [] -> return ()
                m:ms -> do
                    putMVar children ms
                    takeMVar m

    serverThread <- forkIO $ forever (threadDelay 1000000)

    replicateM_ numClients (forkChild startClient)
    catch waitForChildren (printException "waitForChildren")
    catch (killThread serverThread) (printException "killThread")

printException :: String -> SomeException -> IO ()
printException place ex =
    hPutStrLn stderr $ "Error in " ++ place ++ ": " ++ show ex

main :: IO ()
main = do
    args <- getArgs
    when (length args /= 1) $ do
        prog <- getProgName
        hPutStrLn stderr $ "Usage: " ++ prog ++ " <numClients>"
    let numClients = read (args !! 0)
    startTest numClients

Change History (15)

comment:1 Changed 6 years ago by tibbe

Cc: johan.tibell@… added

comment:2 Changed 6 years ago by tibbe

Owner: set to tibbe

I've assigned the ticket to myself but I'm pretty swamped right now so if someone else has time feel free to take a look.

sendWakeup is defined in GHC/Event/Control.hs and is used to wake up the I/O manager every time a new file descriptor or timeout (i.e. threadDelay) is added. Here's the relevant code:

sendWakeup :: Control -> IO ()
#if defined(HAVE_EVENTFD)
sendWakeup c = alloca $ \p -> do
  poke p (1 :: Word64)
  throwErrnoIfMinus1_ "sendWakeup" $
    c_write (fromIntegral (controlEventFd c)) (castPtr p) 8
sendWakeup c = do
  n <- sendMessage (wakeupWriteFd c) CMsgWakeup
  case n of
    _ | n /= -1   -> return ()
      | otherwise -> do
                   errno <- getErrno
                   when (errno /= eAGAIN && errno /= eWOULDBLOCK) $
                     throwErrno "sendWakeup"

Since you're on Linux the first #if case applies.

comment:3 Changed 6 years ago by Feuerbach

Couldn't reproduce here, even by loading all cores by 100% and setting numClients to 10000. GHC 7.2.1, Linux 2.6.32.

comment:4 Changed 6 years ago by Feuerbach

Cc: roma@… added

comment:5 Changed 6 years ago by igloo

Milestone: 7.4.1
Priority: normalhigh

comment:6 Changed 6 years ago by michalt

I can't reproduce it either. Tried GHC 7.2.2 and HEAD with gcc 4.6.2, Linux 3.1.1.

comment:7 Changed 6 years ago by igloo


comment:8 Changed 5 years ago by bit

I am the original reporter of this bug.

I would just like to report that ghc 7.4.1 seems to have resolved this bug, and I am no longer getting the error from the test program.

comment:9 in reply to:  8 Changed 5 years ago by simonmar

difficulty: Unknown
Resolution: worksforme
Status: newclosed

Replying to bit:

I am the original reporter of this bug.

I would just like to report that ghc 7.4.1 seems to have resolved this bug, and I am no longer getting the error from the test program.


comment:10 Changed 11 months ago by basvandijk

Cc: basvandijk simonmar added
Owner: tibbe deleted
Resolution: worksforme
Status: closednew

In a program at work I get exactly the same error.

I'm on GHC-8.0.1.

I'm using Don Stewarts ghc-gc-tune to find the optimal GC settings for a server program. ghc-gc-tune expects the program to terminate with a success exit code. Since it's a server this doesn't happen. To fix this I wrap the program in timeout --preserve-status 5 to terminate the server after 5 seconds. ghc-gc-tune will run the program many times with different -H and -A settings.

Note that I'm running the program with -N2. I couldn't reproduce the problem with -N1.

Among other things, the program forks a thread which basically does the same thing as the serverThread of the OP:

tid <- forkIO $ forever $ threadDelay 5000000

The program ends with installing a signal handler for SIGTERM which will unlock a lock that the program is waiting on. Finally the forked thread is killed:

lock <- newEmptyMVar
installHandler sigTERM (Catch $ putMVar lock ()) (Just fullSignalSet)
takeMVar lock

killThread tid

I believe the sendWakeup exception is thrown in threadDelay.

I'm trying to reduce to program so it doesn't contain any proprietary code but so far I'm not succeeding.

It appears that closeControl is called before wakeManager. Could it be that the state IORef is finalized before we run wakeManager? If so, how can we ensure the garbage collector treats the IORef as reachable until after wakeManager? Maybe writing to it, as in:

wakeManager :: TimerManager -> IO ()
wakeManager mgr = do 
  sendWakeup (emControl mgr)
  atomicWriteIORef (emState mgr) $ \x -> (x, ())

Or maybe we need a touchIORef similar to touchForeignPtr that ensures the IORef is kept alive at the given place in the sequence of IO actions.

Last edited 11 months ago by basvandijk (previous) (diff)

comment:11 Changed 6 months ago by rrnewton

I'm getting this error too, in a different program (on some runs only), with GHC 8.0.2.

comment:12 Changed 6 months ago by Feuerbach

A few of us at nstack have been getting this error lately with ghc 8.0.2.

comment:13 Changed 6 months ago by bgamari


We should fix this.

comment:14 Changed 7 weeks ago by dfeuer

Resolution: fixed
Status: newclosed

I'm unable to reproduce this in 8.2.1. I think it may have been fixed by d5cd505bc484edee3dbd5d41fb7a27c2e18d528d

Author: Ben Gamari <>
Date:   Tue Jan 17 15:52:37 2017 -0500

    event manager: Don't worry if attempt to wake dead manager fails
    This fixes #12038, where the TimerManager would attempt to wake up a
    manager that was already dead, resulting in setnumcapabilities001
    occassionally failing during shutdown with unexpected output on stderr.
    I'm frankly still not entirely confident in this solution but perhaps it
    will help to get a few more eyes on this.
    My hypothesis is that the TimerManager is racing:
      thread                   TimerManager worker
      -------                  --------------------
      requests that thread
      manager shuts down
                               begins to clean up,
                               closing eventfd
      calls wakeManager,
      which tries to write
      to closed eventfd
    To prevent this `wakeManager` will need to synchronize with the
    TimerManger worker to ensure that the worker doesn't clean up the
    `Control` while another thread is trying to send a wakeup. However, this
    would add a bit of overhead on every timer interaction, which feels
    rather costly for what is really a problem only at shutdown.  Moreover,
    it seems that the event manager (e.g.  `GHC.Event.Manager`) is also
    afflicted by a similar race.
    This patch instead simply tries to catch the write failure after it has
    happened and silence it in the case that the fd has vanished. It feels
    rather hacky but it seems to work.
    Test Plan: Run `setnumcapabilities001` repeatedly
    Reviewers: hvr, austin, simonmar
    Subscribers: thomie
    Differential Revision:
    GHC Trac Issues: #12038

comment:15 Changed 7 weeks ago by dfeuer

Note: See TracTickets for help on using tickets.