Opened 23 months ago

Last modified 17 months ago

#13497 new bug

GHC does not use select()/poll() correctly on non-Linux platforms

Reported by: nh2 Owned by:
Priority: normal Milestone:
Component: Runtime System Version: 8.0.1
Keywords: Cc: nh2, simonmar, slyfox
Operating System: Unknown/Multiple Architecture: Unknown/Multiple
Type of failure: None/Unknown Test Case:
Blocked By: Blocking: #8684
Related Tickets: #8684, #12912 Differential Rev(s):
Wiki Page:

Description (last modified by nh2)

From my discovery at

Why does the existing code work on platforms that are not Linux? In my select man page it says:

On Linux, select() modifies timeout to reflect the amount of  time  not
slept;  most  other implementations do not do this.  (POSIX.1-2001 per‐
mits either behavior.)  This causes problems both when Linux code which
reads  timeout  is  ported to other operating systems, and when code is
ported to Linux that reuses a struct timeval for multiple select()s  in
a  loop  without  reinitializing  it.  Consider timeout to be undefined
after select() returns.

The existing select loop seems to rely on the fact that &tv is updated as described here.

Same for man 2 poll.

E.g. man 2 select on FreeBSD 11 says explicitly:

     Version 2 of the Single UNIX Specification (``SUSv2'') allows systems to
     modify the original timeout in place.  Thus, it is unwise to assume that
     the timeout value will be unmodified by the select() system call.
     FreeBSD does not modify the return value, which can cause problems for
     applications ported from other systems.

I have tested this now on FreeBSD, and indeed it doesn't work as expected.

With GHC 7.10.2:

import System.IO
main = hWaitForInput stdin (1 * 1000)

ghc --make test.hs -rtsopts

[root@ ~]# time ./test           

real	0m1.386s
user	0m0.004s
sys	0m0.000s
[root@ ~]# time ./test +RTS -V0.01

real	0m1.386s
user	0m0.001s
sys	0m0.000s
[root@ ~]# time ./test +RTS -V0.001

real	0m1.678s
user	0m0.003s
sys	0m0.002s
[root@ ~]# time ./test +RTS -V0.0001

real	0m11.311s
user	0m0.032s
sys	0m0.139s

See how when we increase the timer signal, the sleep suddenly takes 10x longer than it should.

That's because it triggers the case where EINTR is received in, letting us use the same unmodified 1-second struct timeval *timeout again and again.

This demo of the bug works for GHC 7.10 and 8.0.1; in 8.0.2 hWaitForInput is broken ( so the demo doesn't work there.


Convenience: Here is the call chain of hWaitForInput

Change History (28)

comment:1 Changed 23 months ago by nh2

Using git grep "select(", I've found uses of select() in:

  • libraries/base/cbits/inputReady.c
  • rts/posix/Select.c

The first one I already knew about, the other one has 3 uses. 2 of those are with a 0-timeout (in which case there is no problem), and the one in awaitEvent() aims to block as long as possible, so that's not a problem either.

The only occurrence of poll( is in the new code that carries the problem I mentioned (; this probably has to be fixed (both to not crash and to handle the fact that poll() may not update the timeout.

comment:2 Changed 23 months ago by nh2

I have a first patch for this ready (based on 8.0.1) at

Still need to forward-port it to 8.0.2 / HEAD after fixing the poll() failure there.

comment:3 Changed 23 months ago by nh2

Actually before that, I should probably fix the Windows version too (for the case that the FD is not a socket).

while (1) // keep trying until we find a real key event
  rc = WaitForSingleObject( hFile, msecs );

Looking at this, it is not clear to me how this should ever have worked (how this should have provided the semantics that after the maximum given time, hWaitForInput returns), as this would simply loop forever if interrupted by a signal.

I just wrote the above sentence and then realised that the exact same was true for any non-Linux select, so I got suspicious why it worked at all so far on my FreeBSD VM before my fix, because any EINTR should just return out of fdReady() just to have it restarted with the original full duration.

[1 hour later]

I now know what's going on. It only worked on non-Linux because of the idle GC!

Looking at truss, it keeps calling select() until suddenly a ktimer_settime appears, and a quick printf reveals that it's issued by stopTicker() called by stopTimer() caused by idle GC. When the idle GC is set to a value larger than the msecs passed to fdReady(), e.g. when msecs = 500 and +RTS -I1, then fdReady will never return and the program will be stuck.

BSD and Mac OS users (and probably Windows) were just lucky so far that idle GC defaults to 0.3 seconds.

comment:4 Changed 23 months ago by nh2

Uh, some things are still puzzling me here. If I run with +RTS -I0.3, the infinite loop does not occur on FreeBSD, even though 0.3 is supposed to be the default, so I'm not sure why giving no +RTS and +RTS -I0.3 have different behaviour.

Also, passing +RTS -I0 does not enforce the bug, surprises me.

Note all these observations are for the non-threaded runtime.

comment:5 Changed 23 months ago by nh2

I don't understand how changing -I... can even do anything given that I don't use the threaded runtime (confirmed by +RTS --info), as -I is supposed to be only available for the threaded runtime.

comment:6 Changed 23 months ago by nh2

I found (using getRTSFlags) that setting +RTS -I10 will set

idleGCDelayTime = 10000000000
doIdleGC = True

even when the threaded RTS is not enabled. I'm not sure if that's a bug, but commit 33a84b8c79f that introduced the idle GC says

The idle GC only happens in the threaded RTS

so maybe this is a bug?

The line that checks doIdleGC is clearly outside the #ifdef THREADED_RTS.

comment:7 Changed 23 months ago by nh2

Cc: simonmar added

comment:8 Changed 23 months ago by nh2

The stopTimer() occurring for me (and terminating fdReady()) is the one here.

comment:9 Changed 23 months ago by nh2

I have confirmed by printf-debugging that the line

      ticks_to_gc = RtsFlags.GcFlags.idleGCDelayTime /

is used even if the non-threaded runtime is used, and doIdleGC is False.

comment:10 Changed 23 months ago by nh2

Hmm, this seems to happen on purpose:

Commit Another overhaul of the recent_activity / idle GC handling (#5991) mentions

 - we now turn off the timer signal after idleGCDelay even if the
   idle GC is disabled with +RTS -I0.

I personally find that very confusing, using a setting called idleGCDelay even when the user explicitly turned that off.

Further, doing it this way, there's no way of actually setting what the value of idleGCDelay is, because if you use -I0, well, then it's the default (0.3 seconds), and if you use -I with any other value, then that actually turns idle GC on (as I found above). So there's no way to change away from the default 0.3 seconds when idle GC is set to off.

comment:11 in reply to:  1 ; Changed 23 months ago by nh2

I was wrong when I said:

Replying to nh2:

and the one in awaitEvent() aims to block as long as possible, so that's not a problem either.

The select() occurrence in awaitEvent() waits for sleeping_queue-> - now (code) so that also needs to be vetted on whether it has the &tv updating problem on non-Linux.

Last edited 23 months ago by nh2 (previous) (diff)

comment:12 Changed 23 months ago by nh2

Reading the aforementioned commit in detail, it seems that the code currently doesn't do what the code may not follow the intents expressed in the comments:

+ * The timer interrupt transitions ACTIVITY_YES into
+ * ACTIVITY_MAYBE_NO, waits for RtsFlags.GcFlags.idleGCDelayTime,
+ * and then:
+ *   - if idle GC is no, set ACTIVITY_INACTIVE and wakeUpRts()
+ *   - if idle GC is off, set ACTIVITY_DONE_GC and stopTimer()


          if (RtsFlags.GcFlags.doIdleGC) {
              recent_activity = ACTIVITY_INACTIVE;

It depends on what if idle GC is no (read: on) means. In many places I have read that idle GC can really only be "on" in the threaded RTS. In that case, the code does not implement the comment, as it sets ACTIVITY_INACTIVE even when we're not in the threaded RTS.

So I'm wondering whether this line should have a #ifdef THREADED_RTS around it.

Last edited 23 months ago by nh2 (previous) (diff)

comment:13 Changed 23 months ago by nh2

Some comments I've found in the code base:

    switch (recent_activity) {
        if (force_major) {
            // We are doing a GC because the system has been idle for a
            // timeslice and we need to check for deadlock.  Record the
            // fact that we've done a GC and turn off the timer signal;
            // it will get re-enabled if we run any threads after the GC.
            recent_activity = ACTIVITY_DONE_GC;
    switch (recent_activity)
    case ACTIVITY_DONE_GC: {
        // ACTIVITY_DONE_GC means we turned off the timer signal to
        // conserve power (see #1623).  Re-enable it here.

So the disabling of the timer signal is a power saving feature.

I don't understand though why this is only done after a certain time (idleGCDelayTime / tickInterval many ticks) instead of immediately. Is there something we know only after that many ticks? Or is it a performance optimisation to do it after that many ticks instead of immediately?

comment:14 Changed 23 months ago by simonmar

@nh2 the timer signal gets turned off after the idle GC. We can't turn it off before the idle GC, because then we would never do the idle GC.

comment:15 in reply to:  11 Changed 23 months ago by nh2

Replying to nh2:

The select() occurrence in awaitEvent() waits for sleeping_queue-> - now (code) so that also needs to be vetted on whether it has the &tv updating problem on non-Linux.

OK, I've looked into that in detail now and added some printfs, and I think this select() needs some update as well if we want it to wake up precisely at sleeping_queue->

The reasoning is the following:

Inside the while ((numFound = select(maxfd+1, &rfd, &wfd, NULL, ptv)) < 0) { ... } there's this code:

          /* check for threads that need waking up

          /* If new runnable threads have arrived, stop waiting for
           * I/O and run them.
          if (!emptyRunQueue(&MainCapability)) {
              return; /* still hold the lock */

After an EINTR has interrupted select(), wakeUpSleepingThreads(getLowResTimeOfDay()) checks whether there is a Haskell thread that wants to run (whether we are past its sleeping_queue-> time) -- and that includes the thread for which we're currently selecting -- and if so, we return out of the C code. So we're looking at the current time in each loop iteration. Consequently, we don't have the same problem as in fdReady, as this scheme does not rely on select() updating the passed in struct timeval *ptv pointer.

However, there is still a problem: The EINTRs interrupting the select() come at fixed intervals (the timer signal). That can result in us waiting slighly too long.

For example, assume *ptv is set to 15 ms (sleeping_queue-> is 15 ms from now), and assume the timer signal is every 10 ms.

Then we would enter select(15ms), get interrupted with EINTR after 10ms, and then call select(15ms) again in the same while loop. With the next timer EINTR 10ms later, we would return out of the while loop, so we're not at risk of running forever due to EINTR. But we have now waited 20ms in total instead of the desired 15ms.

That's why I currently believe that the timeout argument to this select() should be recalculated based on the current time in every iteration, similar to how my fix for fdReady() does it.

comment:16 Changed 19 months ago by nh2

OK I have now also checked the situation on Windows:

Windows select() does not set the timeout parameter to the remaining time.

So currently fdReady() on Windows is as wrong as it is for the BSDs, and may wait forever even if given a max timeout.


  _In_    const struct timeval *timeout

I'll try to fix it.

comment:17 Changed 19 months ago by nh2

Hmm looks like I also discovered that fdReady() doesn't work at all on Windows for the case FILE_TYPE_PIPE.

This example file

import System.IO

main = hWaitForInput stdin (5 * 1000)

when compiled with ghc --make -O -fforce-recomp ../example-windows-pipe.hs && time ../example-windows-pipe.hs terminates immediately instead of waiting for 5 seconds.

The reason seems to be that the PeekNamedPipe( hFile, NULL, 0, NULL, &avail, NULL ), which is called only once in fdReady(), doesn't have any blocking feature.

From the docs

The function always returns immediately in a single-threaded application, even if there is no data in the pipe

(Not sure why it says "in a single-threaded application"; the same problem happens with -threaded and +RTS -N2.)

comment:18 Changed 19 months ago by nh2

An update for GHC 8.2:

GHC 8.2 has improved on this problem for POSIX platforms, two-fold:

  1. GHC 8.2 has fixed the crashing problem of fdReady() that was introduced in GHC 8.0.2 (but not present in 8.0.1), that I mentioned in Comment 2 further above.
  2. That fix (commit ae69eaed - "base: Fix hWaitForInput with timeout on POSIX") works by repeatedly inspecting the current time (because select() is gone and was replaced by poll() in commit f46369b8, which doesn't have the functionality of advancing a timeout pointer even on Linux), so the problem that was described in the original issue description is gone for FreeBSD / OSX / any platform that's not Windows.

However, there are remaining problems with the current in implementation:

  • It currently uses gettimeofday(), which doesn't use a monotonic clock, so any time adjustment can make fdReady() wait for significantly more or less than it should.
  • It keeps track of the total time waited by adding up time differences between calls to gettimeofday():
    while ((res = poll(fds, 1, msecs)) < 0) {
        if (errno == EINTR) {
            if (msecs > 0) {
                struct timeval tv;
                if (gettimeofday(&tv, NULL) != 0) {
                    fprintf(stderr, "fdReady: gettimeofday failed: %s\n",

                int elapsed = 1000 * (tv.tv_sec - tv0.tv_sec)
                            + (tv.tv_usec - tv0.tv_usec) / 1000;
                msecs -= elapsed;
                if (msecs <= 0) return 0;
                tv0 = tv;
        } else {
            return (-1);

This is inaccurate, because the code between gettimeofday() and tv0 = tv; is not tracked. If the process is descheduled by the OS in these lines, then that time is "lost" and fdReady() will wait way too long.

This inacurracy can easily be observed and magnified by increasing the frequency of signals arriving at the program. Consider this simple program ghc-bug-13497-accuracy-test.hs that waits 5 seconds for input on stdin:

import System.IO

main = hWaitForInput stdin (5 * 1000)

When run normally on GHC 8.2.1 (release commit 0cee25253), this program terminates within 5 seconds when run with the command line

inplace/bin/ghc-stage2 --make -fforce-recomp -rtsopts ghc-bug-13497-accuracy-test.hs
/usr/bin/time ./ghc-bug-13497-accuracy-test

But it starts taking much longer when +RTS -V... is added for increasingly frequent values of -V; the effect is even stronger when setting the idle GC timer to something large (e.g. -I10 for every 10 seconds):

no `-V` passed  0.00user 0.00system 0:05.02elapsed 0%CPU
-V0.1           0.00user 0.00system 0:05.01elapsed 0%CPU
-V0.01          0.00user 0.00system 0:05.01elapsed 0%CPU
-V0.001         0.00user 0.00system 0:05.13elapsed 0%CPU
-V0.0001        0.00user 0.00system 0:05.30elapsed 0%CPU
-V0.00001       0.06user 0.00system 0:05.31elapsed 1%CPU
-V0.000001      0.37user 0.20system 0:05.73elapsed 10%CPU
-V0.0000001     2.67user 3.30system 0:12.47elapsed 47%CPU
-V0.00000001   50.44user 7.32system 1:17.50elapsed 74%CPU

-I10 -V0.1      0.00user 0.00system 0:05.10elapsed 0%CPU
-I10 -V0.01     0.00user 0.00system 0:05.25elapsed 0%CPU
-I10 -V0.001    0.00user 0.10system 0:08.47elapsed 1%CPU
-I10 -V0.0001   the program did not terminate within 2 minutes of waiting

It reason it's worse with -I10 is that, as described above, without -I ghc stops the timer signal after 0.3 seconds (so no EINTRs are occurring beyond that time), and with -I given it doesn't.

Not all of the above is in the non-threaded runtime. Doing the same with -threaded on Linux gives reliable 0.00user 0.00system 0:05.01elapsed 0%CPU no matter what -V or -I is passed, and strace shows that there are no EINTRs happening in that case. I suspect this is because hWaitForInput calls fdReady() as a safe foreign call, which makes it have its own thread in the threaded runtime.

There is also an unsafe call to fdReady() but that one is only used with timeouts of 0 so that's not a problem.

On FreeBSD 11, non-threaded, the situation is worse:

no `-V` passed 5.14 real         0.00 user         0.00 sys
-V0.1          5.10 real         0.00 user         0.00 sys
-V0.01         5.16 real         0.00 user         0.00 sys
-V0.001        5.17 real         0.00 user         0.02 sys
-V0.0001       5.24 real         0.00 user         0.01 sys
-V0.00001      5.89 real         0.01 user         0.08 sys
-V0.000001     5.81 real         0.00 user         0.11 sys
-V0.0000001    6.05 real         0.00 user         0.09 sys
-V0.00000001   5.77 real         0.00 user         0.09 sys

-I10 -V0.1     5.13 real         0.00 user         0.01 sys
-I10 -V0.01    5.24 real         0.00 user         0.01 sys
-I10 -V0.001   5.90 real         0.00 user         0.10 sys
-I10 -V0.0001  5.82 real         0.00 user         0.09 sys

And with -threaded on FreeBSD 11:

no `-V` passed 5.15 real         0.00 user         0.01 sys
-V0.1          5.30 real         0.00 user         0.00 sys
-V0.01         5.31 real         0.00 user         0.01 sys
-V0.001        5.45 real         0.00 user         0.13 sys
-V0.0001       5.98 real         0.00 user         0.13 sys
-V0.00001      5.93 real         0.00 user         0.15 sys
-V0.000001     5.79 real         0.00 user         0.15 sys
-V0.0000001    5.83 real         0.00 user         0.13 sys
-V0.00000001   5.80 real         0.00 user         0.18 sys

-I10 -V0.1     5.13 real         0.00 user         0.01 sys
-I10 -V0.01    5.27 real         0.00 user         0.03 sys
-I10 -V0.001   5.77 real         0.00 user         0.12 sys
-I10 -V0.0001  5.90 real         0.00 user         0.18 sys

As you can see, on FreeBSD 11 the -threaded doesn't fix the issues as it does on Linux, and truss suggests that that is because EINTRs arrive (while they didn't on Linux).

I'm not sure why with threaded on FreeBSD there's EINTRs happening but not on Linux, but I observed that on Linux we have instead:

[pid 30502] timerfd_create(CLOCK_MONOTONIC, TFD_CLOEXEC) = 7
[pid 30502] read(7, "\1\0\0\0\0\0\0\0", 8) = 8
[pid 30502] read(7, "\1\0\0\0\0\0\0\0", 8) = 8
[pid 30502] read(7, "\1\0\0\0\0\0\0\0", 8) = 8
[pid 30502] read(7, "\1\0\0\0\0\0\0\0", 8) = 8

So I suspect that the difference is that Linux has timerfd and FreeBSD doesn't.

OK, so far the problem description. The summary is:

  • In the nonthreaded runtime, a high precision -V destroys accuracy
  • On non-Linux (systems without timerfd), this happens even with -threaded
  • In any runtime, accuracy can be screwed with due to non-use of monotonic clocks.

The fix is simple:

Use the monotonic clock, and instead of tracking waited time as a sum of wait intervals, always compare the current time with the _end_ time (time at entry of fdReady() + msec).

I have implemented this in commit 12f9d66b of my branch ghc-8.2.1-improve-fdRready-precision (the first link is stable, the latter may change as I write more fixes for Windows).

comment:19 Changed 19 months ago by nh2

With my fix, I reliably get, independent of -threaded or not, on Linux:


And on FreeBSD 11:

5.00 real

comment:20 Changed 19 months ago by nh2

Blocking: 8684 added

comment:21 Changed 19 months ago by bgamari

Indeed you were right, nh2; the current state of affairs looks quite bad. Thanks for looking at this!

comment:22 Changed 19 months ago by slyfox

Cc: slyfox added

comment:23 Changed 19 months ago by nh2

When testing my change on Windows, I found that the if (isSock){ select(...) } part seems completely broken on Windows.

That is because on Windows FD_SETSIZE defaults to 64, but pretty much all GHC programs seem to have > 64 FDs open, so you can't actually create a socket on which you can select().

It errors with fdReady: fd is too big even with an example as simple as this (in this case, on my machine the fd is 284):

{-# LANGUAGE OverloadedStrings #-}

import Control.Monad (forever)
import Network.Socket
import System.IO

-- Simple echo server: Reads up to 10 chars from network, echoes them back.
-- Uses the Handle API so that `hWaitForInput` can be used.
main :: IO ()
main = do
  sock <- socket AF_INET Stream 0
  setSocketOption sock ReuseAddr 1
  bind sock (SockAddrInet 1234 0x0100007f) -- 0x0100007f == localhost
  listen sock 2
  forever $ do
    (connSock, _connAddr) <- accept sock
    putStrLn "Got connection"

    h <- socketToHandle connSock ReadWriteMode
    hSetBuffering h NoBuffering

    ready <- hWaitForInput h (5 * 1000) -- 5 seconds
    putStrLn $ "Ready: " ++ show ready

    line <- hGetLine h
    putStrLn "Got line"
    hPutStrLn h ("Got: " ++ line)
    hClose h

I'm not sure how this was not discovered earlier; for #13525 (where fdReady() breaking completely was also discovered late) at least it failed only when the timeout was non-zero, which is not used in ghc beyond in hWaitForInput, but in this Windows socket case it breaks even on the 0-timeout.

Maybe there is not actually anybody who uses sockets as handles on Windows?

It seems an approriate workaround for now is to increase FD_SETSIZE (which is possible on Windows and BSD, see here) on Windows.

A real fix would be to move to IO Completion Ports, and thus get rid of the last use of select() (the other platforms already use poll() but Windows doesn't have that).

comment:24 Changed 19 months ago by nh2

Addition to above comment:

Probably the only time that worked was before commit b0316cdb (#9169) that added the FD_SETSIZE check -- in which case it probably only worked due to writing to (and likely corrupting) out-of-bounds memory beyond the size of the fd_set.

comment:25 Changed 19 months ago by nh2

I have pushed a workaround to my branch ​ghc-8.2.1-improve-fdRready-precision, namely commit 1d037a16.

This completes my testing of my branch on Windows.

I have tested the 4 of the 5 possible branches that fdReady() can take on Windows:

  1. Testing the FILE_TYPE_PIPE case suceeded with ghc-bug-13497-accuracy-test.hs (from comment 18), by running it from the MSYS2 Mingw-w64 console (NOT cmd.exe!)
  2. Testing the FILE_TYPE_DISK case succeeded with the following file ghc-bug-13497-windows-file-test.hs:
import System.IO

main = do
    writeFile "testfile" "hello"
    handle <- openFile "testfile" ReadMode
    hWaitForInput handle (5 * 1000)
  1. Testing the FILE_TYPE_CHAR case suceeded by running ghc-bug-13497-accuracy-test.hs (see above) , but this time from cmd.exe. Here, focusing and unfocusing the Window triggers EventType == FOCUS_EVENT so that our loop is executed (I confirmed this with a print in the loop).
  2. Testing the if (isSock) case succeded with the echo server example given in the previous comment.
  3. I did not test the default: case (this one), because I don't know how to trigger it. There are 3 cases in which this branch could be taken according to the GetFileType() documentation:
    1. FILE_TYPE_REMOTE, which seems impossible to create on modern Windows
    2. FILE_TYPE_UNKNOWN, for which it is literally unknown on how it can be created
    3. The very bottom of the FILE_TYPE_PIPE case, when /* PeekNamedPipe didn't work - fall through to the general case */ happens. I don't know how to trigger this. Simon Marlow introduced this logic 9 years ago in commit 1b0906e0, so maybe he knows.

In any case, with this testing done I'm reasonably confident that my branch doesn't regress Windows.

comment:26 Changed 19 months ago by nh2

Another thing I found:

The comment on top of fdReady() says:

"Input is available" is defined as 'can I safely read at least a
*character* from this file object without blocking?'

But this seems to be not true in the general case:

If the FD is for a socket, then it is only true when that socket is non-blocking.

man 2 poll (see here) describes this:


See the discussion of spurious readiness notifications under the BUGS section of select(2).

man 2 select (see here) says:

Under Linux, select() may report a socket file descriptor as "ready for reading", while nevertheless a subsequent read blocks. This could for example happen when data has arrived but upon examination has wrong checksum and is discarded. There may be other circumstances in which a file descriptor is spuriously reported as ready. Thus it may be safer to use O_NONBLOCK on sockets that should not block.

I have not investigated whether anywhere in GHC fdReady() is potentially called with a not-nonblocking socket.

comment:27 Changed 17 months ago by Ben Gamari <ben@…>

In c2a1fa7a/ghc:

base: Fix fdReady() potentially running forever on Windows.

This fixes #13497 for Windows -- at least for the `if (isSock)` part; I
haven't investigated the case where it's not a socket yet.

Solved by copying the new current-time based waiting logic from the
non-Windows implementation above.

Reviewers: bgamari, austin, hvr

Reviewed By: bgamari

Subscribers: rwbarton, thomie

Differential Revision:

comment:28 Changed 17 months ago by nh2

Description: modified (diff)
Note: See TracTickets for help on using tickets.