Opened 3 years ago

Last modified 3 years ago

#4934 new bug

threadWaitRead works incorrectly on nonthreaded RTS

Reported by: slyfox Owned by:
Priority: normal Milestone:
Component: Runtime System Version: 7.0.1
Keywords: Cc: tibbe, bos
Operating System: Linux Architecture: x86_64 (amd64)
Type of failure: Incorrect result at runtime Difficulty:
Test Case: Blocked By:
Blocking: Related Tickets:

Description

I found out it in xmobar on ghc-7.0.1.
When I ran it I got the following message:

xmobar: file descriptor 8954336 out of range for select (0--1024).
Recompile with -threaded to work around this.

-threaded option works, but the message is absolutely misleading.

I've decided to track where so large descriptor could come,
but there was no such place.

All xmobar does is:

    do x11connection_fd <- connectionNumber d
       threadWaitRead (Fd fd) -- here we die

x11connection_fd has value 3 at the time of call.
This value comes from libX11 (via FFI, so I suspect it's a problem).

Another example (might be unrelated):

-- a.hs 
import Control.Concurrent

-- 12 is picked at random as guaranteed to be invalid FD
main = threadWaitRead 12 >> print "yet input!"

Building:

$ ghc --make -fforce-recomp a.hs -o a
[1 of 1] Compiling Main             ( a.hs, a.o )
Linking a ...

$ ghc --make -fforce-recomp a.hs -threaded -o a.threaded
[1 of 1] Compiling Main             ( a.hs, a.o )
Linking a.threaded ...

Running:

$ ./a.threaded 
a.threaded: epollControl: invalid argument (Bad file descriptor)

Looks good.

And nonthreaded:

$ ./a
"yet input!"

According to strace there was an error but it was not reported.

sf tmp # strace -etrace='select,write' ./a >/dev/null
select(13, [12], [], NULL, {134, 217727}) = -1 EBADF (Bad file descriptor) # this one

select(2, [], [1], NULL, {0, 0})        = 1 (out [1], left {0, 0})
write(1, "\"yet input!\"\n", 13)        = 13

Side note: Haskeline seems to workaround this problem on it's own way.

libraries/haskeline/System/Console/Haskeline/Backend/Posix.hsc

-- Different versions of ghc work better using different functions.
blockUntilInput :: Handle -> IO ()
#if __GLASGOW_HASKELL__ >= 611
-- threadWaitRead doesn't work with the new ghc IO library,
-- because it keeps a buffer even when NoBuffering is set.
blockUntilInput h = hWaitForInput h (-1) >> return ()
#else
-- hWaitForInput doesn't work with -threaded on ghc < 6.10
-- (#2363 in ghc's trac)
blockUntilInput h = unsafeHandleToFd h >>= threadWaitRead
#endif

Attachments (2)

test_it.tar.gz (4.3 KB) - added by slyfox 3 years ago.
test_it.tar.gz - selfcontained example with weird fault
bad_fd.hs (141 bytes) - added by slyfox 3 years ago.
bad_fd.hs - should issue an error with nonthreaded RTS, but does not

Download all attachments as: .zip

Change History (17)

comment:1 Changed 3 years ago by slyfox

  • Architecture changed from Unknown/Multiple to x86_64 (amd64)
  • Operating System changed from Unknown/Multiple to Linux
  • Type of failure changed from None/Unknown to Incorrect result at runtime

Changed 3 years ago by slyfox

test_it.tar.gz - selfcontained example with weird fault

comment:2 Changed 3 years ago by slyfox

The output of sample:

$ ./test_me.sh 

[1 of 3] Compiling Commands         ( Commands.hs, obj/Commands.o )
[2 of 3] Compiling Xmobar           ( Xmobar.hs, obj/Xmobar.o )

Xmobar.hs:13:1:
    Warning: In the use of `block'
             (imported from Control.Exception, but defined in GHC.IO):
             Deprecated: "use Control.Exception.mask instead"

Xmobar.hs:13:1:
    Warning: In the use of `unblock'
             (imported from Control.Exception, but defined in GHC.IO):
             Deprecated: "use Control.Exception.mask instead"
[3 of 3] Compiling Main             ( Main.hs, obj/Main.o )
Linking test_it ...
Entering event loop
"eventLoop IN"
"MAIN: unblocking"
"forkIO: startCommand: runCom \"date\" [] \"\" 10""forkIO: startCommand: runStdinReader""forkIO: checker spawned"


"WAIT IO: 1"
"SIR: got line""CHECKER: IN"

"CHECKER: throw WakeUp -> me"
"CHECKER: thrown"
"CHECKER: IN"
"MAIN: woke up!"
"MAIN: unblocking"
"WAIT IO: 1"
test_it: file descriptor 7264944 out of range for select (0--1024).
Recompile with -threaded to work around this.

comment:3 Changed 3 years ago by simonmar

  • Milestone set to 7.0.3
  • Owner set to simonmar
  • Priority changed from normal to high

comment:4 follow-up: Changed 3 years ago by simonmar

I've tried this program on a couple of different machines (32 and 64-bit Linux) with several versions of GHC, and haven't managed to reproduce the error. Does it fail right away for you?

Can you tell me more about your setup? (exact versions of tools etc.) Does it fail on a different machine for you?

comment:5 in reply to: ↑ 4 Changed 3 years ago by slyfox

Replying to simonmar:

I've tried this program on a couple of different machines (32 and 64-bit Linux) with several versions of GHC, and haven't managed to reproduce the error. Does it fail right away for you?

Does the first test (main = threadWaitRead 12 >> print "yet input!") work for you and shows an error in nonthreaded case?

For me it works incorrectly (does not show any errors) in:

  • ghc-6.12.3 nonthreaded
  • ghc-6.12.3 threaded
  • ghc-7.0.1 nonthreaded

The second test (in tarball) fails only on ghc-7.0.1 nonthreaded.

The test is factored out from xmobar sources. The failure is 100% reproducible on two machines.

Can you tell me more about your setup? (exact versions of tools etc.) Does it fail on a different machine for you?

It's my amd64 gentoo desktop (notebook) box with recent software.
Linux kernel 2.6.37, glibc-2.12.2, ghc-7.0.1 (bootstrapped by 7.0.0.20101028)

comment:6 Changed 3 years ago by slyfox

The same behaviour is observable on stock ubuntu 10.10 (amd64) with ghc-7.0.1

(built as user: ./configure --prefix=$HOME/somewhere && make && make install).

comment:7 follow-up: Changed 3 years ago by simonmar

  • Status changed from new to infoneeded

My mistake, I don't think I actually tested with the stock 7.0.1, only the current 7.0 branch. It does indeed fail with 7.0.1, but not the current 7.0 branch.

I'm fairly certain this is #4813 in another guise. I'll close when you've had a chance to test against a recent snapshot.

Changed 3 years ago by slyfox

bad_fd.hs - should issue an error with nonthreaded RTS, but does not

comment:8 in reply to: ↑ 7 Changed 3 years ago by slyfox

Replying to simonmar:

My mistake, I don't think I actually tested with the stock 7.0.1, only the current 7.0 branch. It does indeed fail with 7.0.1, but not the current 7.0 branch.

I'm fairly certain this is #4813 in another guise. I'll close when you've had a chance to test against a recent snapshot.

Tried , ("Project version","7.0.1.20110211"). Results are the following:
}}}

  • test_it.tar.gz - works correctly now
  • bad_fd.hs - still eats an error with nonthreaded RTS:
ghc-7.0/ghc-install:bin/ghc --make -fforce-recomp bad_fd.hs -o bad_fd.nt
[1 of 1] Compiling Main             ( bad_fd.hs, bad_fd.o )
Linking bad_fd.nt ...
ghc-7.0/ghc-install:bin/ghc --make -fforce-recomp bad_fd.hs -threaded -o bad_fd.t
[1 of 1] Compiling Main             ( bad_fd.hs, bad_fd.o )
Linking bad_fd.t ...

ghc-7.0/ghc-install:./bad_fd.nt
"yet input!" # how come?
ghc-7.0/ghc-install:./bad_fd.t
bad_fd.t: epollControl: invalid argument (Bad file descriptor)

comment:9 follow-up: Changed 3 years ago by simonmar

  • Cc tibbe bos added

This is the way it has always worked. The problem is that when select() finds that one of the file descriptors has been closed, it returns an error, but doesn't tell you which one it was, so in that case we inform all the blocked threads that they should wake up, and retry whatever operation they were doing, which will result in the appropriate exception being thrown. I presume that epoll is able to determine which file descriptor is erroneous and generate the exception from threadWaitRead instead.

So typical uses of threadWaitRead won't notice the difference. We could make the epoll backend ignore the exception, but that doesn't seem right. Personally I'm happy to leave things as they are, but document that threadWaitRead may or may not raise an exception if the FD is closed (actually I'd like to remove threadWaitRead from Control.Concurrent anyway, it's really an internal API).

tibbe/bos, any thoughts?

comment:10 in reply to: ↑ 9 Changed 3 years ago by tibbe

Replying to simonmar:

So typical uses of threadWaitRead won't notice the difference. We could make the epoll backend ignore the exception, but that doesn't seem right. Personally I'm happy to leave things as they are, but document that threadWaitRead may or may not raise an exception if the FD is closed (actually I'd like to remove threadWaitRead from Control.Concurrent anyway, it's really an internal API).

tibbe/bos, any thoughts?

I agree. I think threadWaitRead should raise an exception for closed FDs. Otherwise it might mask programmer bugs.

comment:11 follow-up: Changed 3 years ago by simonmar

The problem is that we can't easily make threadWaitRead raise an exception for closed FDs in the non-threaded RTS where we use select, because we have no information about which FD has the error. Presumably the same applies in the new IO manager when using poll?

comment:12 in reply to: ↑ 11 ; follow-up: Changed 3 years ago by tibbe

Replying to simonmar:

The problem is that we can't easily make threadWaitRead raise an exception for closed FDs in the non-threaded RTS where we use select, because we have no information about which FD has the error. Presumably the same applies in the new IO manager when using poll?

poll sets POLLNVAL per FD so that should be possible.

comment:13 in reply to: ↑ 12 Changed 3 years ago by simonmar

  • Milestone changed from 7.0.3 to _|_
  • Priority changed from high to normal

Replying to tibbe:

poll sets POLLNVAL per FD so that should be possible.

Ok. In that case we could declare it to be a bug in the non-threaded RTS that could be fixed by using poll() instead of select() and generating correct error returns from waitRead#. Fixing it won't be a high priority, though.

comment:14 Changed 3 years ago by simonmar

  • Status changed from infoneeded to new

comment:15 Changed 3 years ago by simonmar

  • Owner simonmar deleted
Note: See TracTickets for help on using tickets.