Opened 6 years ago

Closed 6 years ago

#4449 closed bug (fixed)

GHC 7 can't do IO when demonized

Reported by: kazu-yamamoto Owned by:
Priority: normal Milestone: 7.0.2
Component: Compiler Version: 7.1
Keywords: forkProcess Cc: johan.tibell@…, dastapov@…
Operating System: MacOS X Architecture: x86
Type of failure: None/Unknown Test Case:
Blocked By: Blocking:
Related Tickets: Differential Rev(s):
Wiki Page:

Description

If a daemon is complied with GHC 7.1 with the -threaded option, the daemon cannot do IO. The following code is a simple echo server which demonstrates this bug:

import Control.Concurrent
import Network
import System.IO
import System.Posix.Daemonize

main :: IO ()
main = daemonize $ do
    s <- listenOn (Service "7000")
    loop s
    return ()
    
loop :: Socket -> IO ThreadId
loop s = do
    (hdr,_,_) <- accept s
    forkIO $ echo hdr
    loop s
    
echo :: Handle -> IO ()
echo hdr = do
    str <- hGetLine hdr
    hPutStrLn hdr str
    hClose hdr
    return ()

If "deamonize $" is removed, this program stays in your terminal and can do IO.

Attachments (3)

full.log.gz (4.7 KB) - added by adept 6 years ago.
Full truss log (FreeBSD 8.2)
dtruss.log.gz (2.0 KB) - added by tibbe 6 years ago.
Running dtruss on OS X 10.6.4
clean-up-remnants-of-the-event-manager-after-forkprocess_-closes-_4449.dpatch (63.6 KB) - added by adept 6 years ago.

Download all attachments as: .zip

Change History (46)

comment:1 Changed 6 years ago by igloo

Did this work with an earlier version of GHC?

comment:2 Changed 6 years ago by igloo

I assume this is using http://hackage.haskell.org/package/hdaemonize, incidentally

comment:3 Changed 6 years ago by kazu-yamamoto

My GHC 7 is HEAD two days ago.

comment:4 Changed 6 years ago by kazu-yamamoto

If I use the daemonize function of the direct-daemon library on MacOS, the program terminates.

Anyway, where can I find a documentation which describes the right way to implement a daemon with GHC 7?

comment:5 Changed 6 years ago by kazu-yamamoto

I have tested this program with GHC 7.0.1 RC2 AND direct-daemonize today on Snow Leopard. With the "-threaded" option, the program terminates immediately while without the option, It works as daemon.

comment:6 Changed 6 years ago by tibbe

Cc: johan.tibell@… added

Since 6.12.1, the I/O manager is supposed to survive forking the process (see #1185). I ported this behavior to the new I/O manager. If forking no longer works correctly, it's a regression in the new I/O manager. Could you please test this by running your program on both 6.12 and 7.0?

There's a regression test for #1185 in the GHC repo and it ran successfully for me when I validated my patches.

comment:7 Changed 6 years ago by kazu-yamamoto

The program works well with GHC 6.12 and the hsdaemon library.

Since I will go on a trip, it will take a time to test with GHC 7.0.

comment:8 Changed 6 years ago by guest

I've done some tests with 7.0.1-rc2 on Linux (Debian/sid on 32bit arch):

hdaemonize       without -threaded: works
direct-daemonize without -threaded: works
hdaemonize       with -threaded: works
direct-daemonize with -threaded: hangs

Code with direct-daemonize looks like this:

import Control.Concurrent
import Network
import System.IO
import System.Daemonize

main :: IO ()
main = do
    daemonize defaultDaemonOptions
    s <- listenOn (Service "7000")
    loop s
    return ()
    
loop :: Socket -> IO ThreadId
loop s = do
    (hdr,_,_) <- accept s
    forkIO $ echo hdr
    loop s
    
echo :: Handle -> IO ()
echo hdr = do
    str <- hGetLine hdr
    hPutStrLn hdr str
    hClose hdr
    return ()

comment:9 Changed 6 years ago by guest

Same system, GHC 6.12.3:

hdaemonize       without -threaded: works
direct-daemonize without -threaded: works
hdaemonize       with -threaded: works
direct-daemonize with -threaded: hangs

Maybe I am doing something wrong with direct-daemonize?

comment:10 Changed 6 years ago by tibbe

Unfortunately, I'm not sure what's going on. I think you have established that the new I/O manager is not to blame, since the problem exists in 6.12.3. Perhaps you could contact the maintainer and try to figure it out together with him?

comment:11 Changed 6 years ago by adept

(btw, two previous "guest" comments are from me. Note that I am not the reporter)

More info: it seems that there are some wrinkles in the direct-daemonize, and its maintainer is working on it.

So, from where I am sitting, there is no problem in daemonizing haskell code (using hdaemonize) in either GHC 6.12.3 or 7.0.1-rc2, with or without -threaded.

I think we need to confirm with reporter that he is indeed having problems with hdaemonize and GHC 7 on MacOS.

comment:12 Changed 6 years ago by adept

Cc: dastapov@… added

comment:13 Changed 6 years ago by kazu-yamamoto

Resolution: fixed
Status: newclosed

I have tested direct-daemonize 2.0 on MacOS and confirmed that this bug disappeared. So, let's close this ticket.

comment:14 Changed 6 years ago by simonmar

direct-daemonize appears to call the C function daemon() which will bypass the RTS completely, so it's not unsurprising that things don't work with -threaded.

comment:15 Changed 6 years ago by kazu-yamamoto

Resolution: fixed
Status: closednew

It appeared that direct-daemonize 2.0 did not solve my problem. My previous report is not correct. Here is summary of my test:

GHC 7rc2

direct-daemonize 2.0 non-threaded OK
direct-daemonize 2.0 threaded     NG
hdaemonize     0.4.3 non-threaded OK
hdaemonize     0.4.3 threaded     NG

GHC 6.12.3

direct-daemonize 2.0 non-threaded OK
direct-daemonize 2.0 threaded     NG
hdaemonize     0.4.3 non-threaded OK
hdaemonize     0.4.3 threaded     OK

So, let's reopen this ticket.

comment:16 Changed 6 years ago by kazu-yamamoto

Simon,

What is a right way to implement daemons in GHC7? Use hdaemonize instead of direct-daemonize?

comment:17 Changed 6 years ago by kazu-yamamoto

I should have said that my test was done on Snow Leapard.

comment:18 in reply to:  16 Changed 6 years ago by simonmar

Resolution: invalid
Status: newclosed

Replying to kazu-yamamoto:

What is a right way to implement daemons in GHC7? Use hdaemonize instead of direct-daemonize?

Yes, the problem is that you can't call fork() directly when using -threaded, you have to use System.Posix.forkProcess instead. direct-daemonize will not work with -threaded, because it calls daemon() which internally calls fork().

I haven't looked at hdaemonize, but I imagine it calls System.Posix.forkProcess, so it should be safe.

comment:19 Changed 6 years ago by adept

Hmmm. But as far as I can see, Kazu is having problems with hsdaemonize & GHC 7rc2 & MacOSX (which I can't recreate on Linux). Why is closing this as "invalid"?

comment:20 Changed 6 years ago by simonmar

Sorry, I got confused by the various different sets of results in this ticket. So it looks like we now have

  • with hdaemonize, on OS X (only) and GHC 7.0RC2 (only), the test program hangs

So could be a regression in the new IO manager that only manifests on OS X, or it could be a latent bug in forkProcess exposed by the new IO manager. Either way we'll need someone to reproduce and look into it.

comment:21 Changed 6 years ago by simonmar

Resolution: invalid
Status: closednew

comment:22 Changed 6 years ago by igloo

Owner: set to tibbe

On OS X, this program:

import Control.Concurrent
import Network
import System.IO

import Prelude hiding (catch)
import System
import System.Posix

main :: IO ()
main = daemonize $ do
    writeFile "X1" "Here"
    s <- listenOn (Service "7000")
    writeFile "X2" "Here"
    loop s
    writeFile "X3" "Here"
    return ()

loop :: Socket -> IO ()
loop s = do
    writeFile "X4" "Here"
    (hdr,_,_) <- accept s
    writeFile "X5" "Here"
    _ <- forkIO $ echo hdr
    writeFile "X6" "Here"
    loop s

echo :: Handle -> IO ()
echo hdr = do
    writeFile "X7" "Here"
    str <- hGetLine hdr
    writeFile "X8" "Here"
    hPutStrLn hdr str
    writeFile "X9" "Here"
    hClose hdr
    writeFile "X0" "Here"
    return ()

daemonize :: IO () -> IO ()
daemonize program =

  do _ <- setFileCreationMask 0
     _ <- forkProcess p
     exitImmediately ExitSuccess

    where

      p  = do _ <- createSession
              _ <- forkProcess p'
              exitImmediately ExitSuccess

      p' = do -- changeWorkingDirectory "/"
              closeFileDescriptors
              blockSignal sigHUP
              program

closeFileDescriptors :: IO ()
closeFileDescriptors =
    do devNull <- openFd "/dev/null" ReadWrite Nothing defaultFileFlags
       let redirectTo fd' fd = closeFd fd >> dupTo fd' fd
       mapM_ (redirectTo devNull) [stdInput, stdOutput, stdError]

blockSignal :: Signal -> IO ()
blockSignal sig = installHandler sig Ignore Nothing >> pass

pass :: IO ()
pass = return ()

creates X4 but not X5 when I compile it with

ghc --make q -threaded

with 7.0.1 RC 2. It works when compiled with 6.12.3. Both have network-2.2.3.1.

Tibbe, you're an OS X person, aren't you? As this seems to involve an overlap of network and the new IO manager, would you be able to take a look please?

comment:23 Changed 6 years ago by tibbe

I will take a look.

comment:24 Changed 6 years ago by igloo

Milestone: 7.0.2

comment:25 Changed 6 years ago by tibbe

Might be related to #4514.

comment:26 Changed 6 years ago by kazu-yamamoto

I tested this with GHC 7.0.2rc1 but not fixed yet.

comment:27 Changed 6 years ago by kazu-yamamoto

I run the original test code (using System.Process) with GHC 7.0.2rc1/-threaded and checked the running echo daemon with "lsof". There should be one kqueue socket and two pairs of pipe (which is used by System.Event.Control). But there are not such file descriptors. So, I guess kqueue.new fails after "daemonize".

comment:28 Changed 6 years ago by kazu-yamamoto

To narrow the problem, I made the following code. Please compile it on Snow Leopard with GHC 7.0.2rc1/-threaded. When you run it, you can see the following warnings:

kqueue: kevent: invalid argument (Bad file descriptor)
kqueue: ioManagerWakeup: write: Bad file descriptor
kqueue: sendWakeup: invalid argument (Bad file descriptor)
kqueue: ioManagerDie: write: Bad file descriptor

Code:

module Main where

import Control.Concurrent
import System.Event (new)
import System.Exit
import System.IO
import System.Posix

main :: IO ()
main = daemonize $ do
    new
    threadDelay 100000000
    
daemonize :: IO () -> IO () 
daemonize program = do
    forkProcess $ do
        forkProcess program
        exitImmediately ExitSuccess
    exitImmediately ExitSuccess

comment:29 Changed 6 years ago by kazu-yamamoto

I think I found the cause of this problem. The difference between epoll() and kqueue() is inheritance. A forked process on Linux inherits the FD of epoll() while one on MacOS does not inherits the FD of kqueue().

On Linux: a parent process creates FD1 of epoll() and forks. Then the child creates FD2 of epoll() which is different FD1. FD1 in the child process is eventually closed by GC. So, everything goes well.

On MacOS: a parent process creates FD1 of kqueue() and forks. But the child does not inherits FD1. And when the child calls kqueue(), FD1 is reused. But GC will close FD1 since old manager does not use FD1 anymore.

I guess that forkProcess() must close the FD of epoll()/kqueue() (and two pair of the control pipes) in the child case.

comment:30 in reply to:  28 Changed 6 years ago by adept

Replying to kazu-yamamoto:

To narrow the problem, I made the following code. Please compile it on Snow Leopard with GHC 7.0.2rc1/-threaded. When you run it, you can see the following warnings:

kqueue: kevent: invalid argument (Bad file descriptor)
kqueue: ioManagerWakeup: write: Bad file descriptor
kqueue: sendWakeup: invalid argument (Bad file descriptor)
kqueue: ioManagerDie: write: Bad file descriptor

OK. I am able to reproduce this on FreeBSD 8.2, so I think it is something lurking in the BSD-only parts of code.

First of all, I commented out the call to new, since forkProcess explicitly restarts IO manager after fork. Then I've run the sample code (compiled with GHC HEAD) under truss (observing cited error messages) and noticed that innermost (twice-forked) process performs a weird sequence of close()-s on its file descriptors:

 4308: fork()                                    = 4310 (0x10d6)
 4310: kqueue(0x0,0x0,0x0,0x0,0x0,0x0)           = 3 (0x3)
 4310: fcntl(13,F_GETFL,)                        = 2 (0x2)
 4310: fcntl(13,F_SETFL,O_NONBLOCK|0x2)          = 0 (0x0)
 4310: fcntl(12,F_SETFD,FD_CLOEXEC)              = 0 (0x0)
 4310: fcntl(13,F_SETFD,FD_CLOEXEC)              = 0 (0x0)
 4310: fcntl(15,F_GETFL,)                        = 2 (0x2)
 4310: fcntl(15,F_SETFL,O_NONBLOCK|0x2)          = 0 (0x0)
 4310: fcntl(14,F_SETFD,FD_CLOEXEC)              = 454 (0x1c6)
 4310: fcntl(15,F_SETFD,FD_CLOEXEC)              = 0 (0x0)
 4310: write(15,"\M^?",1)                        = 1 (0x1)
 4310: read(14,"\M^?",4096)                      = 1 (0x1)
 4310: exit(0xf)                                 = 417 (0x1a1)
 4310: close(3)                                  = 0 (0x0)
 4310: close(4)                                  = 0 (0x0)
 4310: close(5)                                  = 0 (0x0)
 4310: close(6)                                  = 0 (0x0)
 4310: close(7)                                  = 0 (0x0)
 4310: close(3)                                  ERR#9 'Bad file descriptor'
 4310: close(8)                                  = 0 (0x0)
 4310: close(9)                                  = 0 (0x0)
 4310: close(10)                                 = 0 (0x0)
 4310: close(11)                                 = 0 (0x0)
 4310: read(14,"\M^?",4096)                      = 1 (0x1)
 4310: kevent(3,0x0,0,0x2887c010,64,{9.680145979 }) ERR#9 'Bad file descriptor'
 4310: close(3)                                  ERR#9 'Bad file descriptor'
 4310: close(12)                                 = 0 (0x0)
 4310: close(13)                                 = 0 (0x0)
 4310: close(14)                                 = 0 (0x0)
 4310: close(15)                                 = 0 (0x0)
 4310: open("/usr/local/lib/charset.alias",O_RDONLY,0666) = 3 (0x3)
 4310: fstat(3,{ mode=-r--r--r-- ,inode=49467,size=2209,blksize=16384 }) = 0 (0x0)
 4310: read(3,"# This file contains"...,16384)   = 2209 (0x8a1)
 4310: read(3,0x28904000,16384)                  = 0 (0x0)
 4310: close(3)                                  = 0 (0x0)
 4310: ioctl(1,TIOCGETA,0xbf5f8ec8)              = 0 (0x0)
 4310: write(2,"kqueue: ",8)                     = 8 (0x8)
 4310: write(2,"kevent: invalid argument (Bad file descriptor)",46) = 46 (0x2e)
 4310: write(2,"\n",1)                           = 1 (0x1)
 4310: write(15,"\M^?",1)                        ERR#9 'Bad file descriptor'
 4310: write(2,"kqueue: ",8)                     = 8 (0x8)
 4310: write(2,"ioManagerWakeup: write",22)      = 22 (0x16)
 4310: write(2,": Bad file descriptor\n",22)     = 22 (0x16)
 4310: write(15,"\M^?",1)                        ERR#9 'Bad file descriptor'
 4310: write(2,"kqueue: ",8)                     = 8 (0x8)
 4310: write(2,"sendWakeup: invalid argument (Bad file descriptor)",50) = 50 (0x32)
 4310: write(2,"\n",1)                           = 1 (0x1)
 4310: write(13,"\M-~",1)                        ERR#9 'Bad file descriptor'
 4310: write(2,"kqueue: ",8)                     = 8 (0x8)
 4310: write(2,"ioManagerDie: write",19)         = 19 (0x13)
 4310: write(2,": Bad file descriptor\n",22)     = 22 (0x16)
 4310: process exit, rval = 1
 = 0 (0x0)
 4308: process exit, rval = 0
 = 0 (0x0)
 4307: process exit, rval = 0

Notice that kqueue FD (number 3) gets closed twice, then there is an attempt to use kevent on it, plus attempts to write to already-closed descriptors 13 and 15. I don't know yet what is the root cause of this, but I can tell you for sure that linux+epoll rts does not do anything even remotely similar.

Full truss log is attached as well

Changed 6 years ago by adept

Attachment: full.log.gz added

Full truss log (FreeBSD 8.2)

Changed 6 years ago by tibbe

Attachment: dtruss.log.gz added

Running dtruss on OS X 10.6.4

comment:31 Changed 6 years ago by tibbe

Attached the log from running the program by kazu-yamamoto under OS X 10.6.4

comment:32 Changed 6 years ago by adept

Owner: changed from tibbe to adept

Ooookay. I added loads of debugIO into the "base" code, and I am getting very strange results on FreeBSD with the code from comment #28. In particular, I've modified libraries/base/System/Event/Control.hs and made closeControl print out PID of the current process and the Control itself.

And I am seeing this (here 3738 is the PID of the daemonized/twice-forked process):

PID 3738. In closeControl W {controlReadFd = 4, controlWriteFd = 5, wakeupReadFd = 6, wakeupWriteFd = 7}
PID 3738. In closeControl W {controlReadFd = 8, controlWriteFd = 9, wakeupReadFd = 10, wakeupWriteFd = 11}
PID 3738. In closeControl W {controlReadFd = 12, controlWriteFd = 13, wakeupReadFd = 14, wakeupWriteFd = 15}

And the way it is printed suggests that all closeControls are run at once when innermost process terminates (way before threadDelay timeout expires, I might add).

Now, could someone please explain to me:

  1. Why all closeControls are run inside the single process? I thought that children do not inherit threads of parent - so how the Control from parent is used in child at all?
  1. Am I right in assuming that child process inherits controlReadFd, controlWriteFd, wakeupReadFd, and wakeupWriteFd from its parent, and this is not a desired behaviour at all?
  1. Maybe we should add function like forked to all backends which would be called by forkProcess and would do the sensible thing in restoring "status quo" afther the fork?

Or, maybe, we should remove forkProcess at all and say that external "daemonizing" tools should be used, if needed?

comment:33 Changed 6 years ago by adept

More info: the first two calls for closeControl are made from withNew (System/Event/Manager.hs). I don't know enough about weak IORefs to understand what's happening to them over fork, but this is definitely a root cause. When innermost process tries to initialize its event manager, the following code is executed:

newWith be =
  ...
  _ <- mkWeakIORef state $ do
               st <- atomicModifyIORef state $ \s -> (Finished, s)
               when (st /= Finished) $ do
                 I.delete be
                 closeControl ctrl
  ...

and after that something (my guess is threadDelay) forces the value inside that weak IO Ref, pulling state corresponding to the "parent" io managers through I.delete and closeControl.

Since two "outer" processes have the same kqueue FD numbers, repeated I.delete produces the error we are observing.

So far, I am at a loss how to fix this.

comment:34 Changed 6 years ago by kazu-yamamoto

I found the simplest code to re-produce is as follow:

module Main where

import Control.Concurrent
import System.Posix

main :: IO ()
main = do
    forkProcess (threadDelay 100000000)
    threadDelay 100000000

comment:35 Changed 6 years ago by kazu-yamamoto

Please read comment #29. I'm quit sure about it. I think closeControl is identical "two pair of pipes".

comment:36 in reply to:  35 Changed 6 years ago by adept

Replying to kazu-yamamoto:

Please read comment #29. I'm quit sure about it. I think closeControl is identical "two pair of pipes".

I assure you that I've read it carefully. The pipes are closed, but in the wrong place and at wrong time.

comment:37 Changed 6 years ago by kazu-yamamoto

What about possibility that "cleanup" calls "closeControl"?

comment:38 in reply to:  37 Changed 6 years ago by adept

Replying to kazu-yamamoto:

What about possibility that "cleanup" calls "closeControl"?

I don't quite understand your question, but, for example, in the code from comment 28 closeControl is called once from cleanup and twice from withNew, each time for different values of Control. Does this answer your question somehow?

comment:39 Changed 6 years ago by adept

OK. More info. Now that I understand the mkWeakIORef properly (it just adds a finalizer to the IORef) I should rephrase the description of the root cause.

It all happens like this. Parent process creates IO manager, and finalizer is added to the state argument inside newWith. Then this IORef plus finalized are carried on to the child process on fork(). Event if child process explicitly closes the inherited IO manager with ioManagerDie(), there is no guarantee that GC will come to collect the remains of the old IO manger befor the new one is created.

Thus, finalizer for the inherited IO manger could be run _after_ new one is created, and there is no easy way to force it run sooner. Since Controls of the both managers refer to the same file descriptors, we have errors (double close) when they are finally run, plus finalizers of the inherited manager close control pipes of the new manager.

comment:40 Changed 6 years ago by adept

Status: newpatch

Attached patch fixes this bug on FreeBSD 8.2.

I also checked the Linux - no adverse side-effects there.

On both platforms I checked by running test-suite and code from comments to this issue.

Please review and apply

comment:41 Changed 6 years ago by tibbe

Owner: adept deleted

comment:42 Changed 6 years ago by kazu-yamamoto

I have confirmed that this bug is fixed on Snow Leopard. So, let's close this. Thank you for adept's effort!

comment:43 Changed 6 years ago by igloo

Resolution: fixed
Status: patchclosed

Fixed in HEAD and 7.0 by:

Wed Jan 19 02:33:00 PST 2011  Dmitry Astapov <dastapov@gmail.com>
  * Clean up remnants of the Event Manager after forkProcess. Closes #4449
Note: See TracTickets for help on using tickets.