Opened 5 years ago

Closed 5 years ago

#2858 closed merge (fixed)

Segmentation fault due to race between IO manager and installSignals.

Reported by: dsh Owned by: igloo
Priority: normal Milestone: 6.10.2
Component: libraries/base Version: 6.10.1
Keywords: posix signal threaded IOmanager Cc:
Operating System: Unknown/Multiple Architecture: Unknown/Multiple
Type of failure: Difficulty: Unknown
Test Case: Blocked By:
Blocking: Related Tickets:

Description

Hello,

consider the attached test case.

To reproduce the problem please

  1. compile the attached test case with -threaded
  2. enable core dumps in the shell,
  3. run infinite loop until core dump appears,
  4. to make core dumps actually appear run infinite pkill -TERM Test in another shell.

Regards,
Dmitry

Attachments (2)

Test.hs (365 bytes) - added by dsh 5 years ago.
Test with installHandler
patch (705 bytes) - added by dsh 5 years ago.
Quick and dirty fix

Download all attachments as: .zip

Change History (13)

Changed 5 years ago by dsh

Test with installHandler

Changed 5 years ago by dsh

Quick and dirty fix

comment:1 follow-up: Changed 5 years ago by dsh

With this fix I get assertion with sanity checks enabled about deadlock.

May be IO manager should read its pipe in non-blocking mode?

comment:2 Changed 5 years ago by simonmar

  • Difficulty set to Unknown
  • Owner set to simonmar

comment:3 Changed 5 years ago by simonmar

  • Milestone set to 6.10.2

comment:4 Changed 5 years ago by dsh

Hello,

It look like Haskell handlers are not freed after stg_sig_install in System.Posix.Signals
Also freeing StablePtr? in stg_sig_install leaves signal_handlers in inconsistent state.

comment:5 in reply to: ↑ 1 Changed 5 years ago by dsh

Replying to dsh:

With this fix I get assertion with sanity checks enabled about deadlock.

May be IO manager should read its pipe in non-blocking mode?

Small clarification. The deadlock I see happens if one of the following RTS setup is used

  1. "+RTS -N2 -g1 -DS --install-handlers=no -RTS"
  2. "+RTS -N1 -DS --install-handlers=no -RTS"

I observe the issue with slightly more complex program (using createProcess from System.Process).

comment:6 Changed 5 years ago by simonmar

Sorry, I can't reproduce the problem. I run Test.hs over and over again in one window, and run killall -TERM Test over and over (in a while loop) in another window, and so far no core dumps. Is there something I'm missing?

In any case I think a fix for this will probably have to wait until I've finished the signal overhaul (so 6.12), unless the fix is easy.

comment:7 Changed 5 years ago by dsh

I see, I got them infrequently too..

For me the problem looks like

  1. There could be 2 commands (I mean bytes) in IO manager pipe for the same signal, received when the handler was user defined;
  2. Then IO manager scheduled the handler (executed first command), which invalidated 'signal_handlers' table (replacing StablePtr? with special constant STG_SIG_DFL)
  3. It is incorrect for IO manager to schedule process the second command using 'signal_handlers' table.

Quick fix would be always to check the 'signal_handlers' table entry in IO manager.
It would not be worse than ignoring signals after space overflow in non-threaded version

Dmitry

P.S. Could you try random small delays?

comment:8 Changed 5 years ago by simonmar

I intend to fix this by bringing in my new signal-handling code, which moves more of the signal-handling into Haskell. The original plan was to also extend the user API, but we could bring in the low-level machinery without disturbing the API for now. I'd rather do this than spend time fixing bugs in the current implementation, which is rather ugly.

comment:9 Changed 5 years ago by simonmar

  • Owner changed from simonmar to igloo
  • Type changed from bug to merge

I think these patches should fix the problem, since we no longer have the RTS table of StablePtrs to signal handlers. I'd be grateful if you could test the new code and let me know if you still encounter any problems.

patch to libraries/unix:

Thu Feb 19 02:05:32 PST 2009  Simon Marlow <marlowsd@gmail.com>
  * Rewrite of signal-handling.
  Ignore-this: 1579194c10020dc34af715c225a9f207
  
  The API is the same (for now).  The new implementation has the
  capability to define signal handlers that have access to the siginfo
  of the signal (#592), but this functionality is not exposed in this
  patch.
  
  #2451 is the ticket for the new API.
  
  The main purpose of bringing this in now is to fix race conditions in
  the old signal handling code (#2858).  Later we can enable the new
  API in the HEAD.
  
  Implementation differences:
  
   - More of the signal-handling is moved into Haskell.  We store the
     table of signal handlers in an MVar, rather than having a table of
     StablePtrs in the RTS.
  
   - In the threaded RTS, the siginfo of the signal is passed down the
     pipe to the IO manager thread, which manages the business of
     starting up new signal handler threads.  In the non-threaded RTS,
     the siginfo of caught signals is stored in the RTS, and the
     scheduler starts new signal handler threads.

patch to libraries/base:

Thu Feb 19 02:22:03 PST 2009  Simon Marlow <marlowsd@gmail.com>
  * Rewrite of signal-handling (base patch; see also ghc and unix patches)

patch to ghc:

Thu Feb 19 02:31:42 PST 2009  Simon Marlow <marlowsd@gmail.com>
  * Rewrite of signal-handling (ghc patch; see also base and unix patches)

comment:10 Changed 5 years ago by simonmar

  • Operating System changed from Linux to Unknown/Multiple

comment:11 Changed 5 years ago by igloo

  • Resolution set to fixed
  • Status changed from new to closed

All 3 merged.

Note: See TracTickets for help on using tickets.