Opened 2 years ago

Last modified 4 days ago

#7651 new bug

Buiding GHC with parallel IO manager freezes on Mac (not on FreeBSD)

Reported by: kazu-yamamoto Owned by:
Priority: high Milestone: 7.12.1
Component: Build System Version: 7.7
Keywords: Cc: pho@…, andreas.voellmy@…, george.colpitts@…
Operating System: MacOS X Architecture: x86_64 (amd64)
Type of failure: Building GHC failed Test Case:
Blocked By: Blocking:
Related Tickets: Differential Revisions:

Description (last modified by simonmar)

Building GHC with parallel IO manager on Mac freezes when
compiling the dph libraries in the phase 2.

We suspect this is due to a bug in the OS X implementation of kqueue, for the reasons given below. In the meantime, we have added an extra IO manager wakeup that appears to work around the problem; see GHC/Event/Manager.hs.

Details:

  • This happens only if we specify "-j" to "make". Note that "make" closes stdin of sub-processes if "-j" is specified.
  • Even if we specify "-j" to "make", the problem disappears with stdout/stderr redirection. That is, "make -jN >& LOG &" works.
  • The "-d" option of "make" does not make any effects.
  • Programs compiled with built GHC (with our patches) work well. For test, I compiled a daemon HTTP server which closes stdin/stdout/stderr. It worked well.

An IO manager is polling a kqueue fd. Another Haskell thread on
another native thread registers an event through the same kqueue
fd. In many cases, this works on Mac. In a certain situation,
MacOS does not deliver an event to the IO manager. If the IO
manager gets up and polls the kqueue fd, the event is delivered.

This bug only appears when building GHC on Mac. I cannot find a
simple way to reproduce it. Even if we find a way to reproduce
it, I guess that we will probably reach a conclusion that this is
a bug of kqueue of Mac.

I have some evidences that kqueue of Mac is buggy:

Change History (19)

comment:1 Changed 2 years ago by simonmar

  • Description modified (diff)
  • difficulty set to Unknown

Added a bit of formatting and text to the description.

comment:2 Changed 2 years ago by PHO

  • Cc pho@… added

comment:3 Changed 2 years ago by kazu-yamamoto

Andreas wrote C programs to simulate the parallel IO manager:
https://github.com/AndreasVoellmy/epollbug

kqueueserver2 uses kevent64() while kqueueserver3 uses kevent(). kqueue2 disclosed that kevent64() is unstable. The details are described in:
https://discussions.apple.com/thread/4783301

We stopped using kevent64() and started using kevent() in the parallel IO manager. But building GHC still freezes. So, workaround is still necessary on Mac.

comment:4 Changed 2 years ago by AndreasVoellmy

  • Cc andreas.voellmy@… added

comment:6 Changed 2 years ago by AndreasVoellmy

It turns out that kqueueserver2.c (mentioned in previous comments in this item) had an error in it. Therefore, it does not indicate any problem with kevent64. The problem in that code occurred because the program allocated a struct kevent64_s and failed to initialize some of that struct's fields. There must have been some garbage values in these fields that caused kevent64 to behave oddly. The fix is to use the EV_SET64 macro to initialize the struct.

comment:7 Changed 2 years ago by igloo

What's the status of this ticket? Is the problem now understood, and the fix known?

comment:8 Changed 2 years ago by kazu-yamamoto

GHC head has already workaround for this. Please see libraries/base/GHC/Event/Manager.hs. You can find "#if defined(darwin_HOST_OS)".

But we still have two problems:

  • Building GHC on Mac sometime fails. We need to understand whether or not this is due to the new IO manager. Unfortunately, I cannot "make install" GHC head at this moment as I described in ghc-deps.

comment:9 Changed 2 years ago by igloo

  • Milestone set to 7.8.1
  • Priority changed from normal to high

comment:10 Changed 20 months ago by kazu-yamamoto

I can build GHC head in parallel even on Mavericks. See #8497 and #8102.

comment:11 Changed 16 months ago by George

wrt building GHC on Mac sometimes fails, see #8620, building in parallel on Mavericks is failing for some users

comment:12 follow-up: Changed 16 months ago by kazu-yamamoto

Which does "fail" mean, freeze (non stopping) or stop with an error?

comment:13 Changed 15 months ago by George

  • Cc george.colpitts@… added

comment:14 Changed 15 months ago by thoughtpolice

  • Status changed from new to infoneeded

comment:15 Changed 13 months ago by thoughtpolice

  • Milestone changed from 7.8.3 to 7.8.4

Moving to 7.8.4.

comment:16 Changed 9 months ago by thoughtpolice

  • Milestone changed from 7.8.4 to 7.10.1

Moving (in bulk) to 7.10.4

comment:17 Changed 4 months ago by thoughtpolice

  • Milestone changed from 7.10.1 to 7.12.1

Moving to the 7.12.1 milestone, as these tickets won't be fixed in time for the 7.10.1 release (unless you, the reader, help write a patch :)

comment:18 in reply to: ↑ 12 Changed 4 days ago by George

Replying to kazu-yamamoto:

Which does "fail" mean, freeze (non stopping) or stop with an error?

I believe, at the time, I meant with an error as described in #8620. As I've just elaborated there, I believe the case for me was that I encountered this without touching any files after typing 'make', which, according to the wiki, is probably a bug in the build system. In any case I haven't seen this in a long time. I just build 7.10.2 rc with make -j5 and had no problems.

However, I don't think dph is being compiled in 7.10.2 so the problem may still be there but I don't have any information that would be helpful in resolving this bug.

Last edited 4 days ago by George (previous) (diff)

comment:19 Changed 4 days ago by George

  • Architecture changed from Unknown/Multiple to x86_64 (amd64)
  • Status changed from infoneeded to new
Note: See TracTickets for help on using tickets.