ghc 7.6 (not 7.4) sometimes hangs at child process exit on s390x
|Reported by:||cjwatson||Owned by:|
|Type of failure:||Other||Test Case:|
|Related Tickets:||Differential Rev(s):|
On Debian's s390x architecture (64-bit S/390, Linux kernel), builds of several packages hang with GHC 7.6 where they did not hang with GHC 7.4. In particular, ghc itself hangs during its own build when bootstrapping with 7.6. This is quite easy to reproduce on affected systems, although it doesn't hang in exactly the same place every time. It appears that the runtime sometimes deadlocks when a subprocess exits; the strace looks like this:
7523 exit_group(0) = ? 6680 <... futex resumed> ) = ? ERESTARTSYS (To be restarted) 6680 --- SIGCHLD (Child exited) @ 0 (0) --- 6680 futex(0x84fa86ac, FUTEX_WAIT_PRIVATE, 1143, NULL) = ? ERESTARTSYS (To be restarted) 6680 --- SIGVTALRM (Virtual timer expired) @ 0 (0) --- 6680 sigreturn() = ? (mask now ) 6680 futex(0x84fa86ac, FUTEX_WAIT_PRIVATE, 1143, NULL) = ? ERESTARTSYS (To be restarted) 6680 --- SIGVTALRM (Virtual timer expired) @ 0 (0) --- 6680 sigreturn() = ? (mask now ) 6680 futex(0x84fa86ac, FUTEX_WAIT_PRIVATE, 1143, NULL) = ? ERESTARTSYS (To be restarted) [repeats forever]
ghc spawns enough subprocesses (gcc etc.) that it's essentially bound to hit this sooner or later. I suspect perhaps a lack of signal-safety somewhere - at an extremely wild guess, perhaps the type of an important variable written in a signal handler happens to exceed the size of sig_atomic_t on s390x and not elsewhere - but I haven't yet been able to track this down in the time available to me.
If you don't immediately recognise this as something obvious, then perhaps somebody more fluent in Haskell than I would be good enough to suggest test code that exercises this and is somewhat simpler than "build ghc"? If my analysis is at all close to the mark, then something that sits in a loop forking and reaping a trivial child process on each iteration should be enough to reproduce this. On the assumption that most non-Debian-developers don't have convenient access to S/390 machines (Debian developers can use zelenka.debian.org), I'd be happy to try things out.