Huge regression in concurrent app performance and reliability under threaded runtime

I have a trivial HTTP server that I wrote to do some performance measurements with, and it behaves quite poorly for me under 6.12.1 if I try to use multiple cores. The symptoms are very low throughput and frequent hangs.

I'm on a dual core 64-bit Linux system (Fedora 12).

I have attached the server code that you can use to reproduce the problem. It requires the network and network-bytestring packages to function, but is otherwise standalone.

Running it is easy:

ghc -fforce-recomp -O --make Netbench
./Netbench localhost 8080

I've been using the standard apachebench tool, ab, to stress the server:

ab -c 10 -n 30000 http://localhost:8080/

This hits the server with 10 concurrent requests for a total of 30,000 requests. Here are some throughput measurements:

6.10.4, unthreaded RTS: 5539 req/sec
6.10.4, threaded RTS, -N1: 7758 req/sec
6.10.4, threaded RTS, -N2: 5856 req/sec
6.12.1, unthreaded RTS: 5612 req/sec
6.12.1, threaded RTS, -N1: 7437 req/sec
6.12.1, threaded RTS, -N2: 1978 req/sec

With -N2 under 6.12.1, there is a high probability (> 50%) that the server will deadlock mid-run.

When a multi-CPU run completes successfully under 6.12.1, ab reports that quite often a single request will get stuck for several seconds before the server responds. This does not seem to happen in other scenarios.

If you would like any more details, please let me know. Thanks.

Edited Mar 09, 2019 by Simon Marlow

To upload designs, you'll need to enable LFS and have an admin enable hashed storage. More information