Adding one call to getNumCapabilities triggers performance nose dive (6X slowdown)
This is a strange one.
I'm in the process of writing an efficient routine for reading large files of decimal/ASCII numbers, in order to make reasonable Haskell versions of the PBBS benchmarks.
The following reproducer reads a file and parses the numbers inside it. Simply calling "getNumCapabilities" BEFORE reading the file makes it leap from taking 4.5 seconds to 25-30 seconds:
The file is a standalone reproducer and there's a Makefile in that directory that will build and run it in the bugged and unbugged modes. When you run "make" you should see output as in the following Gist:
https://gist.github.com/rrnewton/5901965
......
time ./unbugged.exe
Using parReadNats + readFile
Time to read file sequentially: 0.311108s
SKIPPING read of num capabilities...
Now this is getting ridiculous...
Result: 1 segments of output
<segment, length 69568627>
4.59 real 3.89 user 0.66 sys
time ./bugged.exe
Using parReadNats + readFile
Time to read file sequentially: 0.342555s
Read num capabilities as 1
Now this is getting ridiculous...
Result: 1 segments of output
<segment, length 69568627>
38.79 real 37.57 user 1.18 sys
Currently I'm running it on a 550Mb file containing 69 million numbers. Any file of whitespace-separated positive numbers will do, but if you want the exact file I'm using you can download it here:
Or you can download the PBBS benchmark suite and run the generator to produce that file (http://www.cs.cmu.edu/~pbbs/). Anyway, it should have nothing to do with the file.
One thing that is interesting is that calling certain OTHER system functions appears to create the same performance effect as "getNumCapabilities". Namely, getEnvironment and setNumCapabilities cause the problem, but getEnv and writeFile don't.
I've attached the Core produced in the bugged and unbugged case to the Gist as well:
https://gist.github.com/rrnewton/5901965#file-pbbs_bugged-dump-simpl https://gist.github.com/rrnewton/5901965#file-pbbs_unbugged-dump-simpl
Finally, the above numbers are from my mac laptop, GHC 7.6.2. But the problem is exactly the same on the Linux/GHC-7.6.3 machine I tried, and I'm in the process of building a fresh GHC head to check it there...