Data.Vector.Unboxed performance regression of 7.4.1 relative to 7.0.4
Problem
Severe Data.Vector.Unboxed performance regression in 7.4.1 relative to 7.0.4:[[BR]] (Sum GHC 7.4.1)/(Sum GHC 7.0.4) ~ 2.4[[BR]]
System
GNU/Linux 3.2.0-24-generic 38-Ubuntu i386[[BR]]
Compilers
GHC 7.0.4[[BR]] GHC 7.4.1[[BR]] GCC 4.6.3 for a baseline[[BR]]
Main.hs
module Main where
import System.Environment (getArgs)
import qualified Data.Vector.Unboxed as U (generate, sum)
main :: IO ()
main = do args <- getArgs
if length args == 1
then putSum (read (head args) :: Int)
else error "need a count operand"
putSum :: Int -> IO ()
putSum cnt = let v = U.generate cnt (\i -> fromIntegral i :: Double)
s = U.sum v
in putStrLn ("Sum="++show s)
GHC compilation
ghc --version[[BR]]
- 4.1[[BR]]
ghc -O2 -Wall --make -o sum Main.hs[[BR]]
[[BR]]
ghc --version[[BR]]
- 0.4[[BR]]
ghc -O2 -Wall --make -o sum Main.hs[[BR]]
Baseline csum.c
#include <libgen.h>
#include <stdio.h>
#include <stdlib.h>
int main(int argc, char **argv)
{
unsigned long i, size;
double tot=0;
if (argc != 2)
{
(void)fprintf(stderr, "usage: %s size\n", basename(argv[0]));
return(1);
}
size = atol(argv[1]);
for(i = 0; i < size; i++) tot += (double)i;
(void)printf("Sum=%.15e\n", tot);
return(0);
}
GCC baseline compilation
gcc --version[[BR]]
- 6.3[[BR]]
gcc -O2 -Wall csum.c -o csum[[BR]]
Data: time sum-7.0.4 n
n seconds[[BR]] 100000000 0.74[[BR]] 200000000 1.46[[BR]] 300000000 2.24[[BR]] 400000000 2.94[[BR]] 500000000 3.70[[BR]] 600000000 4.40[[BR]] 700000000 5.14[[BR]] 800000000 5.89[[BR]] 900000000 6.62[[BR]] 1000000000 7.34[[BR]]
Data: time sum-7.4.1 n
n seconds[[BR]] 100000000 1.74[[BR]] 200000000 3.49[[BR]] 300000000 5.24[[BR]] 400000000 6.98[[BR]] 500000000 8.73[[BR]] 600000000 10.51[[BR]] 700000000 12.22[[BR]] 800000000 13.96[[BR]] 900000000 15.75[[BR]] 1000000000 17.51[[BR]]
Data: time csum-4.6.3 n
n seconds[[BR]] 100000000 1.04[[BR]] 200000000 2.10[[BR]] 300000000 3.12[[BR]] 400000000 4.19[[BR]] 500000000 5.23[[BR]] 600000000 6.26[[BR]] 700000000 7.32[[BR]] 800000000 8.37[[BR]] 900000000 9.41[[BR]] 1000000000 10.45[[BR]]
Linear in n
y is in seconds[[BR]] [[BR]] GHC 7.0.4: y = (0.73/10^8) * n + 0.03[[BR]] GCC 4.6.3: y = (1.04/10^8) * n + 0.03[[BR]] GHC 7.4.1: y = (1.75/10^8) * n - 0.01[[BR]]
Severe performance regression:[[BR]] GHC 7.4.1/GHC 7.0.4 ~ 1.75/0.73 ~ 2.4[[BR]]
Notes
1/ I discovered the problem in a slightly more complicated case when I recompiled a package that used some simple statisics. The sum of [0..(n-1)] was the simplest case that I imagined to demonstrate the problem.
2/ I tried a similar experiment with Data.List, Data.Array.Unboxed, Data.Vector.Storable.MMap, and Foreign.Marshal.Alloc. In all cases, the GHC 7.4.1 version was faster than the GHC 7.0.4 version.
3/ It is the same Data.Vector.Unboxed code in both cases compilied and installed separately for each version of the GHC compiler. Thus, the problem appears to be the interaction between Data.Vector.Unboxed and the 7.4.1 compiler that causes the performance regression.
4/ I am impressed that the GHC 7.0.4 sum is faster than the GCC 4.6.3 sum. I expected it to be close, but not faster. Given this impressive result, I certainly would hope that the same result can be recovered once again.
Trac metadata
Trac field | Value |
---|---|
Version | 7.4.1 |
Type | Bug |
TypeOfFailure | OtherFailure |
Priority | normal |
Resolution | Unresolved |
Component | Compiler |
Test case | |
Differential revisions | |
BlockedBy | |
Related | |
Blocking | |
CC | |
Operating system | |
Architecture |