GHC 7.0.4 Performance Regression (Possibly Vector)
I have noticed ~100% performance degradation for my code when I switched from 6.12.3 to 7.0.4. This might be related to vector performance ticket 5623 but I noticed it was filed for performance regression of 7.2.1 relative to 7.0.4, and 6.12.1 vs 7.0.4 performance was reported as ok. So, I am filing it as new bug report.
I am attaching an edited version of my code below which reproduces the issue. It is from actual production code which is used for db driver. Relevant performance benchmarks (~95% degradation):
GHC 6.12.3 MUT Time: 0.48s
GHC 7.0.4 MUT Time: 0.95s
In actual code, performance degrades by ~100%, from ~1.3s to ~2.6s. So, I can't move from 6.12.3 to 7.0.4 or 7.4+ if I want to keep the performance :(
Code below - the comment block at the end shows how to compile, and reproduce the issue - I will be happy to provide more information to fix the issue:
{-# LANGUAGE BangPatterns #-}
import qualified Data.Vector.Storable as SV
import qualified Data.Vector.Storable.Mutable as MSV
import qualified Data.Vector as V
import Foreign (sizeOf)
import Foreign.C.Types (CChar)
import GHC.Int
import System.IO.Unsafe (unsafePerformIO)
import Control.Exception (evaluate)
data Elems = IV {-# UNPACK #-} !(SV.Vector Int32)
| SV {-#UNPACK #-} !Int {-# UNPACK #-} !(SV.Vector CChar) -- Int stores the number of null-terminated C Strings
| T {-# UNPACK #-} !Int {-# UNPACK #-} !(V.Vector Elems) -- Int stores total bytes needed to copy vectors to ByteString
| L {-# UNPACK #-} !(V.Vector Elems) -- General list of elements
deriving (Show)
-- | Function to return total size in bytes taken to store the data from Elems
size :: Elems -> Int
size (IV x) = 6 + (sizeOf (undefined :: Int32)) * (SV.length x)
size (SV _ x) = 6 + (sizeOf (undefined :: CChar)) * (SV.length x)
size (T n _) = n
size (L x) = V.foldl' (\x y -> x + size y) 6 x
{-# INLINE size #-}
fillS :: [[CChar]] -> Elems
fillS x = let (x',y') = createS x
in SV x' y'
{-# INLINE fillS #-}
createS :: [[CChar]] -> (Int, SV.Vector CChar)
createS cl = unsafePerformIO $ do
v <- MSV.new (Prelude.length . Prelude.concat $ cl)
fill v 0 $ Prelude.concat cl
SV.unsafeFreeze v >>= \x -> return (Prelude.length cl,x)
where
fill v _ [] = return ()
fill v i (x:xs) = MSV.unsafeWrite v i x >> fill v (i + 1) xs
{-# INLINE createS #-}
-- | Constructor for T - a db table - we must always build it using this function
fillT :: V.Vector Elems -> Elems
fillT !xs = T (V.foldl' (\x y -> x + size y) 3 xs) xs -- 2 bytes for table header + 1 additional byte for dict type header => 3 bytes additional overhead
{-# INLINE fillT #-}
main = do
let il1 = IV $ SV.enumFromN 1 50000000
il2 = IV $ SV.enumFromN 1 50000000
il3 = IV $ SV.enumFromN 1 50000000
l1 = L (V.fromList [il1,il2,il3])
sl1 = fillS [[97,0],[98,0],[99,0]]
evaluate $ fillT (V.fromList [sl1,l1])
return ()
{-- GHC 6.12.3:
$ ghc -O2 --make test.hs -fforce-recomp -rtsopts -fasm && ./test +RTS -s
[1 of 1] Compiling Main ( test.hs, test.o )
Linking test ...
./test +RTS -s
600,843,536 bytes allocated in the heap
8,336 bytes copied during GC
200,002,504 bytes maximum residency (2 sample(s))
793,936 bytes maximum slop
574 MB total memory in use (9 MB lost due to fragmentation)
Generation 0: 2 collections, 0 parallel, 0.00s, 0.00s elapsed
Generation 1: 2 collections, 0 parallel, 0.00s, 0.00s elapsed
INIT time 0.00s ( 0.00s elapsed)
MUT time 0.48s ( 0.97s elapsed)
GC time 0.00s ( 0.00s elapsed)
EXIT time 0.00s ( 0.00s elapsed)
Total time 0.48s ( 0.97s elapsed)
%GC time 0.2% (0.1% elapsed)
Alloc rate 1,259,822,857 bytes per MUT second
Productivity 99.6% of total user, 49.0% of total elapsed
-----
GHC 7.0.4:
$ ghc -O2 --make test.hs -fforce-recomp -rtsopts -fasm && ./test +RTS -s
[1 of 1] Compiling Main ( test.hs, test.o )
Linking test ...
./test +RTS -s
600,836,872 bytes allocated in the heap
7,664 bytes copied during GC
200,002,224 bytes maximum residency (2 sample(s))
794,216 bytes maximum slop
574 MB total memory in use (0 MB lost due to fragmentation)
Generation 0: 2 collections, 0 parallel, 0.00s, 0.00s elapsed
Generation 1: 2 collections, 0 parallel, 0.11s, 0.11s elapsed
INIT time 0.00s ( 0.00s elapsed)
MUT time 0.94s ( 1.01s elapsed)
GC time 0.11s ( 0.11s elapsed)
EXIT time 0.00s ( 0.11s elapsed)
Total time 1.05s ( 1.12s elapsed)
%GC time 10.2% (9.7% elapsed)
Alloc rate 638,055,951 bytes per MUT second
Productivity 89.4% of total user, 83.4% of total elapsed
--}
Trac metadata
Trac field | Value |
---|---|
Version | 7.0.4 |
Type | Bug |
TypeOfFailure | OtherFailure |
Priority | normal |
Resolution | Unresolved |
Component | Compiler |
Test case | |
Differential revisions | |
BlockedBy | |
Related | |
Blocking | |
CC | |
Operating system | |
Architecture |