Inlining regression/strangeness in 7.8
A couple days ago, it was reported to me that vector-algorithms had a significant performance regression (~20x) on GHC 7.8.2. The problem stems from a lack of inlining and specialization of some of the functions that were previously handled in 7.6 and earlier. The following is a reduced test case (the vector and primitive packages are required):
module A (test) where
import Control.Monad.ST
import Control.Monad
import Control.Monad.Primitive
import Data.Vector.Generic.Mutable as U
test :: (PrimMonad m, MVector v a, Num a) => Int -> v (PrimState m) a -> m a
-- test :: (MVector v a, Num a) => Int -> v s a -> ST s a
test 0 v = liftM (+1) $ unsafeRead v 0
test n v = do
long1 v
test (n-1) v
{-# INLINABLE test #-}
long1, long2, long3, long4 :: (PrimMonad m, MVector v a) => v (PrimState m) a -> m ()
long1 v = long2 v >> long2 v >> long2 v >> long2 v
long2 v = long3 v >> long3 v >> long3 v >> long3 v
long3 v = long4 v >> long4 v >> long4 v >> long4 v
long4 v = unsafeRead v 0 >>= unsafeWrite v 0
{-# INLINE long1 #-}
{-# INLINE long2 #-}
{-# INLINE long3 #-}
{-# INLINE long4 #-}
module Main (main) where
import Control.Monad.ST
import Data.Vector.Unboxed.Mutable as U hiding (read)
import System.Environment
import Unsafe.Coerce
import GHC.Prim
import A
test0 :: Int -> MVector s Int -> ST s Int
test0 n v = test n v
{-# NOINLINE test0 #-}
test1' :: Int -> MVector Any Int -> ST Any Int
test1' n v = test n v
{-# NOINLINE test1 #-}
test1 :: Int -> MVector a Int -> ST a Int
test1 = unsafeCoerce test1'
main = getArgs >>= \(n:b:_) ->
print $ runST $ do
v <- new 1
write v 0 0
(if read b then test0 else test1) (read n) v
Module A
exports a single function, test
. This function is engineered to be quite large, by inlining several other functions into it, and it is itself marked INLINABLE. Then the Main
module uses this function in two different ways:
-
test0
usestest
at a type that is compatible withrunST
-
test1'
usestest
at a completely monomorphic type, which is then coerced to arunST
compatible type intest1
On 7.6 I believe (though have not checked) that there will be little or no performance difference between test0
and test1
. However, on 7.8.2 (and, I have been assured, 7.8.3), there is a massive speed pentalty for test0
; about 70x on my machine. This seems to be due to no inining or specialization for its use of test
, which can be seen from -ddump-simpl
.
However, if one changes the type of test
in A
to be specific to ST s
rather than using PrimMonad
, there is no performance difference, even on 7.8.2. So, the choice to inline and specialize seems to hinge on the instantiation of all the class constraints to monomorphic types containing no variables, rather than just types that resolve all overloading. I myself did not notice this problem, because my benchmark suite uses IO
, which is a concrete instantiation of the type, and doesn't exhibit this problem.
I have temporarily 'fixed' vector-algorithms by moving back to INLINE
pragmas, but INLINABLE
is actually preferable in that case, because it generates faster code than INLINE
when the optimizations actually fire. My test case here does not illustrate that well, though.
Is it safe to assume that this was not an intentional change? It's a rather weird rule (to me) if it was.
Trac metadata
Trac field | Value |
---|---|
Version | 7.8.2 |
Type | Bug |
TypeOfFailure | OtherFailure |
Priority | normal |
Resolution | Unresolved |
Component | Compiler |
Test case | |
Differential revisions | |
BlockedBy | |
Related | |
Blocking | |
CC | |
Operating system | |
Architecture |