Bad monads performance
Hi! We've been struggling with a very strange GHC behavior on IRC today. Let's consider the following code (needs mtl and criterion to be compiled):
module Main where
import Prelude
import Criterion.Main
import qualified Control.Monad.State.Strict as Strict
import qualified Control.Monad.State.Class as State
import Control.DeepSeq (NFData, rnf, force)
import GHC.IO (evaluate)
import Data.Monoid
-----------------------------
-- === Criterion utils === --
-----------------------------
eval :: NFData a => a -> IO a
eval = evaluate . force ; {-# INLINE eval #-}
liftExp :: (Int -> a) -> (Int -> a)
liftExp f = f . (10^) ; {-# INLINE liftExp #-}
expCodeGen :: NFData a => (Int -> a) -> (Int -> IO a)
expCodeGen f i = do
putStrLn $ "generating input code (10e" <> show i <> " chars)"
out <- eval $ liftExp f i
putStrLn "code generated sucessfully"
return out
{-# INLINE expCodeGen #-}
expCodeGenBench :: (NFData a, NFData b) => (Int -> a) -> (a -> b) -> Int -> Benchmark
expCodeGenBench f p i = env (expCodeGen f i) $ bench ("10e" <> show i) . nf p ; {-# INLINE expCodeGenBench #-}
-------------------------------
-- === (a*) list parsing === --
-------------------------------
genList_a :: Int -> [Char]
genList_a i = replicate i 'a' ; {-# INLINE genList_a #-}
pureListParser_a :: [Char] -> Bool
pureListParser_a = \case
'a':s -> pureListParser_a s
[] -> True
_ -> False
{-# INLINE pureListParser_a #-}
mtlStateListParser_a :: State.MonadState [Char] m => m Bool
mtlStateListParser_a = State.get >>= \case
'a':s -> State.put s >> mtlStateListParser_a
[] -> return True
_ -> return False
{-# INLINE mtlStateListParser_a #-}
mtlStateListParser_a_typed :: Strict.State [Char] Bool
mtlStateListParser_a_typed = State.get >>= \case
'a':s -> State.put s >> mtlStateListParser_a_typed
[] -> return True
_ -> return False
{-# INLINE mtlStateListParser_a_typed #-}
mtlStateListParser_a_let :: Strict.MonadState [Char] m => m Bool
mtlStateListParser_a_let = go where
go = Strict.get >>= \case
'a':s -> Strict.put s >> go
[] -> return True
_ -> return False
{-# INLINE mtlStateListParser_a_let #-}
{-# SPECIALIZE mtlStateListParser_a :: Strict.State [Char] Bool #-}
{-# SPECIALIZE mtlStateListParser_a_typed :: Strict.State [Char] Bool #-}
main = do
defaultMain
[ bgroup "a*" $
[ bgroup "pure" $ expCodeGenBench genList_a pureListParser_a <$> [6..6]
, bgroup "mtl.State.Strict" $ expCodeGenBench genList_a (Strict.evalState mtlStateListParser_a) <$> [6..6]
, bgroup "mtl.State.Strict typed" $ expCodeGenBench genList_a (Strict.evalState mtlStateListParser_a_typed) <$> [6..6]
, bgroup "mtl.State.Strict let" $ expCodeGenBench genList_a (Strict.evalState mtlStateListParser_a_let) <$> [6..6]
]
]
The code was compiled with following options (and many other variations): -threaded -funbox-strict-fields -O2 -fconstraint-solver-iterations=100 -funfolding-use-threshold=10000 -fexpose-all-unfoldings -fsimpl-tick-factor=1000 -flate-dmd-anal
Everything in this code has INLINE
pragma. The important part we should focus on are these two functions:
pureListParser_a :: [Char] -> Bool
pureListParser_a = \case
'a':s -> pureListParser_a s
[] -> True
_ -> False
{-# INLINE pureListParser_a #-}
mtlStateListParser_a :: State.MonadState [Char] m => m Bool
mtlStateListParser_a = State.get >>= \case
'a':s -> State.put s >> mtlStateListParser_a
[] -> return True
_ -> return False
{-# INLINE mtlStateListParser_a #-}
Which are just "parsers" accepting strings containing only 'a' characters. The former is pure one, while the later uses State
to keep the remaining input. The following list contains performance related observations:
- For the rest of the points, let's call the performance of
pureListParser_a
a "good" one and everything worse a "bad" one. - The performance of
mtlStateListParser_a
is bad, it runs 10 times slower thanpureListParser_a
. Inspecting CORE we can observe that GHC jumps between(# a,b #)
and(a,b)
representations all the time. - If we add a specialize pragma
{-# SPECIALIZE mtlStateListParser_a :: Strict.State [Char] Bool #-}
, the performance ofmtlStateListParser_a
is good (exactly the same aspureListParser_a
). - If we do NOT use specialize pragma, but we use explicite, non-polymorphic type signature
mtlStateListParser_a_typed :: Strict.State [Char] Bool
, the performance is bad (!), identical to the polymorphic version without specialization. - If we use SPECIALIZE pragma together with explicite, non-polymorphic type, so we use BOTH
mtlStateListParser_a_typed :: Strict.State [Char] Bool
AND{-# SPECIALIZE mtlStateListParser_a_typed :: Strict.State [Char] Bool #-}
we get the good performance. - If we transform
pureListParser_a
to
mtlStateListParser_a_let :: Strict.MonadState [Char] m => m Bool
mtlStateListParser_a_let = go where
go = Strict.get >>= \case
'a':s -> Strict.put s >> go
[] -> return True
_ -> return False
{-# INLINE mtlStateListParser_a_let #-}
we again get the good performance without the need to use SPECIALIZE
pragmas.
- The performance of all the functions that are not optimized as good as
pureListParser_a
is a lot worse in GHC 8.2.1-rc3 than in 8.0.2. - The not-yet documented flag
-fspecialise-aggressively
does NOT affect the results (https://ghc.haskell.org/trac/ghc/ticket/12463). - If you do NOT use
INLINE
pragma on functionsmtlStateListParser_a
andmtlStateListParser_a_typed
their performance is good (soINLINE
pragma makes it bad until we provide explicit specialization). Moreover, if we useINLINABLE
pragma instead ofINLINE
on these functions (which logically makes more sense, because they are recursive), performance of the polymorphic onemtlStateListParser_a
is good, while performance of the explicitly typedmtlStateListParser_a_typed
is bad until we provide explicite specialization.
The above points raise the following questions:
- Why GHC does not optimize
mtlStateListParser_a
the same way aspureListParser_a
and where the jumping between(# a,b #)
and(a,b)
comes from? - Is there any way to tell GHC to automatically insert
SPECIALIZE
pragmas, especially in performance critical code? - Why providing very-explicite type signature
mtlStateListParser_a_typed :: Strict.State [Char] Bool
does not solve the problem unless we useSPECIALIZE
pragma that tells the same as the signature? (GHC even warns:SPECIALISE pragma for non-overloaded function ‘mtlStateListParser_a_typed’
but it affects performance.) - Why the trick to alias the body of recursive function to a local variable
go
affects the performance in any way, especially when it does NOT bring any variable to the local let scope?
We've been testing this behavior in GHC 8.0.2 and 8.2.1-rc3 and several people reported exactly the same observations.