Inliner fails to inline a function, causing 20x slowdown
Alexey filed a bug against the mwc-random package recently, indicating a 20x to 40x slowdown on a function named uniformRange
- you can see its source here.
In the original definition, there was an INLINE
pragma, but Alexey noticed that it wasn't firing and so performance was predictably terrible. He added the SPECIALIZE
pragmas that now follow the body of the function.
I looked at -ddump-simpl
output with the SPECIALIZE
pragmas removed, and sure enough there are no inlining annotations on the function.
The whole purpose of uniformRange
is to be used in instance declarations such as the following:
instance Variate Int8 where
uniform = uniform1 fromIntegral
uniformR = uniformRange
{-# INLINE uniform #-}
{-# INLINE uniformR #-}
I have a suspicion that what's going on is that GHC's inliner is declining to do anything because some call site or other (or perhaps several?) isn't fully saturated.
The behaviour of the new inliner is subtle to understand at times - it's not at all obvious when I should rewrite an instance like this, just to satisfy it:
instance Variate Int8 where
uniform = uniform1 fromIntegral
uniformR inliner sacrifice = uniformRange inliner sacrifice
{-# INLINE uniform #-}
{-# INLINE uniformR #-}
Saturating as above turned out to be the solution to the performance problem. I've been able to remove the SPECIALIZE
pragmas. However, I'm still worried.
It would be helpful if GHC had a mode that dumped out when (and why) inlinings do *not* take place on functions that have been annotated with INLINE
, because I'm surely not the only person who gets caught by this.
Also, aesthetically I find that saturating an application as above makes for tricky-to-read "why are those arguments there?" code, sort of the inliner's version of the dreaded monomorphism restriction: a lexical tic that's tremendously important, but for reasons that most readers will not know about.
Trac metadata
Trac field | Value |
---|---|
Version | 7.2.1 |
Type | Bug |
TypeOfFailure | OtherFailure |
Priority | normal |
Resolution | Unresolved |
Component | Compiler |
Test case | |
Differential revisions | |
BlockedBy | |
Related | |
Blocking | |
CC | |
Operating system | |
Architecture |