Transitivity of Auto-Specialization
From the docs:
[Y]ou often don't even need the SPECIALIZE pragma in the first place. When compiling a module M, GHC's optimiser (with -O) automatically considers each top-level overloaded function declared in M, and specialises it for the different types at which it is called in M. The optimiser also considers each imported INLINABLE overloaded function, and specialises it for the different types at which it is called in M.
- ..
Moreover, given a SPECIALIZE pragma for a function f, GHC will automatically create specialisations for any type-class-overloaded functions called by f, if they are in the same module as the SPECIALIZE pragma, or if they are INLINABLE; and so on, transitively.
So GHC should automatically specialize some/most/all(?) functions marked INLINABLE
without a pragma, and if I use an explicit pragma, the specialization is transitive. My question is:
is the auto-specialization transitive? Either way, I'd like to see the docs updated to answer this question.
Specifically, the attached files demonstrate a bug if auto-specialization should be transitive.
Main.hs:
import Data.Vector.Unboxed as U
import Foo
main =
let y = Bar $ Qux $ U.replicate 11221184 0 :: Foo (Qux Int)
(Bar (Qux ans)) = iterate (plus y) y !! 100
in putStr $ show $ foldl1' (*) ans
Foo.hs:
module Foo (Qux(..), Foo(..), plus) where
import Data.Vector.Unboxed as U
newtype Qux r = Qux (Vector r)
-- GHC inlines `plus` if I remove the bangs or the Baz constructor
data Foo t = Bar !t
| Baz !t
instance (Num r, Unbox r) => Num (Qux r) where
{-# INLINABLE (+) #-}
(Qux x) + (Qux y) = Qux $ U.zipWith (+) x y
{-# INLINABLE plus #-}
plus :: (Num t) => (Foo t) -> (Foo t) -> (Foo t)
plus (Bar v1) (Bar v2) = Bar $ v1 + v2
GHC specializes the call to plus
, but does *not* specialize (+)
in the Qux
Num
instance. (In the attached core excerpt: main6
calls iterate main8
. main8
is just plus
, specialized for Int
. So far so good. However, splus
calls the *polymorphic* c+
. If auto-specialization is transitive, I expect c+
to be specialized to Int
.)
This kills performance: an explicit pragma
{-# SPECIALIZE plus :: Foo (Qux Int) -> Foo (Qux Int) -> Foo (Qux Int) #-}
results in transitive specialization as the docs indicate, so (+)
is specialized and the code is 30x faster.
Is this expected behavior? Should I only expect (+)
to be specialized transitively with an explicit pragma?
Note: this question is different from #5928 for two reasons:
- I believe that no inlining is occuring, and hence I don't think inlining is interfering with specialization
- I have
INLINABLE
pragmas on all relevant functions.
Note: this question is different from #8668 because I am asking about auto-specialization.
This question was originally posted on StackOverflow. As mentioned in the comments of that question, I am intentionally not fully applying the call to plus
in Main, contrary to the suggestions in #8099. I'd love to see why I'm getting that behavior as well.