Transitivity of Auto-Specialization

From the docs:

[Y]ou often don't even need the SPECIALIZE pragma in the first place. When compiling a module M, GHC's optimiser (with -O) automatically considers each top-level overloaded function declared in M, and specialises it for the different types at which it is called in M. The optimiser also considers each imported INLINABLE overloaded function, and specialises it for the different types at which it is called in M.

Moreover, given a SPECIALIZE pragma for a function f, GHC will automatically create specialisations for any type-class-overloaded functions called by f, if they are in the same module as the SPECIALIZE pragma, or if they are INLINABLE; and so on, transitively.

So GHC should automatically specialize some/most/all(?) functions marked INLINABLE without a pragma, and if I use an explicit pragma, the specialization is transitive. My question is: is the auto-specialization transitive? Either way, I'd like to see the docs updated to answer this question.

Specifically, the attached files demonstrate a bug if auto-specialization should be transitive.

Main.hs:

import Data.Vector.Unboxed as U
import Foo

main =
    let y = Bar $ Qux $ U.replicate 11221184 0 :: Foo (Qux Int)
        (Bar (Qux ans)) = iterate (plus y) y !! 100
    in putStr $ show $ foldl1' (*) ans

Foo.hs:

module Foo (Qux(..), Foo(..), plus) where
    
import Data.Vector.Unboxed as U

newtype Qux r = Qux (Vector r)
-- GHC inlines `plus` if I remove the bangs or the Baz constructor
data Foo t = Bar !t
           | Baz !t

instance (Num r, Unbox r) => Num (Qux r) where
    {-# INLINABLE (+) #-}
    (Qux x) + (Qux y) = Qux $ U.zipWith (+) x y

{-# INLINABLE plus #-}
plus :: (Num t) => (Foo t) -> (Foo t) -> (Foo t)
plus (Bar v1) (Bar v2) = Bar $ v1 + v2

GHC specializes the call to plus, but does *not* specialize (+) in the Qux Num instance. (In the attached core excerpt: main6 calls iterate main8. main8 is just plus, specialized for Int. So far so good. However, splus calls the *polymorphic* c+. If auto-specialization is transitive, I expect c+ to be specialized to Int.)

This kills performance: an explicit pragma

{-# SPECIALIZE plus :: Foo (Qux Int) -> Foo (Qux Int) -> Foo (Qux Int) #-}

results in transitive specialization as the docs indicate, so (+) is specialized and the code is 30x faster.

Is this expected behavior? Should I only expect (+) to be specialized transitively with an explicit pragma?

Note: this question is different from #5928 for two reasons:

I believe that no inlining is occuring, and hence I don't think inlining is interfering with specialization
I have INLINABLE pragmas on all relevant functions.

Note: this question is different from #8668 because I am asking about auto-specialization.

This question was originally posted on StackOverflow. As mentioned in the comments of that question, I am intentionally not fully applying the call to plus in Main, contrary to the suggestions in #8099. I'd love to see why I'm getting that behavior as well.

Trac metadata

Trac field	Value
Version	7.6.3
Type	Bug
TypeOfFailure	OtherFailure
Priority	normal
Resolution	Unresolved
Component	Compiler
Test case
Differential revisions
BlockedBy
Related	#5928, #8099, #8668
Blocking
CC
Operating system
Architecture

To upload designs, you'll need to enable LFS and have an admin enable hashed storage. More information