GHC never seems to inline eqTypeRep. That's no good! It produces a Maybe, which users will almost certainly want to match on immediately.
I really don't understand why it doesn't inline. It's a tiny function, none of the functions it calls are themselves marked INLINE, and the advantage to inlining seems obvious. Does the unsafeCoerce get in the way somehow? If so, how can we fix it?
Example:
{-# language GADTs, ScopedTypeVariables, TypeApplications, AllowAmbiguousTypes #-}moduleFoowhereimportType.Reflectionfoo::foralla.Typeablea=>Boolfoo=caseeqTypeRep(typeRep@a)(typeRep@Int)ofJust_->TrueNothing->False
compiles (with -O2) to
foo [InlPrag=NOINLINE] :: forall a. Typeable a => Bool[GblId, Arity=1, Str=<S,1*U>]foo = \ (@ a_a5un) ($dTypeable_a5up :: Typeable a_a5un) -> case eqTypeRep @ * @ * @ a_a5un @ Int ($dTypeable_a5up `cast` (base-4.10.1.0:Data.Typeable.Internal.N:Typeable[0] <*>_N <a_a5un>_N :: (Typeable a_a5un :: Constraint) ~R# (TypeRep a_a5un :: *))) lvl4_r69g of { Nothing -> GHC.Types.False; Just ds_d5CQ -> GHC.Types.True }
The continuation, some_benefit, and work-free lines all look good. I don't know how to read the guidance line. Has GHC allowed eqTypeRep to get too large to inline in most cases? If its decisions seem reasonable to you and other experienced folks, perhaps we should do something like
That says that the discount for being applied to known arguments i 52 and 128 for each argument respectively. The 210 is the overall size of the function. If a function is size 80 or less, it is inlined.
What I suspect is happening is that typeRepFingerprint is inlined into eqTypeRep which makes it big. This has some potential benefit as typeRepFingerprint is defined as a case but nothing further after that.
I think the correct solution is to mark typeRepFingerprint as {-# NOINLINE[0] typeRepFingerprint #-} so that it might only be inlined
in the last phase.
It seems that the fact that eqWord64 is marked as INLINE could be part of the problem as there is really no hope that we will resolve the equality of two type fingerprints computed by very complicated expressions. Another solution could be to stop deriving Eq for Fingerprint and instead hand writing the instance and adding a similar pragma as I suggested above.
mpickering, I'm happy to try it other ways, but there are a few things I don't understand:
What exactly do you think is wrong with the way I did it? I know it's not perfect (see point 3) but I don't know what you dislike about it.
How will the phased NOINLINE(s) you suggest reduce the size of the eqTypeRep unfolding?
While we don't ''generally'. want to inline the case expressions scrutinizing the arguments, we presumably do want to optimize the common case where one of those arguments is completely known at compile time (such as the typeRep @Int in my example). How does this desire fit into the rest of the program?
Without a proper analysis of why the problem occurred in the first place, it can't be evaluated why the fix works at all or whether it is correct.
Adding an INLINE pragma is a very big commitment when there is already a reason that the function is not inlined. Why is the wrapper necessary in addition to the pragma?
I think you are right thought that the NOINLINE[0] pragmas I suggested won't make a difference to the unfolding. So perhaps just adding an INLINABLE pragma is the correct solution?
I don't understand your 3rd point. There doesn't appear to be much advantage at all of knowing either argument as the quality condition will seemingly never simplify further?
Adding an INLINE pragma to a wrapper is exactly what GHC does in its worker/wrapper transformation, so I don't think it's too big a commitment. My third point is that if we know one of the arguments at compile time, we only need to inspect the other one at run time. There's no real need to follow a pointer to get the fingerprint of Int; we already know it! It's just constant folding.
Hmmm... Actually, that constant folding notion is complicated by the fact that the TypeRep builders won't inline to expose the fingerprints anyway, thanks to recursion. Should we INLINE the wrapper function as I proposed, and just NOINLINE the worker?
I still don't understand why you need the wrapper. Adding an INLINABLE pragma seems to be the least invasive way to fix this issue? Did you try that yet?
No, not yet. I'll do that later today if you like. But I also don't really understand what that will accomplish. INLINABLE tells GHC to make an unfolding. But GHC already makes an unfolding; it just doesn't use it. Does INLINABLE change what unfolding it makes? The purpose of the wrapper is to use the fact that we know more about what information will be available/useful at typical eqTypeRep call sites than the compiler does; we therefore have a better sense of which part will actually be useful to inline. If we inline the whole thing as it stands, then the call site will get two calls to typeRepFingerprint and a call to ==. That's certainly larger than a call to sameTypeRep; is it also better in enough cases to matter? I don't know. On the other hand, I know that in virtually all realistic cases we want to use inlining to eliminate the Maybe business.
mpickering, it looks like writing the wrapper and marking it INLINABLE does the trick. I wish I had a better sense of whyINLINABLE does that, but okay. I also don't really understand why this wrapper should be INLINABLE, whereas the wrappers GHC manufactures are INLINE. What difference merits that distinction? Nevertheless, I've updated the differential to use INLINABLE instead, and added some comments.
When a definition is marked INLINABLE the unfolding created is the same as the source definition and so it will be smaller.
In the worker wrapper transformation (which only applies to recursive functions) the point of inlining the wrapper is to eliminate the mutual recursion in the worker which creates a nice loop working just with unboxed values. None of that is going on here.