eqTypeRep does not inline

changed milestone to %8.6.1

changed weight to 5

In case you have any doubt that eqTypeRep performance is important, remember that it's used in every single exception handler.

changed the description

Using an explicit TypeRep instead of a Typeable constraint makes no difference.

Compile with -dverbose-core2core and -ddump-inlinings and it will tell you exactly why it doesn’t get inlined.

mpickering, I get a bunch of things that look like this:

Considering inlining: eqTypeRep
  arg infos [TrivArg, TrivArg]
  interesting continuation CaseCtxt
  some_benefit True
  is exp: True
  is work-free: True
  guidance IF_ARGS [52 128] 210 0
  discounted size = 180
  ANSWER = NO

The continuation, some_benefit, and work-free lines all look good. I don't know how to read the guidance line. Has GHC allowed eqTypeRep to get too large to inline in most cases? If its decisions seem reasonable to you and other experienced folks, perhaps we should do something like

typeRepsSame :: forall k1 k2 (a :: k1) (b :: k2).
             TypeRep a -> TypeRep b -> Bool
typeRepsSame a b = typeRepFingerprint a == typeRepFingerprint b

eqTypeRep :: forall k1 k2 (a :: k1) (b :: k2).
             TypeRep a -> TypeRep b -> Maybe (a :~~: b)
eqTypeRep a b
  | typeRepsSame a b = Just (unsafeCoerce# HRefl)
  | otherwise        = Nothing
{-# INLINE eqTypeRep #-}

inlining a very thin wrapper, but I was surprised enough by the choice that I figured it might point to a more general inlining heuristic problem.

I think the proposed patch is the wrong idea. What does the core for eqTypeRep look like? that will surely resolve any confusion.

Looking at the line,

  guidance IF_ARGS [52 128] 210 0
  discounted size = 180
  ANSWER = NO

That says that the discount for being applied to known arguments i 52 and 128 for each argument respectively. The 210 is the overall size of the function. If a function is size 80 or less, it is inlined.

What I suspect is happening is that typeRepFingerprint is inlined into eqTypeRep which makes it big. This has some potential benefit as typeRepFingerprint is defined as a case but nothing further after that.

I think the correct solution is to mark typeRepFingerprint as {-# NOINLINE[0] typeRepFingerprint #-} so that it might only be inlined in the last phase.

It seems that the fact that eqWord64 is marked as INLINE could be part of the problem as there is really no hope that we will resolve the equality of two type fingerprints computed by very complicated expressions. Another solution could be to stop deriving Eq for Fingerprint and instead hand writing the instance and adding a similar pragma as I suggested above.

Could you test how these would work?

I looked at the interface file for Data.Typeable.Internal where eqTypeRep is defined. Here is the unfolding for eqTypeRep.

855825a70951c01dfa45bc3067373660
  eqTypeRep ::
    forall k1 k2 (a :: k1) (b :: k2).
    TypeRep a -> TypeRep b -> Maybe (a :~~: b)
  {- Arity: 2, Strictness: <S,1*U><S,1*U>,
     Unfolding: (\ @ k1
                   @ k2
                   @ a :: k1
                   @ b :: k2
                   (a1 :: TypeRep a)
                   (b1 :: TypeRep b) ->
                 let {
                   $j :: Word# -> Word# -> Maybe (a :~~: b)
                     <join 2> {- Arity: 2, Strictness: <S,U><L,U> -}
                   = \ (dt :: Word#)[OneShot] (dt1 :: Word#)[OneShot] ->
                     case b1 of wild {
                       TrType co co1
                       -> case fpTYPELiftedRep of wild1 { Fingerprint dt2 dt3 ->
                          case eqWord# dt dt2 of lwild {
                            DEFAULT -> Nothing @ (a :~~: b)
                            1#
                            -> case eqWord# dt1 dt3 of lwild1 {
                                 DEFAULT -> Nothing @ (a :~~: b)
                                 1# -> eqTypeRep1 @ k1 @ a @ k2 @ b } } }
                       TrTyCon dt2 dt3 ds ds1 ds2
                       -> case eqWord# dt dt2 of lwild {
                            DEFAULT -> Nothing @ (a :~~: b)
                            1#
                            -> case eqWord# dt1 dt3 of lwild1 {
                                 DEFAULT -> Nothing @ (a :~~: b)
                                 1# -> eqTypeRep1 @ k1 @ a @ k2 @ b } }
                       TrApp k4 a2 b2 co dt2 dt3 ds ds1 ds2
                       -> case eqWord# dt dt2 of lwild {
                            DEFAULT -> Nothing @ (a :~~: b)
                            1#
                            -> case eqWord# dt1 dt3 of lwild1 {
                                 DEFAULT -> Nothing @ (a :~~: b)
                                 1# -> eqTypeRep1 @ k1 @ a @ k2 @ b } }
                       TrFun r1 r2 a2 b2 co co1 dt2 dt3 ds ds1
                       -> case eqWord# dt dt2 of lwild {
                            DEFAULT -> Nothing @ (a :~~: b)
                            1#
                            -> case eqWord# dt1 dt3 of lwild1 {
                                 DEFAULT -> Nothing @ (a :~~: b)
                                 1# -> eqTypeRep1 @ k1 @ a @ k2 @ b } } }
                 } in
                 case a1 of wild {
                   TrType co co1
                   -> case fpTYPELiftedRep of wild1 { Fingerprint dt dt1 ->
                      $j dt dt1 }
                   TrTyCon dt dt1 ds ds1 ds2 -> $j dt dt1
                   TrApp k4 a2 b2 co dt dt1 ds ds1 ds2 -> $j dt dt1
TrFun r1 r2 a2 b2 co co1 dt dt1 ds ds1 -> $j dt dt1 }) -}

It appears that both functions I mentioned above are inlined (to no great effect) which make the definition quite large.

mpickering, I'm happy to try it other ways, but there are a few things I don't understand:

What exactly do you think is wrong with the way I did it? I know it's not perfect (see point 3) but I don't know what you dislike about it.
How will the phased NOINLINE(s) you suggest reduce the size of the eqTypeRep unfolding?
While we don't ''generally'. want to inline the case expressions scrutinizing the arguments, we presumably do want to optimize the common case where one of those arguments is completely known at compile time (such as the typeRep @Int in my example). How does this desire fit into the rest of the program?

Without a proper analysis of why the problem occurred in the first place, it can't be evaluated why the fix works at all or whether it is correct.

Adding an INLINE pragma is a very big commitment when there is already a reason that the function is not inlined. Why is the wrapper necessary in addition to the pragma?

I think you are right thought that the NOINLINE[0] pragmas I suggested won't make a difference to the unfolding. So perhaps just adding an INLINABLE pragma is the correct solution?

I don't understand your 3rd point. There doesn't appear to be much advantage at all of knowing either argument as the quality condition will seemingly never simplify further?

Adding an INLINE pragma to a wrapper is exactly what GHC does in its worker/wrapper transformation, so I don't think it's too big a commitment. My third point is that if we know one of the arguments at compile time, we only need to inspect the other one at run time. There's no real need to follow a pointer to get the fingerprint of Int; we already know it! It's just constant folding.

Hmmm... Actually, that constant folding notion is complicated by the fact that the TypeRep builders won't inline to expose the fingerprints anyway, thanks to recursion. Should we INLINE the wrapper function as I proposed, and just NOINLINE the worker?

I still don't understand why you need the wrapper. Adding an INLINABLE pragma seems to be the least invasive way to fix this issue? Did you try that yet?

No, not yet. I'll do that later today if you like. But I also don't really understand what that will accomplish. INLINABLE tells GHC to make an unfolding. But GHC already makes an unfolding; it just doesn't use it. Does INLINABLE change what unfolding it makes? The purpose of the wrapper is to use the fact that we know more about what information will be available/useful at typical eqTypeRep call sites than the compiler does; we therefore have a better sense of which part will actually be useful to inline. If we inline the whole thing as it stands, then the call site will get two calls to typeRepFingerprint and a call to ==. That's certainly larger than a call to sameTypeRep; is it also better in enough cases to matter? I don't know. On the other hand, I know that in virtually all realistic cases we want to use inlining to eliminate the Maybe business.

mpickering, it looks like writing the wrapper and marking it INLINABLE does the trick. I wish I had a better sense of why INLINABLE does that, but okay. I also don't really understand why this wrapper should be INLINABLE, whereas the wrappers GHC manufactures are INLINE. What difference merits that distinction? Nevertheless, I've updated the differential to use INLINABLE instead, and added some comments.

Trac metadata

Trac field	Value
Differential revisions	- → D4405

When a definition is marked INLINABLE the unfolding created is the same as the source definition and so it will be smaller.

In the worker wrapper transformation (which only applies to recursive functions) the point of inlining the wrapper is to eliminate the mutual recursion in the worker which creates a nice loop working just with unboxed values. None of that is going on here.

mentioned in commit 8529fbba

closed

Trac metadata

Trac field	Value
Resolution	Unresolved → ResolvedFixed

added runtime perf label

added Pnormal label

eqTypeRep does not inline

Child items 0

Activity