Opened 2 years ago

Last modified 4 months ago

#11196 new bug

TypeInType performance regressions

Reported by: goldfire Owned by:
Priority: high Milestone: 8.4.1
Component: Compiler Version: 7.11
Keywords: TypeInType Cc:
Operating System: Unknown/Multiple Architecture: Unknown/Multiple
Type of failure: Compile-time performance bug Test Case:
Blocked By: Blocking:
Related Tickets: Differential Rev(s):
Wiki Page:

Description (last modified by thomie)

This ticket is to track the handful of performance regressions seen with the addition of TypeInType. It is quite possibly a good idea to break these out into separate tickets, but until we investigate, they're all bundled here.

The regressions are (all in bytes allocated, unless otherwise noted):

  • T3064, up by 14.9%
  • T5030, up by 61.8%
  • T5837, up by 13%
  • T5321Fun, up by 11%
  • T5631, up by 39%
  • T9872d, down by 22% (see below)
  • T9872a, up by 33.6%
  • T9872c, up by 59.4%
  • T9872b, up by 49.4%
  • T9675, up by 29.7%, and peak megabytes allocated up by 28.4%
  • haddock.base, up by 12.4%
  • haddock.Cabal, up by 9.5%

I did add an optimization around type family reduction (the function zonk_eq_types in TcCanonical) that could cause such a drastic reduction.

Change History (21)

comment:1 Changed 2 years ago by goldfire

Description: modified (diff)
Milestone: 8.0.1
Priority: normalhighest
Version: 7.10.27.11

comment:2 Changed 2 years ago by goldfire

Description: modified (diff)

comment:3 Changed 2 years ago by Richard Eisenberg <eir@…>

In 67465497/ghc:

Add kind equalities to GHC.

This implements the ideas originally put forward in
"System FC with Explicit Kind Equality" (ICFP'13).

There are several noteworthy changes with this patch:
 * We now have casts in types. These change the kind
   of a type. See new constructor `CastTy`.

 * All types and all constructors can be promoted.
   This includes GADT constructors. GADT pattern matches
   take place in type family equations. In Core,
   types can now be applied to coercions via the
   `CoercionTy` constructor.

 * Coercions can now be heterogeneous, relating types
   of different kinds. A coercion proving `t1 :: k1 ~ t2 :: k2`
   proves both that `t1` and `t2` are the same and also that
   `k1` and `k2` are the same.

 * The `Coercion` type has been significantly enhanced.
   The documentation in `docs/core-spec/core-spec.pdf` reflects
   the new reality.

 * The type of `*` is now `*`. No more `BOX`.

 * Users can write explicit kind variables in their code,
   anywhere they can write type variables. For backward compatibility,
   automatic inference of kind-variable binding is still permitted.

 * The new extension `TypeInType` turns on the new user-facing
   features.

 * Type families and synonyms are now promoted to kinds. This causes
   trouble with parsing `*`, leading to the somewhat awkward new
   `HsAppsTy` constructor for `HsType`. This is dispatched with in
   the renamer, where the kind `*` can be told apart from a
   type-level multiplication operator. Without `-XTypeInType` the
   old behavior persists. With `-XTypeInType`, you need to import
   `Data.Kind` to get `*`, also known as `Type`.

 * The kind-checking algorithms in TcHsType have been significantly
   rewritten to allow for enhanced kinds.

 * The new features are still quite experimental and may be in flux.

 * TODO: Several open tickets: #11195, #11196, #11197, #11198, #11203.

 * TODO: Update user manual.

Tickets addressed: #9017, #9173, #7961, #10524, #8566, #11142.
Updates Haddock submodule.

comment:4 Changed 2 years ago by thomie

Also:

buildtime/make 	1222 	+ 13.67% 	1389 	seconds

Source: https://perf.haskell.org/ghc/#revision/b5d5d83122c93c2a25839127edfd6b2df7ed6928 (if that link doesn't include the commit "Add kind equalities to GHC.", try https://perf.haskell.org/ghc/#revision/6746549772c5cc0ac66c0fce562f297f4d4b80a2).

comment:5 Changed 23 months ago by simonpj

Keywords: TypeInType added

comment:6 Changed 22 months ago by bgamari

Owner: set to goldfire

comment:7 Changed 22 months ago by simonpj

Richard, this is marked as "highest" priority for 8.0.1

Simon

comment:8 Changed 22 months ago by goldfire

Understood. Right now, I'm mired in the substantial refactor posted in Phab:D1777 (which you believe will also fix "highest" priority #11397). I could perhaps get to this later this week.

comment:9 Changed 20 months ago by goldfire

Owner: goldfire deleted

I'm afraid I have to declare defeat against this ticket. :(

I've spent the better part of the last few days trying to understand what's going on here, but failed utterly. I simply cannot find the source of the slowdown. It just seems like everything is doing more allocation, making it rather hard to find a culprit. I tried putting SCCs on low-level functions, too, but to no avail. The one moment of glory was finding that tcMatchTys was taking up 25% of the allocations on T9872c. I identified that the cause was a check to make sure that we don't match with a local, forall-bound variable. Of course, local forall-bound variables are rare, so I optimized the check by seeing if there are any forall-bound variables. This led to a whopping improvement of 20%! But then, inexplicably, when I turn off profiling, the exact same optimization led to no improvement.

My tinkering led to a few tiny optimizations, leading to 5% reduction in two test cases. This is a small victory, but doesn't really undo the sting of defeat. (Optimizations to land in the next day or so.)

My best guess is that the extra variables due to runtime polymorphism are what's gumming up the works. Previously, every binder without a type signature (including all local ones) were given a type alpha :: OpenKind. Now, every binder is given alpha :: TYPE rho; two metavariables where there was once one. And even when the type is solved, the kind now takes up more space than it did previously. But I have no data to back this up.

I'm very happy to advise someone who wants to take this on. As usual with performance problems, the devil is in the data collecting and pinpointing -- the fix (if there is one) is usually quite easy. But I'm sad to say that I'm formally giving up on this one. I have simply timed out if I am to graduate.

I will say that I don't believe this ticket needs to be a release blocker.

  • T5631 still is worse by 30% in bytes allocated. This is a happy-generated file with lots of mutually-recursive functions with no type signatures, exactly the pathological case for representation polymorphism. I doubt we can bring this one back to 7.10 levels.
  • T5837 is off from before TypeInType. But it's still better than 7.10 was!
  • T9675 has regressed in peak memory usage, but not allocations. This one doesn't seem to be representation polymorphism. But I couldn't nail it.
  • T9872 is an utter abuse of type families and looks nothing like a typical Haskell program.

And those are the only cases that are worse than before TypeInType. Work between when this ticket was posted and now cleared up the others. I'll leave it to someone else to agree with me that this needn't be a release blocker before changing the priority.

comment:10 Changed 20 months ago by bgamari

Milestone: 8.0.18.0.2
Priority: highesthigh

Alright, the reasons cited here seem reasonable. Demoting to high.

comment:11 Changed 20 months ago by goldfire

I just thought of a way to make forward progress on here: I still have the very long and very tortuous commit history of my development of TypeInType. (It's at the nokinds branch of my github fork.) Someone (including perhaps a future me) could just bisect. This would be complicated by the fact that many commits in there do not compile. But git's bisect mechanism has a way of dealing with that (git bisect skip). So this process would be quite long, but it also seems quite likely to lead to a solution. Unlike my previous attempts.

If anyone has written a script for use with git bisect run that might aid in this process, do post!

comment:12 Changed 20 months ago by bgamari

I hacked together WorkingConventions/Bisection many months ago while trying to track down #10528 (IIRC). It worked reasonably well, although these things are always bit fiddly. In your case things are a bit trickier still as you don't quite have a nice thumbs-up or thumbs-down result; really you want to be able to look at testsuite allocations and plot them over time. I recently been working on tools for this, although it requires input in the form produced by nomeata's ghc-speed builder

Last edited 20 months ago by bgamari (previous) (diff)

comment:13 Changed 20 months ago by Richard Eisenberg <eir@…>

In 0706a103/ghc:

Add two small optimizations. (#11196)

- Optimize zonking * to avoid allocation.
- Try to avoid looking at the free variables of a type in the
  pure unifier. We need look at the variables only in the case
  of a forall.

The performance results updates included in this also include
a regression, but the regression is not due to this patch. When
validating previous patches, the test case failed, but I was
unable to reproduce outside of validation, so I let it go by, thinking
the failure was spurious. Upon closer inspection, the perf number
was right at the line, and the wibble between a valiation environment
and a regular test run was enough to make the difference.

comment:14 Changed 20 months ago by goldfire

I had a thought while falling asleep last night about what may be causing all of this.

Previously, Type had a constructor FunTy :: Type -> Type -> Type. Now, the situation is more complex, with a TyBinder:

data Type = ... | ForAllTy TyBinder Type | ...
data TyBinder = Named TyVar VisibilityFlag | Anon Type

What was FunTy arg res is now ForAllTy (Anon arg) res. But this refactoring means that we have an extra box for every function type. And there are a lot of those. Chasing all of these boxes might mean that everything goes just a bit slower... which is exactly what I've observed.

This refactoring was not forced by TypeInType, but Simon and I thought it to be a good idea. It does simplify things in a bunch of places. It could be undone -- somewhat labor-intensive, but not theoretically hard. Even better would be to have unboxed and unpacked sums, which would allow GHC to inline TyBinder into Type. It also might be possible to simulate unboxed and unpacked sums through something like this:

data Type = ...
          | ForAllTyNamed TyVar VisibilityFlag Type
          | ForAllTyAnon Type Type
          | ...

repSplitForAllTy :: Type -> Maybe (TyBinder, Type)
repSplitForAllTy (ForAllTyNamed tv vis ty) = Just (Named tv vis, ty)
repSplitForAllTy (ForAllTyAnon arg res)    = Just (Anon arg, res)
repSplitForAllTy _                         = Nothing

pattern ForAllTy :: TyBinder -> Type -> Type
pattern ForAllTy bndr ty <- (repSplitForAllTy -> Just (bndr, ty))
  where
    ForAllTy (Named tv vis) ty = ForAllTyNamed tv vis ty
    ForAllTy (Anon arg) res    = ForAllTyAnon arg res

With these definitions, it would seem you could use ForAllTy just as it is done now. And my guess is that matching against, say (ForAllTy (Anon arg) res) -> ..., as is done somewhat frequently, would induce GHC to inline all of the splitting and matching (and allocation of the very-temporary TyBinder). That would need to be tested. But perhaps this is the answer to our woes.

If it's performant, this would also be a good recipe to disseminate to people clamoring for unboxed, unpackable sums. With the forthcoming support for pattern synonyms in TH (not for 8.0! but likely for 8.2), this might even be possible to automate and implement unpacked sums in TH until GHC can support them for real.

I won't be testing this anytime soon, sadly. Perhaps someone else can in time for 8.0.

comment:15 Changed 20 months ago by bgamari

With the forthcoming support for pattern synonyms in TH (not for 8.0! but likely for 8.2)

Given how quickly osa1 seems to be moving on unboxed sums it's entirely possible that we will have proper unpacking by 8.2 (at least I have my fingers crossed).

comment:16 Changed 18 months ago by thomie

Description: modified (diff)
Type of failure: None/UnknownCompile-time performance bug

For the record: T9872d went down by 22% (the description first claimed 92%, which probably resulted from a typo), from 726679784 to 566134504 bytes allocated. Since then it has dropped another 8% to 506691240 (64-bit).

comment:17 Changed 16 months ago by bgamari

For the record, FunTy was reintroduced by 77bb09270c70455bbd547470c4e995707d19f37d.

comment:18 Changed 15 months ago by bgamari

Milestone: 8.0.28.2.1

Bumping off to 8.2.

comment:19 Changed 7 months ago by bgamari

Milestone: 8.2.18.4.1

It seems unlikely that anything will happen on this for 8.2.

comment:20 Changed 4 months ago by dfeuer

Has anyone checked whether 77bb09270c70455bbd547470c4e995707d19f37d had an impact here?

comment:21 Changed 4 months ago by bgamari

I have not.

Note: See TracTickets for help on using tickets.