Originally reported here, I distilled the example from this comment into a one file test case. Sigs.hs is exactly like NoSigs.hs, except for the fact that it has a bunch of extra type signatures that have a lot of holes. On my machine, this is what compilation times are (I gave up timing after 15 minutes):
Unfortunately, Sigs.hs and NoSigs.hs are too big to upload. Here they are on GitHub gists. Alternately, you can generate them (or rather generate very similar files, modulo some CPP) via:
The issues are AFAICT unrelated. The Happy issue is really about this - it just so happens that while attempting to reproduce that issue, I stumbled on #14683 (closed) (and incorrectly assumed it too was partial-type-signature related).
Here are some rough timings extracted from -Rghc-timing results for the files linked in #14766 (comment 148707). The table lists the mutator CPU times in seconds.
8.0.2
8.2.2
8.4.4
8.6.5
8.8.4
8.10.4
9.0.1
NoSigs.hs
17.112
16.566
16.798
1.685
1.635
1.380
1.026
Sig.hs
837.279
736.914
676.692
852.924
2.191
1.671
559.709
For NoSigs.hs there's a very remarkable improvement from GHC 8.4 to 8.6.
Even more remarkable is the development of the compile times for Sig.hs: From 8.6 to 8.8 we see a reduction of over 99%! Unfortunately, GHC 9.0.1 has introduced a massive regression. It would be great if we could fix this regression and return to the performance levels of 8.8 and 8.10.
The commit that appears to have caused the typechecker regression is 2b792fac, "Simple subsumption".
Maybe this bit from the commit message is related?!
* I fixed a serious bug in anonymous type holes. IN f :: Int -> (forall a. a -> _) -> Int that "_" should be a unification variable at the /outer/ level; it cannot be instantiated to 'a'. This was plain wrong. New fields mode_lvl and mode_holes in TcTyMode, and auxiliary data type GHC.Tc.Gen.HsType.HoleMode. This fixes #16292, but makes no progress towards the more ambitious #16082
It's mainly thanks to you for digging out these tickets, and giving better insight into what is going wrong. That is a very valuable service, thank you.
Adding a regression test as Simon suggests sounds good
The perf when the warnings are on is so egregiously terrible (from 5G to 950G allocation) that something bizarre must be happening. Could you profile a bit to find out?
commit a9129f9fdfbd358e76aa197ba00bfe75012d6b4fAuthor: Simon Peyton Jones <simonpj@microsoft.com>Date: Tue Mar 16 22:56:56 2021 +0000 Short-circuit warning generation for partial type signatures This Note says it all: Note [Skip type holes rapidly] ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ Suppose we have module with a /lot/ of partial type signatures, and we compile it while suppressing partial-type-signature warnings. Then we don't want to spend ages constructing error messages and lists of relevant bindings that we never display! This happened in #14766, in which partial type signatures in a Happy-generated parser cause a huge increase in compile time. The function ignoreThisHole short-circuits the error/warning generation machinery, in cases where it is definitely going to be a no-op. It makes a pretty big difference on the Sigs.hs example in #14766: Compile-time allocation GHC 8.10 5.6G Before this patch 937G With this patch 4.7G Yes, that's more than two orders of magnitude!
However the fix doesn't cure the underlying problem. As I wrote above
The perf when the warnings are on is so egregiously terrible (from 5G to 950G allocation) that something bizarre must be happening. Could you profile a bit to find out?
So this ticket should stay open, hoping for someone to devote a little time to finding how the hole-fits relevant-binding code is has such abysmally terrible performance.
-- | An expression or type holedata Hole = ExprHole UnboundVar -- ^ Either an out-of-scope variable or a "true" hole in an -- expression (TypedHoles) | TypeHole OccName -- ^ A hole in a type (PartialTypeSignatures)
, so for a PartialTypeSignature, the Hole will have the hole_sort = TrueExprHole _, which doesn't match, no work is done, and we hit:
findValidHoleFits env _ _ _ = return (env, empty)
So it shouldn't be the validHoleFits code interfering here. Definitely something strange going on here, I would love to see some profiling!
I have a quick look. Simon is right the cost comes from calling relevant_bindings. In particular it's the call to zonkTidyTcType which ends up accounting for the additional time.
I also noticed relevant_bindings doesn't short-circuit properly when the fuel is 0, but will continue iterating through the whole list. Not sure if this is intentional or not as the way the code is written the loop keeps adding relevant bindings as long as the FVs are different to ones we have already seen.