Opened 3 years ago

Closed 2 years ago

Last modified 23 months ago

#9960 closed bug (fixed)

Performance problem with TrieMap

Reported by: simonpj Owned by:
Priority: normal Milestone: 8.0.1
Component: Compiler Version: 7.8.4
Keywords: Cc:
Operating System: Unknown/Multiple Architecture: Unknown/Multiple
Type of failure: Compile-time performance bug Test Case:
Blocked By: Blocking:
Related Tickets: Differential Rev(s): Phab:D606
Wiki Page:

Description

Looking at #9805 reminded me. When investigating the performance of #9872 I found a major performance hole in TrieMap.

Suppose we have a handful H of entries in a TrieMap, each with a very large key, size K. If you fold over such a TrieMap you'd expect time O(H). That would certainly be true of an association list! But with TrieMap we actually have to navigate down a long singleton structure to get to the elements, so it takes time O(K*H).

This is pretty bad. In test T9872 I found that 90% of compile time was spent in TrieMap as we repeatedly built, folded, and then rebuilt TrieMaps. I fixed that by keeping smaller TrieMaps, but the underlying problem remains.

The point of a TrieMap is that you need to navigate to the point where only one key remains, and then things should be fast. So I think that TypeMap, say, should look like

data TypeMap a
  = EmptyTM
  | SingletonTM Type a
  | TM { tm_var   :: VarMap a
       , tm_app    :: TypeMap (TypeMap a)
       , tm_fun    :: TypeMap (TypeMap a)
       , tm_tc_app :: NameEnv (ListMap TypeMap a)
       , tm_forall :: TypeMap (BndrMap a)
       , tm_tylit  :: TyLitMap a
       }

The SingletonTM means that, once you are down to a single (key,value) pair, you stop and just use SingletonTM.

Tiresomely, because of the deBruijn thing, it'd need to be

...
  | SingletonTM (CMEnv, Type) a
...

and we'd need an auxiliary function for equality:

eqTypeModuloDeBruijn :: (CMEnv, Type) -> (CEnv, Type) -> Bool

But that's not hard. Very similar to the way RnEnv2 works.

Change History (18)

comment:1 Changed 3 years ago by ezyang

Note: this is basically just trie compression, although the specific scheme doesn't compress long intermediate nodes (and it would be tiresome to do so, since unlike strings we don't have a handy representation of an expression with a hole).

comment:2 Changed 3 years ago by simonpj

True. If you had just two big keys, say

  T A A A A A A A A A A A A A A B
  T A A A A A A A A A A A A A A C

then even with SingletonTM performance of a fold would be 2*10, not just 2 as it would with an association list. But I'm guessing that trees tend to diverge early in practice.

comment:3 Changed 3 years ago by ezyang

Differential Rev(s): Phab:D606

comment:4 Changed 3 years ago by ezyang

One subtlety: eqTypeModuloDeBruijn needs to respect equivalence by coreView, because that's how TrieMaps work too.

comment:5 Changed 3 years ago by ezyang

Nice improvement on T9872d

=====> T9872d(normal) 1507 of 4389 [0, 0, 0] 
cd ./perf/compiler && '/home/hs01/ezyang/ghc-tries-validate/inplace/bin/ghc-stage2' -fforce-recomp -dno-debug-output -no-user-package-db -rtsopts -fno-warn-tabs -fno-ghci-history -c T9872d.hs   +RTS -V0 -tT9872d.comp.stats --machine-readable -RTS  >T9872d.comp.stderr 2>&1
bytes allocated value is too low:
(If this is because you have improved GHC, please
update the test so that GHC doesn't regress again)
    Expected    T9872d(normal) bytes allocated: 739189056 +/-5%
    Lower bound T9872d(normal) bytes allocated: 702229603 
    Upper bound T9872d(normal) bytes allocated: 776148509 
    Actual      T9872d(normal) bytes allocated: 687562440 
    Deviation   T9872d(normal) bytes allocated:      -7.0 %

comment:6 Changed 3 years ago by ezyang

I had a maybe too clever idea.

We need equality over Types, etc; and this equality has to be modulo de Bruijn numbering. In the current proposed design, we bake CmEnv into the generic "empty, singleton, many" structure. This isn't great, because what about keys we don't need CmEnv?

This got me thinking: maybe we're looking at it wrong: the key in question should be data ClosedType = ClosedType CmEnv Type, and we define a TrieMap over *this*.

Now, when we define TrieMap instances, we don't have to synthesize an emptyCME to pass to the helper functions: we have all the information we need. To make a recursive call, we construct a new ClosedType with the updated CME and then call lookup on that. We can even define a plain old Eq instance on ClosedType respecting de Bruijn indices. Singleton k v now automatically stores the CmEnv; and we can make some helper functions which default to an emptyCME.

comment:7 Changed 3 years ago by simonpj

I think that's a cool idea. Maybe not ClosedType (it really isn't closed). Maybe DeBruijnType?

comment:8 Changed 3 years ago by Edward Z. Yang <ezyang@…>

In da64ab530512c36acd17c1dbcd3b5fcc681d128b/ghc:

Compress TypeMap TrieMap leaves with singleton constructor.

Suppose we have a handful H of entries in a TrieMap, each with a very large
key, size K. If you fold over such a TrieMap you'd expect time O(H). That would
certainly be true of an association list! But with TrieMap we actually have to
navigate down a long singleton structure to get to the elements, so it takes
time O(K*H).  The point of a TrieMap is that you need to navigate to the point
where only one key remains, and then things should be fast.

This is a starting point: we can improve the patch by generalizing the
singleton constructor so it applies to CoercionMap and CoreMap; I'll do this
in a later commit.

Summary: Signed-off-by: Edward Z. Yang <ezyang@cs.stanford.edu>

Test Plan: validate

Reviewers: simonpj, austin

Subscribers: carter, thomie

Differential Revision: https://phabricator.haskell.org/D606

GHC Trac Issues: #9960

comment:9 Changed 3 years ago by Edward Z. Yang <ezyang@…>

In 197f4e5aa3443c39e3ec2e53f8e595326ddaa524/ghc:

Generalize TrieMap compression to GenMap.

I still haven't applied the optimization to anything besides TypeMap.

Summary:
Signed-off-by: Edward Z. Yang <ezyang@cs.stanford.edu>

Depends On: D606

Reviewers: simonpj, austin

Subscribers: carter, thomie

Differential Revision: https://phabricator.haskell.org/D607

GHC Trac Issues: #9960

comment:10 Changed 3 years ago by Edward Z. Yang <ezyang@…>

In ccef01465366e11978fdad1bf28aeac2edde36c2/ghc:

Add 'DeBruijn' constructor, which generalizes "key modulo alpha-renaming."

Summary:
We need equality over Types, etc; and this equality has to be modulo alpha
renaming. Previously, we baked CmEnv into the generic "empty, singleton, many"
structure. This isn't great, really GenMap should be more generic than that.

The insight: we've defined the key wrong: the key should be *equipped*
with the alpha-renaming information (CmEnv) and a TrieMap queried with this.
This is what the DeBruijn constructor does.

Now, when we define TrieMap instances, we don't have to synthesize an emptyCME
to pass to the helper functions: we have all the information we need. To make a
recursive call, we construct a new DeBruijn with the updated CME and then
call lookupTM on that. We can even define a plain old Eq instance on DeBruijn
respecting alpha-renaming.  There are number of other good knock-on effects.

This patch does add a bit of boxing and unboxing, but nothing the optimizer
shouldn't be able to eliminate.

Signed-off-by: Edward Z. Yang <ezyang@cs.stanford.edu>

Test Plan: validate

Reviewers: simonpj, austin

Subscribers: carter, thomie

Differential Revision: https://phabricator.haskell.org/D608

GHC Trac Issues: #9960

comment:11 Changed 3 years ago by Edward Z. Yang <ezyang@…>

In 0bef02e49fb2907989127d77ae61ed48ba87ae18/ghc:

Apply GenMap to CoreMap and CoercionMap.

Summary:
The biggest chore is the Eq (DeBrujin a) instances (all the more a chore
because we already have an implementation of them, but a CmEnv is not an
RnEnv2), but otherwise a fairly mechanical transformation.

Signed-off-by: Edward Z. Yang <ezyang@cs.stanford.edu>

Test Plan: validate

Reviewers: simonpj, austin

Subscribers: carter, thomie

Differential Revision: https://phabricator.haskell.org/D609

GHC Trac Issues: #9960

comment:12 Changed 3 years ago by Edward Z. Yang <ezyang@…>

In c4e1ccb2fe6ca7a3100653aadd83f83722669e79/ghc:

Miscellaneous improvements to TrieMap, from D608 code review.

Summary:
    - Add SPECIALIZE pragmas for the lkG/xtG/mapG/fdG family of functions

    - Rename wrapEmptyXX to just emptyXX

    - New deBruijnize function for initializing DeBruijn elements

    - Some extra documentation

Signed-off-by: Edward Z. Yang <ezyang@cs.stanford.edu>

Test Plan: validate

Reviewers: simonpj, austin

Subscribers: carter, thomie

Differential Revision: https://phabricator.haskell.org/D611

GHC Trac Issues: #9960

comment:13 Changed 3 years ago by Edward Z. Yang <ezyang@…>

In 944329accebc86cc5ec989cd6b3c267d32fb6f26/ghc:

Newtype CoreMap and TypeMap so their keys are user-friendly.

Summary: Signed-off-by: Edward Z. Yang <ezyang@cs.stanford.edu>

Test Plan: validate

Reviewers: simonpj, austin

Subscribers: carter, thomie

Differential Revision: https://phabricator.haskell.org/D612

GHC Trac Issues: #9960

comment:14 Changed 2 years ago by ezyang

Resolution: fixed
Status: newclosed

comment:15 Changed 2 years ago by simonpj

Could we have a perf test-case? T9872d is only a 7% swing and I'm sure we could make a more extreme version.

comment:16 Changed 2 years ago by ezyang

I took a look, but unfortunately I don't have a very good idea for how to cause very deep TrieMaps to be generated. Some sort of very large types?

comment:17 Changed 2 years ago by simonpj

OK, well, let's just leave it then.

comment:18 Changed 23 months ago by thomie

Milestone: 8.0.1
Type of failure: None/UnknownCompile-time performance bug
Note: See TracTickets for help on using tickets.