Opened 3 years ago

Closed 3 years ago

Last modified 23 months ago

#10359 closed bug (fixed)

Tuple constraint synonym led to asymptotic performance lossage

Reported by: axch Owned by:
Priority: normal Milestone: 8.0.1
Component: Compiler Version: 7.6.3
Keywords: Cc:
Operating System: Linux Architecture: x86_64 (amd64)
Type of failure: Runtime performance bug Test Case: perf/should_run/T10359
Blocked By: Blocking:
Related Tickets: Differential Rev(s):
Wiki Page:

Description (last modified by axch)

I encountered a situation where using a tuple constraint synonym produced an asymptotic runtime performance degradation, from linear to quadratic time. The enclosed Bug.hs reproduces the potentially relevant features of the problem. The function test there is quadratic, but should be linear. The file also contains a definition of test2, which differs only in eschewing the constraint synonym, and is linear, as it should be.

In more detail: the program tries to count to 20,000 in a rather contrived way. It maintains a Box, which holds the current count and a function that can accept an increment and a count to produce a new count (namely, (+)). The function do_step lifts incrementation to Boxes, using their contained function to update their stored count.

The first tricky thing is that the incrementing function stored in the Box is polymorphic in both arguments separately: it can accept increments that are not the same type as the stored count (and converts them). The second tricky thing is that I don't want to expose the type variable for the increment type (as opposed to the count type) to clients of the Box, so the Box's incrementing function is stored polymorphic (with an explicit forall). Finally, I want to impose the constraint (Fractional num, Real num) on allowable increments (even though this example only needs Real num).

This constellation of desiderata conspires to produce quadratic performance: doubling the parameter to test (but not test2) multiplies its runtime by 4. Inspecting the heap profile indicates a growing accumulation of closures when running test (but not test2). Inspecting the generated stg (enclosed) locates a potential culprit: apparently when do_step (but not do_step2) reconstructs the Box, it changes the func stored there to be a fresh closure that destructures the Numerical constraint tuple, constructs a fresh one, and calls the function that was in that slot previously with it. I hypothesize that this accumulates as a chain that performs a linear number of such destructurings and restructurings at each increment, leading to the observed quadratic runtime and linear memory growth.

Incidentally, I checked whether record wildcards were the issue (no: test4) and whether updating just the obj field solves it (yes: test3). Sadly, the latter solution is not directly usable in my actual application because of "Record update for insufficiently polymorphic field".

For reference: I compiled with ghc -O2 Bug.hs -fprof-auto -rtsopts -ddump-to-file -ddump-simpl -ddump-st

Attachments (3)

Bug.hs (1.9 KB) - added by axch 3 years ago.
Bug.dump-stg (8.8 KB) - added by axch 3 years ago.
Bug2.hs (1.8 KB) - added by axch 3 years ago.

Download all attachments as: .zip

Change History (15)

Changed 3 years ago by axch

Attachment: Bug.hs added

Changed 3 years ago by axch

Attachment: Bug.dump-stg added

comment:1 Changed 3 years ago by axch

Description: modified (diff)

comment:2 Changed 3 years ago by simonpj

First of all, very nice example. Thank you for making it so small and easy to work with.

I can see what's happening. The key part is what happens here:

do_step4 :: (Numerical num) => num -> Box a -> Box a
do_step4 number Box{ func = f, obj = x}
              = Box{ func = f, obj = f number x }

After elaboration (ie making dictionaries explicit) we get this:

do_step4 dn1 number (Box {func = f, obj = x })
  = Box { func = \dn2 -> f ( case dn2 of (f,r) -> f
                           , case dn2 of (f,r) -> r)
        , obj = f dn1 number x }

That's odd! We expected this:

do_step4 dn1 number (Box {func = f, obj = x })
  = Box { func = f
        , obj = f dn1 number x }

And indeed, the allocation of all those \dn2 closures is what is causing the problem. So we are missing this optimisation:

   (case dn2 of (f,r) -> f, case dn2 of (f,r) -> r)
===>
   dn2

If we did this, then the lambda would look like \dn2 -> f dn2 which could eta-reduce to f. But there are at least three problems:

  • The tuple transformation above is hard to spot
  • The tuple transformation is not quite semantically right; if dn2 was bottom, the LHS and RHS are different
  • The eta-reduction isn't quite semantically right: if f ws bottom, the LHS and RHS are different.

You might argue that the latter two can be ignored because dictionary arguments are special; indeed we often toy with making them strict.

But perhaps a better way to avoid the tuple-transformation issue would be not to construct that strange expression in the first place. Where is it coming from? It comes from the call to f (admittedly applied to no arguments) in Box { ..., func = f }. GHC needs a dictionary for (Numerical dum) (I changed the name of the type variable in func's type in the definition of Box). Since it's just a pair GHC says "fine, I'll build a pair, out of Fractional dum and Real dum. How does it get those dictionaries? By selecting the components of the Franctional dum passed to f.

If GHC said instead "I need Numerical dum and behold I have one in hand, it'd be much better. It doesn't because tuple constraints are treated specially. But if we adopted the idea in #10362, we would (automatically) get to re-use the Numerical dum constraint. That would leave us with eta reduction, which is easier.

As to what will get you rolling, a good solution is test3, which saves instantiating and re-generalising f. The key thing is to update all the fields except the polymorphic func field. I'm surprised you say that it doesn't work. Can you give a (presumably more complicated) example to demonstrate? Maybe there's a separate bug!

Changed 3 years ago by axch

Attachment: Bug2.hs added

comment:3 Changed 3 years ago by axch

In my actual use case, test3 doesn't compile. The difference is that Box is also existentially quantified over the type of obj, because I want a heterogeneous collection of Boxes. Translated to this context, the compilation error is

Bug2.hs:51:29:
    Record update for insufficiently polymorphic field: obj :: a
    In the expression: b {obj = func number obj}
    In an equation for `do_step3':
        do_step3 number b@(Box {..}) = b {obj = func number obj}

Offending source file enclosed. I reproduce test and test4, just to confirm that they still execute but with bad asymptotics; I elide test2, which is what I ended up doing to get rolling. Is this compilation error also a bug?

comment:4 Changed 3 years ago by simonpj

Ah I see. Well that rules out using record update.

What happens if you simulate tuple predicates, thus:

class (Fractional a, Real a) => Numerical a 
instance (Fractional a, Real a) => Numerical a

I think that'll work. You need FlexibleInstances, UndecidableInstances.

comment:5 Changed 3 years ago by axch

Haven't tried. Removing the abstraction in the one problematic place was a sufficient workaround.

comment:6 Changed 3 years ago by Simon Peyton Jones <simonpj@…>

In 130e93aab220bdf14d08028771f83df210da340b/ghc:

Refactor tuple constraints

Make tuple constraints be handled by a perfectly ordinary
type class, with the component constraints being the
superclasses:
    class (c1, c2) => (c2, c2)

This change was provoked by

  #10359  inability to re-use a given tuple
          constraint as a whole

  #9858   confusion between term tuples
          and constraint tuples

but it's generally a very nice simplification. We get rid of
 -  In Type, the TuplePred constructor of PredTree,
    and all the code that dealt with TuplePreds
 -  In TcEvidence, the constructors EvTupleMk, EvTupleSel

See Note [How tuples work] in TysWiredIn.

Of course, nothing is ever entirely simple. This one
proved quite fiddly.

- I did quite a bit of renaming, which makes this patch
  touch a lot of modules. In partiuclar tupleCon -> tupleDataCon.

- I made constraint tuples known-key rather than wired-in.
  This is different to boxed/unboxed tuples, but it proved
  awkward to have all the superclass selectors wired-in.
  Easier just to use the standard mechanims.

- While I was fiddling with known-key names, I split the TH Name
  definitions out of DsMeta into a new module THNames.  That meant
  that the known-key names can all be gathered in PrelInfo, without
  causing module loops.

- I found that the parser was parsing an import item like
      T( .. )
  as a *data constructor* T, and then using setRdrNameSpace to
  fix it.  Stupid!  So I changed the parser to parse a *type
  constructor* T, which means less use of setRdrNameSpace.

  I also improved setRdrNameSpace to behave better on Exact Names.
  Largely on priciple; I don't think it matters a lot.

- When compiling a data type declaration for a wired-in thing like
  tuples (,), or lists, we don't really need to look at the
  declaration.  We have the wired-in thing!  And not doing so avoids
  having to line up the uniques for data constructor workers etc.
  See Note [Declarations for wired-in things]

- I found that FunDeps.oclose wasn't taking superclasses into
  account; easily fixed.

- Some error message refactoring for invalid constraints in TcValidity

comment:7 Changed 3 years ago by Simon Peyton Jones <simonpj@…>

comment:8 Changed 3 years ago by simonpj

Resolution: fixed
Status: newclosed
Test Case: perf/should_run/T10359

comment:9 Changed 3 years ago by Austin Seipp <austin@…>

In 3cf8ecdc70cb295a2b9606080a1c7b5fa8eb16f4/ghc:

Revert multiple commits

This reverts multiple commits from Simon:

  - 04a484eafc9eb9f8774b4bdd41a5dc6c9f640daf Test Trac #10359
  - a9ccd37add8315e061c02e5bf26c08f05fad9ac9 Test Trac #10403
  - c0aae6f699cbd222d826d0b8d78d6cb3f682079e Test Trac #10248
  - eb6ca851f553262efe0824b8dcbe64952de4963d Make the "matchable-given" check happen first
  - ca173aa30467a0b1023682d573fcd94244d85c50 Add a case to checkValidTyCon
  - 51cbad15f86fca1d1b0e777199eb1079a1b64d74 Update haddock submodule
  - 6e1174da5b8e0b296f5bfc8b39904300d04eb5b7 Separate transCloVarSet from fixVarSet
  - a8493e03b89f3b3bfcdb6005795de050501f5c29 Fix imports in HscMain (stage2)
  - a154944bf07b2e13175519bafebd5a03926bf105 Two wibbles to fix the build
  - 5910a1bc8142b4e56a19abea104263d7bb5c5d3f Change in capitalisation of error msg
  - 130e93aab220bdf14d08028771f83df210da340b Refactor tuple constraints
  - 8da785d59f5989b9a9df06386d5bd13f65435bc0 Delete commented-out line

These break the build by causing Haddock to fail mysteriously when
trying to examine GHC.Prim it seems.

comment:10 Changed 3 years ago by Simon Peyton Jones <simonpj@…>

comment:11 Changed 3 years ago by Simon Peyton Jones <simonpj@…>

In ffc21506894c7887d3620423aaf86bc6113a1071/ghc:

Refactor tuple constraints

Make tuple constraints be handled by a perfectly ordinary
type class, with the component constraints being the
superclasses:
    class (c1, c2) => (c2, c2)

This change was provoked by

  #10359  inability to re-use a given tuple
          constraint as a whole

  #9858   confusion between term tuples
          and constraint tuples

but it's generally a very nice simplification. We get rid of
 -  In Type, the TuplePred constructor of PredTree,
    and all the code that dealt with TuplePreds
 -  In TcEvidence, the constructors EvTupleMk, EvTupleSel

See Note [How tuples work] in TysWiredIn.

Of course, nothing is ever entirely simple. This one
proved quite fiddly.

- I did quite a bit of renaming, which makes this patch
  touch a lot of modules. In partiuclar tupleCon -> tupleDataCon.

- I made constraint tuples known-key rather than wired-in.
  This is different to boxed/unboxed tuples, but it proved
  awkward to have all the superclass selectors wired-in.
  Easier just to use the standard mechanims.

- While I was fiddling with known-key names, I split the TH Name
  definitions out of DsMeta into a new module THNames.  That meant
  that the known-key names can all be gathered in PrelInfo, without
  causing module loops.

- I found that the parser was parsing an import item like
      T( .. )
  as a *data constructor* T, and then using setRdrNameSpace to
  fix it.  Stupid!  So I changed the parser to parse a *type
  constructor* T, which means less use of setRdrNameSpace.

  I also improved setRdrNameSpace to behave better on Exact Names.
  Largely on priciple; I don't think it matters a lot.

- When compiling a data type declaration for a wired-in thing like
  tuples (,), or lists, we don't really need to look at the
  declaration.  We have the wired-in thing!  And not doing so avoids
  having to line up the uniques for data constructor workers etc.
  See Note [Declarations for wired-in things]

- I found that FunDeps.oclose wasn't taking superclasses into
  account; easily fixed.

- Some error message refactoring for invalid constraints in TcValidity

- Haddock needs to absorb the change too; so there is a submodule update

comment:12 Changed 23 months ago by thomie

Milestone: 8.0.1
Note: See TracTickets for help on using tickets.