enormous compile times

changed weight to 3

changed milestone to %6.8.3

changed weight to 5

Thanks for the report. We have a few similar bugs; I've just keyworded them all "performance".

Trac metadata

Trac field	Value
Type	Task → Bug
Priority	low → normal

added Tbug + 1 deleted label and removed Ttask label

This program:

main :: IO ()
main = writeFile "W.hs" $ unlines $ map unlines foo

foo :: [[String]]
foo = [ "module J where",
        "class C a where",
        "    c :: a -> String",
        "    d :: a -> String",
        "    d x = c x",
        "    e :: a -> String",
        "    e x = c x"
      ] :
      [ ["data " ++ d ++ " = " ++ d,
         "instance C " ++ d ++ " where",
         "    c " ++ d ++ " = \"" ++ d ++ "\""]
        | i <- [1..1000],
          let d = 'A' : show i
      ]

generates a W.hs with lots of instances like

data A24 = A24
instance C A24 where
    c A24 = "A24"

Compiling with

ghc -fforce-recomp -O -c W.hs +RTS -p

this is the interesting bit of the profile:

                                                              individual
COST CENTRE              MODULE              no.    entries  %time %alloc
       specProgram       Specialise         8794        2006   0.0    0.0
        specBind         Specialise         8804        6006   0.0    0.0
         splitUDs        Specialise         8883        2002   3.1    4.0
          elemVarSet     VarSet             9174     1998000   0.1    0.2
          listToCallDetails Specialise      8985     2000002   1.0    2.3
           unionCalls    Specialise         9182     1998000  31.1   37.6
            tcCmpType    Type               9186    57864047   9.3    0.0
          intersectsVarSet VarSet           8885     4006001   0.2    0.0

The number of entries for the last 5 grows quadratically with the number of instances.

Very interesting, and entirely unexpected (to me). No time to look now, but good progress.

S

mentioned in commit 6246f573

assigned to @trac-igloo

Ian: you absolutely nailed the problem, thank you. I refactored a bit, and behold the specialiser pass runs really fast on that code now. Here's the patch, which might be worth merging to the branch:

Mon Apr 28 16:57:11 BST 2008  simonpj@microsoft.com
  * Fix Trac #1969: perfomance bug in the specialiser

You might want to check that it cures the problem on the original code?

I don't quite know how to add a test; maybe not worth the trouble.

Simon

Trac metadata

Trac field	Value
Type	Bug → MergeReq

added backport label and removed Tbug label

PS: I've just noticed that the original report described a problem even without -O, so that part at least can't be cured by my patch, since the specialiser only runs with -O.

So, perhaps you can see if that is the case, and re-open if so?

Simon

I've merged the patch, but I think there's more work to be done here.

Trac metadata

Trac field	Value
Type	MergeReq → Bug

changed milestone to %6.10.1

mentioned in commit a0a541a8

Trac metadata

Trac field	Value
Architecture	Multiple → Unknown/Multiple

Trac metadata

Trac field	Value
Operating system	Multiple → Unknown/Multiple

changed milestone to %6.10.2

Ian diagnosed the problem. Basically we're compiling lots of things like:

    data A24 = A24
    instance C A24 where
        c A24 = "A24"

where

    class C a where
        c :: a -> String
        d :: a -> String
        d x = c x
        e :: a -> String
        e x = c x

In the ticket 1000 instances are generated, but when I tried that the compiler was using 1.5G of RAM before I killed it. 100 instances is plenty to show a problem:

...
 doPassM        SimplCore                           36599     6   0.0    0.0
  simplifyPgmIO SimplCore                           36605    12   0.0    0.0
   OccAnal      SimplCore                           37381    10   0.0    0.0
    occurAnalysePgm  OccurAnal                      37399    10   0.0    0.0
     occAnalBind     OccurAnal                      37400  3018   0.0    0.0
      occAnalRec     OccurAnal                      37680  2717   0.0    0.0
       reOrderRec    OccurAnal                      43292   502   0.0    0.0
        reOrderCycle OccurAnal                      43310   200   1.0    0.3
         stronglyConnCompFromEdgedVerticesR Digraph 43342   200   0.0    0.0
          graphFromEdgedVertices            Digraph 43343   200  32.3   42.0

It looks to me like the problem is:

There's a binder

    (J.$dmd,
      \ (@ a_aue) ($dC_azi :: J.C a_aue) (x_auh :: a_aue) ->
        J.c @ a_aue $dC_azi x_auh)

in the (Rec pairs) being passed to occAnalBind, which, via its idRuleVars, ties everything in a big knot. I assume that what's going on here is that this is the "normal" c, and there are specialisation rules for uses of c at the various A* types? And to make things worse, I guess J.$dmd is not allowed to be a loop breaker, as then the interaction of rules and inlining could lead to an infinite loop?

So with 100 class instances, we end up with a 500 node SCC being given to reOrderRec/reOrderCycle. This finds a loop breaker, which presumably removes at most one instance's worth of definitions from the SCC. Then it calls stronglyConnCompFromEdgedVerticesR to recompute the scc, and we go round again. reOrderCycle gets called 200 times, building a new SCC each time round, with size decreasing roughly linearly from 500 down to

I'm not sure why the space usage is so high; I haven't really looked at that. I guess we're probably holding on to the old SCCs or something for some reason. Biographical profiling says that the space is all VOID, but I'm not convinced that's trustworthy.

mentioned in commit 3733f4b2

assigned to @simonpj and unassigned @trac-igloo

I'm fixing this.

assigned to @trac-igloo and unassigned @simonpj

OK the patch is this

Mon Mar 23 10:38:26 GMT 2009  simonpj@microsoft.com
  * Avoid quadratic complexity in occurrence analysis (fix Trac #1969)
    
    The occurrence analyser could go out to lunch in bad cases, because
    of its clever loop-breaking algorithm. This patch makes it bale out
    in bad cases.  Somewhat ad-hoc: a nicer solution would be welcome.
    
    See Note [Complexity of loop breaking] for the details.
  
    M ./compiler/simplCore/OccurAnal.lhs -22 +71

The fix is still not perfect, because it simply bales out in the tricky case. My feeling is that there is probably a good algorithm somewhere for "find the minimal set of nodes whose removal will make the graph acyclic", but I don't know of it. Furthermore, the problem is made tricker by the presence of RULES etc; see extensive comments in OccurAnal.

Anyway, baling out certainly fixes the bad complexity.

I think this could merge into 6.10.

Ian: what about a test-suite example? Maybe just a big enough instance to trip the timeout?

Simon

Trac metadata

Trac field	Value
Type	Bug → MergeReq

It's the Minimum Feedback Vertex Set problem; I think it's NP-hard. But there are good approximations that are cheap.

Do you know any such approximations? A quick google search shows lots of special cases, but nothing that jumped out at me.

Relevant points:

The vertices each have a "score", and I want to minimise the total score of zapped vertices, not just their number.
It's a directed graph.

I suppose it's possible that dualising the problem (so the vertices become edges and vice versa) might lead to other relevant material.

Simon

mentioned in commit b3335eb0

I've added a compiler space usage test, T1969.

Trac metadata

Trac field	Value
Test case	→ T1969

Replying to [ticket:1969#comment:32958 igloo]:

I've added a compiler space usage test, T1969.

Did you test it on OS X? I get

=====> T1969(normal)
cd ./space_leaks && '/Users/chak/Code/ghc-test/ghc/stage2-inplace/ghc' -fforce-r
ecomp -dcore-lint -dcmm-lint  -dno-debug-output -c T1969.hs   +RTS -tT1969.comp.
stats --machine-readable -RTS  >T1969.comp.stderr 2>&1
peak_megabytes_allocated 19 is more than maximum allowed 15
*** unexpected failure for T1969(normal)

in a validate run with the latest patches.

Yes, it consistently gave me 15 on OS X. I'll relax it to allow 19 too. Do let me know if you see any more of these failures; I'm not sure how loose the bounds should be.

changed milestone to %6.12.1

Trac metadata

Trac field	Value
Type	MergeReq → Bug

Trac metadata

Trac field	Value
TypeOfFailure	- → CompileTimePerformance

changed milestone to %6.12 branch

closed

We think the main performance issue is fixed now.

Trac metadata

Trac field	Value
Resolution	Unresolved → ResolvedFixed

mentioned in commit cc5ca21b

added Ttask label

added compiler perf label

added Pnormal label

enormous compile times

Child items 0

Activity