Opened 7 years ago

Closed 5 years ago

Last modified 4 years ago

#1136 closed bug (fixed)

High memory use when compiling many let bindings.

Reported by: igloo Owned by: simonmar
Priority: high Milestone: 6.12.1
Component: Compiler Version: 6.6
Keywords: performance Cc:
Operating System: Unknown/Multiple Architecture: Unknown/Multiple
Type of failure: Compile-time performance bug Difficulty: Unknown
Test Case: Blocked By:
Blocking: Related Tickets:

Description

Two programs that generate modules with lots of let bindings that GHC struggles to compile. With the first,

ghc -c J.hs

uses >100M of memory which seems a little high. For the second,

ghc -c J2.hs

got past 700M before I killed it, with both 6.6 and HEAD.

Attachments (5)

mkj.hs (446 bytes) - added by igloo 7 years ago.
mkj2.hs (471 bytes) - added by igloo 7 years ago.
J_mem_vs_time.png (7.0 KB) - added by igloo 6 years ago.
J_heap_profile.png (58.3 KB) - added by igloo 6 years ago.
J.prof (14.0 KB) - added by igloo 6 years ago.

Download all attachments as: .zip

Change History (24)

Changed 7 years ago by igloo

Changed 7 years ago by igloo

comment:1 Changed 7 years ago by igloo

comment:2 Changed 7 years ago by simonmar

  • Priority changed from normal to high

comment:3 Changed 6 years ago by simonpj

  • Milestone changed from 6.8 branch to 6.4.3

See also #1875, #783

comment:4 Changed 6 years ago by simonpj

  • Milestone changed from 6.4.3 to 6.8.3

comment:5 Changed 6 years ago by igloo

  • Keywords performance added

comment:6 Changed 6 years ago by simonmar

  • Type changed from bug to compile-time performance bug

Changed 6 years ago by igloo

Changed 6 years ago by igloo

Changed 6 years ago by igloo

comment:7 Changed 6 years ago by igloo

J_mem_vs_time.png shows the memory use against time for J.hs, with different values of numa/numb. It doesn't look like there is a complexity problem, but the constant factor seems high, using about 10k per binding. J_heap_profile.png is the heap profile for the largest case (400/20000), and peak usage works out at about 5k per binding (the difference presumably being due to the copying GC). Typecheck-Rename stands out, but there are also a few other large space users.

comment:8 Changed 6 years ago by simonpj

Interesting. I simplified it still more to a program of form:

a = let a0 = ()
    in let a1 = a0
        ...
    in let a10000 = a9999

and it still takes ages in the type checker, with a residency of 87M for 10,000 bindings. Splitting up the lets eliminates dependency analysis as the problem (in the type checker at least).

Do you think you might do a little bit more profiling to see where the typechecker/renamers costs are coming, for this simpler program?

Simon

Could

comment:9 Changed 6 years ago by igloo

For J2, consider this module:

module J where

data Foo a b c d = Foo a b c d

a = let
    a0 = a3
    a1 = a0
    a2 = a1
    a3 = a2
 in Foo a0 a1 a2 a3

We get (tidied up a bit):

==================== Desugar ====================
Rec {
J.a :: forall t0 t1 t2 t3. J.Foo t0 t1 t2 t3
J.a = \ (@ t0) (@ t1) (@ t2) (@ t3) ->
  letrec {
    res :: J.Foo t0 t1 t2 t3
    res =
      letrec {
        tup :: forall ttup. (ttup, ttup, ttup, ttup)
        tup =
          \ (@ ttup) ->
            letrec {
              tupa1 :: ttup
              tupa1 = tupa0;
              tupa2 :: ttup
              tupa2 = tupa1;
              tupa3 :: ttup
              tupa3 = tupa2;
              tupa0 :: ttup
              tupa0 = tupa3; } in
            (tupa0, tupa3, tupa2, tupa1);
        a0 :: forall ttup. ttup
        a0 = \ (@ ttup) -> case tup @ ttup of _ { (tup0, _, _, _) -> tup0 };
        a3 :: forall ttup. ttup
        a3 = \ (@ ttup) -> case tup @ ttup of _ { (_, tup1, _, _) -> tup1 };
        a2 :: forall ttup. ttup
        a2 = \ (@ ttup) -> case tup @ ttup of _ { (_, _, tup2, _) -> tup2 };
        a1 :: forall ttup. ttup
        a1 = \ (@ ttup) -> case tup @ ttup of _ { (_, _, _, tup3) -> tup3 };
      } in
      J.Foo
        @ t0
        @ t1
        @ t2
        @ t3
        (a0 @ t0)
        (a1 @ t1)
        (a2 @ t2)
        (a3 @ t3); } in
  res
end Rec }

I think that most of the space usage comes from the tuple
deconstructors. We should look into desugaring into something like
this instead:

==================== Desugar ====================
Rec {
J.a :: forall t0 t1 t2 t3. J.Foo t0 t1 t2 t3
J.a = \ (@ t0) (@ t1) (@ t2) (@ t3) ->
  letrec {
    res :: J.Foo t0 t1 t2 t3
    res =
      letrec {
        a0 :: forall ttup. ttup
        a0 = \ (@ ttup) -> a3;
        a3 :: forall ttup. ttup
        a3 = \ (@ ttup) -> a2;
        a2 :: forall ttup. ttup
        a2 = \ (@ ttup) -> a1;
        a1 :: forall ttup. ttup
        a1 = \ (@ ttup) -> a0;
      } in
      J.Foo
        @ t0
        @ t1
        @ t2
        @ t3
        (a0 @ t0)
        (a1 @ t1)
        (a2 @ t2)
        (a3 @ t3); } in
  res
end Rec }

Simon was worried about losing dictionary sharing if we did so, so we need to check into whether that causes a performance problem, and whether we can get the sharing some other way.

comment:10 Changed 6 years ago by simonpj

This patch should help

Thu Jun  5 05:44:23 PDT 2008  simonpj@microsoft.com
  * Desugar multiple polymorphic bindings more intelligently

But I suspect there is something else going on too in the typechecker, perhaps chains of mutable variables.

But first let's see if there's any improvement.

Simon

comment:11 Changed 6 years ago by igloo

  • Milestone changed from 6.8.3 to 6.10.1

comment:12 Changed 6 years ago by simonmar

  • Architecture changed from Unknown to Unknown/Multiple

comment:13 Changed 6 years ago by simonmar

  • Operating System changed from Unknown to Unknown/Multiple

comment:14 Changed 6 years ago by igloo

  • Milestone changed from 6.10.1 to 6.10.2

comment:15 Changed 5 years ago by igloo

  • Milestone changed from 6.10.2 to 6.10 branch

comment:16 Changed 5 years ago by igloo

  • Milestone changed from 6.10 branch to 6.12.1

comment:17 Changed 5 years ago by simonmar

  • Owner set to simonmar

I'll take another look at this before 6.12.1.

comment:18 Changed 5 years ago by simonmar

  • Resolution set to fixed
  • Status changed from new to closed

I think this is now fixed. With 6.11.today, on x86_64, J compiles in 6s using 106MB, J2 compiles in 16s using 84MB.

comment:19 Changed 4 years ago by simonmar

  • Type of failure set to Compile-time performance bug
Note: See TracTickets for help on using tickets.