Opened 9 years ago

Closed 7 years ago

#2762 closed bug (fixed)

Excessive heap usage

Reported by: igloo Owned by: igloo
Priority: low Milestone: 7.2.1
Component: Compiler Version: 6.11
Keywords: Cc: batterseapower@…, michal.terepeta@…
Operating System: Unknown/Multiple Architecture: Unknown/Multiple
Type of failure: Compile-time performance bug Test Case: T2762
Blocked By: Blocking:
Related Tickets: Differential Rev(s):
Wiki Page:

Description

With Main.hs:

module Main (main) where

import InputOutput

main :: IO ()
main = do
          let content1 = concat (replicate 1000000 "1x") ++ "0"
          let i1 = fst $ input content1
          view i1

          let content2 = concat (replicate 1000001 "1y") ++ "0"
          let i2 = fst $ input content2
          view i2

view :: [Char] -> IO ()
view [] = return ()
view (i : is) = i `seq` view is

and InputOutput.hs:

module InputOutput (input) where

class InputOutput a where
    input :: String -> (a, String)

instance InputOutput Char where
    input (x : bs) = (x, bs)

instance InputOutput a => InputOutput [a] where
    input ('0':bs) = ([], bs)
    input ('1':bs) = case input bs of
                     (x, bs') ->
                         case input bs' of
                         ~(xs, bs'') -> (x : xs, bs'')

according to

ghc -O -prof -auto-all --make Main.hs -fforce-recomp
./Main +RTS -h

heap usage goes up to about 20M with the HEAD, but only about 200 bytes with 6.8.2.

This is with 6.11.20081108, but I started investigating with the HEAD when I saw similar problems with more-or-less 6.10.1.

Attachments (7)

ghc_6_8.png (23.0 KB) - added by igloo 9 years ago.
ghc_6_11.png (13.5 KB) - added by igloo 9 years ago.
ImproveDefArity.Initial.darcs_patch (58.9 KB) - added by batterseapower 9 years ago.
Arity analysis whipped together on 29th November
working-A-vs-arity-def-site-B (138.0 KB) - added by batterseapower 9 years ago.
Nofib suite results (before vs after analysis added, -O1)
Main.hs.STGsyntax-0 (21.5 KB) - added by batterseapower 9 years ago.
Atom STG output for compiler without arity analysis
Main.hs.2.STGsyntax-0 (20.5 KB) - added by batterseapower 9 years ago.
Atom STG output for compiler with arity analysis version B (aka "Initial")
test-032011.png (20.6 KB) - added by michalt 7 years ago.
The heap usage.

Download all attachments as: .zip

Change History (23)

Changed 9 years ago by igloo

Attachment: ghc_6_8.png added

Changed 9 years ago by igloo

Attachment: ghc_6_11.png added

comment:1 Changed 9 years ago by simonmar

Ok, here's what's happening. The input method for InputOutput [a] gets compiled to this (STG syntax, so we can see the closures):

InputOutput.input1 =
    \r srt:(0,*bitmap*) [$dInputOutput_smx]
        let {
          input2_smz =
              \u srt:(0,*bitmap*) [] InputOutput.input1 $dInputOutput_smx; } in
        let {
          sat_snm =
              \r srt:(1,*bitmap*) [ds_smB]
              ...
        } in  sat_snm;

so, sat_snm is the function that does the work, and it has input2_smz as a free variable.

When this function is called, it allocates closures for sat_snm and input2_smz, and proceeds to enter sat_snm. Eventually this call evaluates input2_smz, which is another call to input1, which creates more closures, and so on.

Normally all these closures would be GC'd immediately, but they don't in this example, because the Main module has this CAF:

Main.input :: GHC.Base.String -> ([GHC.Types.Char], GHC.Base.String)
Main.input =
  InputOutput.input1
    @ GHC.Types.Char
    (InputOutput.a `cast` ...)

This CAF evaluates to an unbounded chain of sat_snm closures: remember sat_snm has input2 as a free var, which itself evaluates to another sat_snm closure, and so on). The CAF is retained because it is used by the second call to input.

The solution seems simple: inline input2, then the two closures in input1 disappear. We ought to be able to tell that input2 is a value - perhaps the recursion gets in the way?

comment:2 Changed 9 years ago by simonpj

Cc: batterseapower@… added

What we need here is a decent arity analysis, which Max is thinking about. We have

  f = \d. let f1 = f d in
           \y. ...f1...

The leak comes from a shared partial application of f:

  let t = f d in (t e1, t e2)
or
  map (f d) es

This is readily fixed. f looks as if it has arity 1, but a simple fixpoint argument shows that it really has arity 2. When we see that, we see that we can eta-expand it to get

  f = \d.\x.  let f1 = f d in ...f1 d...

Now we can inline f1, and all is good.

I had not previously realised that this arity analysis stuff could fix space leaks.

Simon

comment:3 Changed 9 years ago by batterseapower

Well, I've had a look at the arity analysis idea. I'll attached a preliminary version of the analysis. It does indeed fix some "bad" arities, including the one in this example, but it's reasonably costly compile-time wise and doesn't have a non-neglible effect on allocations or runtime in Nofib, with two exceptions:

  • Bspt allocates a lot more, because the call to concat in Stdlib.seQuence is saturated by the eta-expansion step. This causes the simplifier to inline concat at that use site (even though the arguments are boring, since concat is a thin wrapper around a function poly_go). This in turn leads to RULES not spotting any instances of concat during the compilation of the Init and BSPT modules, so foldr/build fusion just fails to happen. (NB: you need to add the missing phase activation [~1] to the concat RULE in GHC.List to see this effect)
  • Atoms runtime apparently increases a lot, though there is no apparent difference in the STG generated in the two versions (also attached). I can't work out why this is.

Even if we fix these problems is it really worth introducing this analysis, given that it's effects are so small?

Changed 9 years ago by batterseapower

Arity analysis whipped together on 29th November

Changed 9 years ago by batterseapower

Nofib suite results (before vs after analysis added, -O1)

Changed 9 years ago by batterseapower

Attachment: Main.hs.STGsyntax-0 added

Atom STG output for compiler without arity analysis

Changed 9 years ago by batterseapower

Attachment: Main.hs.2.STGsyntax-0 added

Atom STG output for compiler with arity analysis version B (aka "Initial")

comment:4 in reply to:  3 ; Changed 9 years ago by igloo

Replying to batterseapower:

Well, I've had a look at the arity analysis idea. I'll attached a preliminary version of the analysis. It does indeed fix some "bad" arities, including the one in this example,

Great, thanks!

but it's reasonably costly compile-time wise

Even if we fix these problems is it really worth introducing this analysis, given that it's effects are so small?

Fixing the space leak is important to me. If the performance is an issue, then perhaps it should be an -O2-only optimisation?

comment:5 in reply to:  4 Changed 9 years ago by batterseapower

Replying to igloo:

Fixing the space leak is important to me. If the performance is an issue, then perhaps it should be an -O2-only optimisation?

Perhaps. It seems to cost about 9% of the compile time of a -O build, so it's not a /massive/ problem.

Anyway, I looked a bit more at this arity stuff tonight and I've managed to fix a few bugs in the analysis, as well as make it work across several modules - this makes it /slightly/ more effective, and has fixed the problems I mentioned above.

I'm going to continue tuning and extending it, and will talk to Simon PJ when I have something potentially viable for HEAD.

comment:6 Changed 9 years ago by igloo

I think I can work around the cases of this in my real program by pulling the instance methods out into recursive functions, e.g.

instance InputOutput a => InputOutput [a] where
    input = inputList

inputList :: InputOutput a => String -> ([a], String)
inputList ('0':bs) = ([], bs)
inputList ('1':bs) = case input bs of
                     (x, bs') ->
                         case inputList bs' of
                         ~(xs, bs'') -> (x : xs, bs'')

In some cases it gets a little uglier, but still tolerable, e.g. with:

data Seq p from to where
    Cons :: p from mid -> Seq p mid to -> Seq p from to
    Nil :: Seq p here here

I need to hide the from and to arguments to stop it leaking space:

data Hide2 t where
    Hide2 :: t a b -> Hide2 t

hide2 :: t a b -> Hide2 t
hide2 x = Hide2 x

unhide2 :: Hide2 t -> t a b
unhide2 (Hide2 x) = unsafeCoerce x

inputSeq :: InputOutput2 p => ByteString -> (Hide2 (Seq p), ByteString)
inputSeq bs = case BS.head bs of
              0 -> (hide2 Nil, BS.tail bs)
              1 -> case input2 (BS.tail bs) of
                   (x, bs') ->
                       case inputSeq bs' of
                       ~(xs, bs'') ->
                           (hide2 (x `Cons` unhide2 xs), bs'')
              _ -> error "InputOutput Seq: Bad value"

So I'd still like this to be fixed, as it would be one less thing to worry about when writing code, but I don't think it's actually blocking me currently.

comment:7 Changed 9 years ago by igloo

Milestone: 6.10.26.12 branch

comment:8 Changed 8 years ago by igloo

Milestone: 6.12 branch6.12.3

comment:9 Changed 7 years ago by igloo

Milestone: 6.12.36.14.1
Priority: normallow

comment:10 Changed 7 years ago by igloo

Milestone: 7.0.17.0.2

comment:11 Changed 7 years ago by igloo

Milestone: 7.0.27.2.1

comment:12 Changed 7 years ago by michalt

Cc: michal.terepeta@… added
Type of failure: Compile-time performance bug

I've just run the example with GHC 7.0.2 and the heap usage is just over 800 bytes. So it seems that the bug is fixed.

Changed 7 years ago by michalt

Attachment: test-032011.png added

The heap usage.

comment:13 in reply to:  12 Changed 7 years ago by simonmar

Resolution: fixed
Status: newclosed

Replying to michalt:

I've just run the example with GHC 7.0.2 and the heap usage is just over 800 bytes. So it seems that the bug is fixed.

Thanks for testing - let's assume it's fixed then. This is certainly a case that arity analysis should catch.

comment:14 Changed 7 years ago by simonpj

Resolution: fixed
Status: closednew

Ian, could you add a performance regression test based on this ticket? And then re-close?

Thanks

Simon

comment:15 Changed 7 years ago by simonpj

Owner: set to igloo

comment:16 Changed 7 years ago by igloo

Resolution: fixed
Status: newclosed
Test Case: T2762

Test added.

Note: See TracTickets for help on using tickets.