Opened 8 years ago

Last modified 8 months ago

#917 new bug

-O introduces space leak

Reported by: claus.reinke@… Owned by:
Priority: lowest Milestone:
Component: Compiler Version: 6.5
Keywords: Cc: prokhorenkov@…, pho@…, mail@…, ezyang@…, rwbarton@…
Operating System: Unknown/Multiple Architecture: Unknown/Multiple
Type of failure: None/Unknown Difficulty: Unknown
Test Case: Blocked By:
Blocking: Related Tickets:

Description

consider the following variant of a popular space problem

initlast :: (()->[a]) -> ([a], a)
initlast xs = (init (xs ()), last (xs ()))

main = print $ case initlast (\()->[0..1000000000]) of
                 (init, last) -> (length init, last)

the long list has been wrapped in abstractions to avoid
sharing, gaining constant space processing rather than
the space leak in the original code - see

http://www.haskell.org/pipermail/haskell-cafe/2006-September/018447.html

http://www.haskell.org/pipermail/haskell-cafe/2006-September/018464.html

this works nicely if compiled without -O, but unfortunately,
-O reintroduces the space leak (apparently lifting and sharing
the common and space-expensive subexpression).

  1. I didn't see a ticket for this issue, so I wanted to record the example
  1. avoiding sharing isn't always possible, so it would be nice if one could

"bypass" sharing in such cases. in the old g-machine, that might have
been as simple as introducing a pragma that tells the compiler to omit
the update after the eval in the code for some subexpression (so the
original application node would not be overwritten by the space-expensive
result, trading time for space). is there a chance for a similar trick
in GHC? so one might write code like this (handwaving:-):

initlast :: [a] -> ([a], a)
initlast xs = (init ({-# COPY #-} xs), last ({-# COPY #-} xs))

main = print $ case initlast [0..1000000000] of
                 (init, last) -> (length init, last)

Change History (16)

comment:1 Changed 8 years ago by claus.reinke@…

actually, the relevant evals/updates would have been in in init and last, whereas the problem is visible in initlast, where the pragmas should be. so this would have required either a little more thinking if one wanted to do it in the compiler, or perhaps there might still be a simple way to tag thunks for "please don't update" in the runtime system?

comment:2 Changed 8 years ago by simonpj

  • Milestone set to _|_
  • Priority changed from normal to lowest

Thanks for recording this. You are right that full laziness can introduce space leaks. GHC provides no way to selectively disable it, but you can use -fno-full-laziness.

It's a good research topic. It'd be great if someone could find a way to solve it... but I doubt we'll do anything to GHC in the short term.

comment:3 Changed 6 years ago by simonmar

  • Architecture changed from Unknown to Unknown/Multiple

comment:4 Changed 6 years ago by simonmar

  • Operating System changed from Unknown to Unknown/Multiple

comment:5 Changed 5 years ago by simonpj

See also #3273 for another example of a space leak caused by the full laziness transformation.

Simon

comment:6 Changed 3 years ago by prokhorenkov

  • Cc prokhorenkov@… added
  • Type of failure set to None/Unknown

comment:7 Changed 3 years ago by igloo

Another example: #4276.

comment:8 Changed 3 years ago by PHO

  • Cc pho@… added

comment:9 Changed 3 years ago by batterseapower

An interesting example of this occurs with something like:

foo act = forM_ [1..10000] act

Currently full-laziness floats the list out of foo. Not only does this increase space usage, it also disables the fusion that would ordinarily occur between the list and forM_.

I guess the reason this is that the rules matcher sees that applying the foldr/build rule for that shared build might destroy sharing. However, maybe the rules matcher should be allowed to duplicate work as long as it was duplicated in the original source program?

So when we float out, we say "OK, we're going to share this work for now, but if we spot a RULE opportunity at a later stage we're going to undo this sharing.". This would at least fix this case, but it does not seem at all straightforward how to represent information about this "original" lack of sharing.

comment:10 Changed 3 years ago by rl

The CONLIKE pragma is useful in such cases. We could have a cheapBuild which would be exactly like build but would be marked as CONLIKE. Then, if GHC saw:

xs = cheapBuild f
foo x = foldr g z xs

it would apply the foldr/cheapBuild rule even though this loses sharing. We'd then be able to control which functions are cheap enough for sharing not to matter by using cheapBuild instead of build. This works reasonably well in DPH.

That's not to say that your suggestion doesn't make sense. It just seems rather harder to implement.

comment:11 Changed 2 years ago by judahj

One more example that I ran into this week: #5729.

comment:12 Changed 2 years ago by nomeata

  • Cc mail@… added

comment:13 follow-up: Changed 19 months ago by ezyang

While doing this automatically seems quite hard, would some generalized form of revertCAFs help folks out? The generalization would be on two axes: one is that it would apply to any closure on the heap, not just CAFs; and two is that you would be able to trigger the reversion from user code (some other thunk to force, which you can either throw away or poke with a seq).

comment:14 Changed 19 months ago by ezyang

  • Cc ezyang@… added

comment:15 in reply to: ↑ 13 Changed 19 months ago by nomeata

Replying to ezyang:

While doing this automatically seems quite hard, would some generalized form of revertCAFs help folks out? The generalization would be on two axes: one is that it would apply to any closure on the heap, not just CAFs; and two is that you would be able to trigger the reversion from user code (some other thunk to force, which you can either throw away or poke with a seq).

Do you want a command that reverts all closures on the heap, or any (specific) closure on the heap? I assume the latter.

It sounds like a variant of dup that works in hindsight, i.e. some thunks is being evaluated and later the programmer uses revertToThunk to turn back to the thunk.

I don’t think this would work well in general: The original thunk is likely to retain stuff that you want to have thrown away when you evaluate it.

It might be possibly reasonable to state in advance: I want to be able to revert this thunk later, please keep the necessary information around. I believe you can achieve this with dup (and manually remembering the copied thunk). That is, once it runs on your machine without segfaults :-)

comment:16 Changed 8 months ago by rwbarton

  • Cc rwbarton@… added
Note: See TracTickets for help on using tickets.