Opened 4 years ago

Last modified 3 years ago

#9923 new feature request

Offer copy-on-GC sliced arrays

Reported by: dfeuer Owned by:
Priority: normal Milestone:
Component: Runtime System Version: 7.11
Keywords: Cc: simonmar
Operating System: Unknown/Multiple Architecture: Unknown/Multiple
Type of failure: Runtime performance bug Test Case:
Blocked By: Blocking:
Related Tickets: Differential Rev(s):
Wiki Page:

Description

It's sometimes useful to split an array and send each part on to a different computation. Slicing works very well for this sort of thing, as long as both computations run to completion or become garbage at about the same time. It can work badly, however, if one computation is completed or abandoned and the other is retained—it may only hold a tiny slice, but that's enough to keep the entire array alive.

The alternative, currently, is to copy the array to form two new arrays. This gets rid of the retention problem, but introduces a performance problem.

One way to solve these problems might be to offer sliced arrays supported by the RTS. On collection, the garbage collector would copy each slice separately, turning it into an independent (and independently collectible) array.

Change History (5)

comment:1 Changed 4 years ago by svenpanne

Hmmm, I doubt that this would work: The proposal means that the garbage collector must allocate memory during collections, which is quite the opposite what it is supposed to do. Put another way: If somebody (SimonM?) thinks that this is possible, I would be very interested in the details. Even if this somehow works, there are several pathological cases like having lots of 999999-element slices of a 1000000-element array.

Short-circuiting projection functions is OK (IIRC nhc did that first) because one doesn't have to allocate, but slicing is not really projecting.

comment:2 in reply to:  1 Changed 4 years ago by dfeuer

Replying to svenpanne:

Hmmm, I doubt that this would work: The proposal means that the garbage collector must allocate memory during collections, which is quite the opposite what it is supposed to do. Put another way: If somebody (SimonM?) thinks that this is possible, I would be very interested in the details. Even if this somehow works, there are several pathological cases like having lots of 999999-element slices of a 1000000-element array.

Short-circuiting projection functions is OK (IIRC nhc did that first) because one doesn't have to allocate, but slicing is not really projecting.

Yes, it would have to allocate while collecting, but I think this happens anyway. As for pathological cases, "Doctor! It hurts when I do this! Don't do that." The suggested mechanism would not replace the usual sort of array slicing in general; it would be an alternative mechanism for situations where it was useful.

comment:3 Changed 4 years ago by simonmar

The problem occurs if you have both a slice and a reference to the original array. The GC can't know until the end of GC whether it should just copy the slice or keep the whole array, and neither can it copy both the slice and the array (GC would require more memory than the original heap).

comment:4 in reply to:  3 Changed 4 years ago by dfeuer

Replying to simonmar:

The problem occurs if you have both a slice and a reference to the original array. The GC can't know until the end of GC whether it should just copy the slice or keep the whole array, and neither can it copy both the slice and the array (GC would require more memory than the original heap).

Why is that a problem, per se? The application I was thinking about when I came up with this idea was a function for converting an Array to a Seq. Currently, there are two ways to do this:

  1. You can lazily build a sequence "view" of the array. This is very fast, and only allocates subtrees of the finger tree as they are needed. But the entire underlying array is live until the last leaf of the tree is forced.
  1. You can build the sequence monadically, copying each element out of the array (using a nice little function from Data.Primitive.Array). This is sure to release the array, at the expense, of course, of actually building the whole sequence (a good bit larger than the array it represents) before using any of it.

The copy-on-GC concept would, I believe, offer a middle ground between these options, with performance close to (1) with the leak safety of (2).

comment:5 Changed 3 years ago by thomie

Owner: simonmar deleted
Type of failure: None/UnknownRuntime performance bug
Note: See TracTickets for help on using tickets.