Better cache performance in Array#

A common use case for arrays is to loop over them, performing an operation on each element of the array. For ByteArray#s, this is guaranteed to have good cache performance: When element i is accessed, the CPU's prefetcher will ensure that the element i+1 is already sitting in cache, so the next loop will not suffer a cache miss. Currently, the Array# and SmallArray# types do not significantly benefit from the prefetcher. This feature request asks for a few new primops that would improve the situation.

My understanding is that the "raw" element in an Array# is actually a pointer to the region in memory associated with the "logical" element of the array. When looping, the subsequent pointers are guaranteed to get prefetched, but the subsequent logical element is not guaranteed to be prefetched. In particular, if we'll only get the benefits of prefetching on the logical elements if we're lucky enough that the memory manager happened to allocate these logical elements on the heap next to each other. I don't know enough about GHC internals to check the source code, but my experiments demonstrate that this is not currently happening in my use case, resulting in rather significant performance degradations. (The experiments are rather complicated, so I'm not attaching them.)

I propose adding the primops:

packArray#      :: Array# a      -> b -> b
packSmallArray# :: SmallArray# a -> b -> b

packMutableArray#      :: MutableArray# s a      -> State# s -> State# s 
packSmallMutableArray# :: SmallMutableArray# s a -> State# s -> State# s

These operations would have the semantic effect of noops, with the exception that they request that the logical elements in the arrays be arranged adjacently in memory. Thus, future loops over the arrays would benefit from CPU prefetching. There are a number of ways the memory manager could handle these requests, and I don't particular care which is chosen. For example, the memory manager could rearrange the memory immediately or wait until the next garbage collection pass and do it then.

Trac metadata

Trac field	Value
Version	7.10.1
Type	FeatureRequest
TypeOfFailure	OtherFailure
Priority	normal
Resolution	Unresolved
Component	Compiler
Test case
Differential revisions
BlockedBy
Related
Blocking
CC
Operating system
Architecture

To upload designs, you'll need to enable LFS and have an admin enable hashed storage. More information