Opened 3 years ago

Closed 3 years ago

#8923 closed feature request (fixed)

Add SmallArray# type

Reported by: tibbe Owned by: tibbe
Priority: normal Milestone: 7.10.1
Component: Compiler Version: 7.9
Keywords: Cc: simonmar
Operating System: Unknown/Multiple Architecture: Unknown/Multiple
Type of failure: None/Unknown Test Case:
Blocked By: Blocking:
Related Tickets: Differential Rev(s):
Wiki Page:


Add a SmallArray# (and SmallMutableArray#) type that doesn't have a card table. This would

  • save some memory (two words per array),
  • make updates cheaper (no writing to the card table), and
  • make GC faster (no traversing the card table).

The use case is the unordered-containers package, which uses small arrays to implement a hash array mapped trie.

A starting point would be 0417404f5d1230c9d291ea9f73e2831121c8ec99/ghc, which added the card table to the current array type.

Attachments (2)

0001-RTS-support-for-new-SmallArray-type.patch (17.0 KB) - added by tibbe 3 years ago.
0002-Add-SmallArray-type-to-front-end.patch (16.2 KB) - added by tibbe 3 years ago.

Download all attachments as: .zip

Change History (12)

comment:1 Changed 3 years ago by hvr

Cc: simonmar added
Milestone: 7.10.1

Some older discussion about this topic:

17:41 < JaffaCake> I like the idea of a separate small array type
17:41 < hvr> JaffaCake: does that single card-table "bit" make sense for <128 elements btw?
17:42 < JaffaCake> no, because we already have a flag for the whole array that tells us whether anything was modified
17:42 < hvr> or rather: having a kind of threshold (say ~16 elements or so) below which the card-table is 0-sized
17:42 < JaffaCake> so it's no use at all
17:43 < hvr> so we could optimize the <128 element case, by omitting the card-table?
17:43 < hvr> and thus getting a 2+N footprint for <128 elements?
17:43 < JaffaCake> you'd need a different type though
17:44 < JaffaCake> otherwise the write barrier would need a conditional on the size, which is bad
17:44 < hvr> and if the 0th card-table bit would be implicit always?
17:45 < hvr> i.e. derived from the remaining the bits + the whole-array flag you mentioned?
17:45 < JaffaCake> how would you derive it from the remaining bits?
17:46 < hvr> isn't the "whole array flag" == 'and' over all card-table bits?
17:46 < tibbe> Having it conditional would mean that writes would introduce a branch, not good.
17:47 < hvr> nevermind, that wouldn't allow to deduce it in all cases
17:47 < JaffaCake> hvr: right :)
17:48 < JaffaCake> tibbe: yes, we really want to avoid branches in the write barrier
17:48  * hvr needs to read up about write-barriers in that GC-handbook I just bought...
17:48 < JaffaCake> it doesn't have to be a "small array" type as such, you could just reintroduce the old card-table-less arrays
17:49 < JaffaCake> it wouldn't be much code
17:49 < tibbe> JaffaCake: realistically, how much work would it be to add a new closure type for small arrays. I know we have talked about it 10^10 times but perhaps I'll actually do something about it.
17:50 < JaffaCake> you could do it in an afternoon, I would think, it's mostly mechanical
17:51 < JaffaCake> start from 0417404f5d1230c9d291ea9f73e2831121c8ec99
17:51 < JaffaCake> and just add back the old arrays
17:51 < JaffaCake> with a new primitive type, and primops etc.
17:52 < JaffaCake> I suggest not actually having a size requirement, you can call them small arrays if you want but they would work for all sizes
17:52  * JaffaCake must disappear
17:52 < hvr> JaffaCake: so they'd just be a different cost-model for the user to choose from...
17:53 < tibbe> JaffaCake: ok
17:53 < tibbe> JaffaCake: I guess one size field is required for the GC?
17:53 < JaffaCake> hvr: precisely
17:53 < JaffaCake> tibbe: yes

comment:2 Changed 3 years ago by tibbe

I've written a patch (untested, but compiles) that adds a StgSmallMutArrPtrs type to the RTS. I would like a preliminary, high-level review of these changes to know if there are any major parts I'm missing. If this patch looks mostly OK I will go ahead and implement the compiler changes (i.e. type system and code generator changes) and write some tests.

I'm was thinking of splitting the change into perhaps 3 commits for ease of reviewing: the RTS changes (roughly this patch), the compiler front-end changes (Type things), and finally the primops changes to code generator.

Changed 3 years ago by tibbe

comment:3 Changed 3 years ago by tibbe

simonpj, could you look at the second patch (attachment:0002-Add-SmallArray-type-to-front-end.patch​), which adds the new type to the type system.

comment:4 Changed 3 years ago by tibbe

Let me know if you prefer to do the code review on GitHub and I'll push my changes there.

comment:5 Changed 3 years ago by simonpj

I have not been following the thread, but it seems surprising to add tow new types to primpops.txt.pp with no accompanying primpops to manipulate them.

But the front-end changes (which amount only to adding stuff to PrelNames and TysPrim) look ok to me.


comment:6 in reply to:  5 Changed 3 years ago by tibbe

Replying to simonpj:

I have not been following the thread, but it seems surprising to add tow new types to primpops.txt.pp with no accompanying primpops to manipulate them.

I just wanted some early feedback on the core of the implementation. I will add all the primops in the next step.

But the front-end changes (which amount only to adding stuff to PrelNames and TysPrim) look ok to me.

Great, I will proceed with adding the primops and writing some tests.

comment:7 Changed 3 years ago by tibbe

I now got a working implementation. Both runtime and allocations are down 8.8% on my unordered-containers insert benchmark, which is already very heavily optimized. This is using the default GC settings. If I use +RTS -A6M (my L3 cache size), I get a 15% runtime speed-up.

Last edited 3 years ago by tibbe (previous) (diff)

comment:8 Changed 3 years ago by tibbe

Status: newpatch

Ready for review:

The change is relatively large, partly because there's a lot of test code. Once the change is in, I will see if I can refactor the code to have more code sharing with Array#.

comment:9 Changed 3 years ago by Johan Tibell <johan.tibell@…>

In 90329b6cc183b3cd05956ae6bdeb6ac6951549c2/ghc:

Add SmallArray# and SmallMutableArray# types

These array types are smaller than Array# and MutableArray# and are
faster when the array size is small, as they don't have the overhead
of a card table. Having no card table reduces the closure size with 2
words in the typical small array case and leads to less work when
updating or GC:ing the array.

Reduces both the runtime and memory allocation by 8.8% on my insert
benchmark for the HashMap type in the unordered-containers package,
which makes use of lots of small arrays. With tuned GC settings
(i.e. `+RTS -A6M`) the runtime reduction is 15%.

Fixes #8923.

comment:10 Changed 3 years ago by tibbe

Resolution: fixed
Status: patchclosed
Note: See TracTickets for help on using tickets.