wiki:UnpackingArrays

Version 1 (modified by simonmar, 3 years ago) (diff)

--

Unpacking Arrays

(notes written by Simon Marlow and Johan Tibell at ICFP'11)

Motivation

We want to have a data type containing Array# like this:

data T a b = C a {-# UNPACK #-} (Array# b)
           | ...

and have the Array# unpacked into the constructor C, so that the representation would be something like this:

+------------------------+
| C | a | n | x1......xn |
+---+---+----------------+

i.e. the C info table, the field of type a, a length field, and the payload of the Array#.

For example we might use this to remove a layer of indirection in our immutable Array types.

Source API

For now instead of having an implicit unpacked Array# type, lets define a new type of unpacked arrays:

UnpackedArray#

(we can change the name later)

So the user would write:

data T a b = C a (UnpackedArray# b)
           | ...

and we expose some new primitives:

indexUnpackedArray# :: UnpackedArray# a -> Int# -> (# a #)

The question is, how are we going to represent UnpackedArray#?

Suppose that the compiler desugars UnpackedArray# to a pair of values:

  • a pointer value (maybe type InteriorArrayContainer# a, but we could revisit this): this is a pointer to the containing constructor, C
  • an Int#, which is the offset of the beginning of the unpacked array relative to the info table of the C constructor, that is, the offset of the array's length field.

Now, any variable of type UnpackedArray# also gets desugared into a pair of variables with the above types. The indexUnpackedArray# primitive really takes *two* arguments, so let's give the desugared one a different name:

indexInteriorArray# :: InteriorArrayContainer# a -> Int# -> Int# -> (# a #)

(another option is to represent UnpackedArray# a by (# InteriorArrayContainer# a, Int# #) and have a general unboxed tuple desugaring that flattens unboxed tuples in argument and field positions.)

Pattern matching and Indexing

Example: given our data type T above, suppose the user wrote

f :: T Int Float -> Float
f t = case t of 
        C x arr# -> case indexUnpackedArray# arr# x of 
                      (# f #) -> F# f

We would desugar this into

f :: T Int Float -> Float
f t = case t of 
        C x icarr# off# -> 
           case indexInteriorArray# icarr# off# x of 
                      (# f #) -> F# f

Note that the representation of the constructor C does not actually contain the two fields icarr# and off#, these are brought into existence (bound) by the code generator when generating the code for the C case alternative.

Construction

ToDo?

GC things

Further extension

  • unpacking more than one Array# into a constructor
  • unpacking other types, like MutVar#, MVar# etc. (mutable types are tricky because we need write barriers etc.)

List of changes that we think are necessary

This list is likely to be non-exhaustive...

  • Add primops, new primitive types (prelude/primops.txt, prelude/TysPrim.lhs)
  • Implement primops, (codeGen/CgPrimOp.lhs, codeGen/StgCmmPrim.lhs)
  • Add desugaring of unpacked array types in argument and field positions, and indexUnpackedArray# to indexInteriorArray# (somewhere under desugar/ probably)
  • Add code generation for case alternatives on constructors containing interior arrays, they have to bind variables to the InteriorArrayContainer# and offset (codeGen/CgCase.lhs, codeGen/StgCmmCase.lhs).