Opened 3 years ago

Last modified 15 months ago

#4937 new feature request

Remove indirections caused by sum types, such as Maybe

Reported by: tibbe Owned by:
Priority: normal Milestone: 7.6.2
Component: Compiler Version: 7.0.1
Keywords: Cc: johan.tibell@…, pumpkingod@…, as@…, michal.terepeta@…, dterei, hackage.haskell.org@…
Operating System: Unknown/Multiple Architecture: Unknown/Multiple
Type of failure: None/Unknown Difficulty:
Test Case: Blocked By:
Blocking: Related Tickets:

Description

While null pointers cause correctness issues in languages that allow them, they do have one benefit over Maybe when used to represent nullable values: they allow encoding the absence of a value cheaply. Using Maybe to represent a nullable value adds an extra indirection (pointer) to get to the wrapped value.

We could use the bits set by pointer tagging to encode that the pointer points directly to the value, rather than to an intermediate constructor. I believe JHC takes this approach.

A motivating example is this implementation of a hash array mapped trie (HAMT): A HAMT stores key/value pairs and subtrees in two arrays.

data Node k v = MkNode (Array (Maybe k)) (Array (Either v (Node k v))

Iff index i in the first array is Nothing, the second array contains a Node at index i, otherwise it contains a value.

Adding an extra indirection, using Maybe or Either, adds both space (in terms of extra points and data constructor headers) and time (in terms of additional cache misses and branches).

Change History (18)

comment:1 Changed 3 years ago by tibbe

  • Cc johan.tibell@… added

comment:2 Changed 3 years ago by pumpkin

  • Cc pumpkingod@… added

comment:3 Changed 3 years ago by thoughtpolice

  • Cc as@… added

comment:4 follow-up: Changed 3 years ago by simonmar

I believe we thought about this when we did pointer tagging. The problem is that while you could shortcut the Just constructor by packing its field into the pointer, there are no bits left for the tags on the field. e.g. suppose we have a pointer P pointing to a Just value (tag bits 11), and the field of the Just is Q, tag bits 10:

     P_11 -> Just (Q_10 -> V)

if we represeted this by just

    Q_11 -> V

then we've lost the tag bits (10) on Q, and we have no knowledge of V. So then we have to fall back to untagged pointers, and that could hurt performance a lot.

comment:5 in reply to: ↑ 4 Changed 3 years ago by tibbe

Replying to simonmar:

then we've lost the tag bits (10) on Q, and we have no knowledge of V. So then we have to fall back to untagged pointers, and that could hurt performance a lot.

Does it make a difference (performance wise) if we use the field in a polymorphic manner? The main use case I had in mind is something like this (simplified):

data Collection a = C (Array (Maybe a))

member :: Eq a => a -> Collection a -> Bool
member k coll = ...

The example is somewhat contrived but I believe it represents my real use case. k is used in a polymorphic manner: each element is tested to see if it's Just/Nothing followed by an == comparison.

Note that if I didn't store a polymorphic value in the Just constructor I could create my own version of Maybe that manually unpacks the value into the constructor, like so:

data MaybeInt = NothingI | JustI {-# UNPACK #-} !Int

comment:6 follow-up: Changed 3 years ago by rl

Not to detract from the discussion, but in this particular case, couldn't you do:

data Collection a = C (Array Bool) (Array a)

where the Bools indicate which of the elements are present and (Array a) either stores undefined for the missing elements (if you want O(1) random access) or doesn't contain the missing elements at all. You could try to improve performance by using a bit vector instead of an array of Bools.

comment:7 Changed 3 years ago by tibbe

After a discussion on IRC Simon M pointed out that we cannot actually remove the indirection i.e. turning Just Int into a pointer to an Int#, as everything on the heap needs to be tagged.

I'm closing this ticket as this doesn't seem feasible at the moment.

comment:8 in reply to: ↑ 6 Changed 3 years ago by tibbe

  • Resolution set to wontfix
  • Status changed from new to closed

Replying to rl:

Not to detract from the discussion, but in this particular case, couldn't you do:

data Collection a = C (Array Bool) (Array a)

where the Bools indicate which of the elements are present and (Array a) either stores undefined for the missing elements (if you want O(1) random access) or doesn't contain the missing elements at all. You could try to improve performance by using a bit vector instead of an array of Bools.

Yes. My real example is somewhat more complicated. I'm trying to translate the following design to Haskell: A single array is used to store both key/value pairs and subtrees. In the even positions of the array we store either null or a pointer to the key. In the odd positions we store a pointer to the value, iff the preceding element is non-null, or a pointer to a subtree. This could be represented as:

data Elem k v = Null | Key k | Value v | Subtree (Tree k v)
data Tree k v = Tree (Array (Elem k v))

except we need to traverse two pointers to get to the key. We can segment the array like so

data Tree2 k v = Tree2 (Array (Maybe k)) (Array (Either v (Tree2 k v))

but the keys, values and subtrees are still two pointer away. We can the apply your encoding to get to the keys more easily, at the cost of a one word bitmap and some computational overhead. I don't know how to address the indirection to the values and subtrees though.

comment:9 Changed 3 years ago by simonmar

  • Resolution wontfix deleted
  • Status changed from closed to new

I just realised that there is something we could do here. Let's define the representation of a strict Maybe a as either

  • some distinguished pointer (StrictNothing), or
  • the field of the Just

since the field of the Just is not strict, it can never be StrictNothing. We could choose NULL for StrictNothing, but that would mean the GC would have to ignore NULL pointers, and that's an extra test in the GC inner loop.

So this would let us implement {-# UNPACK #-} !(Maybe a), and strict arguments of type Maybe a could be unboxed.

Of course this wouldn't be specific to Maybe, but would work with any type with consisting of one unary constructor and any number of nullary constructors.

comment:10 Changed 3 years ago by igloo

  • Milestone set to 7.2.1

comment:11 follow-up: Changed 3 years ago by tibbe

Couldn't we represent a sum of pointers (e.g. data Either a b = Left a | Right b) as a new heap object, a sum-pointer object, which uses tag bits to distinguish the constructors? This, together with the optimization to distinguish nullary constructors, would allow this data type to be represented with a single pointer:

data Elem k v = Null | Key k | Value v | Subtree (Tree k v)

comment:12 in reply to: ↑ 11 Changed 3 years ago by simonmar

Replying to tibbe:

Couldn't we represent a sum of pointers (e.g. data Either a b = Left a | Right b) as a new heap object, a sum-pointer object, which uses tag bits to distinguish the constructors?

But Either already is a heap object, and we have tag bits to distinguish Left from Right in the pointer itself.

This, together with the optimization to distinguish nullary constructors, would allow this data type to be represented with a single pointer:

data Elem k v = Null | Key k | Value v | Subtree (Tree k v)

So think about the representation of Elem (Elem k v) v'. You can't represent that with a single pointer, because you need two sets of tag bits.

comment:13 Changed 3 years ago by michalt

  • Cc michal.terepeta@… added

comment:14 Changed 2 years ago by dterei

  • Cc dterei added

comment:15 Changed 2 years ago by igloo

  • Milestone changed from 7.4.1 to 7.6.1

comment:16 Changed 19 months ago by igloo

  • Milestone changed from 7.6.1 to 7.6.2

comment:17 Changed 15 months ago by liyang

  • Cc hackage.haskell.org@… added

comment:18 Changed 15 months ago by dterei

Note: See TracTickets for help on using tickets.