Version 8 (modified by jstolarek, 5 years ago) (diff)

--

# Implementing primitive Bool#

## The problem

Consider following fragment of code:

``` case (x <# 0#) || (x >=# width) || (y <# 0#) || (y >=# height) of
True  -> E1
False -> E2
```

This kind of code is common in image processing (and array programming in general) where one needs to check whether the `(x,y)` coordinates are within the image. Primitive comparison operators `<#` and `>=#` have type `Int# -> Int# -> Bool`. Logical OR operator `(||)` is defined as:

```(||)       :: Bool -> Bool -> Bool
True  || _ =  True
False || x =  x
```

in GHC.Classes (ghc-prim library) which is equivalent of:

```(||) x y = case x of
True  -> True
False -> y
```

During the compilation process (assuming the optimizations are turned on) the definition of `(||)` gets inlined and then case-of-case transform is performed successively. This results in following Core (cleaned up for clarity):

```case <# x 0 of _ {
False ->
case >=# x width of _ {
False ->
case <# y 0 of _ {
False ->
case >=# y height of _ {
False -> E2
True  -> E1
};
True -> E1
};
True -> E1
};
True -> E1
};
```

and in following assembler code:

```.Lc1rf:
testq %r14,%r14
jl .Lc1rk
cmpq %rdi,%r14
jge .Lc1rp
testq %rsi,%rsi
jl .Lc1ru
cmpq %r8,%rsi
jge .Lc1rz
movl \$Main_g2_closure+1,%ebx
jmp *0(%rbp)
.Lc1rk:
movl \$Main_g1_closure+1,%ebx
jmp *0(%rbp)
.Lc1rp:
movl \$Main_g1_closure+1,%ebx
jmp *0(%rbp)
.Lc1ru:
movl \$Main_g1_closure+1,%ebx
jmp *0(%rbp)
.Lc1rz:
movl \$Main_g1_closure+1,%ebx
jmp *0(%rbp)
```

There are five possible branches to take, although four of them have the same result. This is caused by code duplication introduced by case-of-case transform (see this blog post for a step by step derivation). According to Ben Lippmeier, who submitted the original bug report, mis-predicted branches are bad in object code because they stall the pipeline.

## Workarounds

It is possible to work around the issue of code duplication by using GHC primops `tagToEnum#` and `dataToTag#`. These allow to distinguish between `True` and `False` by means of accessing the tag of a data type constructor. This means that `dataToTag#` can convert `True` to `1#` and `False` to `0#`, while `tagToEnum#` does the opposite (see paper Faster Laziness Using Dynamic Pointer Tagging for more details):

```ghci> import GHC.Exts
ghci> import GHC.Prim
ghci> :set -XMagicHash
ghci> I# (dataToTag# True)
1
ghci> I# (dataToTag# False)
0
ghci> (tagToEnum# 0#) :: Bool
False
ghci> (tagToEnum# 1#) :: Bool
True
ghci>
```

Having the possibility of converting `Bool` to an unboxed `Int#` allows us to compute results of logical expression by means of logical bitwise operations. The result can be converted back to a `Bool` so this is transparent on the Haskell source level, except for the fact that defined logical binary operators will be strict in both their arguments.

NOTE: Validity of this solution is based on assumption that `True` will always have a tag of `1#`, while `False` will have a tag of `0#`. Changing this invariant in the future would make these primitive logical operators invalid.

### First workaround

First workaround assumes converting each result of comparison into an unboxed `Int` and replacing `||` with `+#`:

``` case (dataToTag# (x <# 0#)) +# (dataToTag# (x >=# width)) +#
(dataToTag# (y <# 0#)) +# (dataToTag# (y >=# height)) of
0# -> E2 -- note that branch order is reversed
_  -> E1
```

This compiles to:

```case +#
(+#
(+# (dataToTag# (<# x 0)) (dataToTag# (>=# x width)))
(dataToTag# (<# y 0)))
(dataToTag# (>=# y height))
of _ {
__DEFAULT -> E1;
0 -> E2
}
```

Similarly we can convert logical && into multiplication.

### Second workaround

The above workaround is a bit clumsy: `dataToTag#`s make the code verbose and it may not be very obvious what the code is doing. Hence the second workaround, that defines an alternative logical `or` operator:

```(||#) :: Bool -> Bool -> Bool
(||#) x y = let xW = int2Word# (dataToTag# x)
yW = int2Word# (dataToTag# y)
zI = word2Int# (yW `or#` xW)
in tagToEnum# zI
```

This operator is defined in terms of primops `dataToTag#`, `tagToEnum#` and a bitwise or primop `or#`. Since the last one operates only on `Word`s we need to use `int2Word#` and `word2Int#` for conversion between these data types. Luckily, GHC does a good job of removing unnecessary conversions between data types. This means that:

``` case (x <# 0#) ||# (x >=# width) ||# (y <# 0#) ||# (y >=# height) of
True  -> E1
False -> E2
```

compiles to:

```case tagToEnum#
(word2Int#
(or#
(int2Word# (dataToTag# (>=# y height)))
(or#
(int2Word# (dataToTag# (<# y 0)))
(or#
(int2Word# (dataToTag# (>=# x width)))
(int2Word# (dataToTag# (<# x 0)))))))
of _ {
False -> E2;
True  -> E1
}
```

Primitive logical operators `&&#` and `not#` can be defined in a similar matter.

## Solutions

It seems that the best solution to the problem would be implementing second of the above workarounds as a separate primop. Alternatively we can implement primitive bitwise `or`, `and` and `xor` that work on `Int`s instead of `Word`s and define new logical operators using bitwise operators in one of the libraries.

## Places of interest in the source code

The file prelude/primops.txt.pp defines PrimOps and their type signatures.