Version 7 (modified by jstolarek, 5 years ago) (diff)

--

# Implementing primitive Bool#

## The problem

Consider following fragment of code:

``` case (x <# 0#) || (x >=# width) || (y <# 0#) || (y >=# height) of
True  -> E1
False -> E2
```

This kind of code is common in image processing (and array programming in general) where one needs to check whether the `(x,y)` coordinates are within the image. Primitive comparison operators `<#` and `>=#` have type `Int# -> Int# -> Bool`. Logical OR operator `(||)` is defined as:

```(||)       :: Bool -> Bool -> Bool
True  || _ =  True
False || x =  x
```

in GHC.Classes (ghc-prim library) which is equivalent of:

```(||) x y = case x of
True  -> True
False -> y
```

During the compilation process (assuming the optimizations are turned on) the definition of `(||)` gets inlined and then case-of-case transform is performed successively. This results in following Core (cleaned up for clarity):

```case <# x 0 of _ {
False ->
case >=# x width of _ {
False ->
case <# y 0 of _ {
False ->
case >=# y height of _ {
False -> E2
True  -> E1
};
True -> E1
};
True -> E1
};
True -> E1
};
```

and in following assembler code:

```.Lc1rf:
testq %r14,%r14
jl .Lc1rk
cmpq %rdi,%r14
jge .Lc1rp
testq %rsi,%rsi
jl .Lc1ru
cmpq %r8,%rsi
jge .Lc1rz
movl \$Main_g2_closure+1,%ebx
jmp *0(%rbp)
.Lc1rk:
movl \$Main_g1_closure+1,%ebx
jmp *0(%rbp)
.Lc1rp:
movl \$Main_g1_closure+1,%ebx
jmp *0(%rbp)
.Lc1ru:
movl \$Main_g1_closure+1,%ebx
jmp *0(%rbp)
.Lc1rz:
movl \$Main_g1_closure+1,%ebx
jmp *0(%rbp)
```

There are five possible branches to take, although four of them have the same result. This is caused by code duplication introduced by case-of-case transform (see this blog post for a step by step derivation). According to Ben Lippmeier, who submitted the original bug report, mis-predicted branches are bad in object code because they stall the pipeline.

## Possible solutions and their consequences

Unnecessary branches could be eliminated by introducing primitive logical operators AND, OR and NOT (they will be denoted as `||#`, `&&#` and `not#` respectively). These operators would be strict and treat their logical parameters as unboxed integers (`1#` for `True` and `0#` for `False`). Result would be produced by using low-level bitwise operations. This means that if logical operators were chained together like in a given example only the final result would have to be inspected and thus only one branch would be necessary.

Note (Jan Stolarek): My first thought was that introducing primitve logical operators will require changing `Bool` data type to a primitive unboxed version. This might not be the case. Please treat two solution apporaches as possibly wrong.

### First approach

Treat `Bool` as a boxed version of primitive `Bool#`. `True` would be equivalent of `B# True#`, `False` of `B# False#`:

```data Bool = B# True# | B# False#

-- B# :: Bool# -> Bool
```

Not sure if this can be considered equivalent to what the Haskell Report says about Bool. We need to ensure that `Bool#` is populated only by `True#` and `False#` and that these two are translated to `1#` and `0#` in the Core. It should be impossible to write such a function at Haskell level:

```g :: Bool -> Int -> Int
g (B# b) (I# i) = I# (b + i)
```

This approach might require one additional case expression to inspect the value of `Bool` at the Core level. For example:

```f :: Int -> Int -> Int
f x y = if x > y
then x
else y
```

would compile to:

```case x of _ { I# xP ->
case y of _ { I# yP ->
case ># xP yP of _ {
B# bP -> case bP of _ { 1# -> e1; 0# -> e2 }
}
}
}
```

This would complicate Core a bit but it should be possible to compile such Core to exactly the same result as with normal `Bool`. This code assumes that `>#` has type `Int# -> Int# -> Bool`, but to truly avoid branching in the Core we need `.># :: Int# -> Int# -> Bool#` so that we get a primitive value that doesn't need to be inspected using case expression but can be directly used by primitive logical operators.

### Second approach

Second approach assumes creating type `Bool#` that is independent of type `Bool`. Boxing and unboxing would have to be done explicitly via additional functions:

```data Bool = True | False -- no changes here

bBox :: Bool# -> Bool
bBox 1# = True
bBox 0# = False

bUnbox :: Bool -> Bool#
bUnbox True  = 1#
bUnbox False = 0#
```

`Bool#` could not be implemented as an ADT because it is unlifted and unboxed, while ADT value constructors need to be boxed and lifted (see comments in compiler/types/TyCon.lhs). There would need to be some magical way of ensuring that `Bool#` is populated only by `#0` and `1#` and that these values cannot be mixed with unboxed integers. Perhaps this could be done by preventing programmer from explicitly creating values of that type (can this be done?) and allow her only to use values returned from functions.

Another problem with this approach is that it would introduce primitive logical operations
 # and &&# with type `Int# -> Int# -> Int#` - it is questionable whether anyone would want such operations available to the programmer. I think it is desirable to have primitive logical operators of type `Bool# -> Bool# -> Bool#`.

## Places of interest in the source code

The file prelude/primops.txt.pp defines PrimOps and their type signatures. An example definition looks like this:

```primop   IntGtOp  ">#"   Compare   Int# -> Int# -> Bool
with fixity = infix 4
```

Existing definitions should remain unchanged or the code using them would break and that is a Very Bad Thing. This would require creating new PrimOps:

```primop   IntGtOpB  ".>#"   Compare   Int# -> Int# -> Bool#
with fixity = infix 4
```

The tricky part here is `Compare`. This a value constructor of `PrimOpInfo` data type defined in prelude/PrimOp.lhs:

```data PrimOpInfo
= Dyadic      OccName         -- string :: T -> T -> T
Type
| Monadic     OccName         -- string :: T -> T
Type
| Compare     OccName         -- string :: T -> T -> Bool
Type
| GenPrimOp   OccName         -- string :: \/a1..an . T1 -> .. -> Tk -> T
[TyVar]
[Type]
Type
```

We would need new `PrimOpInfo` value to denote PrimOps of type `T -> T -> Bool#`. Appropriate functions like `primOpSig` and `getPrimOpResultInfo` would have to be adjusted accordingly.