Opened 7 months ago

Last modified 7 months ago

#8326 new task

Place heap checks common in case alternatives before the case

Reported by: jstolarek Owned by:
Priority: normal Milestone:
Component: Compiler Version: 7.7
Keywords: Cc:
Operating System: Unknown/Multiple Architecture: Unknown/Multiple
Type of failure: None/Unknown Difficulty: Unknown
Test Case: Blocked By:
Blocking: #8317 Related Tickets: #1498

Description (last modified by jstolarek)

We would like to have functions that check whether an Int# is a valid tag to represent Bool (see Note [Optimizing isTrue#] in ghc-prim):

isTrue# :: Int# -> Bool
isTrue# 1# = True
isTrue# _  = False

isFalse# :: Int# -> Bool
isFalse# 0# = True
isFalse# _  = False

We could use them with comparison primops like this:

f :: Int# -> Int
f x | isTrue# (x ># 0#) = I# x
    | otherwise         = -(I# x)

isTrue# is optimized away at the Core level:

A.f =
  \ (x_aqM :: GHC.Prim.Int#) ->
    case GHC.Prim.># x_aqM 0 of _ {
      __DEFAULT -> GHC.Types.I# (GHC.Prim.negateInt# x_aqM);
      1 -> GHC.Types.I# x_aqM

but the code genrator produces very bad Cmm code, because it pushes heap checks into case alternatives:

       cFd: // stack check
           if ((Sp + -16) < SpLim) goto cFr; else goto cFs;
       cFr: // not enough place on the stack, call GC
           R2 = R2;
           R1 = A.f_closure;
           call (stg_gc_fun)(R2, R1) args: 8, res: 0, upd: 8;
       cFs: // scrutinize (x ># 0#)
           _sEU::I64 = R2;
           _sEV::I64 = %MO_S_Gt_W64(R2, 0);
           if (_sEV::I64 != 1) goto cFg; else goto cFo;
       cFg: // False branch
           Hp = Hp + 16;
           if (Hp > HpLim) goto cFy; else goto cFx;
       cFy: // not enough heap, call GC
           HpAlloc = 16;
           I64[Sp - 16] = cFf;
           R1 = _sEV::I64;
           I64[Sp - 8] = _sEU::I64;
           Sp = Sp - 16;
           call stg_gc_unbx_r1(R1) returns to cFf, args: 8, res: 8, upd: 8;
       cFf: // re-do the False branch
           _sEU::I64 = I64[Sp + 8];
           Sp = Sp + 16;
           _sEV::I64 = R1;
           goto cFg;
       cFx: // RHS of False branch
           I64[Hp - 8] = GHC.Types.I#_con_info;
           I64[Hp] = -_sEU::I64;
           R1 = Hp - 7;
           call (P64[Sp])(R1) args: 8, res: 0, upd: 8;
       cFo: // True branch
           Hp = Hp + 16;
           if (Hp > HpLim) goto cFv; else goto cFu;
       cFv: // not enough heap, call GC
           HpAlloc = 16;
           I64[Sp - 16] = cFn;
           R1 = _sEV::I64;
           I64[Sp - 8] = _sEU::I64;
           Sp = Sp - 16;
           call stg_gc_unbx_r1(R1) returns to cFn, args: 8, res: 8, upd: 8;
       cFn: // re-do the True branch
           _sEU::I64 = I64[Sp + 8];
           Sp = Sp + 16;
           _sEV::I64 = R1;
           goto cFo;
       cFu: // RHS of True branch
           I64[Hp - 8] = GHC.Types.I#_con_info;
           I64[Hp] = _sEU::I64;
           R1 = Hp - 7;
           call (P64[Sp])(R1) args: 8, res: 0, upd: 8;

This results in average 2.5% increase in binary size. By contrast, if we use tagToEnum# instead of isTrue# heap check will be placed before case expression and the code will be significantly shorter (this is done by a special case-on-bool optimization in the code generator - see #8317). What we would like to do here is:

  1. compile case alternatives without placing heap checks inside them
  2. each compiled alternative should return amount of heap it needs to allocate
  3. code generator inspects amounts of heap needed by each alternative and either adds heap checks in alternatives or puts a single check before the case expression.

Getting this right might be a bit tricky.

  1. if all branches allocate some heap then we can just put a common heap check before the case. Note that we must allocate the higgest amount required by any of the alternatives and then alternatives that use less heap must retract the heap pointer accordingly.
  2. if we have two alternatives, one of which allocates heap and the other does not, we should place the heap check only in the alternative that allocates the stack. This will solve #1498.
  3. it is not clear to me what to do if we have combination of the above (more than one branch that allocates heap and at least one branch that does not). If we place heap check before the case expression we lose optimization of recursive functions and face the problem described in #1498. If we push heap checks into branches that allocate heap then we get code duplication, i.e. the problem that we're addressing in this ticket. I guess the only way to make correct decission here is to try different aproaches and measure their performance.

This ticket is mentioned on this wiki page and in the source code in Note [Optimizing isTrue#] in ghc-prim. Once this ticket is resolved we need to update these places accordingly.

Change History (2)

comment:1 Changed 7 months ago by Jan Stolarek <jan.stolarek@…>

In 53948f915140396acd1b80c6a7a252b2d1e12635/ghc:

Restore old names of comparison primops

In 6579a6c we removed existing comparison primops and introduced new ones
returning Int# instead of Bool. This commit (and associated commits in
array, base, dph, ghc-prim, integer-gmp, integer-simple, primitive, testsuite and
template-haskell) restores old names of primops. This allows us to keep
our API cleaner at the price of not having backwards compatibility.

This patch also temporalily disables fix for #8317 (optimization of
tagToEnum# at Core level). We need to fix #8326 first, otherwise
our primops code will be very slow.

comment:2 Changed 7 months ago by jstolarek

  • Description modified (diff)
Note: See TracTickets for help on using tickets.