Changes between Version 19 and Version 20 of Commentary/Compiler/Backends/LLVM/Alias


Ignore:
Timestamp:
Jan 23, 2012 7:26:33 PM (4 years ago)
Author:
dterei
Comment:

--

Legend:

Unmodified
Added
Removed
Modified
  • Commentary/Compiler/Backends/LLVM/Alias

    v19 v20  
    139139As a quick experiment, I hacked LLVM to accept "safe" annotations on loads and then manually annotated the LLVM assembly generated by GHC and that helped quite a bit. I suppose that's the way to go - we'll have to get this into LLVM in some form and then the backend will have to generate those annotations for loads which can't fail. I assume they are loads through the stack pointer and perhaps the heap pointer unless we're loading newly allocated memory (those loads can't be moved past heap checks). In any case, the stack pointer is the most important thing. I can also imagine annotating pointers (such as Sp) rather than instructions but that doesn't seem to be the LLVM way and it's also less flexible.
    140140
     141=== GHC Heap Check (case merging) ===
     142
     143'''Following is from Roman Leshchinskiy'''
     144
     145I investigated heap check a bit more and it seems to me that it's largely
     146GHC's fault. LLVM does do loop unswitching which correctly pulls out
     147loop-invariant heap checks but that happens fairly late in its pipeline
     148and heap checks interfere with optimisations before that.
     149
     150However, we really shouldn't be generating those heap checks in the first
     151place. Here is a small example loop:
     152
     153{{{
     154 foo as n = loop 0# 0.0##
     155  where
     156    loop i x
     157      | i >=# n = (# (), D# x #)
     158      | otherwise = loop (i +# 1#) (x *## indexDoubleArray# as i)
     159}}}
     160
     161This is the C-- that GHC generates:
     162
     163{{{
     164 sep_ret()
     165    ceO:
     166        Hp = Hp + 12;
     167        if (Hp > I32[BaseReg + 92]) goto ceT;
     168
     169        if (%MO_S_Ge_W32(R1, I32[Sp + 16])) goto ceX;
     170
     171        _seA::I32 = R1 + 1;
     172        F64[Sp + 0] = %MO_F_Mul_W64(F64[Sp + 0], F64[I32[Sp + 12] + ((R1 << 3) + 8)]);
     173        R1 = _seA::I32;
     174        Hp = Hp - 12;
     175        jump sep_info ();
     176    ceT:
     177        I32[BaseReg + 112] = 12;
     178        goto ceR;
     179    ceR:
     180        I32[Sp + 8] = sep_info;
     181        I32[BaseReg + 32] = 131327;
     182        jump stg_gc_ut ();
     183    ceX:
     184        I32[Hp - 8] = GHC.Types.D#_con_info;
     185        F64[Hp - 4] = F64[Sp + 0];
     186        I32[Sp + 16] = Hp - 7;
     187        R1 = GHC.Unit.()_closure+1;
     188        Sp = Sp + 16;
     189        jump (I32[Sp + 4]) ();
     190}}}
     191
     192Note how in each loop iteration, we add 12 to Hp, then do the heap check
     193and  then subtract 12 from Hp again. I really don't think we should be
     194generating that and then relying on LLVM to optimise it away.
     195
     196This happens because GHC commons up heap checks for case alternatives and
     197does just one check before evaluating the case. The relevant comment from
     198CgCase.lhs is this:
     199
     200A more interesting situation is this:
     201
     202{{{
     203          !A!;
     204          ...A...
     205          case x# of
     206            0#      -> !B!; ...B...
     207            default -> !C!; ...C...
     208}}}
     209
     210where !x! indicates a possible heap-check point. The heap checks
     211in the alternatives '''can''' be omitted, in which case the topmost
     212heapcheck will take their worst case into account.
     213
     214This certainly makes sense if A allocates. But with vector-based code at
     215least, a lot of the time neither A nor C will allocate '''and''' C will
     216tail-call A again so by pushing the heap check into !A!, we are now doing
     217it '''in''' the loop rather than at the end.
     218
     219It seems to me that we should only do this if A actually allocates and
     220leave the heap checks in the alternatives if it doesn't (perhaps we could
     221also use a common heap check if '''all''' alternatives allocate). I tried to
     222hack this and see what happens but found the code in CgCase and friends
     223largely incomprehensible. What would I have to change to implement this
     224(perhaps controlled by a command line flag) and is it a good idea at all?