Changes between Version 19 and Version 20 of Commentary/Compiler/Backends/LLVM/Alias


Ignore:
Timestamp:
Jan 23, 2012 7:26:33 PM (2 years ago)
Author:
dterei
Comment:

--

Legend:

Unmodified
Added
Removed
Modified
  • Commentary/Compiler/Backends/LLVM/Alias

    v19 v20  
    139139As a quick experiment, I hacked LLVM to accept "safe" annotations on loads and then manually annotated the LLVM assembly generated by GHC and that helped quite a bit. I suppose that's the way to go - we'll have to get this into LLVM in some form and then the backend will have to generate those annotations for loads which can't fail. I assume they are loads through the stack pointer and perhaps the heap pointer unless we're loading newly allocated memory (those loads can't be moved past heap checks). In any case, the stack pointer is the most important thing. I can also imagine annotating pointers (such as Sp) rather than instructions but that doesn't seem to be the LLVM way and it's also less flexible. 
    140140 
     141=== GHC Heap Check (case merging) === 
     142 
     143'''Following is from Roman Leshchinskiy''' 
     144 
     145I investigated heap check a bit more and it seems to me that it's largely 
     146GHC's fault. LLVM does do loop unswitching which correctly pulls out 
     147loop-invariant heap checks but that happens fairly late in its pipeline 
     148and heap checks interfere with optimisations before that. 
     149 
     150However, we really shouldn't be generating those heap checks in the first 
     151place. Here is a small example loop: 
     152 
     153{{{ 
     154 foo as n = loop 0# 0.0## 
     155  where 
     156    loop i x 
     157      | i >=# n = (# (), D# x #) 
     158      | otherwise = loop (i +# 1#) (x *## indexDoubleArray# as i) 
     159}}} 
     160 
     161This is the C-- that GHC generates: 
     162 
     163{{{ 
     164 sep_ret() 
     165    ceO: 
     166        Hp = Hp + 12; 
     167        if (Hp > I32[BaseReg + 92]) goto ceT; 
     168 
     169        if (%MO_S_Ge_W32(R1, I32[Sp + 16])) goto ceX; 
     170 
     171        _seA::I32 = R1 + 1; 
     172        F64[Sp + 0] = %MO_F_Mul_W64(F64[Sp + 0], F64[I32[Sp + 12] + ((R1 << 3) + 8)]); 
     173        R1 = _seA::I32; 
     174        Hp = Hp - 12; 
     175        jump sep_info (); 
     176    ceT: 
     177        I32[BaseReg + 112] = 12; 
     178        goto ceR; 
     179    ceR: 
     180        I32[Sp + 8] = sep_info; 
     181        I32[BaseReg + 32] = 131327; 
     182        jump stg_gc_ut (); 
     183    ceX: 
     184        I32[Hp - 8] = GHC.Types.D#_con_info; 
     185        F64[Hp - 4] = F64[Sp + 0]; 
     186        I32[Sp + 16] = Hp - 7; 
     187        R1 = GHC.Unit.()_closure+1; 
     188        Sp = Sp + 16; 
     189        jump (I32[Sp + 4]) (); 
     190}}} 
     191 
     192Note how in each loop iteration, we add 12 to Hp, then do the heap check 
     193and  then subtract 12 from Hp again. I really don't think we should be 
     194generating that and then relying on LLVM to optimise it away. 
     195 
     196This happens because GHC commons up heap checks for case alternatives and 
     197does just one check before evaluating the case. The relevant comment from 
     198CgCase.lhs is this: 
     199 
     200A more interesting situation is this: 
     201 
     202{{{ 
     203          !A!; 
     204          ...A... 
     205          case x# of 
     206            0#      -> !B!; ...B... 
     207            default -> !C!; ...C... 
     208}}} 
     209 
     210where !x! indicates a possible heap-check point. The heap checks 
     211in the alternatives '''can''' be omitted, in which case the topmost 
     212heapcheck will take their worst case into account. 
     213 
     214This certainly makes sense if A allocates. But with vector-based code at 
     215least, a lot of the time neither A nor C will allocate '''and''' C will 
     216tail-call A again so by pushing the heap check into !A!, we are now doing 
     217it '''in''' the loop rather than at the end. 
     218 
     219It seems to me that we should only do this if A actually allocates and 
     220leave the heap checks in the alternatives if it doesn't (perhaps we could 
     221also use a common heap check if '''all''' alternatives allocate). I tried to 
     222hack this and see what happens but found the code in CgCase and friends 
     223largely incomprehensible. What would I have to change to implement this 
     224(perhaps controlled by a command line flag) and is it a good idea at all?