Changes between Version 65 and Version 66 of Commentary/Compiler/StackAreas


Ignore:
Timestamp:
Jul 18, 2008 2:13:33 PM (7 years ago)
Author:
dias
Comment:

--

Legend:

Unmodified
Added
Removed
Modified
  • Commentary/Compiler/StackAreas

    v65 v66  
    9696 1. Walk over the graph, keeping track of the stack pointer, and rewrite each address of a stack slot with an offset from the stack pointer. 
    9797 
    98 One way to assign stack slots is to traverse the flow graph in a reverse post-order depth-first search, allocating stack slots to each `Area`. Subsequently, when we see an `Area` again, we reuse the previous allocation. A stack slot is allocated the first time we encounter: 
    99  * A use or definition of a `RegSlot` for a variable 
    100  * A call site, where we allocate stack slots for the `CallArea` associated with the return continuation. Note that we cannot allocate stack slots for a `CallArea` when we first see a use or definition of a slot in the `CallArea`. Why? At a call site, the `CallArea` holding the function arguments must be the youngest live area on the stack. If we were to eagerly allocate stack slots for the `CallArea`, we wouldn't know how much space to leave on the stack for variable spills that we will yet encounter on the path to the call site. By waiting until we reach the call site, we can be sure that we have already seen all the variables that are spilled before the call site. 
    101  
    102  
    103  
    104  
    105 The algorithm keeps two pieces of information: 
    106  * A many-to-one map from each allocated stack slot to its assigned location on the stack: this map is used to make sure that we use only a single stack location for each stack slot. 
    107  * The (one-to-many) inverse of the first map. This map gives the contents of each stack location. 
    108  
    109 We want to reuse stack slots whenever possible. Therefore, if a stack slot is assigned to a location {{{l}}}, we need a variation of liveness analysis to identify final references to {{{l}}}. Under this analysis, we say that a stack slot is "live out" if the stack slot may be referenced as either a use or definition in a subsequent instruction. If a location {{{l}}} is not live out, we can reallocate {{{l}}} to other stack slots. 
    110  
    111 The algorithm makes a forward pass over the code, making the following decisions for stack slots in each instruction: 
    112  * If the stack slot has already been allocated, leave it alone. 
    113  * If the (unallocated) stack slot is part of a {{{CallArea}}}, then allocate the entire {{{CallArea}}} after the youngest stack slot that is live out. We might be able to do better than this, of course, but probably not without significantly more effort. 
    114  * If the (unallocated) stack slot is a variable spill, then allocate it to any stack location that is not live out (or, more accurately, that does not contain a stack slot that is live out). If possible, this stack slot should not come from the youngest end of the stack. 
    115  * If the instruction is a reload from a stack slot {{{s}}} in a {{{CallArea}}} to a variable {{{x}}} whose stack slot is unallocated, and {{{s}}} is not live out, then allocate {{{stack<x>}}} to {{{s}}}. We avoid allocating {{{stack<x>}}} to {{{s}}} if {{{s}}} is live out because we don't have full interference information to ensure that they can share a stack slot. 
    116  
    117 We could rewrite the graph in a subsequent pass or in the same pass. The latter seems preferable. 
    118  
    119 Let's walk through an example. The following is a simple program in pseudo-C--: 
    120  
    121 {{{ 
    122    if <> then 
    123      x, y = f(a); 
    124      ... <uses y> 
    125    else 
    126      x, z = g(a); 
    127    spill<x> // some source code resulting in x getting spilled 
    128 }}} 
    129  
    130 The program may be lowered to the following C-- code: 
    131  
    132 {{{ 
    133    if <> then goto L0; else goto L2; 
    134 L0: 
    135    sp := stack<L1 + 2>; 
    136    m[stack<L1 + 1>] := L1_info_table; 
    137    m[stack<L1 + 2>] := a; 
    138    call f returns to L1; 
    139 L1:  // on entry to L1, sp == stack<L1+3> 
    140    x := m[stack<L1 + 2>]; 
    141    y := m[stack<L1 + 3>]; 
    142    ... <uses y> 
    143    goto L4; // L1 stack area dead out 
    144 L2: // stack<y> is live out 
    145    sp := stack<L3 + 2>; 
    146    m[stack<L3 + 1>] := L3_info_table; 
    147    m[stack<L3 + 2>] := a; 
    148    call f returns to L3; 
    149 L3:  // on entry to L3, sp == stack<L3+3> 
    150    x := m[stack<L3 + 2>]; 
    151    y := m[stack<L3 + 3>]; 
    152    goto L4; // L3 stack area dead out 
    153 L4: // y dead out 
    154    m[stack<x>] := x; 
    155    m[stack<z>] := x; 
    156 L5: 
    157 }}} 
    158  
    159 Let's assume we visit the blocks in lexical order, which is what a reverse post-order depth-first search would give us. Here's a map we might expect to produce at input to each label: 
    160  * L0: {} (empty input map -- should have incoming and outgoing parameters for the function) 
    161  * L1: {L1 + 0> -> 0, <L1 + 1> -> 1, <L1 + 2> -> 2, <L1 + 3> -> 3} 
    162  * L2: {<L1 + 0> -> 0, <L1 + 1> -> 1, <L1 + 2> -> 2, <L1 + 3> -> 3, stack<x> -> 2, stack<y> -> 3} 
    163  * L3: {<L1 + 0> -> 0, <L1 + 1> -> 1, <L1 + 2> -> 2, <L1 + 3> -> 3, stack<x> -> 2, stack<y> -> 3, 
    164         <L3 + 1> -> 4, <L3 + 2> -> 5, <L3 + 3> -> 6} 
    165  * L4: {<L1 + 0> -> 0, <L1 + 1> -> 1, <L1 + 2> -> 2, <L1 + 3> -> 3, stack<x> -> 2, stack<y> -> 3, 
    166         <L3 + 1> -> 4, <L3 + 2> -> 5, <L3 + 3> -> 6} 
    167  * L5: {<L1 + 0> -> 0, <L1 + 1> -> 1, <L1 + 2> -> 2, <L1 + 3> -> 3, stack<x> -> 2, stack<y> -> 3, 
    168         <L3 + 1> -> 4, <L3 + 2> -> 5, <L3 + 3> -> 6, stack<z> -> 3} 
    169  
    170  
    171 MISSING BIT: 
    172 WE NEED TO INITIALIZE THE MAP TO DEAL WITH THE INCOMING AND OUTGOING CALL AREA OF THE FUNCTION, WHOSE LOCATIONS ARE FIXED BY CONVENTION AT THE OLD END OF THE STACK FRAME. 
    173  
    174  
    175 === Notes === 
    176  
    177 Note: Call instructions should indicate not only the registers they use and kill but also the stack slots. 
    178  
    179 Note: This greedy algorithm makes the copy propagation only if the stack location is not "live"-out. Otherwise, we would have two values sharing the stack slot, with no guarantee that they could safely share the location. If we had an interference graph, we could make a more informed decision, but that's a non-greedy optimization. 
    180  
    181 Problem: I don't think this approach deals nicely with the case where the arguments returned along both branches are bound to the same variables. In particular, when allocating the first returned value, it may be the same on both branches, but you don't know if the second one will be, so you can't commit to using the same stack space. 
    182  
    183 What about a procedure's incoming and outgoing parameters, which should appear at the young end of the stack, in a predetermined location? The locations of these stack slots are determined by convention. Should we initialize the map with their locations? 
    184  
    185 Invariant: A load from a stack slot must be preceded by a spill to that stack slot. And we should visit that spill before the reload. Otherwise, something has gone horribly wrong. 
    186  
    187 Note: Obviously, we need to keep track of the actual widths of values to compute stack addresses. 
    188  
    189 == Random Thoughts == 
    190  
    191 Assignments into parameter-passing areas can't be hoisted past adjustments to the stack pointer until we know the stack layout -- otherwise we might try to write off the young end of the stack. There must be some sort of invariant here, something along the lines of: "a reference to a parameter passing area must (a) succeed the adjustment moving the stack pointer to the bottom of the area and (b) precede any other stack adjustment. 
    192 Restated: a def or use of a stack location is well-defined only if the stack pointer points "past" the location. Therefore, we can move a stack assignment/reference past an assignment to the stack pointer only if we can prove that this property is maintained. 
    193  
    194 This all feels related to the coalescing optimization performed by register allocators. 
    195  
    196 We're missing another optimization opportunity: there's no reason why different live ranges need to share the same stack slot. 
    197  
    198 == !ToDo == 
    199  
    200  * State the invariants. 
    201  * Say something about aliasing. 
     98NEED TO REWRITE THIS SECTION WITH RECENT CHANGES.