|Version 23 (modified by simonpj, 8 years ago) (diff)|
Material about the new code generator
This page summarises work that Norman Ramsey, Simon M, Simon PJ, and John Dias are doing on re-architecting GHC's back end. You may want to see the
Bug list (code-gen related bugs that we may be able to fix):
Notes about the state of play in late 2007
These notes are largely out of date, but I don't want to dump them till we're sure that we've sucked all the juice out of them.
- The Rep swamp is drained: see Commentary/Compiler/BackEndTypes
- Code generator: first draft done.
- Control-flow opt: simple ones done
- Common block elimination: done
- Block concatenation: done
- Adams optimisation: currently done in compiler/cmm/CmmProcPointZ.hs, which is incomplete because it does not insert the correct CopyOut nodes. The Adams optimization should be divorced from this module and replaced with common-block elimination, to be done after the proc-point transformation. In principle this combination may be slightly less effective than the current code, since the selection of proc-point protocols is guided by Adams's criteria, but NR thinks it will be easy to get the common, important cases nailed.
- Proc-point analysis and transformation: 'working' but largely untested. There is still no coherent plan for calling conventions, and the lack of such a plan prevents the completion of proc-point analysis, as in principle it should come up with a calling convention for each freely chosen proc point. In practice NR recommends the following procedure:
- All optional proc points to be generated with no parameters (all live variables on the stack)
- This situation to be remedied when the code generator is reorganized along the lines NR proposed in July 2007, i.e., the register allocator runs on C-- with calls (as opposed to C-- with jumps only) and therefore before proc-point analysis
- Add spill/reload: Implemented to NR's satisfaction in compiler/cmm/CmmSpillReload.hs, with the proviso that spilling is done to abstract stack slots rather than real stack positions (see comments below on stack-slot allocation)
- Stack slot allocation: nothing here but some broken bits and pieces. Progress in this arena is blocked by the lack of a full understanding of how to do stack-frame layout and how to deal with calling conventions. NR proposes that life would be simplified if all calls downstream from the Cmm converter were to be parameterless---the idea being to handle the calling conventions here and to put arguments and results in their conventional locations. John has done much of the work here already; the remaining bit is the actual layout of the stack slots.
- Make stack explicit: done.
- Split into multiple CmmProcs: mostly done, just a bit of patching up remains.
- New code to check invariants of output from compiler/cmm/ZipDataflow.hs
- Finish debugging compiler/cmm/ZipDataflow.hs.
- Use Simon PJ's 'common-blockifier' (which does not exist!!!) to move the Adams optimization outside compiler/cmm/CmmProcProintZ.hs
- ProcPointZ does not insert CopyOut nodes; this omission must be rectified and will require some general infrastructure for inserting predecessors.
- Simple optimizations on CopyIn and CopyOut may be required
- Define an interface for calling conventions and invariants for the output of frame layout [will require help from Simon M]
- Stack layout
- Glue the whole pipeline together and make sure it works.
Items 1-5 look like a few days apiece. Items 6 and 7 are more scary...
ToDo: main issues
- SRTs simply record live global variables. So we should use the same live-variable framework as for live local variables. That means we must be able to identify which globals are SRT-able. What about compression/encoding schemes?
- How do we write continuations in the RTS? E.g. the update-frame continuation? Michael Adams had a syntax with two sets of parameters, the the ones on the stack and the return values.
- Review code gen for calls with lots of args. In the existing codegen we push magic continuations that say "apply the return value to N more args". Do we want to do this? ToDo: how rare is it to have too many args?
- Figure out how PAPs work. This may interact with the GC check and stack check at the start of a function call.
- How do stack overflow checks work? (They are inserted by the CPS conversion, and must not generate a new info table etc.)
- Was there something about sinking spills and hoisting reloads?
ToDo: small issues
- Shall we rename Branch to GoTo?!
- Where is the "push new continuation" middle node?
- Change the C-- parser (which parses RTS .cmm files) to directly construct CmmGraph.
- (SLPJ) See let-no-escape todos in StgCmmExpr.