|Version 28 (modified by 9 years ago) (diff),|
Material about the new code generator
This page summarises work that Norman Ramsey, Simon M, Simon PJ, and John Dias are doing on re-architecting GHC's back end. Our plan is as follows:
- Step 1: drain the "Rep swamp". This is a change of data representation that pervades the compiler, including lots and lots of tiny changes in the existing native code generators. It's done, and tested, but not yet committed to the HEAD.
- Step 2: Replace the existing Stg to Cmm code generator (a very complex and inflexible pass) with a new modular pipeline. The output of this pipeline is fed to the existing, un-modified code geneators. The design of the new pipeline is here: Commentary/Compiler/NewCodeGenPipeline.
- Step 3: Expand the capability of the new pipeline so that it does native code generation too, and we can ultimately discard the existing code generators. The design of this stage is here: Commentary/Compiler/IntegratedCodeGen
In timescale terms it looks like this:
- GHC 6.10 will have nothing new at all
- Immediately after the code fork for 6.10 we'll commit the new stuff for Step 1 and Step 2. By the end of 2008 (latest) we hope to be using the Step 2 pipeline in anger, and can discard the existing code generator entirely. To be fair, at this point you probably won't see any performance improvements; indeed compilation could be a bit slower. But the pipeline will be far more modular and flexible.
- Work on Step 3 will proceed in 2009, but at a slower pace because John's internship ends in Oct 2008.
- At the same time, others can help! In particular, Cmm-to-Cmm optimisations will be easy. And some of them really should yield performance improvements.
Bug list (code-gen related bugs that we may be able to fix):
Notes about the state of play in late 2007
These notes are largely out of date, but I don't want to dump them till we're sure that we've sucked all the juice out of them.
- The Rep swamp is drained: see Commentary/Compiler/BackEndTypes
- Code generator: first draft done.
- Control-flow opt: simple ones done
- Common block elimination: done
- Block concatenation: done
- Adams optimisation: currently done in compiler/cmm/CmmProcPointZ.hs, which is incomplete because it does not insert the correct CopyOut nodes. The Adams optimization should be divorced from this module and replaced with common-block elimination, to be done after the proc-point transformation. In principle this combination may be slightly less effective than the current code, since the selection of proc-point protocols is guided by Adams's criteria, but NR thinks it will be easy to get the common, important cases nailed.
- Proc-point analysis and transformation: 'working' but largely untested. There is still no coherent plan for calling conventions, and the lack of such a plan prevents the completion of proc-point analysis, as in principle it should come up with a calling convention for each freely chosen proc point. In practice NR recommends the following procedure:
- All optional proc points to be generated with no parameters (all live variables on the stack)
- This situation to be remedied when the code generator is reorganized along the lines NR proposed in July 2007, i.e., the register allocator runs on C-- with calls (as opposed to C-- with jumps only) and therefore before proc-point analysis
- Add spill/reload: Implemented to NR's satisfaction in compiler/cmm/CmmSpillReload.hs, with the proviso that spilling is done to abstract stack slots rather than real stack positions (see comments below on stack-slot allocation)
- Stack slot allocation: nothing here but some broken bits and pieces. Progress in this arena is blocked by the lack of a full understanding of how to do stack-frame layout and how to deal with calling conventions. NR proposes that life would be simplified if all calls downstream from the Cmm converter were to be parameterless---the idea being to handle the calling conventions here and to put arguments and results in their conventional locations. John has done much of the work here already; the remaining bit is the actual layout of the stack slots.
- Make stack explicit: done.
- Split into multiple CmmProcs: mostly done, just a bit of patching up remains.
- New code to check invariants of output from compiler/cmm/ZipDataflow.hs
- Finish debugging compiler/cmm/ZipDataflow.hs.
- Use Simon PJ's 'common-blockifier' (which does not exist!!!) to move the Adams optimization outside compiler/cmm/CmmProcProintZ.hs
- ProcPointZ does not insert
CopyOutnodes; this omission must be rectified and will require some general infrastructure for inserting predecessors.
- Simple optimizations on
CopyOutmay be required
- Define an interface for calling conventions and invariants for the output of frame layout [will require help from Simon M]
- Stack layout
- Glue the whole pipeline together and make sure it works.
Items 1-5 look like a few days apiece. Items 6 and 7 are more scary...
ToDo: main issues
- SRTs simply record live global variables. So we should use the same live-variable framework as for live local variables. That means we must be able to identify which globals are SRT-able. What about compression/encoding schemes?
- How do we write continuations in the RTS? E.g. the update-frame continuation? Michael Adams had a syntax with two sets of parameters, the the ones on the stack and the return values.
- Review code gen for calls with lots of args. In the existing codegen we push magic continuations that say "apply the return value to N more args". Do we want to do this? ToDo: how rare is it to have too many args?
- Figure out how PAPs work. This may interact with the GC check and stack check at the start of a function call.
- How do stack overflow checks work? (They are inserted by the CPS conversion, and must not generate a new info table etc.)
- Was there something about sinking spills and hoisting reloads?
ToDo: small issues
- Shall we rename Branch to GoTo?!
- Where is the "push new continuation" middle node?
- Change the C-- parser (which parses RTS .cmm files) to directly construct
- (SLPJ) See let-no-escape todos in