New codegen more than doubles compile time of T3294
|Reported by:||simonmar||Owned by:||simonmar|
|Type of failure:||Compile-time performance bug||Test Case:|
|Related Tickets:||#4258||Differential Rev(s):|
I did some preliminary investigation, and there seem to be a couple of things going on.
First, the stack allocator generates lots of unnecessary reloads at a continuation, for variables that are not used. These would be cleaned up by the sinking pass (if we were running the sinking pass), but generating them in the first place costs compile time.
Second, there is a large nested
let expression of the form
let x = let y = let z = ... in f z in f y
where each let binding has a lot of free variables. So the body of each let ends up copying a ton of variables out of its closure to build the inner let binding's closure. These sequences look like:
x1 = [R1+8] x2 = [R1+16] ... [Hp-32] = x1 [Hp-24] = x2 ...
CmmSink can't currently inline all the locals because knowing that
[R1+8] doesn't alias
[Hp-32] is tricky (see comments in
CmmSink). However, again, we're not even running the sinking pass because this is
-O0. The fact that we generate all this code in the first place is a problem. The old code generator generated
[Hp-32] = [R1+8] [Hp-24] = [R1+16] ...
which amounts to a lot less
Cmm, and a lot less trouble for the register allocator later.
One thing we could do is flatten out the
lets, on the grounds that the inner let binding has a lot of free variables that need to be copied when the
let is nested. This could be based on a heuristic about the number of free variables and the amount of extra allocation that would be entailed if the
let is never entered.
Change History (14)
comment:4 Changed 2 years ago by
|Component:||Compiler → Compiler (CodeGen)|
|Type of failure:||None/Unknown → Compile-time performance bug|