I have committed these patches to branch wip/spj-T13397:
commit 43540c8c6b9e914f302c71213a71ab5c780be2acAuthor: Simon Peyton Jones <simonpj@microsoft.com>Date: Wed Mar 8 11:05:53 2017 +0000 Improve code generation for conditionals This patch in in preparation for the fix to Trac #13397 The code generator has a special case for case tagToEnum (a>#b) of False -> e1 True -> e2 but it was not doing nearly so well on case a>#b of DEFAULT -> e1 1# -> e2 This patch arranges to behave essentially identically in both cases. In due course we can eliminate the special case for tagToEnum#, once we've completed Trac #13397. The changes are: * Make CmmSink swizzle the order of a conditional where necessary; see Note [Improving conditionals] in CmmSink * Hack the general case of StgCmmExpr.cgCase so that it use NoGcInAlts for conditionals. This doesn't seem right, but it's the same choice as the tagToEnum version. Without it, code size increases a lot (more heap checks). There's a loose end here. * Add comments in CmmOpt.cmmMachOpFoldMcommit e49f3154a5ceb1894414f4635579aeb3aa84054fAuthor: Simon Peyton Jones <simonpj@microsoft.com>Date: Wed Mar 8 10:26:47 2017 +0000 Re-engineer caseRules to add tagToEnum/dataToTag See Note [Scrutinee Constant Folding] in SimplUtils * Add cases for tagToEnum and dataToTag. This is the main new bit. It allows the simplifier to remove the pervasive uses of case tagToEnum (a > b) of False -> e1 True -> e2 and replace it by the simpler case a > b of DEFAULT -> e1 1# -> e2 See Note [caseRules for tagToEnum] and Note [caseRules for dataToTag] in PrelRules. * This required some changes to the API of caseRules, and hence to code in SimplUtils. See Note [Scrutinee Constant Folding] in SimplUtils. * Avoid duplication of work in the (unusual) case of case BIG + 3# of b DEFAULT -> e1 6# -> e2 Previously we got case BIG of DEFAULT -> let b = BIG + 3# in e1 3# -> let b = 6# in e2 Now we get case BIG of b# DEFAULT -> let b = b' + 3# in e1 3# -> let b = 6# in e2 * Avoid duplicated code in caseRules A knock-on refactoring: * Move Note [Word/Int underflow/overflow] to Literal, as documentation to accompany mkMachIntWrap etc; and get rid of PrelRuls.intResult' in favour of mkMachIntWrap
It's good stuff generally, so I'm quite keen to keep it. It does indeed eliminate the annoying tagToEnum# stuff.
I get the nofib results below. There are some odd things happening, which is why I have not committed to HEAD.
I did not expect binary sizes to change, but the do wobble around a bit, with a net tiny increase
I did not expect allocations to change. I chased down the change in knights: it was due to increased closure sizes. That in turn was due to better CSE, which is a good thing (just made more live variables). So I think I'm ok with that. Allocations sometimes go down too. Net zero.
There are some troubling increases in execution time. Notably, n-body really does run slower, repeatably. I think. I have no idea why. I think the C-- code is the same... but perhaps we are somehow generating worse assembly code.
I got rather different results from nofib, attached above. Note that these are for a full recompilation from scratch. Judging from the Size column in your nofib output, I guess that you probably also did a full recompilation from scratch.
In the program that regressed the most in my nofib run, tak, it looks like the code generator just output basic blocks in a different order. Probably the order of the then and else branches of a conditional got reversed. tak is known to be very sensitive to (poorly-understood) alignment effects (#8279) so I'm inclined to assume this is just noise that we can't do much about.
Not really sure what to make of the larger regressions that you saw, or why I can't reproduce them.
I found another reason to do this: tagToEnum# (x ># y) is floated out by full laziness, creating a new thunk and a free variable of the function closure. But plain (x ># y) is not: it's too cheap to be worth it.
We could make tagToEnum# (x ># y) look cheaper, but if we eliminate it we don't have to bother.
Simon, I rebased your branch and fixed a comment, producing wip/dfeuer-T13397. The perf results generally look fine, with a few very small regressions:
In principle yes. But although +0.59% allocation in T783, say isn't important enough to prevent using is, it'd be good to know why it happened. Maybe there's something simple going wrong that would be easily fixed?
Or maybe it's one of those things like "with the change, f becomes small enough to inline into g, so g becomes too big to inline at its call sites and that makes the difference". If so, fine. But it'd be good to know.
I find -ticky lets you nail the changes really fast.
Simon's caseRules re-engineering was merged in 193664d4
commit 193664d42dbceadaa1e4689dfa17ff1cf5a405a0Author: Simon Peyton Jones <simonpj@microsoft.com>Date: Wed Mar 8 10:26:47 2017 +0000 Re-engineer caseRules to add tagToEnum/dataToTag See Note [Scrutinee Constant Folding] in SimplUtils * Add cases for tagToEnum and dataToTag. This is the main new bit. It allows the simplifier to remove the pervasive uses of case tagToEnum (a > b) of False -> e1 True -> e2 and replace it by the simpler case a > b of DEFAULT -> e1 1# -> e2 See Note [caseRules for tagToEnum] and Note [caseRules for dataToTag] in PrelRules. * This required some changes to the API of caseRules, and hence to code in SimplUtils. See Note [Scrutinee Constant Folding] in SimplUtils. * Avoid duplication of work in the (unusual) case of case BIG + 3# of b DEFAULT -> e1 6# -> e2 Previously we got case BIG of DEFAULT -> let b = BIG + 3# in e1 3# -> let b = 6# in e2 Now we get case BIG of b# DEFAULT -> let b = b' + 3# in e1 3# -> let b = 6# in e2 * Avoid duplicated code in caseRules A knock-on refactoring: * Move Note [Word/Int underflow/overflow] to Literal, as documentation to accompany mkMachIntWrap etc; and get rid of PrelRuls.intResult' in favour of mkMachIntWrap