Optimise calls to tagToEnum#

changed weight to 5

I have committed these patches to branch wip/spj-T13397:

commit 43540c8c6b9e914f302c71213a71ab5c780be2ac
Author: Simon Peyton Jones <simonpj@microsoft.com>
Date:   Wed Mar 8 11:05:53 2017 +0000

    Improve code generation for conditionals
    
    This patch in in preparation for the fix to Trac #13397
    
    The code generator has a special case for
      case tagToEnum (a>#b) of
        False -> e1
        True  -> e2
    
    but it was not doing nearly so well on
      case a>#b of
        DEFAULT -> e1
        1#      -> e2
    
    This patch arranges to behave essentially identically in
    both cases.  In due course we can eliminate the special
    case for tagToEnum#, once we've completed Trac #13397.
    
    The changes are:
    
    * Make CmmSink swizzle the order of a conditional where necessary;
      see Note [Improving conditionals] in CmmSink
    
    * Hack the general case of StgCmmExpr.cgCase so that it use
      NoGcInAlts for conditionals.  This doesn't seem right, but it's
      the same choice as the tagToEnum version. Without it, code size
      increases a lot (more heap checks).
    
      There's a loose end here.
    
    * Add comments in CmmOpt.cmmMachOpFoldM

commit e49f3154a5ceb1894414f4635579aeb3aa84054f
Author: Simon Peyton Jones <simonpj@microsoft.com>
Date:   Wed Mar 8 10:26:47 2017 +0000

    Re-engineer caseRules to add tagToEnum/dataToTag
    
    See Note [Scrutinee Constant Folding] in SimplUtils
    
    * Add cases for tagToEnum and dataToTag. This is the main new
      bit.  It allows the simplifier to remove the pervasive uses
      of     case tagToEnum (a > b) of
                False -> e1
                True  -> e2
      and replace it by the simpler
             case a > b of
                DEFAULT -> e1
                1#      -> e2
      See Note [caseRules for tagToEnum]
      and Note [caseRules for dataToTag] in PrelRules.
    
    * This required some changes to the API of caseRules, and hence
      to code in SimplUtils.  See Note [Scrutinee Constant Folding]
      in SimplUtils.
    
    * Avoid duplication of work in the (unusual) case of
         case BIG + 3# of b
           DEFAULT -> e1
           6#      -> e2
    
      Previously we got
         case BIG of
           DEFAULT -> let b = BIG + 3# in e1
           3#      -> let b = 6#       in e2
    
      Now we get
         case BIG of b#
           DEFAULT -> let b = b' + 3# in e1
           3#      -> let b = 6#      in e2
    
    * Avoid duplicated code in caseRules
    
    A knock-on refactoring:
    
    * Move Note [Word/Int underflow/overflow] to Literal, as
      documentation to accompany mkMachIntWrap etc; and get
      rid of PrelRuls.intResult' in favour of mkMachIntWrap

It's good stuff generally, so I'm quite keen to keep it. It does indeed eliminate the annoying tagToEnum# stuff.

I get the nofib results below. There are some odd things happening, which is why I have not committed to HEAD.

I did not expect binary sizes to change, but the do wobble around a bit, with a net tiny increase
I did not expect allocations to change. I chased down the change in knights: it was due to increased closure sizes. That in turn was due to better CSE, which is a good thing (just made more live variables). So I think I'm ok with that. Allocations sometimes go down too. Net zero.
There are some troubling increases in execution time. Notably, n-body really does run slower, repeatably. I think. I have no idea why. I think the C-- code is the same... but perhaps we are somehow generating worse assembly code.

        Program           Size    Allocs   Runtime   Elapsed  TotalMem
--------------------------------------------------------------------------------
           anna          -0.0%     -0.7%      0.16      0.16     +0.0%
           ansi          +0.1%     +0.0%      0.00      0.00     +0.0%
           atom          +0.2%     +0.0%     -2.1%     -2.1%     +0.0%
         awards          +0.2%     +0.0%      0.00      0.00     +0.0%
         banner          -0.0%     +0.0%      0.00      0.00     +0.0%
     bernouilli          +0.4%     -0.0%     -0.7%     -0.9%     +0.0%
   binary-trees          +0.5%     -0.0%     +4.6%     +4.7%     +0.0%
          boyer          +0.0%     +0.0%      0.06      0.06     +0.0%
         boyer2          -0.0%     +0.0%      0.01      0.01     +0.0%
           bspt          +0.0%     +0.0%      0.01      0.01     +0.0%
      cacheprof          +0.0%     -0.0%     +1.6%     +2.3%     -1.8%
       calendar          +0.0%     +0.0%      0.00      0.00     +0.0%
       cichelli          -0.0%     +0.0%      0.12      0.12     +0.0%
        circsim          +0.1%     -0.0%     +0.7%     +0.7%     +0.0%
       clausify          +0.2%     +0.0%      0.05      0.05     +0.0%
  comp_lab_zift          -0.0%     +0.0%     +0.3%     +0.2%     +0.0%
       compress          -0.0%     +0.0%     -0.7%     +0.4%     +0.0%
      compress2          +0.3%     +0.0%     +2.3%     +2.4%     +0.0%
    constraints          +0.0%     +0.0%     +3.2%     +3.2%     +0.0%
   cryptarithm1          -0.0%     +0.0%     -9.0%     -9.0%     +0.0%
   cryptarithm2          -0.0%     +0.0%      0.01      0.01     +0.0%
            cse          -0.0%     +0.0%      0.00      0.00     +0.0%
   digits-of-e1          +0.4%     +0.0%     +2.9%     +2.9%     +0.0%
   digits-of-e2          +0.3%     +0.0%     -2.1%     -2.1%     +0.0%
          eliza          -0.0%     +0.0%      0.00      0.00     +0.0%
          event          +0.0%     +0.0%     +0.3%     +0.3%     +0.0%
         exp3_8          +0.2%     +0.0%     +0.7%     +0.7%     +0.0%
         expert          +0.1%     +0.0%      0.00      0.00     +0.0%
 fannkuch-redux          -0.0%     -0.0%     -1.1%     -1.1%     +0.0%
          fasta          +0.0%     +0.0%     +0.5%     -0.2%     +0.0%
            fem          +0.4%     +0.0%      0.04      0.04     +0.0%
            fft          +0.2%     -0.4%      0.06      0.06     +0.0%
           fft2          +0.2%     -0.1%      0.08      0.08     +0.0%
       fibheaps          +0.0%     +0.0%      0.03      0.03     +0.0%
           fish          -0.0%     +0.0%      0.02      0.02     +0.0%
          fluid          +0.2%     +0.0%      0.01      0.01     +0.0%
         fulsom          +0.1%     +0.0%     +0.1%     +0.0%     +0.0%
         gamteb          +0.2%     +0.0%      0.07      0.07     +0.0%
            gcd          +0.3%     +0.0%      0.09      0.09     +0.0%
    gen_regexps          -0.0%     +0.0%      0.00      0.00     +0.0%
         genfft          -0.1%     -0.2%      0.06      0.06     +0.0%
             gg          +0.1%     +0.0%      0.02      0.02     +0.0%
           grep          -0.1%     +0.0%      0.00      0.00     +0.0%
         hidden          +0.4%     +0.0%     +2.8%     +2.9%     +0.0%
            hpg          +0.1%     -0.0%     -1.9%     -2.1%     +0.0%
            ida          +0.0%     +0.0%      0.10      0.10     +0.0%
          infer          -0.0%     +0.0%      0.10      0.10     +0.0%
        integer          +0.5%     +0.0%     +1.6%     +1.6%     +0.0%
      integrate          +0.2%     +0.0%     +4.8%     +5.0%     +0.0%
   k-nucleotide          +0.1%     -0.1%     -1.5%     -1.6%     +0.0%
          kahan          +0.2%     +0.0%     +1.6%     +1.6%     +0.0%
        knights          +0.2%     +1.3%      0.01      0.01     +0.0%
         lambda          +0.0%     +0.0%     +6.5%     +6.5%     +0.0%
     last-piece          -0.1%     +0.3%     +2.4%     +2.5%     +0.0%
           lcss          +0.0%     +0.0%     +2.7%     +2.7%     +0.0%
           life          +0.1%     +0.0%     +0.8%     +1.0%     +0.0%
           lift          -0.0%     +0.0%      0.00      0.00     +0.0%
      listcompr          -0.0%     +0.0%      0.18      0.18     +0.0%
       listcopy          -0.0%     +0.0%      0.19      0.19     +0.0%
       maillist          -0.0%     -0.0%      0.08      0.09     -5.3%
         mandel          +0.5%     +0.0%      0.13      0.13     +0.0%
        mandel2          -0.0%     +0.0%      0.01      0.01     +0.0%
        minimax          -0.0%     +0.0%      0.01      0.01     +0.0%
        mkhprog          -0.0%     +0.0%      0.00      0.00     +0.0%
     multiplier          +0.0%     +0.0%      0.19      0.19     +0.0%
         n-body          +0.2%     +0.0%    +14.6%    +14.6%     +0.0%
       nucleic2          +0.2%     +0.0%      0.11      0.11     +0.0%
           para          -0.0%     +0.0%     -1.5%     -1.5%     +0.0%
      paraffins          -0.1%     -0.1%      0.19      0.20     +0.0%
         parser          -0.6%     +0.0%      0.04      0.04     +0.0%
        parstof          -0.0%     +0.0%      0.01      0.01     +0.0%
            pic          -0.3%     +1.1%      0.01      0.01     +0.0%
       pidigits          +0.3%     +0.0%     -0.0%     -0.0%     +0.0%
          power          +0.2%     +0.0%     +2.0%     +2.2%     +0.0%
         pretty          +0.3%     +0.0%      0.00      0.00     +0.0%
         primes          +0.0%     +0.0%      0.11      0.11     +0.0%
      primetest          +0.4%     +0.0%      0.13      0.13     +0.0%
         prolog          +0.2%     +0.0%      0.00      0.00     +0.0%
         puzzle          -0.0%     +0.0%      0.20      0.20     +0.0%
         queens          +0.0%     +0.0%      0.02      0.02     +0.0%
        reptile          -0.1%     +0.0%      0.02      0.02     +0.0%
reverse-complem          -0.0%     +0.0%     +2.4%     +2.4%     +0.0%
        rewrite          +0.0%     +0.0%      0.03      0.03     +0.0%
           rfib          +0.5%     +0.0%      0.03      0.03     +0.0%
            rsa          +0.4%     +0.0%      0.03      0.03     +0.0%
            scc          -0.0%     +0.0%      0.00      0.00     +0.0%
          sched          +0.0%     +0.0%      0.03      0.03     +0.0%
            scs          +0.0%     +0.8%     +7.6%     +7.5%     +0.0%
         simple          +0.1%     +0.0%     +4.9%     +5.0%     +0.0%
          solid          +0.2%     +0.0%      0.19      0.19     +0.0%
        sorting          -0.0%     +0.0%      0.00      0.00     +0.0%
  spectral-norm          +0.2%     +0.0%     +1.5%     +1.5%     +0.0%
         sphere          +0.0%     +0.0%      0.08      0.08     +0.0%
         symalg          +0.4%     +0.0%      0.01      0.01     +0.0%
            tak          +0.0%     +0.0%      0.02      0.02     +0.0%
      transform          -0.0%     +0.0%     -4.5%     -4.5%     +0.0%
       treejoin          -0.0%     +0.0%    +16.7%    +16.6%     +0.0%
      typecheck          -0.0%     +0.0%     -1.4%     -1.3%     +0.0%
        veritas          -0.1%     +0.0%      0.00      0.00     +0.0%
           wang          +0.2%     +0.0%      0.17      0.17     +0.0%
      wave4main          +0.0%     +0.0%     +1.9%     +1.7%     +0.0%
   wheel-sieve1          +0.0%     +0.0%     +1.5%     +1.5%     +0.0%
   wheel-sieve2          +0.0%     +0.0%     +0.6%     +0.6%     +0.0%
           x2n1          +0.1%     +0.0%      0.01      0.01     +0.0%
--------------------------------------------------------------------------------
            Min          -0.6%     -0.7%     -9.0%     -9.0%     -5.3%
            Max          +0.5%     +1.3%    +16.7%    +16.6%     +0.0%
 Geometric Mean          +0.1%     +0.0%     +1.6%     +1.6%     -0.1%

PS The improvement in cryptarithm1 runtime appears to be solid and real too. So there is a win here!

Attached file nofib (download).

normal nofib output

Attached file nofib.slow (download).

slow nofib output

I got rather different results from nofib, attached above. Note that these are for a full recompilation from scratch. Judging from the Size column in your nofib output, I guess that you probably also did a full recompilation from scratch.

In the program that regressed the most in my nofib run, tak, it looks like the code generator just output basic blocks in a different order. Probably the order of the then and else branches of a conditional got reversed. tak is known to be very sensitive to (poorly-understood) alignment effects (#8279) so I'm inclined to assume this is just noise that we can't do much about.

Not really sure what to make of the larger regressions that you saw, or why I can't reproduce them.

Hmm. I can re-try.

Did you see any wins either? What does your summary table (like the above) look like?

mentioned in issue #13523

I found another reason to do this: tagToEnum# (x ># y) is floated out by full laziness, creating a new thunk and a free variable of the function closure. But plain (x ># y) is not: it's too cheap to be worth it.

We could make tagToEnum# (x ># y) look cheaper, but if we eliminate it we don't have to bother.

Simon, I rebased your branch and fixed a comment, producing wip/dfeuer-T13397. The perf results generally look fine, with a few very small regressions:

Nofib allocations
-----------------
knights    +1.35%
pic        +1.14%
scs        +0.81% (this ran slightly *faster*)
last-piece +0.29%

Nofib runtimes
--------------
cacheprof    +1.3%
last-piece   +1.23%
hidden       +1.08%
wheel-sieve1 +0.95%
digits-of-e1 +0.87%
lambda       +0.47%


Test suite allocations
----------------------
T783    +0.59%
T9675   +0.33%
T12707  +0.23%

The wins seem modest, but bigger than the regressions:

Nofib allocations
-----------------
anna    -0.7%
fft     -0.41%

Nofib runtimes
--------------
cryptarithm1   -4.29%
fannkuch-redux -3.75%
binary-trees   -3.7%
scs            -1.28%
integer        -1%
fasta          -0.98%
mate           -0.95%
digits-of-e2   -0.85%

As far as I'm concerned, this should probably be ready to merge. Should I do so?

Trac metadata

Trac field	Value
CC	- → dfeuer

In principle yes. But although +0.59% allocation in T783, say isn't important enough to prevent using is, it'd be good to know why it happened. Maybe there's something simple going wrong that would be easily fixed?

Or maybe it's one of those things like "with the change, f becomes small enough to inline into g, so g becomes too big to inline at its call sites and that makes the difference". If so, fine. But it'd be good to know.

I find -ticky lets you nail the changes really fast.

mentioned in commit 6d14c148

Simon's caseRules re-engineering was merged in 193664d4

commit 193664d42dbceadaa1e4689dfa17ff1cf5a405a0
Author: Simon Peyton Jones <simonpj@microsoft.com>
Date:   Wed Mar 8 10:26:47 2017 +0000

    Re-engineer caseRules to add tagToEnum/dataToTag
    
    See Note [Scrutinee Constant Folding] in SimplUtils
    
    * Add cases for tagToEnum and dataToTag. This is the main new
      bit.  It allows the simplifier to remove the pervasive uses
      of     case tagToEnum (a > b) of
                False -> e1
                True  -> e2
      and replace it by the simpler
             case a > b of
                DEFAULT -> e1
                1#      -> e2
      See Note [caseRules for tagToEnum]
      and Note [caseRules for dataToTag] in PrelRules.
    
    * This required some changes to the API of caseRules, and hence
      to code in SimplUtils.  See Note [Scrutinee Constant Folding]
      in SimplUtils.
    
    * Avoid duplication of work in the (unusual) case of
         case BIG + 3# of b
           DEFAULT -> e1
           6#      -> e2
    
      Previously we got
         case BIG of
           DEFAULT -> let b = BIG + 3# in e1
           3#      -> let b = 6#       in e2
    
      Now we get
         case BIG of b#
           DEFAULT -> let b = b' + 3# in e1
           3#      -> let b = 6#      in e2
    
    * Avoid duplicated code in caseRules
    
    A knock-on refactoring:
    
    * Move Note [Word/Int underflow/overflow] to Literal, as
      documentation to accompany mkMachIntWrap etc; and get
      rid of PrelRuls.intResult' in favour of mkMachIntWrap

It's not clear to me what remains to be done on this ticket.

mentioned in issue #14281 (closed)

closed

Trac metadata

Trac field	Value
Resolution	Unresolved → ResolvedFixed

added Pnormal label

Trac field	Value
Version	8.0.1
Type	Bug
TypeOfFailure	OtherFailure
Priority	normal
Resolution	Unresolved
Component	Compiler
Test case
Differential revisions
BlockedBy
Related
Blocking
CC
Operating system
Architecture

Optimise calls to tagToEnum#

Child items 0

Activity