Opened 3 years ago

Closed 20 months ago

#5996 closed bug (fixed)

fix for CSE

Reported by: michalt Owned by: simonpj
Priority: normal Milestone: 7.8.1
Component: Compiler Version: 7.5
Keywords: Cc: conrad@…
Operating System: Unknown/Multiple Architecture: Unknown/Multiple
Type of failure: Runtime performance bug Test Case: simplCore/should_compile/T5996
Blocked By: Blocking:
Related Tickets: Differential Revisions:

Description

The current version of CSE is slightly broken -- contrary to the comments
explaining it, it does not add an additional mapping when it substitutes the RHS
of some binding. Taking the example from CSE.lhs

module Test2 where

data C a b = C a b
a = undefined
{-# NOINLINE a #-}
b = undefined
{-# NOINLINE b #-}

x1 = C a b
x2 = C x1 b
y1 = C a b
y2 = C y1 b

We want to detect that y1 is the same as x1 and so rewrite it to
y1 = x1. But at this point we also want to add a new substitution that
changes y1 to x1. So that we can get y2 = C x1 b and then
y2 = x2. This is not done by the current code:

> ghc -O2 -fforce-recomp Test2.hs -ddump-simpl -dsuppress-all -rtsopts
[1 of 1] Compiling Test2            ( Test2.hs, Test2.o )

==================== Tidy Core ====================
Result size = 40

b = undefined
a = b
x1 = \\ @ a_aaf @ b_aag -> C (a) (b)
x2 = \\ @ b_aao @ a_aap @ b1_aaq -> C (x1) (b)
y1 = x1
y2 = \\ @ b_aaG @ a_aaH @ b1_aaI -> C (x1) (b)

I wrote a patch to fix that and we get the desired result:

> ~/dev/ghc-work/inplace/bin/ghc-stage2 -O2 -fforce-recomp Test2.hs -ddump-simpl -dsuppress-all -rtsopts
[1 of 1] Compiling Test2            ( Test2.hs, Test2.o )

==================== Tidy Core ====================
  Result size = 30

b = undefined
a = b
x1 = \\ @ a_aal @ b_aam -> C (b) (b)
x2 = \\ @ b_aau @ a_aav @ b1_aaw -> C (x1) (b)
y1 = x1
y2 = x2

The fix seems quite easy but it made nofib unhappy -- see nofib1
attachment. Apparently there is quite a bit additional alloctation happening in
a few benchmarks. I managed to narrow it down to the GHC.IO.Encoding.*
modules. Adding a simple GHC_OPTIONS -fno-cse seems to improve the
performance quite a bit (and above the current HEAD!) -- see nofib2 attachment.
There seems to be a bit more code bloat, but the performance looks worth it. I
haven't yet looked into exactly what causes the excessive allocation with CSE in
the GHC.IO.Encoding.* modules (and I'm also a bit surprised by that -- I
thought the main issue with CSE would be bigger memory usage). So any
suggestions are more than welcome. ;)

All the patches are in https://github.com/michalt/ghc (branch "cse") and
https://github.com/michalt/packages-base (branch "cse").

Attachments (8)

nofib1 (154.2 KB) - added by michalt 3 years ago.
nofib: current HEAD vs HEAD with CSE fix
nofib2 (154.2 KB) - added by michalt 3 years ago.
nofib: current HEAD vs HEAD with CSE fix and -fno-cse for Encoding.* modules
0001-Decrease-allocation-limits-for-T5536.patch (1.1 KB) - added by michalt 3 years ago.
Patch for T5536
0001-Whitespace-layout-only-in-simplCore-CSE.patch (19.6 KB) - added by michalt 3 years ago.
0002-Remove-old-representation-of-CSEnv.patch (8.8 KB) - added by michalt 3 years ago.
0003-Add-a-missing-mapping-when-doing-CSE.patch (1.6 KB) - added by michalt 3 years ago.
0001-Add-fno-cse-flag-to-some-IO.Encoding.-modules.patch (2.1 KB) - added by michalt 3 years ago.
Main.hs (1.7 KB) - added by igloo 2 years ago.

Download all attachments as: .zip

Change History (20)

Changed 3 years ago by michalt

nofib: current HEAD vs HEAD with CSE fix

Changed 3 years ago by michalt

nofib: current HEAD vs HEAD with CSE fix and -fno-cse for Encoding.* modules

comment:1 Changed 3 years ago by michalt

  • Status changed from new to patch
  • Summary changed from fix CSE to fix for CSE

comment:2 Changed 3 years ago by michalt

I forgot to mention that the patches introduce one validate failure, due to
smaller than expected allocation in T5536 (i.e. "stat too good"). I'll attach a
patch that fixes the allocation limits (but only for x86_64 -- I don't have
access to 32-bit machine).

Changed 3 years ago by michalt

Patch for T5536

comment:3 Changed 3 years ago by michalt

I'll be probably doing some more work on my cse branch, so I'm attaching all the
relevant patches here.

comment:4 Changed 3 years ago by simonpj

  • difficulty set to Unknown
  • Owner set to simonpj

comment:5 Changed 3 years ago by igloo

  • Milestone set to 7.8.1

comment:6 Changed 3 years ago by conrad

  • Cc conrad@… added

comment:7 Changed 2 years ago by ian@…

commit cfe92a8f8482a3c863c3bddc4be894b09fb972ff

Author: Ian Lynagh <[email protected]>
Date:   Thu Jun 6 17:55:35 2013 +0100

    Remove old representation of CSEnv; part of #5996

 compiler/coreSyn/CoreUtils.lhs |   80 ----------------------------------------
 compiler/simplCore/CSE.lhs     |   72 +-----------------------------------
 2 files changed, 1 insertions(+), 151 deletions(-)

Changed 2 years ago by igloo

comment:8 Changed 2 years ago by igloo

I get very different nofib (mode=slow) results to you, using recent HEAD:

--------------------------------------------------------------------------------
        Program           Size    Allocs   Runtime   Elapsed  TotalMem
--------------------------------------------------------------------------------
[...]
           ansi          -1.1%     +9.4%    +10.1%    +10.1%     +0.0%
[...]
       compress          -0.5%     -0.0%      0.10      0.10    -33.3%
[...]
            Min          -2.2%     -1.3%     -6.6%     -7.1%    -33.3%
            Max          -0.4%     +9.4%    +10.1%    +10.1%     +1.7%
 Geometric Mean          -0.6%     +0.1%     +0.5%     +0.5%     -0.4%

I took a look at the ansi test. Main.hs is a boiled down version of it. Compiling with

ghc -O2 -rtsopts --make Main -fforce-recomp

and running with

time ./Main 1400 +RTS -t < /dev/null > out

I get

<<ghc: 15797996408 bytes, 24864 GCs, 512598/945104 avg/max bytes residency (70 samples), 3M in use, 0.00 INIT (0.00 elapsed), 1.82 MUT (1.82 elapsed), 2.08 GC (2.08 elapsed) :ghc>>
./Main 1400 +RTS -t < /dev/null > out  3.88s user 0.03s system 99% cpu 3.907 total

for HEAD, and

<<ghc: 16996446952 bytes, 27034 GCs, 536827/955856 avg/max bytes residency (75 samples), 3M in use, 0.00 INIT (0.00 elapsed), 1.95 MUT (1.95 elapsed), 2.64 GC (2.64 elapsed) :ghc>>
./Main 1400 +RTS -t < /dev/null > out  4.56s user 0.03s system 99% cpu 4.594 total

with this patch.

I believe the difference is ultimately caused by the expression

loop 0 ""

which appears in the program twice.

comment:9 Changed 20 months ago by Simon Peyton Jones <simonpj@…>

In 0001d161f7f6a6f7392eb2a3229f6204c3423450/ghc:

Fix egregious omission in CSE (Trac #5996)

This patch fixes a bad omission in CSE, thanks to 'michaelt' for spotting
it, and correctly identifying the fix (in cseRhs).  The trouble was with
   x1 = C a b
   x2 = C x1 b
   y1 = C a b
   y2 = C y1 b
we were not commoning up y2=x2, because we failed to substitute y1:=x1,
so y2's RHS looked different to x2's

I also refactoring, so taht the cs_map in a CSEnv map is
       cs_map    :: CoreMap (OutExpr, Id)
instead of
       cs_map    :: CoreMap (OutExpr, OutExpr)
Much nicer!

This doesn't make much difference to allocation, but it gives a surprisingly
big benefit to binary size.

--------------------------------------------------------------------------------
        Program           Size    Allocs   Runtime   Elapsed  TotalMem
--------------------------------------------------------------------------------
           ansi          -1.7%     -0.8%      0.00      0.00     +0.0%
           bspt          -1.6%     -1.5%      0.01      0.01     +0.0%
      cacheprof          -1.8%     -0.2%     +1.6%     +1.9%     +2.7%
            fft          -1.4%     -1.3%      0.06      0.06    +11.1%
            ida          -1.4%     -1.0%      0.12      0.12     +0.0%
           rfib          -1.4%     -0.1%      0.03      0.03     +0.0%
            scs          -1.6%     -0.1%     +1.5%     +1.5%     +0.0%
  spectral-norm          -1.3%     -0.1%     -0.2%     -0.2%     +0.0%
            tak          -1.4%     -0.1%      0.02      0.02     +0.0%
        veritas          -1.4%     -0.1%      0.00      0.00     +0.0%
--------------------------------------------------------------------------------
            Min          -2.5%     -1.5%    -11.8%    -11.8%     -8.0%
            Max          -1.0%     +0.0%     +2.7%     +2.5%    +11.1%
 Geometric Mean          -1.3%     -0.1%     -2.6%     -2.6%     +0.0%
Last edited 20 months ago by simonpj (previous) (diff)

comment:10 Changed 20 months ago by simonpj

Commit message properly formatted

Fix egregious omission in CSE (Trac #5996)

This patch fixes a bad omission in CSE, thanks to 'michaelt' for spotting
it, and correctly identifying the fix (in cseRhs).  The trouble was with
   x1 = C a b
   x2 = C x1 b
   y1 = C a b
   y2 = C y1 b
we were not commoning up y2=x2, because we failed to substitute y1:=x1,
so y2's RHS looked different to x2's

I also refactoring, so taht the cs_map in a CSEnv map is
       cs_map    :: CoreMap (OutExpr, Id)
instead of
       cs_map    :: CoreMap (OutExpr, OutExpr)
Much nicer!

This doesn't make much difference to allocation, but it gives a surprisingly
big benefit to binary size.

--------------------------------------------------------------------------------
        Program           Size    Allocs   Runtime   Elapsed  TotalMem
--------------------------------------------------------------------------------
           ansi          -1.7%     -0.8%      0.00      0.00     +0.0%
           bspt          -1.6%     -1.5%      0.01      0.01     +0.0%
      cacheprof          -1.8%     -0.2%     +1.6%     +1.9%     +2.7%
            fft          -1.4%     -1.3%      0.06      0.06    +11.1%
            ida          -1.4%     -1.0%      0.12      0.12     +0.0%
           rfib          -1.4%     -0.1%      0.03      0.03     +0.0%
            scs          -1.6%     -0.1%     +1.5%     +1.5%     +0.0%
  spectral-norm          -1.3%     -0.1%     -0.2%     -0.2%     +0.0%
            tak          -1.4%     -0.1%      0.02      0.02     +0.0%
        veritas          -1.4%     -0.1%      0.00      0.00     +0.0%
--------------------------------------------------------------------------------
            Min          -2.5%     -1.5%    -11.8%    -11.8%     -8.0%
            Max          -1.0%     +0.0%     +2.7%     +2.5%    +11.1%
 Geometric Mean          -1.3%     -0.1%     -2.6%     -2.6%     +0.0%

comment:11 Changed 20 months ago by Simon Peyton Jones <simonpj@…>

comment:12 Changed 20 months ago by simonpj

  • Resolution set to fixed
  • Status changed from patch to closed
  • Test Case set to simplCore/should_compile/T5996
Note: See TracTickets for help on using tickets.