Opened 3 years ago

Closed 3 years ago

#10148 closed bug (fixed)

Optimization causes repeated computation

Reported by: akio Owned by:
Priority: normal Milestone: 7.10.2
Component: Compiler Version: 7.10.1-rc1
Keywords: Cc:
Operating System: Unknown/Multiple Architecture: Unknown/Multiple
Type of failure: Runtime performance bug Test Case: stranal/should_run/T10148
Blocked By: Blocking:
Related Tickets: Differential Rev(s):
Wiki Page:

Description

The attached program, when compiled with -O0, calls array_adjust 6 times and hence prints 6 lines of trace output. However, with -O1, it prints 37 lines of trace output instead. Moreover, this number grows quadratically rather than linearly when I increase the value of n.

-fno-state-hack did not help.

I can reproduce the issue with both GHC 7.8.4 and 7.10.0.20150123.

Attachments (4)

repeated.hs (1.1 KB) - added by akio 3 years ago.
repeated1.hs (1.2 KB) - added by akio 3 years ago.
don't rely on bangs
repeated2.hs (645 bytes) - added by nomeata 3 years ago.
even smaller
simple.patch (1.1 KB) - added by akio 3 years ago.

Download all attachments as: .zip

Change History (20)

Changed 3 years ago by akio

Attachment: repeated.hs added

comment:1 Changed 3 years ago by nomeata

Fun fact: Most changes make the problem go away.

  • Changing the data Array' a = ALeaf a to a newtype Array' a = ALeaf a
  • not calling go in array_adjust
  • adding a trace to the pattern matching in adjustA
  • removing the pattern match there completely
  • removing the Bang Pattern in gstep
  • removing the first parameter from Array
  • marking mkMaschine NOINLINE

Here is a cue: Removing the ! on t in mkMachine removes *all* trace output. So it seems that (possibly as a result of minimizing the code), t and hence array_adjust is not actually used. This (function strict in an argument, but argument is unused) might make GHC be a bit too liberal in what it does with the code.

(Didn’t look at Core so far, though).

comment:2 in reply to:  1 Changed 3 years ago by akio

Thank you for looking at this!

Replying to nomeata:

Here is a cue: Removing the ! on t in mkMachine removes *all* trace output. So it seems that (possibly as a result of minimizing the code), t and hence array_adjust is not actually used. This (function strict in an argument, but argument is unused) might make GHC be a bit too liberal in what it does with the code.

Yes, this is indeed a result of minimizing the code. I'll attach a version of the test program that does not rely solely on the bangs. In this version, if I remove the bangs in mkMachine (on both m and t), the amount of trace output grows exponentially (not quadratically) when compiled with -O1.

Changed 3 years ago by akio

Attachment: repeated1.hs added

don't rely on bangs

comment:3 Changed 3 years ago by nomeata

Thanks. It still relies on the bang in gstep.

Also funny: Replacing data Array' a = ALeaf a by newtype Array' a = ALeaf a removes some repeated computation, but not all. I’m curious about what causes this :-)

comment:4 Changed 3 years ago by nomeata

It must be related to the error call in adjustA. Removing the branch, or replacing it by something terminating, even replacing it by undefined makes the bug disappear.

comment:5 Changed 3 years ago by nomeata

I tried to further minimize the code. Note that undefined i is required; if does not exhibit the bug with just undefined.

Changed 3 years ago by nomeata

Attachment: repeated2.hs added

even smaller

comment:6 Changed 3 years ago by akio

Looking at Core and STG outputs for repeated2.hs, it looks like incorrect demand information is causing the problem.

ghc 7.10.20150123 produces the following core for gstep when invoked with -O1 -fno-state-hack:

Rec {
Main.$wgstep [InlPrag=[0], Occ=LoopBreaker]
  :: GHC.Types.Int
     -> GHC.Prim.Int#
     -> (# GHC.Types.Int -> Main.Machine, GHC.Types.Int #)
[GblId, Arity=2, Str=DmdType <L,U(U)><L,U>, Unf=OtherCon []]
Main.$wgstep =
  \ (ww :: GHC.Types.Int) (ww1 [Occ=Once] :: GHC.Prim.Int#) ->
    case GHC.Prim.<# ww1 0 of sat { __DEFAULT ->
    case GHC.Prim.tagToEnum# @ GHC.Types.Bool sat of _ [Occ=Dead] {
      GHC.Types.False ->
        let {
          ipv [Occ=OnceL, Dmd=<L,A>] :: GHC.Types.Int
          [LclId, Str=DmdType]
          ipv =
            let {
              sat [Occ=Once, Dmd=<L,1*U>] :: GHC.Base.String
              [LclId, Str=DmdType]
              sat =
                let {
                  sat [Occ=Once] :: [GHC.Types.Char]
                  [LclId, Str=DmdType]
                  sat =
                    case ww of _ [Occ=Dead] { GHC.Types.I# ww3 [Occ=Once] ->
                    case GHC.Show.$wshowSignedInt 0 ww3 (GHC.Types.[] @ GHC.Types.Char)
                    of _ [Occ=Dead] { (# ww5 [Occ=Once], ww6 [Occ=Once] #) ->
                    GHC.Types.: @ GHC.Types.Char ww5 ww6
                    }
                    } } in
                GHC.CString.unpackAppendCString# "adj "# sat } in
            Debug.Trace.trace @ GHC.Types.Int sat ww } in
        let {
          sat [Occ=Once] :: GHC.Types.Int -> Main.Machine
          [LclId, Str=DmdType]
          sat =
            \ (w [Occ=Once!] :: GHC.Types.Int) ->
              case w of _ [Occ=Dead] { GHC.Types.I# ww3 [Occ=Once] ->
              case Main.$wgstep ipv ww3
              of _ [Occ=Dead] { (# ww5 [Occ=Once], ww6 [Occ=Once] #) ->
              Main.Machine ww5 ww6
              }
              } } in
        (# sat, ww #);
      GHC.Types.True -> case GHC.Err.undefined of _ [Occ=Dead] { }
    }
    }
end Rec }

Note the ipv [Occ=OnceL, Dmd=<L,A>] part. If I understand it correctly, Dmd<L,A> claims that ipv will not be used at all. However, this is wrong and ipv will be used in the recursive call to Main.$wgstep.

This demand info then causes a single-entry closure (\s) to be generated for ipv in the STG.

If I replace the undefined i with just undefined in the source file, the corresponding definition has no demand information, and its closure is marked as updatable (\u) rather than single-entry.

comment:7 Changed 3 years ago by simonpj

Very helpful diagnosis. That does look like a real bug. Thanks.

Keep studying it because I'm tied up right now; but you've got close to the problem!

comment:8 Changed 3 years ago by akio

It looks like the simplifier does not preserve correctness of demand information.

After the worker-wrapper pass, I have this fragment of code

            of m'_axp [Dmd=<L,U(U(U))>] { Array ipv_s1i9 [Dmd=<L,A>] ->
            Main.Machine (gstep m'_axp) t_axr
            }

Here, the demand info on ipv_s1i9 is correct, it is indeed unused.

However, after one iteration of simplifier pass, it becomes:

    of m'_axp [Dmd=<L,U(U(U))>] { Array ipv_s1i9 [Dmd=<L,A>] ->
    (# \ (w_s3Ix :: Int) ->
         case w_s3Ix of _ [Occ=Dead] { GHC.Types.I# ww_X3IZ ->
         case $wgstep_s3II ipv_s1i9 ww_X3IZ
         of _ [Occ=Dead] { (# ww_X3Ja, ww_X3Jc #) ->
         Main.Machine ww_X3Ja ww_X3Jc
         }
         },
       ww_s3IA #)

Now ipv_s1i9 is referenced and will be used, but its usage information has not been updated.

I've found the note [Case alternative occ info] in Simplify.hs, which explains why the occurrence info on case-bound variables should be zapped. I tried zapping the usage info as well as the occurrence info here (patch attached). It seems to compile repeated2.hs fine, but I'm not sure about a few points:

  • Is the approach right? i.e. is it a good idea to manually modify the usage info in the simplifier?
  • Probably the patch is overly conservative, in that it zaps the strictness info as well as the usage info. However, how hard should I try to preserve information? For example, should the code have a special case for when the case binder and the scrutinee are both marked unused?

Changed 3 years ago by akio

Attachment: simple.patch added

comment:9 Changed 3 years ago by simonpj

Akio, good work. But I would be reluctant to zap demand info:

  • Occurrence info is syntactic, and changes often, so needs to be zapped
  • Demand info is (supposed to be) semantic, and so should not change, except that the demand might get stronger; it is a static analysis after all.

So my diagnosis is that the demand analyser is wrong. Consider

f x@(p,_) = if p then foo x else True

foo (p,True) = True
foo (p,q)    = foo (q,p)

Here foo is just a strict function. The interesting one is f. After strictness analysis we get

f = \ (x_an1 [Dmd=<S(SL),1*U(U,1*U)>] :: (Bool, Bool)) ->
    case x_an1
    of wild_X7 [Dmd=<L,1*U(1*U,1*U)>]
    { (p_an2 [Dmd=<S,1*U>], ds_dnz [Dmd=<L,A>]) ->
    case p_an2 of _ {
      False -> GHC.Types.True;
      True -> foo wild_X7 }

Look at that "absent" annotation on ds_dnz. It's not true that ds_dnz cannot be demanded. Look what happens when we inline foo's worker in f:

f = \ (x_an1 :: (Bool, Bool)) ->
    case x_an1
    of _ [Occ=Dead, Dmd=<L,1*U(1*U,1*U)>]
    { (p_an2 [Dmd=<S,1*U>], ds_dnz [Dmd=<L,A>]) ->
    case p_an2 of _ [Occ=Dead, Dmd=<L,A>] {
      False -> GHC.Types.True;
      True -> $wfoo_soq GHC.Types.True ds_dnz   }  }

Look at that! ds_dnz has come back to life in the call to $wfoo_soq! This is utterly bogus.

I think the problem is that the demand analyser should never have labelled it "absent" in the first place.

If you want to go further, look at the product-constructor equation for Case in DmdAnal.dmdAnal'. Here we compute an aggregate scrut_dmd (correctly) and use to to analyse the scrutinee. But that same scrut_dmd should

  • Be attached to the case_bndr
  • Be distributed over the binders of the case alternative.

Can you see how to do that? Should not be hard.

Simon

Last edited 3 years ago by simonpj (previous) (diff)

comment:10 Changed 3 years ago by Simon Peyton Jones <simonpj@…>

In 9f0f99fd41ff82cc223d3b682703e508efb564d2/ghc:

Fix a long-standing bug in the demand analyser

This patch fixes Trac #10148, an outright and egregious
bug in the demand analyser.

It is explained in Note [Demand on case-alternative binders]
in Demand.hs.

I did some other minor refactoring.

To my astonishment I got some big compiler perf changes

* perf/compiler/T5837: bytes allocated -76%
* perf/compiler/T5030: bytes allocated -10%
* perf/compiler/T3294: max bytes used  -25%

Happy days

comment:11 Changed 3 years ago by Simon Peyton Jones <simonpj@…>

comment:12 Changed 3 years ago by simonpj

Milestone: 7.10.2
Test Case: stranal/should_run/T10148

Thank you akio and Joachim for your help in tracking this one down.

Austin: please merge.

Simon

comment:13 Changed 3 years ago by simonpj

PS: on Linux I did not see the big compiler perf changes which I did on Windows. Mysterious.

comment:14 Changed 3 years ago by thomie

Status: newpatch

comment:15 Changed 3 years ago by thomie

Status: patchmerge

comment:16 Changed 3 years ago by thoughtpolice

Resolution: fixed
Status: mergeclosed
Note: See TracTickets for help on using tickets.