Opened 7 years ago

Last modified 5 months ago

#2132 new bug

Optimise nested comparisons

Reported by: simonpj Owned by:
Priority: normal Milestone:
Component: Compiler Version: 6.8.2
Keywords: Cc: v.dijk.bas@…, hackage.haskell.org@…, carter.schonwald@…, jan.stolarek@…, tom.schrijvers@…
Operating System: Unknown/Multiple Architecture: Unknown/Multiple
Type of failure: Runtime performance bug Test Case:
Blocked By: Blocking:
Related Tickets: Differential Revisions:

Description

GHC isn't capable of this optimisation:

  case (x ># y) of             ==>    case (x ># y) of
    True -> ...(x ==# y)...               True -> ...False...
    False -> ...                          False -> ...

That is, knowing that (x>y) we know that the two are not equal.

Also, consider this:

  case (x ># y) of               ==>   case (x >=# y) of
     True -> e                           True -> e
     False -> case (x ==# y) of          False -> e'
                True -> e
                False -> e'

Again this needs special knowlege about comparison operators. However, it does arise. Consider this:

  data T = MkT Int deriving( Eq, Ord )

The derived (>) operation looks like this:

Foo.$dm> =
  \ (eta_a8q :: Foo.T) (eta1_a8r :: Foo.T) ->
    case eta_a8q of wild_B1 { Foo.MkT a1_a60 ->
    case eta1_a8r of wild1_XO { Foo.MkT b1_a62 ->
    case a1_a60 of wild2_a9I { GHC.Base.I# x#_a9K ->
    case b1_a62 of wild11_a9M { GHC.Base.I# y#_a9O ->
    case GHC.Prim.<# x#_a9K y#_a9O of wild3_a9W {
      GHC.Base.False ->
        case GHC.Prim.==# x#_a9K y#_a9O of wild12_a9Z {
          GHC.Base.False -> GHC.Base.True; GHC.Base.True -> GHC.Base.False
        };
      GHC.Base.True -> GHC.Base.False

See also #2130

Attachments (3)

0001-Optimize-nested-comparisons-ticket-2132.patch (33.1 KB) - added by michalt 3 years ago.
Initial implementation.
0001-Optimize-nested-comparisons-ticket-2132.2.patch (33.8 KB) - added by michalt 3 years ago.
Fixed patch.
nofib (154.2 KB) - added by michalt 3 years ago.
Nofib results.

Download all attachments as: .zip

Change History (39)

comment:1 Changed 7 years ago by igloo

  • Milestone set to 6.8.3

comment:2 Changed 7 years ago by igloo

  • Milestone changed from 6.8.3 to 6.10.1

comment:3 Changed 7 years ago by igloo

  • Milestone changed from 6.10.1 to 6.10 branch

comment:4 Changed 6 years ago by simonmar

  • Architecture changed from Unknown to Unknown/Multiple

comment:5 Changed 6 years ago by simonmar

  • Operating System changed from Unknown to Unknown/Multiple

comment:6 Changed 6 years ago by igloo

  • Milestone changed from 6.10 branch to 6.12 branch

comment:7 Changed 5 years ago by simonmar

  • Type of failure set to Runtime performance bug

comment:8 Changed 5 years ago by igloo

  • Milestone changed from 6.12 branch to 6.12.3

comment:9 Changed 5 years ago by igloo

  • Milestone changed from 6.12.3 to 6.14.1
  • Priority changed from normal to low

comment:10 Changed 4 years ago by igloo

  • Milestone changed from 7.0.1 to 7.0.2

comment:11 Changed 4 years ago by igloo

  • Milestone changed from 7.0.2 to 7.2.1

comment:12 Changed 3 years ago by igloo

  • Milestone changed from 7.2.1 to 7.4.1

Changed 3 years ago by michalt

Initial implementation.

comment:13 Changed 3 years ago by michalt

  • Status changed from new to patch

Attached is my initial implementation of a core-to-core pass that tries to
optimize away unnecessary comparisons. Right now it is able to handle code like:

  case x < y of
    True -> .. case y <= z of
      True -> .. x == z ..  -- always false

or

  case x < 10 of
    True -> .. case y > 20 of
      True -> .. x == y ..  -- always false
              .. x > 15 ..  -- always false

It basically stores the relations between variables and their intervals and uses
both to determine if a result of some comparison is known. So far it does not
handle complex expressions (like anything that includes arithmetic), so:

  case x + 10 < y - 3 of
    True -> .. x < y ..  -- always false but not handled

It also doesn't handle the second example in the ticket description.

The patch validates (i.e. I've compiled stage2 and libraries with the new
optimization and it validates). I've also run nofib but I haven't noticed
anything that really stands out (it does find some more than 60 unnecessary
comparisons in nofib and in case of GHC/libraries more than 100).

So what do you think about it? Let me know what should be fixed/changed etc.
Also I didn't write any tests yet, since I'm not sure how to actually test it
--- would it be enough to simply write a few cases where the optimization should
fire and some where it shouldn't and then just make sure that the result is
as expected?

comment:14 Changed 3 years ago by simonmar

Wow, there's quite a lot of code here. Simon: we need to review and decide what to do. Somehow this optimisation feels similar to what the simplifier already does with case expressions (eliminating redundant cases, and branches that can't match), and perhaps it ought to be incorporated into the simplifier.

comment:15 Changed 3 years ago by tibbe

It's also similar to what a forward substitution pass at a lower level might do.

comment:16 Changed 3 years ago by igloo

  • Owner set to simonpj

comment:17 follow-up: Changed 3 years ago by basvandijk

  • Cc v.dijk.bas@… added

Would it be hard to extend the language of RULES so that a user can perform this optimization instead of the compiler? I'm thinking about something like this:

{-# RULES
forall x y t f.
  (case (x ># y) of
     True  -> t
     False -> f
  ) 
  =
  (case (x ># y) of
     True  -> t {{(x ==# y)=False}}
     False -> f
  )
  #-}

The v {{subrules}} syntax assigns sub-RULES to the variable v. subrules is a list of RULES of the form p1 = e1, p2 = e2 ... pn = en which are applied to the expression that v refers to.

I think it would be nice if sub-RULES have the same (or a very similar) syntax that normal RULES have. They should even be able to quantify new variables.

The way I expect this to be used most often is that the sub-RULES describe the actual substitutions while the top-level RULE only describes context. However it should be possible that both the top-level and the sub-rules describe transformations.

This probably deserves its own ticket so I stop now.

comment:18 in reply to: ↑ 17 ; follow-up: Changed 3 years ago by michalt

Replying to basvandijk:

Would it be hard to extend the language of RULES so that a user can perform
this optimization instead of the compiler? ..

Note that this optimization goes a bit further than your example. For
instance, once it knows x ># y and y >=# z it can optimize away x <# z.
Similarly knowing x ># 100 and y <# 10 it can get rid of x ==# y.
That would be probably quite difficult to do with rules (even extended).

comment:19 in reply to: ↑ 18 Changed 3 years ago by basvandijk

Replying to michalt:

Note that this optimization goes a bit further than your example. For
instance, once it knows x ># y and y >=# z it can optimize away x <# z.

This is possible with sub-rules:

{-# RULES
forall x y t f.
  (case (x ># y) of
     True  -> t
     False -> f
  ) 
  =
  (case (x ># y) of
     True  -> t {{ forall z t2 f2. 
                   (case (y >=# z) of
                     True  -> t2
                     False -> f2
                   )
                   =
                   (case (y >=# z) of
                     True  -> t2 {{(x <# z)=False}}
                     False -> f2
                   )
                }}
     False -> f
  )
  #-}

Similarly knowing x ># 100 and y <# 10 it can get rid of x ==# y.
That would be probably quite difficult to do with rules (even extended).

Encoding it for the concrete 100 and 10 is easy. However encoding it for all numbers is not possible without having the ability to reason about numbers in the language of RULES.

I'm sure your patch is way more powerful since I expect it will have this ability. I do wonder if sub-rules have other useful applications?

comment:20 follow-up: Changed 3 years ago by simonmar

Another thought: this can be a plugin, and shipped via Hackage, right?

comment:21 in reply to: ↑ 20 Changed 3 years ago by michalt

Replying to simonmar:

Another thought: this can be a plugin, and shipped via Hackage, right?

(I'm assuming you're referring to the optimization, not the idea of sub-rules)

Yes it could be made a plugin. Although I'm not sure it would be used
by anyone then (personally I use Hackage only when I need some library,
etc. and probably wouldn't have thought to search for compiler
optimization, unless it is something really huge).

comment:22 Changed 3 years ago by michalt

  • Owner simonpj deleted
  • Status changed from patch to new

The patch doesn't take into account variable shadowing --- I'll try to fix that
once I have some more time. I've never noticed any problems probably because
simplifier removes (most of) it and the optimization fires only in pretty
special cases (nested comparisons, no arithmetic)..

comment:23 Changed 3 years ago by simonpj

Also it'd be good to do a nofib run to see what performance boost you get

comment:24 Changed 3 years ago by igloo

  • Milestone changed from 7.4.1 to 7.6.1
  • Priority changed from low to lowest

Changed 3 years ago by michalt

Fixed patch.

Changed 3 years ago by michalt

Nofib results.

comment:25 Changed 3 years ago by michalt

  • Status changed from new to patch

I think I've fixed the problem with shadowing (I've basically followed the
approach used in CSE). Everything validates.

However, the nofib results are not very encouraging (I've attached the whole
output of nofib-analyse):

        Program           Size    Allocs   Runtime   Elapsed  TotalMem
--------------------------------------------------------------------------------
...
--------------------------------------------------------------------------------
            Min          -0.1%     -0.0%     -3.5%     -3.8%     -4.8%
            Max          +0.4%     +0.0%     +2.0%     +2.0%     +0.0%
 Geometric Mean          -0.0%     -0.0%     -1.0%     -1.0%     -0.1%

(the results are: clean GHC and ordinary nofib run vs GHC with the patch and
nofib run with EXTRA_HC_OPTS=-fcomparisons).
I've used the debugging output to see how often the optimization is actually
removing some comparison: in case of GHC compile it managed to remove around
1800 comparisons, in nofib only around 40. So I'm not sure this is worth the
whole additional optimization. It seems that either the patch is not going far
enough or there simply isn't that many opportunities for optimization...? If it's
the former, then the main idea would probably be to extend the pass to be able
to deal with arithmetic [1]. If it's the latter then maybe we should just close
it as wontfix? What do you think?

[1] Handling, for example things like:

  case x <# y + 5 of
    case x <# y + 4 of
      ...

but this will be quite a bit more complex as we need to take into account
integer overflows, etc.

comment:26 follow-up: Changed 3 years ago by simonmar

C/C++ compilers use abstract interpretation to do this, e.g. http://en.wikipedia.org/wiki/Sparse_conditional_constant_propagation. This could be expressed very nicely as a Hoopl optimisation, which would deal nicely with loops too. However, at the Hoopl level we would only be able to optimise within a single function or closure, whereas doing this at the Core level has far greater reach.

I don't have any specific comments about this particular implementation - there's a lot of code and I haven't fully reviewed it. I still think the best way forward is for this to be a plugin; I think all the infrastructure is in place to make this work.

comment:27 in reply to: ↑ 26 Changed 3 years ago by michalt

Replying to simonmar:

C/C++ compilers use abstract interpretation to do this, e.g.
http://en.wikipedia.org/wiki/Sparse_conditional_constant_propagation. This
could be expressed very nicely as a Hoopl optimisation, which would deal
nicely with loops too. However, at the Hoopl level we would only be able to
optimise within a single function or closure, whereas doing this at the Core
level has far greater reach.

Right, using abstract interpretation would definitely find more optimization
opportunities. I might look into that if I have some more free time.

Also, I think that doing it on a level of C-- or in a C/C++ compiler would
require some relational abstract domain (e.g., octagons), right? AFAIK SCCP
would only be able to optimize conditions if the values are actually constant.
Are C/C++ compilers actually doing any relational analysis (apart from things
like Graphite in GCC or Polly in LLVM)?

I don't have any specific comments about this particular implementation -
there's a lot of code and I haven't fully reviewed it. I still think the best
way forward is for this to be a plugin; I think all the infrastructure is in
place to make this work.

Now that I think about, you're probably right -- I'll refactor it into a GHC
plugin and upload to github and hackage.

Not sure if you want to keep the ticket open...

comment:28 Changed 3 years ago by igloo

  • Milestone changed from 7.6.1 to 7.6.2

comment:29 Changed 2 years ago by igloo

  • Resolution set to wontfix
  • Status changed from patch to closed

If the idea is to implement it as a plugin separate from GHC, then I think we should close the ticket.

Thanks for all your work on this though, michalt.

comment:30 Changed 2 years ago by michalt

Yes, that makes sense. Also the plugin is on github if anyone is interested:
https://github.com/michalt/ghc-comparisons-plugin

comment:31 Changed 2 years ago by simonpj

  • Resolution wontfix deleted
  • Status changed from closed to new

I'm re-opening this, since it came up again in #7378. I wonder if a very simple pass, enabled with -O2, might be worth while. No overflow magic, just simple comparisons.

Simon

comment:32 Changed 2 years ago by simonpj

  • Milestone changed from 7.6.2 to _|_
  • Priority changed from lowest to normal

comment:33 Changed 2 years ago by liyang

  • Cc hackage.haskell.org@… added

comment:34 Changed 2 years ago by carter

  • Cc carter.schonwald@… added

comment:35 Changed 21 months ago by jstolarek

  • Cc jan.stolarek@… added

comment:36 Changed 5 months ago by TomSchrijvers

  • Cc tom.schrijvers@… added
Note: See TracTickets for help on using tickets.