Opened 12 years ago

Last modified 14 months ago

#149 new bug (None)

missed CSE opportunity

Reported by: nobody Owned by:
Priority: normal Milestone:
Component: Compiler Version: 5.04.2
Keywords: optimisations Cc: michal.terepeta@…, hackage.haskell.org@…
Operating System: Unknown/Multiple Architecture: Unknown/Multiple
Type of failure: Runtime performance bug Test Case: T149
Blocked By: Blocking:
Related Tickets: Differential Revisions:

Description (last modified by simonmar)

Don't know if this is a bug, but it was at least
_surprising_ to find that

playerMostOccur [a] = a
playerMostOccur (x:xs)
 | numOccur x (x:xs) > numOccur (playerMostOccur xs) xs = x
 | otherwise = playerMostOccur xs

was exponentially slower when compiled with ghc-5.04.2
-O than:

playerMostOccur [a] = a
playerMostOccur (x:xs)
 | numOccur x (x:xs) > numOccur pmo xs = x
 | otherwise = pmo
 where pmo = playerMostOccur xs

Although the student responsible for the code couldn't
spot the
obvious optimisation, I was expecting that GHC's
optimiser would. :)
If it's not a bug, could you explain it to me?

-Greg(gregm.at.cs.uwa.edu.au)

Attachments (2)

CriterionBench.hs (1.2 KB) - added by rizsotto 5 years ago.
criterion benchmark for the problem
CriterionBenchFixed.hs (1.2 KB) - added by simonmar 5 years ago.
fixed bug in CriterionBench (not using whnf)

Download all attachments as: .zip

Change History (29)

comment:1 Changed 10 years ago by simonmar

Logged In: YES 
user_id=48280

Looks like a case where GHC's CSE isn't spotting the common
subexpression.  The CSE in GHC is pretty cheap & cheerful,
there are plenty of ways it could be improved, or even
replaced by a completely new one.

comment:2 Changed 10 years ago by simonmar

  • Summary changed from strange non-optimising to missed CSE opportunity

comment:3 Changed 8 years ago by igloo

  • Architecture set to Unknown
  • Description modified (diff)
  • difficulty set to Unknown
  • Keywords optimisations added
  • Milestone set to _|_
  • Operating System set to Unknown

comment:4 Changed 8 years ago by igloo

  • Milestone changed from _|_ to 6.8

This seems to work in 6.4.1, but not in the HEAD as of a few days ago, so it's probably worth at least looking to see what's going on. I've added a test (simplrun006) to the testsuite.

comment:5 Changed 8 years ago by igloo

  • Test Case set to simplrun006

comment:6 Changed 7 years ago by simonmar

  • Milestone changed from 6.8 branch to 6.8.3
  • Owner nobody deleted
  • Status changed from assigned to new

look into for 6.8.3

comment:7 Changed 7 years ago by igloo

  • Priority changed from low to high

This is a (possibly long-standing) regression.

comment:8 Changed 7 years ago by simonmar

  • Type changed from bug to run-time performance bug

comment:9 Changed 7 years ago by simonmar

  • Description modified (diff)
  • difficulty changed from Unknown to Moderate (1 day)
  • Milestone changed from 6.8.3 to 6.8 branch
  • Priority changed from high to normal
  • severity changed from normal to minor

The problem seems to be that Float Out isn't floating out the let expression in the guard, whereas presumably it used to. The let can't be floated out past a lambda, but nevertheless floating it would reveal an opportunity for CSE.

Plan: try floating out all lets, even if they can't move past a lambda, and measure the difference on nofib.

Not urgent enough for 6.8.3.

comment:10 Changed 7 years ago by igloo

  • Milestone changed from 6.8 branch to 6.10.1

comment:11 Changed 6 years ago by simonmar

  • Architecture changed from Unknown to Unknown/Multiple

comment:12 Changed 6 years ago by simonmar

  • Operating System changed from Unknown to Unknown/Multiple

comment:13 Changed 6 years ago by igloo

  • Milestone changed from 6.10.1 to 6.10.2

comment:14 Changed 6 years ago by simonpj

  • Milestone changed from 6.10.2 to _|_

GHC currently does not do full-blown CSE, partly because it's a bit harder than what is done now, and partly because doing so can introduce space leaks. There is scope for experimentation here, but we're removing it from the immediate milestone.

Simon

comment:15 Changed 5 years ago by simonmar

  • difficulty changed from Moderate (1 day) to Moderate (less than a day)

comment:16 Changed 5 years ago by simonmar

  • Type of failure set to Runtime performance bug

Changed 5 years ago by rizsotto

criterion benchmark for the problem

comment:17 Changed 5 years ago by nominolo

  • patch set to 0

I ran the criterion benchmark on OS X 10.5 (32 bits) with GHC 6.12.1 using the default optimisation level and the runtimes are identical (163ns +/- 2ns).

Changed 5 years ago by simonmar

fixed bug in CriterionBench (not using whnf)

comment:18 Changed 5 years ago by simonmar

The version of the benchmark I just attached demonstrates the problem.

comment:19 Changed 4 years ago by michalt

  • Cc michal.terepeta@… added

I've run the fixed benchhmark on GHC 7.0.2 and the differences are quite small:

warming up
estimating clock resolution...
mean is 5.608272 us (160001 iterations)
found 2835 outliers among 159999 samples (1.8%)
  2620 (1.6%) high severe
estimating cost of a clock call...
mean is 41.08527 ns (43 iterations)
found 4 outliers among 43 samples (9.3%)
  2 (4.7%) high mild
  2 (4.7%) high severe

benchmarking playerMostOccur
collecting 100 samples, 1 iterations each, in estimated 635.0040 ms
bootstrapping with 100000 resamples
mean: 3.506386 ms, lb 3.463573 ms, ub 3.627819 ms, ci 0.950
std dev: 335.3037 us, lb 107.8520 us, ub 701.8364 us, ci 0.950
found 3 outliers among 100 samples (3.0%)
  2 (2.0%) high severe
variance introduced by outliers: 1.000%
variance is unaffected by outliers

benchmarking playerMostOccur'
collecting 100 samples, 2 iterations each, in estimated 720.7155 ms
bootstrapping with 100000 resamples
mean: 3.606333 ms, lb 3.600894 ms, ub 3.616024 ms, ci 0.950
std dev: 36.14909 us, lb 22.68614 us, ub 52.87152 us, ci 0.950
found 10 outliers among 100 samples (10.0%)
  9 (9.0%) high severe
variance introduced by outliers: 0.990%
variance is unaffected by outliers

benchmarking playerMostOccur
collecting 100 samples, 67 iterations each, in estimated 561.6000 ms
bootstrapping with 100000 resamples
mean: 84.15780 us, lb 84.03386 us, ub 84.47981 us, ci 0.950
std dev: 953.0503 ns, lb 438.4662 ns, ub 1.977162 us, ci 0.950
found 7 outliers among 100 samples (7.0%)
  3 (3.0%) high mild
  4 (4.0%) high severe
variance introduced by outliers: 0.990%
variance is unaffected by outliers

benchmarking playerMostOccur'
collecting 100 samples, 70 iterations each, in estimated 566.6148 ms
bootstrapping with 100000 resamples
mean: 81.52763 us, lb 81.32001 us, ub 81.96656 us, ci 0.950
std dev: 1.476358 us, lb 803.8508 ns, ub 2.630875 us, ci 0.950
found 11 outliers among 100 samples (11.0%)
  4 (4.0%) high mild
  7 (7.0%) high severe
variance introduced by outliers: 0.995%
variance is unaffected by outliers

benchmarking playerMostOccur
collecting 100 samples, 1 iterations each, in estimated 923.5859 ms
bootstrapping with 100000 resamples
mean: 7.817296 ms, lb 7.790770 ms, ub 7.877817 ms, ci 0.950
std dev: 193.8747 us, lb 70.80833 us, ub 338.0659 us, ci 0.950
found 4 outliers among 100 samples (4.0%)
  3 (3.0%) high severe
variance introduced by outliers: 0.997%
variance is unaffected by outliers

benchmarking playerMostOccur'
collecting 100 samples, 1 iterations each, in estimated 757.6227 ms
bootstrapping with 100000 resamples
mean: 7.514011 ms, lb 7.495620 ms, ub 7.552087 ms, ci 0.950
std dev: 130.1032 us, lb 72.61251 us, ub 236.9545 us, ci 0.950
found 6 outliers among 100 samples (6.0%)
  4 (4.0%) high mild
  2 (2.0%) high severe
variance introduced by outliers: 0.995%
variance is unaffected by outliers

benchmarking playerMostOccur
collecting 100 samples, 1 iterations each, in estimated 82.58250 s
bootstrapping with 100000 resamples
mean: 777.5068 ms, lb 776.6123 ms, ub 778.8386 ms, ci 0.950
std dev: 5.512690 ms, lb 4.044142 ms, ub 7.447603 ms, ci 0.950
found 10 outliers among 100 samples (10.0%)
  4 (4.0%) high mild
  6 (6.0%) high severe
variance introduced by outliers: 0.990%
variance is unaffected by outliers

benchmarking playerMostOccur'
collecting 100 samples, 1 iterations each, in estimated 76.42000 s
bootstrapping with 100000 resamples
mean: 748.9900 ms, lb 747.7942 ms, ub 750.6694 ms, ci 0.950
std dev: 7.211566 ms, lb 5.591098 ms, ub 9.302462 ms, ci 0.950
found 9 outliers among 100 samples (9.0%)
  3 (3.0%) high mild
  6 (6.0%) high severe
variance introduced by outliers: 0.990%
variance is unaffected by outliers

It's interesting that the times are not the same. On the other hand, it seems
that the original issue has been fixed.

comment:20 Changed 4 years ago by igloo

  • Milestone changed from _|_ to 7.2.1
  • Owner set to igloo
  • Priority changed from normal to high

Thanks for looking into it. If it's fixed, let's add a regression test and close it.

comment:21 Changed 4 years ago by simonmar

I'm fairly sure the bug in general is not fixed, but this particular instance of it might be.

comment:22 Changed 4 years ago by igloo

  • Milestone changed from 7.2.1 to _|_
  • Owner igloo deleted
  • Priority changed from high to normal
  • Test Case changed from simplrun006 to T149

Doesn't look fixed to me. I've changed the test to be less fragile.

comment:23 Changed 2 years ago by liyang

  • Cc hackage.haskell.org@… added

comment:24 Changed 2 years ago by nfrisby

As of my fix for #7796, commit hash c7d80c6524390551b64e9c1d651e1a03ed3c7617, this test is passing again. I made no changes to CSE, though. So it needs to be further robustified. It's on my todo list, but not near the top.

comment:25 Changed 15 months ago by Ian Lynagh <igloo@…>

In ee40de9386961639e9520dee372688183b26aa5e/ghc:

Test for #149 (missed CSE opportunity)

comment:26 Changed 15 months ago by Ian Lynagh <igloo@…>

In c4068e4df8b23e7f51605bd8db9754fee28b6ff0/ghc:

simplrun006 is broken: trac #149

comment:27 Changed 14 months ago by Joachim Breitner <mail@…>

In 393ea739567206d848f53e9ca75f532a49218694/ghc:

Update test cases due to call arity

Some nice improvements on already succeeding test cases (#876, #7954
and #4267)

Test #149 needed a little change, lest call arity causes a allocation
change that we do not want to test here.
Note: See TracTickets for help on using tickets.