Add fudge-factor for performance tests run on non-validate builds
Since I'm not going to get around to this immediately, Trac'ifying for posterity:
These tests have been doing better than expected in the nightlies for some while.
Unexpected failures:
perf/compiler T3064 [stat too good] (normal)
perf/compiler T3294 [stat too good] (normal)
perf/compiler T5642 [stat too good] (normal)
perf/haddock haddock.Cabal [stat too good] (normal)
perf/haddock haddock.base [stat too good] (normal)
Unfortunately, fixing them is not a simple matter of shifting the ranges up, since the tests only exceed expectations on a /perf/ build, so on a normal build such as 'quick', these tests all pass normally.
I could bump up the upper bounds so that the builder stops bleating about them; perhaps we could do something more complicated where the expected performance depends on what level of optimization GHC was built with (but I don't know how to implement this.)
The problem with just widening the bounds to cover 2 different types of build is that it increases the chance that performance changes won't actually be noticed by thge person responsible.
Having different bounds for different build configurations is a pain, because (a) the testsuite has to work out which set of bounds to use, and (b) you now have even more wobbly values to keep up-to-date.
I think perhaps the best thing would be to add some sort of (per-test?) fudge factor for non-validate builds. That way validate will still find performance regressions, like it does today, but other builds are less likely to give false positives. (Igloo)
Trac metadata
Trac field | Value |
---|---|
Version | 7.7 |
Type | Task |
TypeOfFailure | OtherFailure |
Priority | normal |
Resolution | Unresolved |
Component | Build System |
Test case | |
Differential revisions | |
BlockedBy | |
Related | |
Blocking | |
CC | |
Operating system | |
Architecture |