Opened 20 months ago

Last modified 12 months ago

#11822 new bug

Pattern match checker exceeded (2000000) iterations

Reported by: j.waldmann Owned by: gkaracha
Priority: normal Milestone:
Component: Compiler Version: 8.0.1-rc3
Keywords: PatternMatchWarnings Cc: gkaracha, ivanm
Operating System: Unknown/Multiple Architecture: x86_64 (amd64)
Type of failure: Compile-time performance bug Test Case:
Blocked By: Blocking:
Related Tickets: Differential Rev(s):
Wiki Page:

Description

graphviz seems a nice benchmark (not too large, few dependencies) that highlights compiler performance problems on "trivial" source code (cf. https://hackage.haskell.org/package/graphviz-2999.18.0.2/docs/src/Data-GraphViz-Attributes-Complete.html#sameAttribute)

Compilation of Compiling Data.GraphViz.Attributes.Colors.X11 and Data.GraphViz.Attributes.Values takes >= 1 second each, and the following happens:

cabal install graphviz-2999.18.0.2 --allow-newer

...

[17 of 31] Compiling Data.GraphViz.Attributes.Complete ( Data/GraphViz/Attributes/Complete.hs, dist/dist-sandbox-e6f2c124/build/Data/GraphViz/Attributes/Complete.o )

Data/GraphViz/Attributes/Complete.hs:1013:1: warning:
    Pattern match checker exceeded (2000000) iterations in
    an equation for ‘sameAttribute’. (Use -fmax-pmcheck-iterations=n
    to set the maximun number of iterations to n)

I do welcome that ghc emits this warning - I expect you want its occurence to be reported as a bug?

Change History (21)

comment:1 Changed 20 months ago by carter

That's a pretty large data type! Does seem like an example where checking the coverage should work fine as is.... Especially since this isn't a gadt and doesn't have pattern guards right?

comment:2 Changed 20 months ago by simonpj

Cc: gkaracha added
Keywords: PatternMatchWarnings added

comment:3 Changed 20 months ago by NeilMitchell

I've ran into this two times, both regressions since GHC rc3. The two cases are:

Is there any recommendation on how I can avoid this warning without CPP in a way that works on older versions of GHC?

In contrast to the original reporter, I do not welcome that GHC emits this warning. It seems a warning about GHC, not my code, and not something that I can do much about, other than turn off warnings, which seems a little sad.

comment:4 Changed 20 months ago by simonpj

Owner: set to gkaracha

George, this seems odd.

First, there's nothing complicated about the example, so why is it hard.

Second, why has it regressed since 8.0?

Simon

comment:5 in reply to:  4 Changed 20 months ago by thomie

Replying to simonpj:

why has it regressed since 8.0?

Herbert made the following change recently, in Phab:D2095:

-        maxPmCheckIterations    = 10000000,
+        maxPmCheckIterations    = 2000000,

comment:6 Changed 20 months ago by NeilMitchell

I observe there are 5 instances in the GHC compiler where the revised limit was insufficient. That seems quite a high false-positive rate.

comment:7 Changed 20 months ago by gkaracha

Apologies for the late (and long) response. You can find below some explanation about the behavior of the checker, which I hope to clear things out a bit.

Why does this example exceed the limit

In principle, the match of sameAttribute gives rise to 161*161 = 25,921 cases. Also, the counter takes into account the arguments too and most constructors have at least one.

Yet, the most important reason that the match iterates >2000000 times is that all unmatched cases from the first clause are checked with the second, and the resulting ones are checked with the third, etc.

data T = A | B | C

f :: T -> ()
f A = ()
f B = () -- (*)
f C = ()

That being said, for f above, the second clause will be checked against missing cases B and C. This is a pretty small example, but in general this gives rise to (something like) a quadratic number of checks. That's why the iteration counter goes over the roof.

General Comments on Performance

About the general performance of the algorithm, there are some things worth mentioning:

  1. In principle, the new checker is slower than the previous one on average. This makes sense because we check term and type constraints, that the previous checker didn't, so we have to pay something for that. Nevertheless, the main reason that it's slower on average is that the new checker checks uncovered matches eagerly, while the past checker did it lazily (whether the specific match has GADTs or not, the new checking function is in TcM also, to be able to call the type-checker).
  1. Neither I or Simon were happy about the above, but (that's why we opened #11528 but it still needs a lot of work to figure out how to change) the constraint-based approach grew really fast, unless we performed constraint solving as soon as possible, to remove useless cases (as I said above, what is generated from one clause is passed to the next one so this was devastating).
  1. The takeaway is that, in order to avoid extreme cases (like #11195 which is really difficult for the checker), we chose to have predictable behavior for all (with/without guards or GADTs) but a bit slower. The other result is that since the checker is now strict, it had to be bounded, to avoid memory-issues.

I dislike the exceeded-iterations-warning too, but I know no better solution at the moment. My original limit was 10000000, which even worked on #11195 and gave no such warning. However, as Herbert said, memory consumption can be too high, if we let it. The best solution, the way I see it would be to collect statistics (like the examples from Neil or this whole bug report) and fine-tune it. Maybe 10000000 was too high but 2000000 seems to be too low. I do not have anything better to offer at the moment, but I am open to suggestions.

comment:8 Changed 20 months ago by NeilMitchell

Perhaps make it a warning that is off by default? That solves the immediate problem and lets people who do care about the new pattern match checker opt in to issues where their patterns are too complex to be properly matched.

comment:9 Changed 20 months ago by gkaracha

But if we do this silently, we may get no warning at all for non-exhaustive matches, even with -Wall on. Don't you think that this is much worse in terms of safety?

comment:10 Changed 20 months ago by NeilMitchell

There are some warnings that are on by default, and some that are only on with {{-Wall}}. Making this a warning that was not triggered by default but was by {{-Wall}} would give you what you want.

As to my personal opinion, I want my code to compile without warnings. If you can give me a warning that I have an undesirable problem, and I can reasonably mitigate it, I'm happy. If you give me a vague sense of unease about limitations in your checker, and I can't do anything with it other than disable warnings, I'm sad. The warning above doesn't relate to whether there are inexhaustive patterns or not, it's all about whether your checker can prove that, which is a different thing.

comment:11 Changed 20 months ago by carter

I guess one question I have here is whether we can improve the approximation here to have better bounds in the non gadt case? I do like having informative coverage warnings rather than silent anything or resource blowups on my lap top (though 8.0 does claim 1tb of virtual memory either way :))

comment:12 Changed 19 months ago by ivanm

Cc: ivanm added

comment:13 Changed 19 months ago by NeilMitchell

Note that in getting to 2M iterations it took 35s (timings are from loading into ghci). That's a significant slowdown in overall compilation time.

comment:14 Changed 18 months ago by hanjoosten

For what it is worth: I ran into it compiling simple-sql-parser.

comment:15 Changed 14 months ago by niteria

Is it quadratic even for the simplest case?

I have A.gen.sh:

N=$1
echo "module A where"
echo
echo "data X = X"
for i in $(seq 1 $N); do echo "  | X$i"; done
echo
echo "instance Enum X where"
for i in $(seq 1 $N); do echo "  toEnum $i = X$i"; done
for i in $(seq 1 $N); do echo "  fromEnum X$i = $i"; done

Trying to compile for a datatype with 4000 constructors gives me:

$ ./A.gen.sh 4000 > A.hs && ./inplace/bin/ghc-stage2 -dno-debug-output A.hs
[1 of 1] Compiling A                ( A.hs, A.o )

A.hs:8006:3: warning:
    Pattern match checker exceeded (2000000) iterations in
    an equation for ‘fromEnum’. (Use -fmax-pmcheck-iterations=n
    to set the maximun number of iterations to n)

This is not a made up example, we actually have enumeration types with 10k constructors.

comment:16 Changed 14 months ago by simonpj

It would be great if someone had time to really look at this. I think gkaracha has probably now moved on; is that true George?

It's not just a question of making a one-line fix; you'd have to dig into the algorithm.

comment:17 Changed 14 months ago by gkaracha

Replying to niteria:

Is it quadratic even for the simplest case?

Yes, indeed it is, but I don't think that this is the problem here (the previous algorithm is quadratic in this case too, yet lazily). From all the problems we have seen I think that the number of iterations is not a good metric to use to kill the checker: checking a guard counts as a single iteration, yet it calls the (rather expensive) term oracle solveOneEq. Similarly, matching against a GADT constructor calls the type oracle tyOracle. All in all, not all iterations cost the same.

I will build the HEAD one of these days and start looking into it myself but can you check for me how much time is needed to compile your example for 4000 (+1) constructors (increase the -fmax-pmcheck-iterations sufficently so that the checker can run)? I have the impression that it will not take forever, it's just that the number of iterations is a bad metric and makes the checker give up even in cases where there is no need to. Could you check that for me?

Replying to simonpj:

It would be great if someone had time to really look at this. I think gkaracha has probably now moved on; is that true George?

It's not just a question of making a one-line fix; you'd have to dig into the algorithm.

Hi Simon, I will happily look into this. I have been busy researching other stuff but the performance of the checker is still a challenge I wish to address (and the recent comments in bug reports like this one show that it is of great importance for GHC users). Starting next week I will start spending time on it to see what can be done.

comment:18 Changed 14 months ago by niteria

I don't have an optimized HEAD at hand, so I tried it with ghc-8.0 branch:

$ for i in 1 10 100 1000 2000 4000 10000 20000; do echo $i; ./A.gen.sh $i > A.hs && time ghc8.0 -dno-debug-output -fmax-pmcheck-iterations=200000000 A.hs; done
1
[1 of 1] Compiling A                ( A.hs, A.o )

real    0m0.289s
user    0m0.178s
sys     0m0.032s
10
[1 of 1] Compiling A                ( A.hs, A.o )

real    0m0.228s
user    0m0.135s
sys     0m0.031s
100
[1 of 1] Compiling A                ( A.hs, A.o )

real    0m0.299s
user    0m0.225s
sys     0m0.038s
1000
[1 of 1] Compiling A                ( A.hs, A.o )

real    0m2.087s
user    0m1.941s
sys     0m0.112s
2000
[1 of 1] Compiling A                ( A.hs, A.o )

real    0m5.885s
user    0m5.126s
sys     0m0.127s
4000
[1 of 1] Compiling A                ( A.hs, A.o )

real    0m15.612s
user    0m15.094s
sys     0m0.472s
10000
[1 of 1] Compiling A                ( A.hs, A.o )

real    1m31.634s
user    1m29.615s
sys     0m1.873s
20000
[1 of 1] Compiling A                ( A.hs, A.o )

A.hs:40006:3: warning:
    Pattern match checker exceeded (200000000) iterations in
    an equation for ‘fromEnum’. (Use -fmax-pmcheck-iterations=n
    to set the maximun number of iterations to n)

I will measure again after I build HEAD.

comment:19 Changed 14 months ago by niteria

Ok, it doesn't look like the asymptotics changed between 7.10.3 and 8.0. Numbers from 7.10.3:

$ for i in 1 10 100 1000 2000 4000 10000 20000; do echo $i; ./A.gen.sh $i > A.hs && (time ghc7.10.3 -dno-debug-output A.hs) 2>&1 | grep real; done
1
real    0m0.126s
10
real    0m0.161s
100
real    0m0.207s
1000
real    0m1.428s
2000
real    0m3.875s
4000
real    0m11.783s
10000
real    1m13.631s
Last edited 14 months ago by niteria (previous) (diff)

comment:20 Changed 14 months ago by niteria

Results for HEAD (ff225b4957ded752dc017446fccb9708a1f4ec56):

1
real    0m0.201s
10
real    0m0.202s
100
real    0m0.303s
1000
real    0m1.907s
2000
real    0m4.617s
4000
real    0m13.118s
10000
real    1m1.684s

comment:21 Changed 12 months ago by simonpj

Starting next week I will start spending time on it to see what can be done.

George, how's it going?

I've just noticed that in the compiler itself PrelRules.hs doesn't currently blow the limit, but the changes in D2858 require -fmax-pmcheck-iterations=6000000.

Note: See TracTickets for help on using tickets.