Wildcard patterns and COMPLETE sets can lead to misleading redundant pattern-match warnings

changed weight to 5

George, this is your patch.

Replying to [ticket:13363#comment:132903 simonpj]:

George, this is your patch.

Hello, I think this is Matthew's patch. If I remember mpickering's implementation correctly, both {F, TooGoodToBeTrue} and {F,T} are complete (especially in the module where everything is visible). So both combinations are exhaustive:

F, TooGoodToBeTrue -- Using COMPLETE               => _               redundant
F, _               -- Using the Boolean signature  => TooGoodToBeTrue redundant

Hence, I think this is just non-specified semantics, but not necessarily a bug. Maybe Matthew can shed some more light on this.

I think this is a bug because the match is not "redundant" as removing it will change the semantics of the function.

Similarly if you write

f T = 1
f F = 0
f _ = 3

then removing the F case leaves a complete definition but the correct thing to do is to remove the redundant wildcard clause.

Yes, I guess you're right. This seems like the discussion we had on D2669, but I thought you actually disabled the redundancy check when COMPLETE pragmas are involved so that these discrepancies don't show up, with the commit:

Don't warn about redundancy when COMPLETE is involved

But that is not the problem.

This warning comes from the built-in constructor set. I think there needs to be some more clever augmentation to the intermediate result in the case that a result of allCompleteMatches does not contain the constructor which was used to do the lookup. For example, if ToGoodToBeTrue is not in T, F.

changed milestone to %8.4.1

changed milestone to %8.6.1

This ticket won't be resolved in 8.4; remilestoning for 8.6. Do holler if you are affected by this or would otherwise like to work on it.

changed title from Wildcarn patterns and COMPLETE sets can lead to misleading redundant pattern-match warnings to Wildcard patterns and COMPLETE sets can lead to misleading redundant pattern-match warnings

This will not be addressed in GHC 8.6.

changed milestone to %8.8.1

These will not be addressed in GHC 8.6.

mentioned in issue #15554 (closed)

Trac metadata

Trac field	Value
Related	- → #15554 (closed)

changed milestone to %8.10.1

Bumping milestones of low-priority tickets.

added PatternSynonyms label

added error messages label

It seems pattern synonyms simply aren't handled in the pattern match checker yet. The uncovered set after the first clause is just {T}. The second clause tries to match the pattern synonym constructor TooGoodToBeTrue against that in ConCon, but since T /= TooGoodToBeTrue, it will just return an empty covered set. Which is not conservative here; we should instead return something like {TooGoodToBeTrue}`.

mentioned in commit 18c9f36f

mentioned in merge request !685 (closed)

mentioned in commit da98ea80

added pattern match warnings label

As I remarked in !685 (comment 191216), I'm rather baffled by the fact that the uncovered set is {T} instead of {TooGoodToBeTrue}, especially given the existence of the {F, TooGoodToBeTrue} COMPLETE set. Why does the uncovered set become {T} in the first place?

I'm rather baffled by the fact that the uncovered set is {T} instead of {TooGoodToBeTrue}

It is at least the first time we arrive at ConCon, as evident by the -ddump-ec-trace output.

Now that I understand how the pattern match checker interacts with COMPLETE better (at least I hope so), I see that it also tries to split in the ConVar case according to the COMPLETE set. It even arrives at the assumption that catchAll _ is completely overlapped, it probably just doesn't select it as its 'best' solution.

The warnings that we're seeing would make more sense if the coverage checker happened to choose {F, T} (i.e., the original set of data constructors) as its set of conlikes to use. So one question is: should GHC be choosing {F, T} or {F, TooGoodToBeTrue}? Here is the users' guide algorithm by which GHC ranks sets of conlikes:

Fewest uncovered clauses
Fewest redundant clauses
Fewest inaccessible clauses
Whether the match comes from the original set of data constructors or from a COMPLETE pragma (prioritizing the former over the latter)

In this function:

catchAll :: Boolean -> Int
catchAll F               = 0
catchAll TooGoodToBeTrue = 1
catchAll _               = error "impossible"

Neither set leads to any uncovered clauses if they are chosen.
Choosing {F, TooGoodToBeTrue} leads to a single redundant clause (catchAll _ = ...). If we choose {F, T}, however, then I'm not sure what happens—would catchAll TooGoodToBeTrue = ... be considered redundant, or inaccessible? If it's the latter, then {F, T} is the clear winner. If it's the former, then we continue on to step (4), which again favors {F, T}.

To me, favoring {F, T} feels a little strange, since {F, TooGoodToBeTrue} intuitively "feels" like a better fit than {F, T}. Perhaps I feel this way because we explicitly match on TooGoodToBeTrue, a pattern synonym which seems like it ought to bias the algorithm in favor of the COMPLETE set. I'm not quite sure yet how to change the algorithm to reflect this intuition, however.

So, the problem is that in the end, the pattern match checker has to decide between these 2 PmResults:

go
  PmResult
    FromComplete
    [[_]]
    UnPat []
    []
  PmResult
    FromBuiltin
    [[TooGoodToBeTrue]]
    UnPat []
    []

And I think that the latter qualifies as the 'least' result, because _ is more informative than TooGoodToBeTrue, so we warn about TooGoodToBeTrue.

But as we realised, this is wrong, because getResult doesn't consider the order in which patterns occur.

I'm not sure how to parse the -ddump-ec-trace output in #13363 (comment 191229). What is the significance of [[_]] and [[TooGoodToBeTrue]] in each PmResult?

[[_]] stands for the wildcard clause catchAll _ = error "impossible", whereas [[TooGoodToBeTrue]] is a constructor application (from catchAll TooGoodToBeTrue = 1). The latter is strictly more specific than the former, so we select the latter.

What I mean is: what is the significance on the second entry in a PmResult? Is it the list of redundant clauses? Inaccessible clauses? Something else?

It is the list of redundant patterns:

instance Outputable PmResult where
  ppr pmr = hang (text "PmResult") 2 $ vcat
    [ ppr (pmresultProvenance pmr)
    , ppr (pmresultRedundant pmr)
    , ppr (pmresultUncovered pmr)
    , ppr (pmresultInaccessible pmr)
    ]

OK. I don't believe that GHC has any notion of the "informativeness" of a redundant clause. If we assume that GHC adheres to the specification in #13363 (comment 191228), then since both sets would lead to the same number of redundant clauses, we fall through to step (4), which favors {F, T} over {F, TooGoodToBeTrue}.

That makes sense. Still, it's the wrong thing to do, because getResult has no notion of order, whereas the Covered property (i.e., what determined if a clause is redundant) depends on the specific uncovered set at the point it is defined.

The decision to commit to one (or multiple) specific COMPLETE set must be made at the point we see TooGoodToBeTrue, by filtering out every COMPLETE candidate set that doesn't mention TooGoodToBeTrue, for example. But that feels broken for uncovered clauses.

Actually, it's probably a little simpler than that: We commit to a COMPLETE set as soonas the first is 'completed', wrt. the number of clauses we processed.

Edit: I think that's still unsound. Consider

data T = A | B | C
  deriving Eq

pattern BC :: T ...

{-# COMPLETE A BC #-}

f A  = ...
f B  = ...
f BC = ...

This would finally commit to BC and announce that the B clause was redundant.

Frankly, I see no way to support this soundly. Just announcing COMPLETE sets is not enough information to relate to other COMPLETE sets.

Frankly, I see no way to support this soundly.

This is the conclusion that I've come to after pondering these sorts of sticky situations for a while. I do wish that there were an alternative to COMPLETE sets that you could use to indicate that a pattern synonym should be treated "transparently" by the coverage checker (i.e., deliberately mark a pattern synonym as non-abstract). Something like this:

pattern TooGoodToBeTrue = True
{-# TRANSPARENT TooGoodToBeTrue #-}

That way, catchAll TooGoodToBeTrue = ... would be coverage-checked exactly as it someone had written catchAll True = ..., and you'd get the warnings you'd expect as a result.

Maybe we should deactivate redundancy warnings as soon as there is a pattern synonym involved? I think the status quo works quite well for uncovered sets, but is unsound for covered (and divergent) sets.

That might be one approach. Of course, any design that we pick is inevitably going to shift around various warnings, so it would probably be good to draft up the behavior you desire into a GHC proposal to ensure that there is consensus around that design.

That way, catchAll TooGoodToBeTrue = ... would be coverage-checked exactly as it someone had written catchAll True = ..., and you'd get the warnings you'd expect as a result.

That even seems like something we could immediately desugar (do we want this at this point in the pipeline, though?).

it would probably be good to draft up the behavior you desire into a GHC proposal

I'll see what I can do. Probably not today :)

Reading the wiki page, this is interesting

COMPLETE pragmas only specify that a certain match is covering, because of this, we do not warn about redundant or inaccessible matches when the pattern match checking result arises from checking against a COMPLETE pragma.

So, we already don't warn about inaccessibility and redundancy if we pick a COMPLETE set. It's just that we deliberately choose to favor the original data definition's set (which makes sense).

Hmph. Perhaps I'm just naïvely expecting too much, but I would be rather dissatisfied with coverage-checking that doesn't even attempt to warn about redundancy. But then again, I'm already dissatisfied with COMPLETE signatures in general, so perhaps this is a sign that I need to adjust my expectations.

I spent some time thinking about this today.

I think the culprit is that already in the ConVar split, we commit to a particular COMPLETE set and just try to enumerate them all, giving preference to the "best match" afterwards. In reality, there's no single best match and the decision whether a clause is redundant depends on the order in which the clauses appear; a subtlety the current approach completely fails to address.

How about instead we take inspiration from PmNLit and introduce an PmNCon :: [ConLike] -> PmPat 'VA variant in PmPat? Then, instead of splitting a flexible variable on all possible alternatives of a particular COMPLETE set, we just split into the paricular PmCon we see and its corresponding complement PmNCon. Example:

catchAll :: Boolean -> Int
catchAll F               = 0
catchAll TooGoodToBeTrue = 1
catchAll _               = error "impossible"

We begin with a value vector abstraction containing only a flexible variable x. We match it against F, ConVar splits in PmCon F and PmNCon [F].
1. PmCon F is covered by F.
2. PmNCon [F] has no overlap with F
PmNCon [F] is matched against TooGoodToBeTrue. Split in ConNCon case to PmCon TooGoodToBeTrue and PmNCon [F,TooGoodToBeTrue]
1. PmCon TooGoodToBeTrue is covered by TooGoodToBeTrue, and non-redundant at this point!
2. PmNCon [F,TooGoodToBeTrue] is actually vacuous; there's no inhabitant due to the COMPLETE set. Everything's covered. Optionally detect this on construction already.
Since the incoming uncovered set is empty, the wildcard match is detected as redundant.

Success! :) A nice side-effect of this is that we don't generate huge value set abstractions for large enums when we don't check every case anyway, similar to the situation with literals and NLit.

Sounds plausible to me. And has the possible advantage that, in (say) a bit enumeration, when we pattern match on constructor K1, but instead of generating K2, K3, K4, ... Kn, you'll genererate a single "not K1". Maybe that's good for perf!

mentioned in merge request !963 (closed)

I implemented the idea in !963 (closed). I have only tested it for the reproductions from this thread so far, but the results are encouraging; It generates the expected warning for the example from the OP, for example.

It also caught an incomplete match in CmmStackLayout:450 (a match on a CmmNode O C, a GADT, lacking a case for CmmCall). It's not a false positive from what I can tell, so hooray!

As I let it compile, it catches quite a few more alleged incomplete patterns. Have to investigate tomorrow.

It also caught an incomplete match in CmmStackLayout:450 (a match on a CmmNode O C, a GADT, lacking a case for CmmCall).

Can you provide a link to the specific block of code you're talking about here? This change sounds a bit suspicious, but without context I can't gauge for sure if this change is really for the better.

I was talking about this case expression (CmmLayoutStack, not CmmStackLayout). And as I go over it again now, the CmmCall case is clearly exhaustive. So, not quite so conservative, after all, but I think that's just a bug somewhere in my logic.

TLDR; My approach needs !994 (closed). I can also reproduce the below for the classic

foo | True <- x  = 42
    | False <- x = 43

My approach seems to have problems with CoPats/guards. In particular, for the following module

{-# LANGUAGE DataKinds, TypeFamilies, TypeInType, GADTs #-}

module Lib where

data family Sing (a :: k)

data instance Sing (a :: Bool) where
  STrue  :: Sing 'True
  SFalse :: Sing 'False

fromSing :: Sing (a :: Bool) -> Bool
fromSing STrue  = True
fromSing SFalse = False

It will emit a wrong warning

test.hs:12:1: warning: [-Wincomplete-patterns]
    Pattern match(es) are non-exhaustive
    In an equation for `fromSing': Patterns not matched: _
   |
12 | fromSing STrue  = True
   | ^^^^^^^^^^^^^^^^^^^^^^...

This is due to CoPats like the occurrence of STrue on the LHS (it's more like STrue |> co, where co = ax :: Sing{Bool} ~ SingBool) Bool) are translated as (x :: Sing{Bool} a) (STrue <- x |> co).

In the desugaring PmGrd case of pmcheck, x |> co is bound to an auxiliary binding y, which the term oracle immediately solves as y |-> x, ignoring the coercion. Then, the value abstraction variable y is matched against the pattern STrue.

Long story short, the new PmNCon logic leaves a PmNCon y [STrue] behind and will record an inequality False ~ (y = STrue), simplified to False ~ (x = STrue) in the term oracle. The same happens for the SFalse case, so we know that y can't be STrue nor SFalse. Yet, the term oracle will not refute y ~ STrue when asked: With that fact, it will happily simplify False ~ (y = STrue) to False ~ (STrue = STrue) but doesn't reduce any further.

The reason STrue = STrue doesn't reduce is this line: No arguments to either constructor were simplified, so we abort with the worst_case when in fact we can just return truePmExpr.

mentioned in merge request !994 (closed)

Trac field	Value
Version	8.1
Type	Bug
TypeOfFailure	OtherFailure
Priority	normal
Resolution	Unresolved
Component	Compiler
Test case
Differential revisions
BlockedBy
Related
Blocking
CC	mpickering
Operating system
Architecture

Wildcard patterns and COMPLETE sets can lead to misleading redundant pattern-match warnings

Child items 0

Activity

Wildcard patterns and COMPLETE sets can lead to misleading redundant pattern-match warnings

Relates to

Activity