Opened 9 months ago

Last modified 2 months ago

#14899 new bug

Significant compilation time regression between 8.4 and HEAD due to coverage checking

Reported by: RyanGlScott Owned by:
Priority: highest Milestone: 8.8.1
Component: Compiler Version: 8.5
Keywords: PatternMatchWarnings, newcomer Cc:
Operating System: Unknown/Multiple Architecture: Unknown/Multiple
Type of failure: None/Unknown Test Case:
Blocked By: Blocking:
Related Tickets: Differential Rev(s):
Wiki Page:

Description

Consider the following program:

{-# LANGUAGE GADTs #-}
{-# LANGUAGE ScopedTypeVariables #-}
{-# LANGUAGE TemplateHaskell #-}
{-# LANGUAGE TypeFamilies #-}
{-# LANGUAGE TypeInType #-}
module Bug where

data family Sing (z :: k)

class SEq k where
  (%==) :: forall (a :: k) (b :: k). Sing a -> Sing b -> ()
  infix 4 %==

data Foo a b c d
  = A a b c d |
    B a b c d |
    C a b c d |
    D a b c d |
    E a b c d |
    F a b c d

data instance Sing (z_awDE :: Foo a b c d) where
    SA :: Sing a -> Sing b -> Sing c -> Sing d -> Sing ('A a b c d)
    SB :: Sing a -> Sing b -> Sing c -> Sing d -> Sing ('B a b c d)
    SC :: Sing a -> Sing b -> Sing c -> Sing d -> Sing ('C a b c d)
    SD :: Sing a -> Sing b -> Sing c -> Sing d -> Sing ('D a b c d)
    SE :: Sing a -> Sing b -> Sing c -> Sing d -> Sing ('E a b c d)
    SF :: Sing a -> Sing b -> Sing c -> Sing d -> Sing ('F a b c d)

$([d| instance (SEq a, SEq b, SEq c, SEq d) => SEq (Foo a b c d) where
        (%==) (SA _ _ _ _) (SA _ _ _ _) = ()
        (%==) (SA _ _ _ _) (SB _ _ _ _) = ()
        (%==) (SA _ _ _ _) (SC _ _ _ _) = ()
        (%==) (SA _ _ _ _) (SD _ _ _ _) = ()
        (%==) (SA _ _ _ _) (SE _ _ _ _) = ()
        (%==) (SA _ _ _ _) (SF _ _ _ _) = ()
        (%==) (SB _ _ _ _) (SA _ _ _ _) = ()
        (%==) (SB _ _ _ _) (SB _ _ _ _) = ()
        (%==) (SB _ _ _ _) (SC _ _ _ _) = ()
        (%==) (SB _ _ _ _) (SD _ _ _ _) = ()
        (%==) (SB _ _ _ _) (SE _ _ _ _) = ()
        (%==) (SB _ _ _ _) (SF _ _ _ _) = ()
        (%==) (SC _ _ _ _) (SA _ _ _ _) = ()
        (%==) (SC _ _ _ _) (SB _ _ _ _) = ()
        (%==) (SC _ _ _ _) (SC _ _ _ _) = ()
        (%==) (SC _ _ _ _) (SD _ _ _ _) = ()
        (%==) (SC _ _ _ _) (SE _ _ _ _) = ()
        (%==) (SC _ _ _ _) (SF _ _ _ _) = ()
        (%==) (SD _ _ _ _) (SA _ _ _ _) = ()
        (%==) (SD _ _ _ _) (SB _ _ _ _) = ()
        (%==) (SD _ _ _ _) (SC _ _ _ _) = ()
        (%==) (SD _ _ _ _) (SD _ _ _ _) = ()
        (%==) (SD _ _ _ _) (SE _ _ _ _) = ()
        (%==) (SD _ _ _ _) (SF _ _ _ _) = ()
        (%==) (SE _ _ _ _) (SA _ _ _ _) = ()
        (%==) (SE _ _ _ _) (SB _ _ _ _) = ()
        (%==) (SE _ _ _ _) (SC _ _ _ _) = ()
        (%==) (SE _ _ _ _) (SD _ _ _ _) = ()
        (%==) (SE _ _ _ _) (SE _ _ _ _) = ()
        (%==) (SE _ _ _ _) (SF _ _ _ _) = ()
        (%==) (SF _ _ _ _) (SA _ _ _ _) = ()
        (%==) (SF _ _ _ _) (SB _ _ _ _) = ()
        (%==) (SF _ _ _ _) (SC _ _ _ _) = ()
        (%==) (SF _ _ _ _) (SD _ _ _ _) = ()
        (%==) (SF _ _ _ _) (SE _ _ _ _) = ()
        (%==) (SF _ _ _ _) (SF _ _ _ _) = () |])

It takes significantly longer to compile this program on 8.4 and HEAD:

$ /opt/ghc/8.4.1/bin/ghc --version
The Glorious Glasgow Haskell Compilation System, version 8.4.1
$ time /opt/ghc/8.4.1/bin/ghc Bug.hs -fforce-recomp
[1 of 1] Compiling Bug              ( Bug.hs, Bug.o )

real    0m0.285s
user    0m0.236s
sys     0m0.036s
$ /opt/ghc/head/bin/ghc --version                      
The Glorious Glasgow Haskell Compilation System, version 8.5.20180306
$ time /opt/ghc/head/bin/ghc Bug.hs -fforce-recomp     
[1 of 1] Compiling Bug              ( Bug.hs, Bug.o )

real    0m29.684s
user    0m29.656s
sys     0m0.060s

The reason for this regression is somewhat incidental—it's due to commit ffb2738f86c4e4c3f0eaacf0a95d7326fdd2e383 (Fix #14838 by marking TH-spliced code as FromSource). Before that commit, we were supressing pattern-match coverage checking entirely on TH-quoted code. We no longer do this, which means that we coverage-check the TH-quoted instance in that program, which appears to be why it takes so long to compile.

This is a serious issue in practice because a good chunk of singletons-generated code is of this form, which means that a good amount of code is effectively uncompilable on GHC HEAD now. (See, for instance, this Travis failure on GHC HEAD.)

Change History (9)

comment:1 Changed 9 months ago by RyanGlScott

Curiously, data family instances seem to play a role in this. If I replace the data family formulation of Sing with a normal datatype:

{-# LANGUAGE GADTs #-}
{-# LANGUAGE ScopedTypeVariables #-}
{-# LANGUAGE TemplateHaskell #-}
{-# LANGUAGE TypeFamilies #-}
{-# LANGUAGE TypeInType #-}
module Bug where

class SEq k where
  (%==) :: forall (a :: k) (b :: k). Sing a -> Sing b -> ()
  infix 4 %==

data Foo a b c d
  = A a b c d |
    B a b c d |
    C a b c d |
    D a b c d |
    E a b c d |
    F a b c d

data Sing (z_awDE :: k) where
    SA :: Sing a -> Sing b -> Sing c -> Sing d -> Sing ('A a b c d)
    SB :: Sing a -> Sing b -> Sing c -> Sing d -> Sing ('B a b c d)
    SC :: Sing a -> Sing b -> Sing c -> Sing d -> Sing ('C a b c d)
    SD :: Sing a -> Sing b -> Sing c -> Sing d -> Sing ('D a b c d)
    SE :: Sing a -> Sing b -> Sing c -> Sing d -> Sing ('E a b c d)
    SF :: Sing a -> Sing b -> Sing c -> Sing d -> Sing ('F a b c d)

$([d| instance (SEq a, SEq b, SEq c, SEq d) => SEq (Foo a b c d) where
        (%==) (SA _ _ _ _) (SA _ _ _ _) = ()
        (%==) (SA _ _ _ _) (SB _ _ _ _) = ()
        (%==) (SA _ _ _ _) (SC _ _ _ _) = ()
        (%==) (SA _ _ _ _) (SD _ _ _ _) = ()
        (%==) (SA _ _ _ _) (SE _ _ _ _) = ()
        (%==) (SA _ _ _ _) (SF _ _ _ _) = ()
        (%==) (SB _ _ _ _) (SA _ _ _ _) = ()
        (%==) (SB _ _ _ _) (SB _ _ _ _) = ()
        (%==) (SB _ _ _ _) (SC _ _ _ _) = ()
        (%==) (SB _ _ _ _) (SD _ _ _ _) = ()
        (%==) (SB _ _ _ _) (SE _ _ _ _) = ()
        (%==) (SB _ _ _ _) (SF _ _ _ _) = ()
        (%==) (SC _ _ _ _) (SA _ _ _ _) = ()
        (%==) (SC _ _ _ _) (SB _ _ _ _) = ()
        (%==) (SC _ _ _ _) (SC _ _ _ _) = ()
        (%==) (SC _ _ _ _) (SD _ _ _ _) = ()
        (%==) (SC _ _ _ _) (SE _ _ _ _) = ()
        (%==) (SC _ _ _ _) (SF _ _ _ _) = ()
        (%==) (SD _ _ _ _) (SA _ _ _ _) = ()
        (%==) (SD _ _ _ _) (SB _ _ _ _) = ()
        (%==) (SD _ _ _ _) (SC _ _ _ _) = ()
        (%==) (SD _ _ _ _) (SD _ _ _ _) = ()
        (%==) (SD _ _ _ _) (SE _ _ _ _) = ()
        (%==) (SD _ _ _ _) (SF _ _ _ _) = ()
        (%==) (SE _ _ _ _) (SA _ _ _ _) = ()
        (%==) (SE _ _ _ _) (SB _ _ _ _) = ()
        (%==) (SE _ _ _ _) (SC _ _ _ _) = ()
        (%==) (SE _ _ _ _) (SD _ _ _ _) = ()
        (%==) (SE _ _ _ _) (SE _ _ _ _) = ()
        (%==) (SE _ _ _ _) (SF _ _ _ _) = ()
        (%==) (SF _ _ _ _) (SA _ _ _ _) = ()
        (%==) (SF _ _ _ _) (SB _ _ _ _) = ()
        (%==) (SF _ _ _ _) (SC _ _ _ _) = ()
        (%==) (SF _ _ _ _) (SD _ _ _ _) = ()
        (%==) (SF _ _ _ _) (SE _ _ _ _) = ()
        (%==) (SF _ _ _ _) (SF _ _ _ _) = () |])

Then the compilation time for GHC HEAD goes back to being the same as in 8.4.1.

comment:2 Changed 9 months ago by RyanGlScott

This is bad. Really bad.

I looked briefly into the GHC source, and found this ominous Note:

Note [Translate CoPats]
~~~~~~~~~~~~~~~~~~~~~~~
The pattern match checker did not know how to handle coerced patterns `CoPat`
efficiently, which gave rise to #11276. The original approach translated
`CoPat`s:

    pat |> co    ===>    x (pat <- (e |> co))

Instead, we now check whether the coercion is a hole or if it is just refl, in
which case we can drop it. Unfortunately, data families generate useful
coercions so guards are still generated in these cases and checking data
families is not really efficient.

If that is to be believed, then is coverage-checking data family instances really doomed to be slow?

comment:3 Changed 9 months ago by RyanGlScott

To make the data type version of the program as slow to compile as the data family instance version, you can use explicit guards:

{-# LANGUAGE GADTs #-}
{-# LANGUAGE ScopedTypeVariables #-}
{-# LANGUAGE TemplateHaskell #-}
{-# LANGUAGE TypeFamilies #-}
{-# LANGUAGE TypeInType #-}
module Bug where

class SEq k where
  (%==) :: forall (a :: k) (b :: k). Sing a -> Sing b -> ()
  infix 4 %==

data Foo a b c d
  = A a b c d |
    B a b c d |
    C a b c d |
    D a b c d |
    E a b c d |
    F a b c d

data Sing (z_awDE :: k) where
    SA :: Sing a -> Sing b -> Sing c -> Sing d -> Sing ('A a b c d)
    SB :: Sing a -> Sing b -> Sing c -> Sing d -> Sing ('B a b c d)
    SC :: Sing a -> Sing b -> Sing c -> Sing d -> Sing ('C a b c d)
    SD :: Sing a -> Sing b -> Sing c -> Sing d -> Sing ('D a b c d)
    SE :: Sing a -> Sing b -> Sing c -> Sing d -> Sing ('E a b c d)
    SF :: Sing a -> Sing b -> Sing c -> Sing d -> Sing ('F a b c d)

instance (SEq a, SEq b, SEq c, SEq d) => SEq (Foo a b c d) where
  (%==) x y
    | SA {} <- x
    , SA {} <- y
    = ()

    | SA {} <- x
    , SB {} <- y
    = ()

    | SA {} <- x
    , SC {} <- y
    = ()

    | SA {} <- x
    , SD {} <- y
    = ()

    | SA {} <- x
    , SE {} <- y
    = ()

    | SA {} <- x
    , SF {} <- y
    = ()

    | SB {} <- x
    , SA {} <- y
    = ()

    | SB {} <- x
    , SB {} <- y
    = ()

    | SB {} <- x
    , SC {} <- y
    = ()

    | SB {} <- x
    , SD {} <- y
    = ()

    | SB {} <- x
    , SE {} <- y
    = ()

    | SB {} <- x
    , SF {} <- y
    = ()

    | SC {} <- x
    , SA {} <- y
    = ()

    | SC {} <- x
    , SB {} <- y
    = ()

    | SC {} <- x
    , SC {} <- y
    = ()

    | SC {} <- x
    , SD {} <- y
    = ()

    | SC {} <- x
    , SE {} <- y
    = ()

    | SC {} <- x
    , SF {} <- y
    = ()

    | SD {} <- x
    , SA {} <- y
    = ()

    | SD {} <- x
    , SB {} <- y
    = ()

    | SD {} <- x
    , SC {} <- y
    = ()

    | SD {} <- x
    , SD {} <- y
    = ()

    | SD {} <- x
    , SE {} <- y
    = ()

    | SD {} <- x
    , SF {} <- y
    = ()

    | SE {} <- x
    , SA {} <- y
    = ()

    | SE {} <- x
    , SB {} <- y
    = ()

    | SE {} <- x
    , SC {} <- y
    = ()

    | SE {} <- x
    , SD {} <- y
    = ()

    | SE {} <- x
    , SE {} <- y
    = ()

    | SE {} <- x
    , SF {} <- y
    = ()

    | SF {} <- x
    , SA {} <- y
    = ()

    | SF {} <- x
    , SB {} <- y
    = ()

    | SF {} <- x
    , SC {} <- y
    = ()

    | SF {} <- x
    , SD {} <- y
    = ()

    | SF {} <- x
    , SE {} <- y
    = ()

    | SF {} <- x
    , SF {} <- y
    = ()

comment:4 Changed 9 months ago by RyanGlScott

(Warning: half-baked thoughts follow.)

For reasons that I don't fully understand, it seems that coverage-checking guards is not nearly as efficient as coverage-checking raw constructor patterns. In that case, a question arises: why are we desugaring coercion patterns (like what data family constructors give you) to guards? It seems that we could handle coercion patterns in a fairly natural way by extending the algorithm from GADTs Meet Their Match slightly.

First, we could extend the pattern syntax (figure 2 from the GADTs Meet Their Match paper) to include coercion patterns directly:

p,x ::= x | K p_1 ... p_n | G | (p |> co)

Where co : τ_1 ~ τ_2 is a coercion.

Then, we could extend the coverage checking algorithm in figure 3 to include a case for coercion patterns. For instance:

C ((p |> co) q_1 ... q_n) (Γ ⊢ u_0 u_1 ... u_n ⊳ Δ)
  = C (p q_1 ... q_n) (Γ, y : τ_1 ⊢ y u_1 ... u_n ⊳ Δ')
  where
    Γ ⊢ p   : τ_1
    Γ ⊢ u_0 : τ_2
    Γ ⊢ co  : τ_1 ~ τ_2
    y # Γ
    Δ' = Δ ∪ y ≈ (u_0 |> sym co)

And similarly for U and D. That way, we wouldn't need guards at all here—we'd just have an extra case for coercion patterns that "pushes through" the types as necessary.

Does this sound plausible?

Last edited 6 months ago by bgamari (previous) (diff)

comment:5 Changed 7 months ago by simonpj

Does this sound plausible?

Yes, something like that looks plausible, except that I think the last line should be more like

    Δ' = Δ ∪ u_0 ≈ (y |> co)

(patterns p don't appear in expressions).

comment:6 Changed 6 months ago by bgamari

It seems to me like comment:4 (which I updated to fix a few mistakes) is only part of the story. In particular, the constraint Δ' proposed in that comment is not currently admitted under the constraint syntax given in the paper. In particular, the right hand size of a term equality (e.g. the production) is expected to be an expression. However, there is no way to lift (u_0 |> sym co) into an expression. More generally, it doesn't make sense to me to lift a value abstraction like u_0 into an expression. Perhaps a new constraint variety is needed?

comment:7 Changed 4 months ago by bgamari

Milestone: 8.6.18.8.1

Unfortunately the summary here suggests that the status quo isn't acceptable for release but the future direction is rather unclear. I can see a few possibilities:

  1. Teach the coverage checker to bail out (with a warning) if it sees a "large" pattern of this form
  2. Allow a TH splice to explicitly request that coverage checking be disabled. I'm not very fond of this approach since it would involve changing the user-facing TH interface all to simply work-around the issue.
  3. Add a new FromTH Origin and a flag allowing the user to choose whether to run the coverage checker on TH splices.
  4. Advise users to disable coverage checking in any module affected by the bug
  5. Revert ffb2738f86c4e4c3f0eaacf0a95d7326fdd2e383 for 8.6
  6. Figure out how to efficiently treat data families in the coverage checkerr.

Frankly, all of these options seem pretty terrible given that this pattern is problematic only due to a (hopefully temporary) limitation of the coverage checker. That being said, we clearly need to do something for 8.6.

Last edited 2 months ago by bgamari (previous) (diff)

comment:8 Changed 3 months ago by guibou

@bgamari I'm sorry, but your second point in comment:7 seems truncated. That's minor, but I'm interested by your conclusion on that point. Thank you.

comment:9 Changed 2 months ago by bgamari

Keywords: newcomer added

Unfortunately there wasn't really a conclusion (hence nothing happened for 8.6.1. Task (6) above would make for a great task for someone with a free weekend or three. It would require working out the theory but the coverage checker paper is fairly accessible.

Note: See TracTickets for help on using tickets.