Opened 13 months ago

Closed 7 months ago

#12503 closed bug (fixed)

Template Haskell regression: GHC erroneously thinks a type variable is also a kind

Reported by: RyanGlScott Owned by: goldfire
Priority: high Milestone: 8.2.1
Component: Template Haskell Version: 8.0.1
Keywords: TypeInType Cc: goldfire
Operating System: Unknown/Multiple Architecture: Unknown/Multiple
Type of failure: GHC rejects valid program Test Case: tests/th/T12503
Blocked By: Blocking:
Related Tickets: Differential Rev(s): Phab:D3022
Wiki Page:

Description

The following program compiles without issue on GHC 7.6.3 through GHC 7.10.3, but fails to compile on GHC 8.0.1 and GHC HEAD. (I added a CPP guard simply because the DataD constructor changed between 7.10 and 8.0.)

{-# LANGUAGE CPP #-}
{-# LANGUAGE KindSignatures #-}
{-# LANGUAGE PolyKinds #-}
{-# LANGUAGE TemplateHaskell #-}
{-# OPTIONS_GHC -ddump-splices #-}
module Regression where

import Language.Haskell.TH

data T k
class C a

$(do TyConI (DataD [] tName [ KindedTV kName kKind]
#if __GLASGOW_HASKELL__ >= 800
                   _
#endif
                   _ _) <- reify ''T
     d <- instanceD (cxt []) (conT ''C `appT` (conT tName `appT` sigT (varT kName) kKind)) []
     return [d])
$ /opt/ghc/8.0.1/bin/ghc Regression.hs 
[1 of 1] Compiling Regression       ( Regression.hs, Regression.o )
Regression.hs:(13,3)-(19,15): Splicing declarations
    do { TyConI (DataD []
                       tName_a2RT
                       [KindedTV kName_a2RU kKind_a2RV]
                       _
                       _
                       _) <- reify ''T;
         d_a31Y <- instanceD
                     (cxt [])
                     (conT ''C
                      `appT` (conT tName_a2RT `appT` sigT (varT kName_a2RU) kKind_a2RV))
                     [];
         return [d_a31Y] }
  ======>
    instance C (T (k_avB :: k_avC))

Regression.hs:13:3: error:
    Variable ‘k_avB’ used as both a kind and a type
    Did you intend to use TypeInType?

Somewhat confusingly, you can use instance C (T (k_avB :: k_avC)) on its own, and it will compile without issue. Also, quoting it doesn't seem to trip up this bug, as this also compiles without issue:

{-# LANGUAGE CPP #-}
{-# LANGUAGE KindSignatures #-}
{-# LANGUAGE PolyKinds #-}
{-# LANGUAGE TemplateHaskell #-}
{-# OPTIONS_GHC -ddump-splices #-}
module WorksSomehow where

import Language.Haskell.TH

data T k
class C a

$([d| instance C (T k) |])

The original program has several workarounds:

  1. Turn off PolyKinds (OK, this isn't a very good workaround...)
  2. Add an explicit kind signature to T, e.g., data T (k :: k1)
  3. Change the type variable of T to some other letter, e.g., data T p or data T k1

Given that (3) is a successful workaround, I strongly suspect that GHC is confusing the type variable k (which gets ddump-spliced as k_avB) with the kind variable k (which gets ddump-spliced as k_avC) due to their common prefix k.

Change History (12)

comment:1 Changed 13 months ago by RyanGlScott

The above program might seem somewhat contrived, but I actually ran into this problem in real code. The generic-deriving library does something very similar to derive Generic1 instances using Template Haskell (that is, it re-uses the kind information it gets from reify). Here's some code that triggers the same error:

{-# LANGUAGE DataKinds #-}
{-# LANGUAGE PackageImports #-}
{-# LANGUAGE PolyKinds #-}
{-# LANGUAGE TemplateHaskell #-}
{-# LANGUAGE TypeFamilies #-}
{-# OPTIONS_GHC -ddump-splices #-}
module Regression where

-- Using generic-deriving-1.11
import "generic-deriving" Generics.Deriving.TH

data T k a
$(deriveAll1 ''T)
$ /opt/ghc/8.0.1/bin/ghc Regression.hs 
[1 of 1] Compiling Regression       ( Regression.hs, Regression.o )
Regression.hs:13:3-16: Splicing declarations
    deriveAll1 ''T
  ======>
    instance GHC.Generics.Generic1 (T (k_avv :: k_avx) :: GHC.Types.Type
                                                          -> GHC.Types.Type) where
      type GHC.Generics.Rep1 (T (k_avv :: k_avx) :: GHC.Types.Type
                                                    -> GHC.Types.Type) = GHC.Generics.D1 (GHC.Generics.MetaData "T" "Regression" "main" False) GHC.Generics.V1
      GHC.Generics.from1
        = \ val_a3ph
            -> case val_a3ph of {
                 y_a3pi
                   -> GHC.Generics.M1
                        (case y_a3pi of {
                           _ -> error "No generic representation for empty datatype T" }) }
      GHC.Generics.to1
        = \ val_a3pj
            -> case val_a3pj of {
                 GHC.Generics.M1 y_a3pk
                   -> case y_a3pk of { _ -> error "No values for empty datatype T" } }

Regression.hs:13:3: error:
    Variable ‘k_avv’ used as both a kind and a type
    Did you intend to use TypeInType?

Regression.hs:13:3: error:
    Variable ‘k_avv’ used as both a kind and a type
    Did you intend to use TypeInType?

comment:2 Changed 13 months ago by RyanGlScott

It should be noted that you don't need a type/kind variable explicitly named k to trigger this bug. You can also trigger it with an arbitrarily named variable like so:

{-# LANGUAGE CPP #-}
{-# LANGUAGE KindSignatures #-}
{-# LANGUAGE PolyKinds #-}
{-# LANGUAGE TemplateHaskell #-}
{-# LANGUAGE TypeFamilies #-}
{-# OPTIONS_GHC -ddump-splices #-}
module Regression2 where

import Language.Haskell.TH

data family T (a :: b)
data instance T b
class C a

$(do FamilyI
#if __GLASGOW_HASKELL__ >= 800
       (DataFamilyD tName _ _)
#else
       (FamilyD _ tName _ _)
#endif
       [DataInstD [] _ [tyVar]
#if __GLASGOW_HASKELL__ >= 800
          _
#endif
          _ _] <- reify ''T
     d <- instanceD (cxt []) (conT ''C `appT` (conT tName `appT` return tyVar)) []
     return [d])
$ /opt/ghc/8.0.1/bin/ghc Regression2.hs
[1 of 1] Compiling Regression2      ( Regression2.hs, Regression2.o )
Regression2.hs:(15,3)-(27,15): Splicing declarations
    do { FamilyI (DataFamilyD tName_a2RY _ _)
                 [DataInstD [] _ [tyVar_a2RZ] _ _ _] <- reify ''T;
         d_a322 <- instanceD
                     (cxt [])
                     (conT ''C `appT` (conT tName_a2RY `appT` return tyVar_a2RZ))
                     [];
         return [d_a322] }
  ======>
    instance C (T (b_avD :: b_avO))

Regression2.hs:15:3: error:
    Variable ‘b_avD’ used as both a kind and a type
    Did you intend to use TypeInType?

comment:3 Changed 13 months ago by RyanGlScott

As before, Regression2 does in fact compile with GHC 7.10.3:

$ /opt/ghc/7.10.3/bin/ghc Regression2.hs
[1 of 1] Compiling Regression2      ( Regression2.hs, Regression2.o )
Regression2.hs:(15,3)-(27,15): Splicing declarations
    do { FamilyI (FamilyD _ tName_a39r _ _)
                 [DataInstD [] _ [tyVar_a39s] _ _] <- reify ''T;
         d_a3id <- instanceD
                     (cxt [])
                     (conT ''C `appT` (conT tName_a39r `appT` return tyVar_a39s))
                     [];
         return [d_a3id] }
  ======>
    instance C (T (b_apE :: k_apP))

There's something interesting to note here, as in GHC 8.0.1, it tries to splice:

instance C (T (b_avD :: b_avO))

But in GHC 7.10.3, it splices this:

instance C (T (b_apE :: k_apP))

Notice that the kind variable isn't b at all, but rather an inferred k! So it looks like there was a change in behavior at some point.

comment:4 Changed 10 months ago by RyanGlScott

Milestone: 8.2.1

comment:5 Changed 9 months ago by bgamari

Richard or Ryan, could one of you pick this up? I'm trying to find owners for all of the 8.2 tickets in high or highest state.

comment:6 Changed 9 months ago by goldfire

Owner: set to goldfire

comment:7 Changed 8 months ago by RyanGlScott

I did some cursory debugging of this tonight. Here are my findings

  1. This bug was definitely introduced in 6746549772c5cc0ac66c0fce562f297f4d4b80a2. (Add kind equalities to GHC.) Not the most surprising turn of events, I know, but still nice to have confirmed.
  2. The lines of code that actually produce this error are in RnTypes:
extract_tv :: TypeOrKind -> Located RdrName -> FreeKiTyVars -> RnM FreeKiTyVars
extract_tv t_or_k ltv@(L _ tv) acc
  | isRdrTyVar tv = case acc of
      FKTV kvs k_set tvs t_set all
        |  isTypeLevel t_or_k
        -> do { when (occ `elemOccSet` k_set) $
                mixedVarsErr ltv
              ; return (FKTV kvs k_set (ltv : tvs) (t_set `extendOccSet` occ)
                             (ltv : all)) }
        |  otherwise
        -> do { when (occ `elemOccSet` t_set) $
                mixedVarsErr ltv
              ; return (FKTV (ltv : kvs) (k_set `extendOccSet` occ) tvs t_set
                             (ltv : all)) }
  | otherwise     = return acc
  where
    occ = rdrNameOcc tv

where mixedVarsErr ltv actually throws the "used as both a kind and a type" error.

I decided to do some pprTrace debugging in this neighborhood, and I discovered that when you run the program above, the error gets thrown in the isTypeLevel t_or_k case. Here's some values at the moment of the error:

RGS typelevel
  kvs                 : [k_a1ly]
  k_set               : [vESeYM :-> k]
  tvs                 : []
  t_set               : []
  all                 : [k_a1ly]
  tv                  : k_a1lx
  rdrNameOcc tv (occ) : k

What does this mean? I don't yet—I'll need to do some more digging.

comment:8 Changed 8 months ago by RyanGlScott

I think I know what's happening here. The kind equalities patch changed FreeKiTyVars from this:

type FreeKiTyVars = ([RdrName], [RdrName])

to this:

data FreeKiTyVars = FKTV { fktv_kis    :: [Located RdrName]
                         , _fktv_k_set :: OccSet  -- for efficiency,
                                                  -- only used internally
                         , fktv_tys    :: [Located RdrName]
                         , _fktv_t_set :: OccSet
                         , fktv_all    :: [Located RdrName] }

Now FreeKiTyVars is using two OccSets. (One each for the kind and type variables extracted so far? That's what I'm guessing, since Note [Kind and type-variable binders] is out of date.)

The problem is that two exact RdrNames reified from Template Haskell can have the same underlying Unique but with different Names (see Note [Local bindings with Exact Names]). That is what is shown in comment:7 — two exact RdrNames (k) have the same Unique (vESeYM), but different Names (a1lx and a1ly).

But as far as fixing this goes, I am a bit clueless. I suppose we could add NameSets to FreeKiTyVars as well that track only Exact RdrNames, and if lookupLocalRdrEnv finds a hit for an RdrName, we could use the corresponding NameSet. Otherwise, fall back to the OccSet.

Thoughts, Richard?

Last edited 8 months ago by RyanGlScott (previous) (diff)

comment:9 Changed 8 months ago by goldfire

Yuck yuck yuck. The problem is that this code assumes that two variables with the same OccName are the same. This will always be true in user-written code. But it's not true with Exact names.

Ryan's suggestion to fix this makes some sense, but it's very heavy. I will propose something altogether radical: skip the check for Exact names. This check serves but one purpose: to allow people who want -XPolyKinds but are skittish about -XTypeInType to know that they have wandered into -XTypeInType-land. I foresee a future where -XTypeInType and -XPolyKinds become synonymous (not unlike how -XRankNTypes and -XRank2Types are synonymous) and this whole check can be dropped.

Template Haskell already is somewhat anarchic about when it requires extensions. This choice would contribute to the anarchy, but in a very small way. And it's very easy to just skip the check!

To be clear: the only people this hurts are people who want to avoid -XTypeInType while still using Template Haskell and who happen to use TH to generate some code that is accepted only with -XTypeInType. (And who like the letter k.) I argue that this set of people is small.

comment:10 Changed 8 months ago by RyanGlScott

Differential Rev(s): Phab:D3022
Status: newpatch

Richard, that proposal sounds even better, so I went with that approach in Phab:D3022.

comment:11 Changed 7 months ago by Ryan Scott <ryan.gl.scott@…>

In 283a3465/ghc:

Prevent Template Haskell splices from throwing a spurious TypeInType error

Summary:
There was a rather annoying corner case where splicing poly-kinded
Template Haskell declarations could trigger an error muttering about
`TypeInType` not being enabled, whereas the equivalent non-TH code would
compile without issue. This was causing by overzealous validity check in the
renamer, wherein failed to distinguish between two different `Exact` names
with the same `OccName`. As a result, it mistakenly believed some type
variables were being used as both type and kind variables simultaneously! Ack.

This avoids the issue by simply disabling the aforementioned validity check
for Exact names. Fixes #12503.

Test Plan: ./validate

Reviewers: austin, bgamari, goldfire

Subscribers: thomie

Differential Revision: https://phabricator.haskell.org/D3022

comment:12 Changed 7 months ago by RyanGlScott

Resolution: fixed
Status: patchclosed
Test Case: tests/th/T12503
Note: See TracTickets for help on using tickets.