Opened 5 months ago

Closed 5 months ago

Last modified 6 weeks ago

#13848 closed bug (duplicate)

Unexpected order of variable quantification with GADT constructor

Reported by: RyanGlScott Owned by:
Priority: normal Milestone:
Component: Compiler (Type checker) Version: 8.0.1
Keywords: TypeApplications Cc:
Operating System: Unknown/Multiple Architecture: Unknown/Multiple
Type of failure: None/Unknown Test Case:
Blocked By: Blocking:
Related Tickets: #11721 Differential Rev(s):
Wiki Page:

Description

Here's some code:

{-# LANGUAGE DataKinds #-}
{-# LANGUAGE ExistentialQuantification #-}
{-# LANGUAGE GADTs #-}
{-# LANGUAGE KindSignatures #-}
{-# LANGUAGE ScopedTypeVariables #-}
module Foo where

data N = Z | S N

data Vec (n :: N) a where
  VNil  :: forall a. Vec Z a
  VCons :: forall n a. a -> Vec n a -> Vec (S n) a

I want to use TypeApplications on VCons. I tried doing so in GHCi:

GHCi, version 8.2.0.20170523: http://www.haskell.org/ghc/  :? for help
Loaded GHCi configuration from /home/rgscott/.ghci
[1 of 1] Compiling Foo              ( Foo.hs, interpreted )
Ok, modules loaded: Foo.
λ> :set -XTypeApplications -XDataKinds
λ> :t VCons @Z @Int 1 VNil

<interactive>:1:8: error:
    • Expected a type, but ‘'Z’ has kind ‘N’
    • In the type ‘Z’
      In the expression: VCons @Z @Int 1 VNil

<interactive>:1:11: error:
    • Expected kind ‘N’, but ‘Int’ has kind ‘*’
    • In the type ‘Int’
      In the expression: VCons @Z @Int 1 VNil

<interactive>:1:17: error:
    • Couldn't match type ‘'Z’ with ‘Int’
      Expected type: Vec Int 'Z
        Actual type: Vec 'Z 'Z
    • In the fourth argument of ‘VCons’, namely ‘VNil’
      In the expression: VCons @Z @Int 1 VNil

Huh? That's strange, I would have expected the first type application to be of kind N, and the second to be of kind *. But GHC disagrees!

λ> :set -fprint-explicit-foralls 
λ> :type +v VCons
VCons :: forall a (n :: N). a -> Vec n a -> Vec ('S n) a

That's downright unintuitive to me, since I explicitly specified the order in which the quantified variables should appear in the type signature for VCons.

Similarly, if you leave out the foralls:

data Vec (n :: N) a where
  VNil  :: Vec Z a
  VCons :: a -> Vec n a -> Vec (S n) a

Then :type +v will also report the same quantified variable order for VCons. This is perhaps less surprising, since the n and a in data Vec (n :: N) a don't scope over the constructors, so GHC must infer the topological order in which the variables appear in each constructor. But I would certainly not expect GHC to do this if I manually specify the order with forall.

Change History (9)

comment:1 Changed 5 months ago by RyanGlScott

Above, I implied that GHC was inferring the topological order for a and n in the type signature for VCons even with an explicit forall. I just realized there's more to it after reading Note [mkGADTVars]. The relevant excerpt:

Note [mkGADTVars]
~~~~~~~~~~~~~~~~~

Running example:

data T (k1 :: *) (k2 :: *) (a :: k2) (b :: k2) where
  MkT :: T x1 * (Proxy (y :: x1), z) z

We need the rejigged type to be

  MkT :: forall (x1 :: *) (k2 :: *) (a :: k2) (b :: k2).
         forall (y :: x1) (z :: *).
         (k2 ~ *, a ~ (Proxy x1 y, z), b ~ z)
      => T x1 k2 a b

You might naively expect that z should become a universal tyvar,
not an existential. (After all, x1 becomes a universal tyvar.)
The problem is that the universal tyvars must have exactly the
same kinds as the tyConTyVars. z has kind * while b has kind k2.
So we need an existential tyvar and a heterogeneous equality
constraint.

So what's really going on is that GHC is putting all of the universally quantified variables (a) before the existentially quantified ones (n).

Still, I still think that GHC ought to adhere to what the user wrote if they bother to write an explicit forall, because otherwise you have to trace out the variable order that GHC happens to infer.

comment:2 Changed 5 months ago by RyanGlScott

The question remains: how could we preserve the order in which the user wrote the foralld variables? The assumption that they should be partitioned into separate universal and existential groups is baked pretty deeply into the GHC AST, as DataCon has separate dcUnivTyVars and dcExTyVars fields, and never the twain shall meet.

One possible solution is to introduce a dcOrigTyVars :: Maybe [TyVar] field, which is Just the foralld variables (in whatever order the user writes) or Nothing otherwise. GHC could then use this information when computing dataConUserType, which I believe GHC uses for determining the order in which TypeApplications happen.

comment:3 Changed 5 months ago by simonpj

Hmm. Fortunately we already have just the hook we need.

Every interesting data constructor (including every GADT data con) has a wrapper; it is built by MkId.mkDataConRep. Look at wrapper_reqd to see which data cons have wrappers.

The wrapper mediates between the nice user view of the data constructor and the internal "representation data constructor" that GHC uses internally. So for example

  data T a where
    MkT :: forall p q. p -> q -> T (p,q)

The representation data con has this type

  MkT :: forall r. forall p q. (r ~ (p,q)) => p -> q -> T r

But the wrapper is defined like this

  $WMkT :: forall p q. p -> q -> T (p,q)
  $WMkT = /\p q. \(x::p) (y::q).
          MkT @(p,q) @p @q <(p,q)> x y

The <(p,q)> is a coercion argument (refl) witnessing (p,q)~(p,q)

Now currently dataConUserType (which claims to show the user-written type of the data con) is thus (in DataCon):

dataConUserType (MkData { dcUnivTyVars = univ_tvs,
                          dcExTyVars = ex_tvs, dcEqSpec = eq_spec,
                          dcOtherTheta = theta, dcOrigArgTys = arg_tys,
                          dcOrigResTy = res_ty })
  = mkForAllTys (filterEqSpec eq_spec univ_tvs) $
    mkForAllTys ex_tvs $
    mkFunTys theta $
    mkFunTys arg_tys $
    res_ty

This filterEqSpec business is fishy! But now we understand about wrappers, for data constructors that have wrappers we could insead use the type of the wrapper, thus:

dataConUserTYpe (MkData { dcRep = rep
                        , dcRepType = rep_ty }
  = case rep of
      DCR { dcr_wrap_id = wrap_id } -> idType wrap_id
      NoDataConRep                  -> rep_ty
        -- If there is no wrapper, the "rep-type" is the same 
        -- as the "user type"

Whizzo.

Now, to return to the ticket,

  • We should ensure that the wrapper type reflects exactly the type the user wrote including type variable order
  • That in turn may force to make a wrapper when we don't right now, just to swizzle round the type variables

Make sense?

comment:4 Changed 5 months ago by RyanGlScott

I don't think it's going to be that simple. If you look at the implementation of mkDataConRep, you'll notice that it figures out the type of the wrap_id like this:

wrap_ty = dataConUserType data_con

So obviously we don't want to change dataConUserType's implementation to use idType wrap_id, since that would be a circular definition.

We should ensure that the wrapper type reflects exactly the type the user wrote including type variable order

Sure, but that's easier said than done. The only info you have in mkDataConId to determine the wrapper type is the DataCon itself, and at that point, the type variables have already been carved up into the universal and existential ones, with no way to recover the original order. So I don't see any way to make this work without having a separate dcOrigTyVars fields, as proposed in comment:2.

comment:5 Changed 5 months ago by simonpj

Well it would take a bit more plumbing, I agree. You'd need to take the qtkvs returned by tcGadtSigType in TcTyClsDecls.tcConDecl and pass it on to buildDataCon, which can in turn give them to mkDataConRep to use instead of calling dataConUserType.

I'm not certain that's all, but I think so.

comment:6 Changed 5 months ago by RyanGlScott

OK, I think I'm starting to see the picture more clearly. However, some of the details are still hazy to me. In particular, tcConDecl is far from the only place that uses buildDataCon to construct a DataCon. However, as far as I can tell, tcConDecl is the only place that knows exactly what order the user wrote the constructor's type variables in.

This is important because we might write a GADT to an interface file and read it back in tyConToIfaceDecl, which also uses buildDataCon. However, we've lost the order of the type variables used in the wrapper type by that point, since IfaceData doesn't store the wrapper type!

This suggests to me that we need to beef up IfaceData to accommodate this change. Should we be storing the wrapper type as a field of IfaceData as well? Or am I barking up the wrong tree?

comment:7 Changed 5 months ago by simonpj

Yes you are right. So yes, we have to put more info in the IfaceConDecl.

Putting in the wrapper type doesn't feel right -- duplicates too much.

Better to put in the original tyvars.

And in that case we may as well record them in the DataCon too, as you suggested earlier. Then we wouldn't need to pass them to mkDataConRep after all, because they'll be in the DataCon.

One fiddly thing I'm not sure about: in the declaration of IfaceConDecl we see

data IfaceConDecl
  = IfCon {
        ifConName    :: IfaceTopBndr,                -- Constructor name
        ifConWrapper :: Bool,                   -- True <=> has a wrapper
        ifConInfix   :: Bool,                   -- True <=> declared infix

        -- The universal type variables are precisely those
        -- of the type constructor of this data constructor
        -- This is *easy* to guarantee when creating the IfCon
        -- but it's not so easy for the original TyCon/DataCon
        -- So this guarantee holds for IfaceConDecl, but *not* for DataCon

Reasoning is explained in MkIface where we convert a DataCon to a ConDecl

          -- Tidy the univ_tvs of the data constructor to be identical
          -- to the tyConTyVars of the type constructor.  This means
          -- (a) we don't need to redundantly put them into the interface file
          -- (b) when pretty-printing an Iface data declaration in H98-style syntax,
          --     we know that the type variables will line up
          -- The latter (b) is important because we pretty-print type constructors
          -- by converting to IfaceSyn and pretty-printing that
          con_env1 = (fst tc_env1, mkVarEnv (zipEqual "ifaceConDecl" univ_tvs tc_tyvars))
                     -- A bit grimy, perhaps, but it's simple!

Meddling with IfaceConDecl is not a big deal... it's just a serialisation format, and changes are very localised. So feel free to suggest what to do.

comment:8 Changed 5 months ago by RyanGlScott

Resolution: duplicate
Status: newclosed

Oh dear, I just realized this is a duplicate of #11721. Closing in favor of that ticket.

comment:9 Changed 6 weeks ago by Ben Gamari <ben@…>

In ef26182/ghc:

Track the order of user-written tyvars in DataCon

After typechecking a data constructor's type signature, its type
variables are partitioned into two distinct groups: the universally
quantified type variables and the existentially quantified type
variables. Then, when prompted for the type of the data constructor,
GHC gives this:

```lang=haskell
MkT :: forall <univs> <exis>. (...)
```

For H98-style datatypes, this is a fine thing to do. But for GADTs,
this can sometimes produce undesired results with respect to
`TypeApplications`. For instance, consider this datatype:

```lang=haskell
data T a where
  MkT :: forall b a. b -> T a
```

Here, the user clearly intended to have `b` be available for visible
type application before `a`. That is, the user would expect
`MkT @Int @Char` to be of type `Int -> T Char`, //not//
`Char -> T Int`. But alas, up until now that was not how GHC
operated—regardless of the order in which the user actually wrote
the tyvars, GHC would give `MkT` the type:

```lang=haskell
MkT :: forall a b. b -> T a
```

Since `a` is universal and `b` is existential. This makes predicting
what order to use for `TypeApplications` quite annoying, as
demonstrated in #11721 and #13848.

This patch cures the problem by tracking more carefully the order in
which a user writes type variables in data constructor type
signatures, either explicitly (with a `forall`) or implicitly
(without a `forall`, in which case the order is inferred). This is
accomplished by adding a new field `dcUserTyVars` to `DataCon`, which
is a subset of `dcUnivTyVars` and `dcExTyVars` that is permuted to
the order in which the user wrote them. For more details, refer to
`Note [DataCon user type variables]` in `DataCon.hs`.

An interesting consequence of this design is that more data
constructors require wrappers. This is because the workers always
expect the first arguments to be the universal tyvars followed by the
existential tyvars, so when the user writes the tyvars in a different
order, a wrapper type is needed to swizzle the tyvars around to match
the order that the worker expects. For more details, refer to
`Note [Data con wrappers and GADT syntax]` in `MkId.hs`.

Test Plan: ./validate

Reviewers: austin, goldfire, bgamari, simonpj

Reviewed By: goldfire, simonpj

Subscribers: ezyang, goldfire, rwbarton, thomie

GHC Trac Issues: #11721, #13848

Differential Revision: https://phabricator.haskell.org/D3687
Note: See TracTickets for help on using tickets.