Opened 3 years ago

Closed 3 years ago

Last modified 2 years ago

#5375 closed bug (wontfix)

Regression in newName

Reported by: reinerp Owned by:
Priority: highest Milestone: 7.2.1
Component: Template Haskell Version: 7.3
Keywords: Cc: kfisher@…, wasserman.louis@…
Operating System: Unknown/Multiple Architecture: Unknown/Multiple
Type of failure: GHC rejects valid program Difficulty:
Test Case: Blocked By:
Blocking: Related Tickets:

Description

Template Haskell's newName no longer works properly, with GHC HEAD. Consider:

{-# LANGUAGE TemplateHaskell #-}

module B where

import Language.Haskell.TH

data D = D

-- declare 'data E = D' with a fresh name 'D'
$(do
    nm <- newName "D"
    return $ [DataD [] (mkName "E") [] [NormalC nm []] []]
 )

With ghc-7.0.3, this compiles without error, as it should. However, with ghc-7.3.20110803, we get:

B.hs:8:3:
    Multiple declarations of `D'
    Declared at: B.hs:7:10
                 B.hs:8:3

which is wrong.

The following might also be related. Given:

{-# LANGUAGE TemplateHaskell #-}

module C where

import Language.Haskell.TH

g = $(do 
        nm <- newName "f"
        return $ VarE nm
     )

with ghc-7.0.3 we get the correct error message

C.hs:7:7:
    Not in scope: `f[aK5]'
    In the result of the splice:
      $(do { nm <- newName "f";
               return $ VarE nm })
    To see what the splice expanded to, use -ddump-splices
    In the expression:
      $(do { nm <- newName "f";
               return $ VarE nm })
    In an equation for `g':
        g = $(do { nm <- newName "f";
                     return $ VarE nm })

but with ghc-7.3.20110803 we get a GHC internal error message:

C.hs:7:7:
    GHC internal error: `f_aOw' is not in scope during type checking, but it passed the renamer
    tcg_type_env of environment: []
    tcl_env of environment: [(rgQ, Identifier[g::t_a, 1])]
    In the expression: f_aOw
    In the result of the splice:
      $(do { nm <- newName "f";
             return $ VarE nm })
    To see what the splice expanded to, use -ddump-splices
    In the expression:
      $(do { nm <- newName "f";
             return $ VarE nm })

Change History (16)

comment:1 Changed 3 years ago by simonpj

Aha. This is a consequence of fixing another bug #5037! (See also Note [Binders in Template Haskell] in source:compiler/hsSyn/Convert.lhs.) Consider this:

x = 4
y = $(do { x <- newName "x"; return (LamE (VarP x) (VarE (mkName "x"))) })

So the binder is made with newName, but the occurrence is made with mkName (the "dynamic binding" bit of TH). Clearly, I think, the occurrence should should bind to the lambda, not to the top-level x. So the defns should end up thus:

-- Like this
   x = 4
   y = \x -> x

-- NOT like this
   x = 4
   y = \x' -> x    

Similarly, what do you think should happen here:

$( do { x <- newName "x"; return (ValD (VarP x) (LitE 3)) } )
v = x

Current the newName expands to something that captures the x below the splice:

-- We get this
  x = 3
  v = x

-- NOT this
  x' = 3
  v = x

Remember too that the newName stuff is what happens behind the scences with a quotation like [d| x = 3 |].

For your data type example, suppose we had

-- declare 'data E = D' with a fresh name 'D'
$(do
    nm <- newName "D"
    return $ [DataD [] (mkName "E") [] [NormalC nm []] []] )

f :: E
f = D

Wouldn't you expect the D in the defn of F to bind to the D declared by the spliced data type decl? Just as if you'd written

data E = D
f :: E
f = D

Remember, again, that quotation uses newName for binders, and you'd definitely expect this to expand to the above code:

$( [d| data E = D |] )
f :: E
f = D

In short, it seems to me that the "multiple definitions" thing is right. But I accept that:

  • There is therefore no way to splice in a data type decl that is guaranteed not to clash with any existing one. That's bad. (In the case of terms, lambdas etc simply shadow existing defns, so the problem doesn't arise.) Maybe we need a way to make a fresh name that can be referred to only by knowing the name, and not by mkName or by open Haskell code?
  • The "internal error" message from the type checker is terrible.

comment:2 Changed 3 years ago by reinerp

Interesting. I've been using newName as you said at the end of your comment: to make a fresh name which can't possibly be captured anywhere except by use of the produced Name, and as far as I can tell, this is exactly how newName has behaved until after ghc-7.0.3. So:

  • I would definitely like some function which produces genuinely uncapturable names.
  • Given that newName has produced genuinely uncapturable names up to and including ghc-7.0.3, I would suggest keeping it this way. If the "partial capturing" behaviour of ghc HEAD's newName is really necessary (I don't think it is; see below), then I would at least suggest implementing it under a new function name, to avoid changing the semantics of newName.

Examples

As I said, however, I'm not convinced that a construct supporting "partial capturing" is actually necessary. On to the specifics of your examples! I'll talk about the examples without quotations [| ... |] first, and then I'll talk about quotations.

x = 4
y = $(do { x <- newName "x"; return (LamE (VarP x) (VarE (mkName "x"))) })

I would have said we produce

x = 4
y = \x' -> x

If you want to produce what you suggested, why not just write

x = 4
y = $(return (LamE (VarP (mkName "x")) (VarE (mkName "x"))) })

Similarly, with

$( do { x <- newName "x"; return (ValD (VarP x) (LitE 3)) } )
v = x

I would expect

x' = 3
v = x

Otherwise, why not use mkName? Similarly, I'd expect

-- declare 'data E = D' with a fresh name 'D'
$(do
    nm <- newName "D"
    return $ [DataD [] (mkName "E") [] [NormalC nm []] []] )

f :: E
f = D

to produce:

data E = D'
f = D

But I definitely agree with you on your examples involving quotations. So, for instance, this:

$( [d| data E = D |] )
f :: E
f = D

should compile successfully (and indeed it does in ghc-7.0.3), as should the example in #5037.

I think the way to achieve the correct behaviour for quotations is simply to use mkName in the desugarer, rather than using newName.

My suggestion

In summary, I think we should:

  • revert the recent changes to newName
  • make the desugarer use mkName instead of newName

Let me clarify what I mean when I say "use mkName" in the desugarer. Given this:

import Prelude(map)
g = [| map (\x -> x) |]

the name map should be bound to Prelude.map, whereas both occurrences of the name x should be created using mkName. That is, the quotation should desugar to:

g = AppE (VarE ''map) (LamE [VarP (mkName "x")] (VarE (mkName "x")))

Here's the rule. When encountering a name in a quotation:

  • if the binding for that name is outside the quotation (like map above), use a NameL or NameG as appropriate
  • if the binding for that name is inside the quotation, use mkName (i.e. a NameS)
  • if a binding for that name can't be found, produce an error (this is what ghc currently does)

Note that what I suggest should leave almost all Template Haskell programs with the same semantics, and allow a few more (such as the one in #5037) to compile. Even quotations relying on shadowing, such as

f = $( [| \x -> (\x -> x) |] )

should behave the same as before: the only difference is that the shadowing is resolved after splicing rather than during desugaring of the quotation.

comment:3 Changed 3 years ago by reinerp

Hmm, that doesn't seem to work. Here's an example I didn't think of before:

f = $( [| \x -> $( LamE [VarP (mkName "x")] (VarE 'x) ) |] )

In this case, we would like to produce:

f = \x' -> \x -> x'

But with my suggestion, this wouldn't be possible: under my suggestion, the only possible desugaring of the 'x would be mkName "x", which is not what we want. I have to think some more about this.

comment:4 Changed 3 years ago by simonpj

Indeed. And the classic example from every meta-programming paper: look at gen in Section 2 of the original TH paper. We write things like

f v = ... [| \x -> ...$(f [| x |])... |] ...

and the 'x' passed to f (insde the quote) must be indissolubly linked to the 'x' created by the lambda. Note that the invocation of f inside the quote may itself make a new \x and so on (this happens in gen). These lambdas must create fresh names; hence newName.

Simon

comment:5 Changed 3 years ago by reinerp

I see. Incidentally, when playing around with this, I discovered a related bug, which I've reported: #5379.

So it seems like we need three ways of making names:

  • mkName, as it currently behaves
  • newName, as it behaved in ghc-7.0.3 and before: there is absolutely no way of names it produces being accidentally captured
  • newName, as it behaves in ghc HEAD.

As I said in my previous comment, would it at least be possible to give the third option a different name, to avoid changing the semantics of newName?

Reiner

comment:6 Changed 3 years ago by igloo

  • Milestone set to 7.2.1
  • Priority changed from normal to highest

This also affects 7.2. Do we know how many packages/users are affected by this?

comment:7 Changed 3 years ago by simonpj

  • Cc kfisher@… added

I'm quite reluctant to introduce a third option on the spur of the moment. The intended semantics of newName has not changed; it simply had a bug before. The fact that quotation syntax is syntactic sugar for newName was, for example, stated explicitly in the TH2 note (section 4.2). The distinction between newName and mkName is already subtle enough, without adding a third variant to explain.

Also, the "totally fresh" semantics you want has narrow usefulness, because it can never be referred to anywhere outside that immediate quotation. You data constructor D, for example, could not be referred to from anywhere else (and could not be exported from the module) under the semantics you propose.

Really the only significant shortcoming of the status quo is that you can't make a data type that is local to a single splice and is totally invisible outside it. (Actually you can get close by using a long name or a random number, but those are really only approximations.)

Before doing the Real Work it would take (eg adding a new constructor to TH.Syntax.Name, I'd need to be convinced that it was really all worth it.

Does anyone else have an opinion? I'm adding Kathleen to the cc in case it rings a bell with her.

comment:8 Changed 3 years ago by reinerp

I should point out another (perhaps more common) need for the "totally fresh" semantics. When I write a quasiquoter which antiquotes to Haskell, for instance for interpolated strings (see http://hackage.haskell.org/package/interpolatedstring-perl6-0.8.1), consider the following:

f x = [qc|The value of x is {x}|]

(here antiquotation is delimited by {}s.) In this case, I want the antiquoted x to refer to f's parameter. My quasiquotation will evaluate to a syntax tree as follows:

f x = $( ... (VarE (mkName "x")) ... )

What happens if I want to bind a variable somewhere in the syntax tree? I might for instance want to produce the following:

f x = let theString = "The value of x is "
      in theString ++ show x

In this case the name theString should be totally fresh: it's an implementation detail of the quasiquoter, and we don't want its binding to capture the variable x. So in this case we want:

f x = 
  $(do
      theString <- newName "theString"  -- old semantics are needed here
      return $ LetE [ValD (VarP theString) (NormalB (LitE "The value of x is")) []]
                    (InfixE (VarE theString) '(++) (VarE (mkName "x")))
  )

Cheers,
Reiner

comment:9 Changed 3 years ago by simonpj

I think that's the wrong approach. It places a global burden on every binding enclosing the occurrence of "x".

But the problem is not with those bindings; rather, it's with the occurrence of x. The quasiquoter should not use mkName to generate a string-name for x -- as you yourself pointed out in #4429. Rather it should use the proper Name of x! Just like 'x in Haskell code

So how can a quasiquoter get hold of the Name for x? That's a whole separate question, which we have open in #4429. If we had the lookupValue described there, the quasiquoter could use it to look x up.

Now, I grant that #4429 is languishing unloved, but I think that's the right fix because it adds a simple-to-understand facility that may be independently useful, rather than making a regrettably-complex situation more complex still.

comment:10 Changed 3 years ago by reinerp

So how do quasiquoters handle shadowing? For instance, what should we do when we encounter the following:

f x = [some_quasi| ... (\x -> x)...|]

This should become

f x = ... (\x' -> x') ...

But I don't see how this would work if we used lookupName on all the occurrences of x in the quasiquote.

Put in a different way, what would happen if I wrote this:

f x = 
 $(do
     x' <- lookupName "x"
     return $ LamE [VarP x'] (VarE x')
  )

comment:11 Changed 3 years ago by simonpj

Quasiquotation, by definition, puts all the control in the hands of the author of the quasiquoter. All TH can hope to do is to make facilities available that let them do what they want. lookupName is such a facility. I can't answer your first example without knowing precisely what the quasiquoter is trying to do.

For your second example, you are doing a Bad Thing. It's precisely the same Bad Thing as this:

f x = return (LamE (VarP 'x) (VarE 'x))

Here the 'x is the Name of the x that is f's argument. You should not put such a Name in a binding position. (I forget whether TH will emit an error message if you do so, but it Jolly Well Should!)

comment:12 Changed 3 years ago by reinerp

Okay, consider the interpolated-string quasiquoter from before. Given:

f x = [qc|The value is { (x, let x = 3 in x) } |]

I would like to produce:

f x = let theString = "The value is "
      in theString ++ show (x, let x' = 3 in x')

It seems that I need to use lookupName for the x in the fst of the tuple (to make sure that it doesn't get captured by the theString binding, for example), and I need to use mkName for the occurrences of x in the snd of the tuple.

This is possible to do, certainly, but it means my quasiquoter has keep in mind the scoping rules of Haskell. Or is there a simpler way?

comment:13 Changed 3 years ago by simonpj

Yes, of course the quasiquoter needs to understand the scoping rules of the language it is parsing. In some weird language it might be that in (x, let x=3 in x) the first x is bound by the let x=3. It depends on the language!

Now, if you want to embed chunks of Haskell, you need a Haskell parser. I'm not sure what Kathleen uses, maybe haskell-src-exts or something. Then you need to convert that parser's output to TH syntax, and when you do that then, yes, you need to take care with scopes. That is how you express the semantics of the quasiquoted language, which just happens to be Haskell in this case.

comment:14 Changed 3 years ago by reinerp

Ok, makes sense. Sounds good.

comment:15 Changed 3 years ago by simonpj

  • Resolution set to wontfix
  • Status changed from new to closed

Great. I think we have agreed to

  • Close this ticket
  • Implement #4429
  • Fix #5379 (done already).

Thank you for uncovering these issues, and especially for finding #5379.

Simon

comment:16 Changed 2 years ago by LouisWasserman

  • Cc wasserman.louis@… added

"Also, the "totally fresh" semantics you want has narrow usefulness, because it can never be referred to anywhere outside that immediate quotation. You data constructor D, for example, could not be referred to from anywhere else (and could not be exported from the module) under the semantics you propose."

I wanted to point out my use case for exactly these semantics, and I think that it's more common than you seem to suggest.

This bug broke my package unpack-funcs. I generated class instances with an associated data type, and for that data type, I *did* want a data constructor that couldn't be exported from the model or referred to anywhere else, because I wanted the only access to be via the functions in the type class.

Instead, what I got was that I couldn't generate multiple class instances, because after the change to newName, the data constructors in each instance started conflicting!

As it stands, it seems like the only way around this is the dirty, dirty hack of generating unique names with an unsafePerformIO'd global IORef for a unique ID counter. It's really depressing. =(

Note: See TracTickets for help on using tickets.