Opened 4 years ago

Closed 4 years ago

#5348 closed feature request (fixed)

Template Haskell quasiquote escapes

Reported by: simonpj Owned by:
Priority: normal Milestone:
Component: Compiler Version: 7.0.3
Keywords: Cc: gale@…, mainland@…, kfisher@…, michael@…, jonas.duregard@…, gershomb@…, haskell@…
Operating System: Unknown/Multiple Architecture: Unknown/Multiple
Type of failure: None/Unknown Test Case:
Blocked By: Blocking:
Related Tickets: Differential Revisions:

Description

This ticket arises from a brief email thread
http://www.haskell.org/pipermail/glasgow-haskell-users/2011-June/020550.html and http://www.haskell.org/pipermail/glasgow-haskell-users/2011-July/020564.html

In a Template Haskell quasiquote, what would you expect this to do?

  [myparser|blah\1blah|]

That is, what String would you expect to be passed to the parser myparser? Currently you get exactly the 10-character string

blah\1blah

The backslash is passed on literally. The principle is to do as little meddling as possible, leaving it up to myparser to do what it wants. However, backslash does initiate some limited escape behaviour

  [myparser|blah\|blah|]    -- gives "blah|blah"  (9 chars)
  [myparser|blah\|]blah|]   -- gives "blah|]blah" (10 chars)
  [myparser|blah\]blah|]    -- gives "blah]blah"  (9 chars)

The motivation here is that you might want the sequence "|]" in your quoted string, and to do so you have to escape it.

However:

  • Currently "\]" is an escape sequence, but it does not need to be.
  • Currently "\|" is an escape sequence, whereas it could be "\|]". Then [myparser|blah\|blah|] would yield the (10-char) string blah\|blah.
  • There is no way to have a quote that ends in a backslash, thus [myparser|\blahblah\|]. This could be fixed by adding "\\" as an escape sequence for backslash.

An alterntive would be to support no escapes at all, but instead ask myparser itself to determine where the quotation ended. Instead of String -> Q Exp the parser would have type String -> Q (String,Exp). This seems to me to be a bridge too far. Better to get GHC to determine where the quote ends.

Proposal:

  • Get rid of "\]" as an escape sequence
  • Replace "\]" as an escape sequence by "\|]"
  • Add "\\" as an escape sequence

Change History (19)

comment:1 Changed 4 years ago by simonpj

  • Type changed from bug to feature request

comment:2 follow-up: Changed 4 years ago by YitzGale

Another alternative: support no escapes, without asking
myparser to determine where the quotation ends. Just use the
current closing oxford bracket, without a hard-wired way to
escape it.

If someone needs to be able to allow literal oxford brackets
in quasi-quotes, they can define their own syntax for that.
There is no need for GHC to provide it.

Of course there is then a limitation on what syntax can
be used for this purpose - it cannot contain
an entire closing oxford bracket, because that will be intercepted
by GHC to end the quasi-quote. So, for example, it can be
"|\]", or "'|']", but not "\|]".

I think that limitation is well worth it. GHC must do its utmost
to avoid interfering with the syntax that the quasi-quoter
is trying to define. Frankly, I was shocked to learn of this
backslash meddling.

Whereas the need to allow literal oxford brackets in a quasi-quote
seems to be somewhat of a corner case. None of the popular
quasi-quoters I know about have any particular need for that.
Though I can imagine it coming up sometimes.

Perhaps, as a convenience, the library should provide a combinator
for backslash quoting. Users can optionally compose it with
their quasi-quoter. For example, something like this:

backslashQuoter :: String -> String
backslashQuoter s@('|':'\\':s') = case span (== '\\') s' of
                                    (bs, cs@(']':_)) -> '|' : bs ++ cs
                                    _                -> s
backslashQuoter s                                     = s

I.e., recognize backslash quoting only for a sequence
of backslashes directly sandwiched between
'|' and ']'. Otherwise stay out of the way.
Or we could define the combinator to be more like what
Simon suggested.

If we must have hard-wired backslash quoting, I would much
prefer something more polite, like the above, over what Simon
suggested.

To summarize, I am making two alternative proposals:

  • No hard-wired quoting; provide a combinator for convenience. This is what I prefer. Or,
  • Hard-wired quoting that only affects oxford-bracket-like syntax.

comment:3 Changed 4 years ago by int-e

As Yitz points out, third alterntive would be to let quasi-quotes end with |]. A downside is that then different quasi-quotation authors will pick different encodings. I like the idea of having a convenience function like Yitz proposed, because it'll encourage using a standard encoding.

Btw I totally agree that letting the quasi-quoters determine where the quote ends goes too far. It's not just GHC, it affects other development tools as well, starting with syntax highlighting. Those tools would then have to learn about the intricacies of every quasi-quoter in existence to do a decent job, which is clearly impossible. (And finally, don't forget the poor human reading the code.)

comment:4 in reply to: ↑ 2 ; follow-up: Changed 4 years ago by simonpj

Replying to YitzGale:

Another alternative: support no escapes, without asking
myparser to determine where the quotation ends. Just use the
current closing oxford bracket, without a hard-wired way to
escape it.

I don't think this is feasible. Suppose we want to support random language X inside quasiquotes, and X includes string literals. Then we might want to say

 [xlang|  f x = "abc|]def" |]

With my proposal you'd say

 [xlang| f x = "abc\|]def" |]

With yours you can't say this at all. You may say that you simply have to adjust the xlang parser so that X never requires you to have the character sequence "|]" in the code. So you'd have to say something like

  [xlang| f x = "abc\Cdef" |]

(Notice that we have to avoid the character sequence "|]") which seems a little strange. On the other hand, if "|]" isn't relevant in language X, my proposal has no effect. (Oh, except for backslash itself... that's true... Maybe "\\" should not be an escape sequence... instead "\\|]" should produce a single trailing backslash and close the quote. That'd do it.

comment:5 Changed 4 years ago by snoyberg

  • Cc michael@… added

Just wanted to mention that I actually ran into this issue in the past. The Yesod scaffolder produces a number of Haskell files, some of which include quasiquotes. And in an older version, I specified the code within a quasiquote itself, so I needed a way to allow |] to show up within the content of the QQ.

The work-around I implemented was to hardcode a value (I think , I'm not certain at this point) that would be completely ignored by the quasiquoter. This worked out very well and was simple to implement. Since then, I've moved from QQ to an external file (via qRunIO and readFile) for the same effect and have been able to avoid this hack.

I'm probably one of the biggest (ab)users of QQ, and this hasn't been an issue that's bitten me too much. If it stays "broken" as is, I won't shed any tears. But I would prefer not implementing some kind of arbitrary escape mechanism within QQ. We already use a bit of backslash escaping within Hamlet, and if the QQ mechanism itself *also* did backslash escaping, code would get ugly (not to mention breakage between GHC versions).

comment:6 Changed 4 years ago by JonasDuregard

  • Cc jonas.duregard@… added

I posted this suggestion on the GHC users list, should probably just have posted it here instead:

We introduce an alternative start tag i.e. a quote can start with [q| or [q\| (or something similar). If [q| is used then the quoted text is used verbatim and if the alternative start-tag is used then the escape sequences suggested (
and \|]) are enabled.

Examples:

These are all as in the current implementation:
[q|nothing special|]
[q|a \ in the text|] (uses "a
in the text")

These is an incompatible change:
[q|a \|] |] will be a parse error, is currently "a |] "

These are all parse errors (I believe) that will work:
[q|a \|] will use "a
"
[q\|nothing special|] will use "nothing special"
[q\|a
in the text|] will use "a
in the text"
[q\|both \|] and
|] will use "both |] and
"

Also [q\|a \ in the text|] should be a parse error, i.e. all backslashes must be escape sequences when using the alternative tag.

Arguably the incompatible change could be removed, but then the alternative tag must be used to end a quote with backslash and it seems cleaner to have one fully verbatim and one fully escapable tag.

One advantage of this approach is that we can add more escape sequences in the future without breaking code (since all backslash sequences are reserved in the alternative tag).

comment:7 in reply to: ↑ 4 Changed 4 years ago by YitzGale

...we might want to say

  [xlang|  f x = "abc|]def" |]

With my proposal you'd say

  [xlang| f x = "abc\|]def" |]

With mine you'd say

 [xlang| f x = "abc|\]def" |]

With yours you can't say this at all.

I don't understand why you feel "|\]"
is so much worse than "\|]".

You may say that you simply have to adjust
the xlang parser so that X never requires
you to have the character sequence "|]" in the code.

From the perspective of a user, it is obviously the
case that the usual syntax for xlang will need to be
modified for use within the quasi-quote to avoid "|]".
The question is whether we force this change upon the
user in a particular way, or whether we leave it up to
the implementer of the quasi-quoter.

(Notice that we have to avoid the character sequence "|]")
which seems a little strange.

True. But your proposal at least requires quoting that sequence.
There is no avoiding doing something to it.

(Oh, except for backslash itself... that's
true... Maybe "
" should not be an escape
sequence... instead "
|]" should produce a
single trailing backslash and close the
quote. That'd do it.

Not quite. What if you want a literal closing
oxford preceded by one or more backslashes? Shall
the interpretation of the bracket as literal or
closing depend on whether the number of preceding
backslashes is odd or even?

Michael Snoyman once used "|~]" when he
needed literal brackets in a quasi-quoter. It
seems simpler to work out all the cases when the
quoting characters are placed inside the sandwich,
rather than as a topping.

In any case, I still don't understand why we can't
leave as much as possible up to the author of the
quasi-quoter.

comment:8 Changed 4 years ago by gershomb

  • Cc gershomb@… added

I just know that I really don't like the idea of "
" being an escape sequence. Its already too common an escape sequence elsewhere in the PL world. So if I'm writing, say, inline javascript code, that includes string literals with escaped backslashes, then I have to double-escape those escaped backslashes, which can get ugly quickly. Also,
is the list difference operator in Haskell, so if I'm writing an inline haskell-like language, then I've lost a fairly standard bit of syntax.

Moving to just "\|]" to escape the end of a quote sounds like a good idea. I don't think its a great loss to not have the ability to end a quasiquote with a backslash. Any quasiquoter I could imagine would/should be able to handle trailing whitespace, in which case one could always end the quote with "\ |]".

comment:9 Changed 4 years ago by benmachine

  • Cc haskell@… added

I agree with Yitz. Adding an escaping mechanism adds no expressivity, just complexity and surprise. I spent a little time implementing the |\] escape mechanism, and it's only while testing it that I noticed that \|] existed. I really think the former is more natural and doesn't require any "magic" to support. I don't think lack of uniformity is a sensible argument, since nothing else about quasiquoters is uniform in any case, and if we added library support as suggested then that would at least encourage a common pattern.

comment:10 follow-ups: Changed 4 years ago by simonpj

I see your point about double escaping. Getting rid of the escape mechanism altogether is fine with me -- it's easy to implement!

But can you outline how Kathleen's PadsHaskell will work? She has quasiquotes that involve fragments of Haskell:

  [pads|  blah blah { f x = "wubble" } blah blah |]

where the "blah blah" is in the PADS DSL, but then there's a bit of proper Haskell inside. Now that proper Haskell is parsed with, well, a Haskell parser. So if she want the sequence "|]" in that literal string, what should she do?

Simon

comment:11 in reply to: ↑ 10 Changed 4 years ago by JonasDuregard

Replying to simonpj:

where the "blah blah" is in the PADS DSL, but then there's a bit of proper Haskell inside. Now that proper Haskell is parsed with, well, a Haskell parser. So if she want the sequence "|]" in that literal string, what should she do?

I think what benmachine means by library support is a function like

escaped :: QuasiQuoter -> QuasiQuoter

That takes any quoter and produces a new one which preprocesses the input string, replacing [\| with [|. Ideally we could then write [escaped q|blah |\] blah|], but this requires a relaxed syntax for quoter identifiers. Currently we must use q' = escaped q and [q'|...]. And i believe q' must be defined in a separate module.

I think this solution is great, text is used verbatim unless the user decides to go with a modified quoter (and this should only be done if the verbatim quoter gives a parse error). This way the user only needs to worry about escaping in the very rare occasions when a |] is in the quoted text. Of course if you need both |\] and |] in your text you can use a preprocessor with a more advanced escaping mechanism, with the disadvantage that you need to be more careful with the rest of your text but only in this particular quote!

comment:12 in reply to: ↑ 10 Changed 4 years ago by int-e

Replying to simonpj:

But can you outline how Kathleen's PadsHaskell will work? She has quasiquotes that involve fragments of Haskell:

  [pads|  blah blah { f x = "wubble" } blah blah |]

With Yitz' proposal, she could anticipate this problem and use the predefined convenience unescape function in the definition of the quasi-quoter, or define a custom one if |\], |\\]... clash with the Pads syntax for some reason.

I really like the simplicity of this approach, and it preserves a reasonable amount of flexibility (in particular the choice of escape character can be made by the quasi-quoter author).

comment:13 follow-up: Changed 4 years ago by kfisher

Simon asked me to comment on this thread. I'm afraid I don't fully understand the proposal. Could you explain it in a bit more detail, perhaps working out a small example that shows a bit of Haskell containing a |] contained within the pads/haskell quasi-quoter?

Practically our quoted Haskell is very simple and so is unlikely to contain an '|]' symbol. However, it would be nice to get the corner cases right. I don't want to have to define multiple quasi quoters in multiple files to get this corner case right, which is what it sounds like I might have to do under this proposal. I certainly don't want my end user to see that kind of complexity.

comment:14 in reply to: ↑ 13 Changed 4 years ago by YitzGale

Replying to kfisher:

Could you explain it in a bit more detail, perhaps
working out a small example that shows a bit of
Haskell containing a |] contained within the pads/haskell
quasi-quoter?

Hmm, I wasn't immediately able to find much information
about the pads quasiquoter. But I'll do my best to work
out a simple hypothetical example below.

I don't want to have to define multiple quasi quoters
in multiple files to get this corner case right

You certainly would not need to do that.

Let's say that you currently define the pads
quasiquoter for top-level declarations. So your
definition of the quasiquoter looks something like
this:

pads :: QuasiQuoter
pads = QuasiQuoter { quoteDec  = padsQuoteDec }

padsQuoteDec  = ...

Suppose you'd now like to allow embedded
oxford brackets.

You would simply replace the definition of
padsQuoteDec with something like:

padsQuoteDec = padsQuoteDecBase . escapeBrackets
padsQuoteDecBase = ...

Use the exact same definition for padsQuoteDecBase
as you used before for padsQuoteDec. That's it!

Now, let's say users want to use a pads quasiquoter
to describe data that looks like this:

[| 42 |]

They might then write:

[pads|
  Pstruct oxford_t {
    "[| "; Puint32 num;
    " |\]";
  }
|]

The PADS expression parser will see "|]" where the
user wrote "|\]".

If the users want a literal "|\]" to be part of the
PADS expression, they can write "|\\]". And so on.

The character '\' only has this special pre-processing when
it is directly sandwiched between '|' and ']'.
In any other context, the '\' is passed literally
to the base PADS quasiquoter as before.

Perhaps this syntax for quoting the closing oxford bracket
does not suit you for PADS, e.g., due to interaction with
a special meaning of '\' in PADS itself.
Then instead of the escapeBracket library function used
above, you can define your own syntax for a quoted closing
oxford bracket:

padsQuoteDec = padsQuoteDecBase . myEscapeBracket

myEscapeBracket :: String -> String
myEscapeBracket = ...

padsQuoteDecBase = ...

For example, one quasiquoter author mentioned above that
he used "|~]" to represent "|]".
See my earlier comment in this ticket for an example of
how to write a bracket escape function.

With this proposal, you have complete flexibility to decide
what syntax users use to embed a literal "|]".
You can easily use the built-in syntax as above, or you can define
your own syntax without too much more effort, or you can decide
not to allow them at all.

comment:15 Changed 4 years ago by kfisher

Thanks for the more detailed explanation. That proposal seems fine to me.

comment:16 Changed 4 years ago by YitzGale

Since we mentioned my example bracket quoter again,
I will fix some bugs that were in it,
with apologies:

backslashQuoter :: String -> String
backslashQuoter ('|':s) = '|' :
    case span (== '\\') s of
       (_:bs, ']':cs) -> bs ++ ']' : backslashQuoter cs
       (bs, cs)       -> bs ++       backslashQuoter cs
backslashQuoter (c:cs) = c :         backslashQuoter cs
backslashQuoter _      = []

comment:17 Changed 4 years ago by simonpj@…

commit 70c7eb8e3c10f0841521e164507a52aecda1bd03

Author: Simon Peyton Jones <[email protected]>
Date:   Wed Aug 3 16:16:50 2011 +0100

    Remove all escape handling from quasiquotes; fixes Trac #5348
    
    There is a long discussion in the ticket.

 compiler/parser/Lexer.x |   16 ++++++----------
 1 files changed, 6 insertions(+), 10 deletions(-)

comment:18 Changed 4 years ago by simonpj@…

commit 617edd7799da3d5b524385584585e56f25915e5f

Author: Simon Peyton Jones <[email protected]>
Date:   Wed Aug 3 16:44:00 2011 +0100

    Document the (lack of) escape conventions in quasiquotes
    
    See Trac #5348

 docs/users_guide/glasgow_exts.xml |   12 ++++++++++++
 1 files changed, 12 insertions(+), 0 deletions(-)

comment:19 Changed 4 years ago by simonpj

  • Resolution set to fixed
  • Status changed from new to closed

Done. I'm inclined *not* to push this to the 7.2 branch, unless one of you really wants it. It's a change in functionality (albeit a small one) and we are trying to stablise the branch.

Simon

Note: See TracTickets for help on using tickets.