Opened 8 months ago

Closed 3 months ago

#13608 closed bug (fixed)

Expose the type of quasiquotes

Reported by: facundo.dominguez Owned by: facundo.dominguez
Priority: normal Milestone: 8.4.1
Component: Template Haskell Version: 8.0.1
Keywords: QuasiQuotes Cc: simonpj, mboes, goldfire, bgamari
Operating System: Unknown/Multiple Architecture: Unknown/Multiple
Type of failure: None/Unknown Test Case:
Blocked By: Blocking:
Related Tickets: 12778 Differential Rev(s): Phab:D3610 Phab:D3821
Wiki Page:

Description (last modified by facundo.dominguez)

It happens with inline-java that

[java| 0.0 |]

produces a static method in java

Object fresh_name() { return 0.0; }

where

double fresh_name() { return 0.0; }

would be preferred. This is better because the user would get an error if the expression does not match the expected result type.

Examining the type that GHC expects of the quasiquote would allow to build the later variant. However, GHC provides no access to it.

The quasiquote desugars as follows:

  [java| 0.0 |]
====>
  $(quoteExp java " 0.0 ")

We have experimented with a patch that desugars instead like

  [java| 0.0 |]
====>
  let __ghc_qq_<loc_hash> = $(quoteExp java " 0.0 ")
   in __ghc_qq_<loc_hash>

The quasiquoter can learn then of the expected type by doing:

do qqName <- getCurrentQuasiQuoteName
   addModFinalizer $ reify qqName >>= ...
 where
   getCurrentQuasiQuoteName :: Q Name
   getCurrentQuasiQuoteName = do
     loc <- TH.location
     return $ mkName $ "__ghc_qq_" ++ hash loc

Where getCurrentQuasiQuoteName can be provided in Language.Haskell.TH.Quote.

This addresses the same concern that I initially intended to solve with the more complex proposal in #12778.

I hope to submit a patch this week, but it would be useful if people want to provide some feedback meanwhile.

Change History (24)

comment:1 Changed 8 months ago by facundo.dominguez

Owner: set to facundo.dominguez

comment:2 Changed 8 months ago by facundo.dominguez

Description: modified (diff)

comment:3 Changed 8 months ago by simonpj

Can you fill in the details? What does parseExp do? What does the finaliser do? What if the type mentions in-scope type variables (existentially or lambda bound)?

comment:4 Changed 8 months ago by facundo.dominguez

What does parseExp do?

Sorry, I meant quoteExp there from Language.Haskell.TH.Quote.

In the case of the java quasiquoter:

  • it adds a finalizer which generates the java method to call, e.g. Object fresh_name() { return 0.0; }
  • it inserts at the quasiquote location some foreign calls to have the generated java method invoked, and the result marshaled to Haskell.

What if the type mentions in-scope type variables (existentially or lambda bound)?

In that case, the variables will likely show up in the type returned by reify. We don't care much about that case, as the user of inline-java would be asked to add enough of a type signature to provide as much information as necessary to infer a reasonable type in java.

Also, if you want to explore it, there would be an alternative design where it is possible to avoid introducing the variable __ghc_qq_<hash_loc>. This would involve extending the Quasi monad with a method

qTypeOfCurrentQuasiQuote :: m TH.Type

which must be invoked in a module finalizer and yields the type of the quasiquote, which would require some interaction with the typechecker.

Last edited 7 months ago by facundo.dominguez (previous) (diff)

comment:5 Changed 8 months ago by rwbarton

The need to access the type from a finalizer seems a bit roundabout and restrictive. Of course you can't access the quote's type from within the expansion of the quasiquote for causality reasons, but what about a type class and typed quasiquotes?

comment:6 Changed 7 months ago by facundo.dominguez

but what about a type class and typed quasiquotes?

How would this work in order to get type of the quasiquote?

comment:7 Changed 7 months ago by facundo.dominguez

Description: modified (diff)

comment:8 Changed 7 months ago by facundo.dominguez

Description: modified (diff)

comment:9 Changed 7 months ago by simonpj

This is a change to the source language, so you should really make a GHC propsal for it. That way you would get good feedback.

Is the current TH finaliser design (with the recent modificadtions you put in) written up anywhere? If not, it would be good to do that at the same time.

I Utterly Hate the idea of making up a funny name based on the hash of a location, and then having to guess what it is (inside your function getCurrentQuasiQuoteName). Yurgh.

Could you not arrange that your Java parser, instead of producing some Haskell expression e, produced the Haskell expression let my_name = e in my_name, where my_name is a TH name that you generate. Now you know what it is!

But now you'll tell me that it's not in scope in the typechecker's environment when it encounters the quasi-quotes... but then quasi-quotes run in the renamer anyway. I'm very lost as you can see, but the current design just smells wrong to me. There are lots of clever people around GHC. Perhaps if you explain the original problem, and your current solution, someone may have a good idea.

comment:10 Changed 7 months ago by facundo.dominguez

Differential Rev(s): Phab:D3610
Status: newpatch

comment:11 Changed 7 months ago by facundo.dominguez

I just submitted the patch with the work in progress I have. But I'm fine further discussing a proposal.

I Utterly Hate the idea of making up a funny name based on the hash of a location ...

Well, the programmer doesn't have to do it. The function getCurrentQuasiQuoteName would be provided in the template-haskell package for that sake.

I hope the test in https://phabricator.haskell.org/D3610 makes clear how it works.

And I agree that it might help discussing the use case further.

Last edited 7 months ago by facundo.dominguez (previous) (diff)

comment:12 Changed 7 months ago by facundo.dominguez

I don't submit a GHC proposal because we don't have yet a good solution to propose, we only have a problem and a couple of leads to investigate.

I state the problem here in a general form. An account that Mathieu did a while ago can be found in ticket:12778#comment:8.

Template Haskell quasiquotes allow to embed other languages in Haskell programs. One can use this ability to generate and compile code in a foreign language and then have the result invoked from Haskell.

For quasiquotations to be typesafe though, the implementation of the quasiquoter needs to tell the foreign compiler which types are expected of the antiquoted variables and of the returned value. Quasiquoters currently have no way to find the expected return type if the programmer does not supply it explicitly.

Let's consider the following example using inline-java. The package inline-java implements a quasiquote which allows to embed fragments of Java programs in Haskell modules.

jappendWorld :: Text -> IO Text
jappendWorld = [java| $x + " World!" |]

This generates some Java code that is compiled by a Java compiler. It also generates some Haskell code which marshals values between Java and Haskell and invokes the result of compiling the Java code.

The java code that is generated looks like

    class ClassFreshName {
        public static Object freshName(String x) {
            return x + " World!";
        }
    }

The quasiquoter knows that the antiquote x has type String in Java, because it knows that x has the type Text in Haskell and it can marshal the values between the two types. The quasiquoter can find the Haskell type of x via the Template Haskell function reify as implemented in ticket:11832.

However, the quasiquoter has currently no way to find the expected return type. Therefore, it assumes that any return value is of the catch-all Java type java.lang.Object. This is problematic, because it is up to the programmer to use the return value in a way appropriate to its type. If the value returned by the quasiquote does not match the type expected by the programmer on the Haskell side, the program has undefined behavior.

Solutions:

  1. Have the programmer supply the return type, this is how the package inline-c works to embed C programs in Haskell. This involves effort on the part of the user to write the return type in every quasiquotation.
  1. Use typed splices instead of quasiquotes. e.g.
   $$(java [string| $x + " World!" |])

Typed splices do expose the expected type to the implementation, and the generated code could be tailored by using type classes. This is rather clumsy to write while quasiquotes are the best fit.

  1. Implement typed quasiquotes, so we can write
    [java|| $x + " World!" ||]

which desugars to

    $$(typedQuoteExp java " $x + \" World!\")
  1. What this ticket proposed.
  1. Similar to (4), but avoid introducing a name with a hash of the location. For this, we extend the type checker so when it finds a splice, it adds a binding to the typing environment which has the type of the splice and an identifier uniquely associated to the splice. Calls to reify can then find this binding and yield the type in the same fashion that it is done with antiquoted variables.

It might no be possible to make neither of (4) or (5) work with splices appearing in quotation brackets as the splice points are not recorded. It is the same challenge we found with ticket:12778.

Let me know if this should still be sent to GHC proposals. Besides that, any thoughts or advice?

Last edited 7 months ago by facundo.dominguez (previous) (diff)

comment:13 Changed 7 months ago by facundo.dominguez

Looks like typed splices and quasiquotes will pose some gotchas.

-- Q.hs
{-# LANGUAGE ScopedTypeVariables #-}
{-# LANGUAGE TemplateHaskell #-}
module Q where

import Data.Proxy
import Language.Haskell.TH.Syntax

class C a where
  method :: Proxy a -> Q (TExp a)

instance C Int where
  method _ = [|| 1 :: Int ||]

instance C Char where
  method _ = [|| 'a' ||]


q :: forall a. C a => Q (TExp a)
q = method (Proxy :: Proxy a)
-- testQ.hs
{-# LANGUAGE TemplateHaskell #-}
import Q
main :: IO ()
main = print ($$(q) `asTypeOf` (0 :: Int))
$ ghc --make testQ.hs
[1 of 2] Compiling Q                ( Q.hs, Q.o )
[2 of 2] Compiling Main             ( testQ.hs, testQ.o )

testQ.hs:6:18: error:
    • Ambiguous type variable ‘a0’ arising from a use of ‘q’
      prevents the constraint ‘(C a0)’ from being solved.
      Probable fix: use a type annotation to specify what ‘a0’ should be.
      These potential instances exist:
        instance C Char -- Defined at Q.hs:14:10
        instance C Int -- Defined at Q.hs:11:10
    • In the expression: q
      In the Template Haskell splice $$(q)
      In the first argument of ‘asTypeOf’, namely ‘$$(q)’
  |
6 | main = print ($$(q) `asTypeOf` (0 :: Int))
  |                  ^

comment:14 Changed 6 months ago by simonpj

Looks like typed splices and quasiquotes will pose some gotchas.

Ah, yes, I understand what's happening here, and (once again) it's awkward.

GHC has to compile and run the term inside the splice, here $$(q). But since q :: forall a. C a => Q (TExp a), looking at $$(q) in isolation we just see that q has type Q (TExp alpha) with constraint C alpha, but we don't know what alpha is. It'll ultimately be fixed by the asTypeOf (0::Int) part, but not yet.

If you change it to $$(q) :: Int then it does work because the information about the Int type is pushed inwards from the type signature. That is horribly delicate, and I had not realised it before.

The robust way to do it would be $$(q :: Q (TExp Int)), putting all the type info inside the splice.

This doesn't happen for untyped splices because they don't expect to get any type info from the context.

I think I should probably stop pushing type info from the context into a typed splice, so that it would fail reliably.

comment:15 Changed 6 months ago by simonpj

I don't submit a GHC proposal because we don't have yet a good solution to propose

Indeed. And I don't feel comfortable about any of the solutions you propose, because they all feel so specific and ad-hoc.

Is there anything we could do to have a more basic mechanism that is also more flexible? It seems that, for a given top-level splice (or quasiquote), you want to have the opportunity to do some arbitrary work "later", when type checking is complete; a bit like a core-to-core pass that works through those splices.

Suppose you could say

addPostProcessor :: (CoreExpr -> IO (CoreExpr, [CoreBind])) -> Q ()

So addPostProcessor f would say

  • When type checking and desugaring is complete, please run f on the spliced-in expression.
  • f e will return a new expression (of the same type) to replace it with (often just e).
  • ...and perhaps some new top-level bindings. (Might need more too...foreign stubs etc.)

The nice thing about this is that when we are in Core every Id has its type "inside" it; we don't need to consult any type environment etc, which has given us a lot of trouble with the addModFinaliser stuff (which this would replace).

The type of the entire splice (the original reason for this ticket) is also readily available, via exprType.

Just thinking aloud. I don't want us to get stuck in a deeper and deeper pile of sticking plasters.

Last edited 6 months ago by simonpj (previous) (diff)

comment:16 Changed 6 months ago by facundo.dominguez

addPostProcessor :: (CoreExpr -> IO (CoreExpr, [CoreBind])) -> Q ()

Will this require linking ghc with the application? CoreExpr and CoreBind are not exposed in a leaner library AFAIK. If it were not for this problem, it could be useful. Only the compiler needs to execute this code, so perhaps there is a way.

I just proposed an alternative in ticket:12778#comment:11

Last edited 6 months ago by facundo.dominguez (previous) (diff)

comment:17 Changed 6 months ago by simonpj

Will this require linking ghc with the application?

Ah yes, I suppose it would. Would that matter? I suppose it'd make the binary bigger.

I suppose that one could imagine modules that guarantee to contain only compile-time code, and hence which do not need to be linked into the final executable.

Keep thinking! I'm seeking a single, simple mechanism that'll solve multiple problems at once.

comment:18 Changed 6 months ago by facundo.dominguez

I think this proposal would stumble with the same rock that ticket:12778 and approaches (4) and (5). Given a nested splice, how do you associate it with a post-processor added with addPostProcessor?

This approach doesn't look very different from using compiler plugins. If the user can annotate the splice locations somehow, a plugin pass could spot them and add the necessary stubs built from the type information found in the Core program.

Last edited 6 months ago by facundo.dominguez (previous) (diff)

comment:19 Changed 6 months ago by simonpj

Given a nested splice, how do you associate it with a post-processor

Well, GHC would apply the post-procssor to the expression for the top-level splice. It would pass the expression, so no need for any other association.

I don't understand the nested-splice issue.

Yes, it's a bit like a plugin. But then TH splices are already a bit like a plugin: both provide code that the compiler links dynamically and runs at compile time.

I'm a bit ouf of my depth. Is anyone else interested in this? Designing for a single use-case is sometimes justified, but it's better if there are more.

comment:20 Changed 6 months ago by facundo.dominguez

I don't understand the nested-splice issue.

Suppose we have solved this problem for top-level splices, [java| ... |] gets the types it needs.

Then one day someone tries [| ... [java| ... |] ... |], and finds that it fails because the java quasiquoter gets the type of the top-level splice instead of its own type which occurs nested in the outer brackets.

The solution discussed in ticket:12778#comment:11 is more attractive in this regard. It says: design any feature to work with top-level splices. Presto! It will also work with nested splices because they become top-level splices before running.

Last edited 6 months ago by facundo.dominguez (previous) (diff)

comment:21 Changed 5 months ago by facundo.dominguez

Here's a more general approach that relies on plugins.

addCorePlugin :: GhcPlugins.Plugin -> Q ()

This arranges for a plugin to be inserted in the core2core passes. It saves the user the trouble of adding it at the top of the module:

{-# OPTIONS_GHC -fplugin=... #-}

The plugin can find the result of quasiquotations by making a pass over the module looking for some special function inserted by the quasiquoter for that sake.

I have observed that GHC does not link the GHC api into the final executable when using plugins, I would hope that it doesn't do it either if we add it this way.

comment:22 Changed 4 months ago by facundo.dominguez

Differential Rev(s): Phab:D3610Phab:D3610 Phab:D3821

Implemented

-- | Adds a core plugin to the compilation pipeline.
--
-- @addCorePlugin m@ has almost the same effect as passing @-fplugin=m@ to ghc
-- in the command line. The major difference is that the plugin module @m@
-- must not belong to the current package. When TH executes, it is too late
-- to tell the compiler that we needed to compile first a plugin module in the
-- current package.
addCorePlugin :: String -> Q ()

See the phabricator diff for details.

comment:23 Changed 3 months ago by Ben Gamari <ben@…>

In 1755869/ghc:

Implement TH addCorePlugin.

This allows template-haskell code to add plugins to the compilation
pipeline. Otherwise, the user would have to pass -fplugin=... to ghc.

For now, plugin modules in the current package can't be used. This is
because when TH runs, it is too late to let GHC know that the plugin
modules needed to be compiled first.

Test Plan: ./validate

Reviewers: simonpj, bgamari, austin, goldfire

Reviewed By: bgamari

Subscribers: angerman, rwbarton, mboes, thomie

GHC Trac Issues: #13608

Differential Revision: https://phabricator.haskell.org/D3821

comment:24 Changed 3 months ago by bgamari

Milestone: 8.4.1
Resolution: fixed
Status: patchclosed
Note: See TracTickets for help on using tickets.