Expose the type of quasiquotes

mentioned in issue #12778

changed weight to 5

added Tbug TemplateHaskell Trac import labels

changed the description

Can you fill in the details? What does parseExp do? What does the finaliser do? What if the type mentions in-scope type variables (existentially or lambda bound)?

What does parseExp do?

Sorry, I meant quoteExp there from Language.Haskell.TH.Quote.

In the case of the java quasiquoter:

it adds a finalizer which generates the java method to call, e.g. Object fresh_name() { return 0.0; }
it inserts at the quasiquote location some foreign calls to have the generated java method invoked, and the result marshaled to Haskell.

What if the type mentions in-scope type variables (existentially or lambda bound)?

In that case, the variables will likely show up in the type returned by reify. We don't care much about that case, as the user of inline-java would be asked to add enough of a type signature to provide as much information as necessary to infer a reasonable type in java.

Also, if you want to explore it, there would be an alternative design where it is possible to avoid introducing the variable __ghc_qq_<hash_loc>. This would involve extending the Quasi monad with a method

qTypeOfCurrentQuasiQuote :: m TH.Type

which must be invoked in a module finalizer and yields the type of the quasiquote, which would require some interaction with the typechecker.

The need to access the type from a finalizer seems a bit roundabout and restrictive. Of course you can't access the quote's type from within the expansion of the quasiquote for causality reasons, but what about a type class and typed quasiquotes?

but what about a type class and typed quasiquotes?

How would this work in order to get type of the quasiquote?

changed the description

This is a change to the source language, so you should really make a GHC propsal for it. That way you would get good feedback.

Is the current TH finaliser design (with the recent modificadtions you put in) written up anywhere? If not, it would be good to do that at the same time.

I Utterly Hate the idea of making up a funny name based on the hash of a location, and then having to guess what it is (inside your function getCurrentQuasiQuoteName). Yurgh.

Could you not arrange that your Java parser, instead of producing some Haskell expression e, produced the Haskell expression let my_name = e in my_name, where my_name is a TH name that you generate. Now you know what it is!

But now you'll tell me that it's not in scope in the typechecker's environment when it encounters the quasi-quotes... but then quasi-quotes run in the renamer anyway. I'm very lost as you can see, but the current design just smells wrong to me. There are lots of clever people around GHC. Perhaps if you explain the original problem, and your current solution, someone may have a good idea.

Trac metadata

Trac field	Value
Differential revisions	- → D3610

I just submitted the patch with the work in progress I have. But I'm fine further discussing a proposal.

I Utterly Hate the idea of making up a funny name based on the hash of a location ...

Well, the programmer doesn't have to do it. The function getCurrentQuasiQuoteName would be provided in the template-haskell package for that sake.

I hope the test in https://phabricator.haskell.org/D3610 makes clear how it works.

And I agree that it might help discussing the use case further.

I don't submit a GHC proposal because we don't have yet a good solution to propose, we only have a problem and a couple of leads to investigate.

I state the problem here in a general form. An account that Mathieu did a while ago can be found in ticket:12778#comment:135621.

Template Haskell quasiquotes allow to embed other languages in Haskell programs. One can use this ability to generate and compile code in a foreign language and then have the result invoked from Haskell.

For quasiquotations to be typesafe though, the implementation of the quasiquoter needs to tell the foreign compiler which types are expected of the antiquoted variables and of the returned value. Quasiquoters currently have no way to find the expected return type if the programmer does not supply it explicitly.

Let's consider the following example using inline-java. The package inline-java implements a quasiquote which allows to embed fragments of Java programs in Haskell modules.

jappendWorld :: Text -> IO Text
jappendWorld = [java| $x + " World!" |]

This generates some Java code that is compiled by a Java compiler. It also generates some Haskell code which marshals values between Java and Haskell and invokes the result of compiling the Java code.

The java code that is generated looks like

    class ClassFreshName {
        public static Object freshName(String x) {
            return x + " World!";
        }
    }

The quasiquoter knows that the antiquote x has type String in Java, because it knows that x has the type Text in Haskell and it can marshal the values between the two types. The quasiquoter can find the Haskell type of x via the Template Haskell function reify as implemented in #11832 (closed).

However, the quasiquoter has currently no way to find the expected return type. Therefore, it assumes that any return value is of the catch-all Java type java.lang.Object. This is problematic, because it is up to the programmer to use the return value in a way appropriate to its type. If the value returned by the quasiquote does not match the type expected by the programmer on the Haskell side, the program has undefined behavior.

Solutions:

Have the programmer supply the return type, this is how the package inline-c

works to embed C programs in Haskell. This involves effort on the part

of the user to write the return type in every quasiquotation.

Use typed splices instead of quasiquotes. e.g.

$$(java [string| $x + " World!" |])

Typed splices do expose the expected type to the implementation, and the generated code

could be tailored by using type classes. This is rather clumsy to write while

quasiquotes are the best fit.

Implement typed quasiquotes, so we can write

[java|| $x + " World!" ||]

which desugars to

$$(typedQuoteExp java " $x + \" World!\")

What this ticket proposed.
Similar to (4), but avoid introducing a name with a hash of the location.

For this, we extend the type checker so when it finds a splice, it adds a

binding to the typing environment which has the type of the splice and an

identifier uniquely associated to the splice. Calls to reify can then

find this binding and yield the type in the same fashion that it is done

with antiquoted variables.

It might no be possible to make neither of (4) or (5) work with splices appearing in quotation brackets as the splice points are not recorded. It is the same challenge we found with #12778.

Let me know if this should still be sent to GHC proposals. Besides that, any thoughts or advice?

Looks like typed splices and quasiquotes will pose some gotchas.

-- Q.hs
{-# LANGUAGE ScopedTypeVariables #-}
{-# LANGUAGE TemplateHaskell #-}
module Q where

import Data.Proxy
import Language.Haskell.TH.Syntax

class C a where
  method :: Proxy a -> Q (TExp a)

instance C Int where
  method _ = [|| 1 :: Int ||]

instance C Char where
  method _ = [|| 'a' ||]


q :: forall a. C a => Q (TExp a)
q = method (Proxy :: Proxy a)

-- testQ.hs
{-# LANGUAGE TemplateHaskell #-}
import Q
main :: IO ()
main = print ($$(q) `asTypeOf` (0 :: Int))

$ ghc --make testQ.hs
[1 of 2] Compiling Q                ( Q.hs, Q.o )
[2 of 2] Compiling Main             ( testQ.hs, testQ.o )

testQ.hs:6:18: error:
    • Ambiguous type variable ‘a0’ arising from a use of ‘q’
      prevents the constraint ‘(C a0)’ from being solved.
      Probable fix: use a type annotation to specify what ‘a0’ should be.
      These potential instances exist:
        instance C Char -- Defined at Q.hs:14:10
        instance C Int -- Defined at Q.hs:11:10
    • In the expression: q
      In the Template Haskell splice $$(q)
      In the first argument of ‘asTypeOf’, namely ‘$$(q)’
  |
6 | main = print ($$(q) `asTypeOf` (0 :: Int))
  |                  ^

Looks like typed splices and quasiquotes will pose some gotchas.

Ah, yes, I understand what's happening here, and (once again) it's awkward.

GHC has to compile and run the term inside the splice, here $$(q). But since q :: forall a. C a => Q (TExp a), looking at $$(q) in isolation we just see that q has type Q (TExp alpha) with constraint C alpha, but we don't know what alpha is. It'll ultimately be fixed by the asTypeOf (0::Int) part, but not yet.

If you change it to $$(q) :: Int then it does work because the information about the Int type is pushed inwards from the type signature. That is horribly delicate, and I had not realised it before.

The robust way to do it would be $$(q :: Q (TExp Int)), putting all the type info inside the splice.

This doesn't happen for untyped splices because they don't expect to get any type info from the context.

I think I should probably stop pushing type info from the context into a typed splice, so that it would fail reliably.

I don't submit a GHC proposal because we don't have yet a good solution to propose

Indeed. And I don't feel comfortable about any of the solutions you propose, because they all feel so specific and ad-hoc.

Is there anything we could do to have a more basic mechanism that is also more flexible? It seems that, for a given top-level splice (or quasiquote), you want to have the opportunity to do some arbitrary work "later", when type checking is complete; a bit like a core-to-core pass that works through those splices.

Suppose you could say

addPostProcessor :: (CoreExpr -> IO (CoreExpr, [CoreBind])) -> Q ()

So addPostProcessor f would say

When type checking and desugaring is complete, please run f on the spliced-in expression.
f e will return a new expression (of the same type) to replace it with (often just e).
...and perhaps some new top-level bindings. (Might need more too...foreign stubs etc.)

The nice thing about this is that when we are in Core every Id has its type "inside" it; we don't need to consult any type environment etc, which has given us a lot of trouble with the addModFinaliser stuff (which this would replace).

The type of the entire splice (the original reason for this ticket) is also readily available, via exprType.

Just thinking aloud. I don't want us to get stuck in a deeper and deeper pile of sticking plasters.

addPostProcessor :: (CoreExpr -> IO (CoreExpr, [CoreBind])) -> Q ()

Will this require linking ghc with the application? CoreExpr and CoreBind are not exposed in a leaner library AFAIK. If it were not for this problem, it could be useful. Only the compiler needs to execute this code, so perhaps there is a way.

I just proposed an alternative in ticket:12778#comment:137690

Will this require linking ghc with the application?

Ah yes, I suppose it would. Would that matter? I suppose it'd make the binary bigger.

I suppose that one could imagine modules that guarantee to contain only compile-time code, and hence which do not need to be linked into the final executable.

Keep thinking! I'm seeking a single, simple mechanism that'll solve multiple problems at once.

I think this proposal would stumble with the same rock that #12778 and approaches (4) and (5). Given a nested splice, how do you associate it with a post-processor added with addPostProcessor?

This approach doesn't look very different from using compiler plugins. If the user can annotate the splice locations somehow, a plugin pass could spot them and add the necessary stubs built from the type information found in the Core program.

Given a nested splice, how do you associate it with a post-processor

Well, GHC would apply the post-procssor to the expression for the top-level splice. It would pass the expression, so no need for any other association.

I don't understand the nested-splice issue.

Yes, it's a bit like a plugin. But then TH splices are already a bit like a plugin: both provide code that the compiler links dynamically and runs at compile time.

I'm a bit ouf of my depth. Is anyone else interested in this? Designing for a single use-case is sometimes justified, but it's better if there are more.

I don't understand the nested-splice issue.

Suppose we have solved this problem for top-level splices, [java| ... |] gets the types it needs.

Then one day someone tries [| ... [java| ... |] ... |], and finds that it fails because the java quasiquoter gets the type of the top-level splice instead of its own type which occurs nested in the outer brackets.

The solution discussed in ticket:12778#comment:137690 is more attractive in this regard. It says: design any feature to work with top-level splices. Presto! It will also work with nested splices because they become top-level splices before running.

Here's a more general approach that relies on plugins.

addCorePlugin :: GhcPlugins.Plugin -> Q ()

This arranges for a plugin to be inserted in the core2core passes. It saves the user the trouble of adding it at the top of the module:

{-# OPTIONS_GHC -fplugin=... #-}

The plugin can find the result of quasiquotations by making a pass over the module looking for some special function inserted by the quasiquoter for that sake.

I have observed that GHC does not link the GHC api into the final executable when using plugins, I would hope that it doesn't do it either if we add it this way.

Implemented

-- | Adds a core plugin to the compilation pipeline.
--
-- @addCorePlugin m@ has almost the same effect as passing @-fplugin=m@ to ghc
-- in the command line. The major difference is that the plugin module @m@
-- must not belong to the current package. When TH executes, it is too late
-- to tell the compiler that we needed to compile first a plugin module in the
-- current package.
addCorePlugin :: String -> Q ()

See the phabricator diff for details.

Trac metadata

Trac field	Value
Differential revisions	D3610 → D3610, D3821

mentioned in commit 17558690

closed

changed milestone to %8.4.1

Trac metadata

Trac field	Value
Resolution	Unresolved → ResolvedFixed

added Pnormal label

Expose the type of quasiquotes

Child items 0

Activity