Opened 2 years ago

Closed 2 years ago

Last modified 2 years ago

#10796 closed bug (fixed)

Illegal data constructor name: `fromList' ... When splicing a TH expression

Reported by: erisco Owned by: RyanGlScott
Priority: normal Milestone: 8.0.1
Component: Template Haskell Version: 7.8.3
Keywords: newcomer Cc:
Operating System: Unknown/Multiple Architecture: Unknown/Multiple
Type of failure: None/Unknown Test Case:
Blocked By: Blocking:
Related Tickets: Differential Rev(s): Phab:D1313
Wiki Page:

Description

At the end is a log of my ghci session reproducing the issue. It is expected that the splice should succeed, but GHC complains "Illegal data constructor name: `fromList' ... When splicing a TH expression".

Full definitions being used can be found at http://hackage.haskell.org/package/regex-tdfa-quasiquoter-0.2.0.0 . Briefly, this is the splice being used.

[e|patternToRegex $patternExp $compOptsExp $execOptsExp|]
where
patternExp = dataToExpQ (const Nothing) pattern
compOptsExp = dataToExpQ (const Nothing) compOpts
execOptsExp = dataToExpQ (const Nothing) execOpts

Data and Typeable instances are derived (using -XStandaloneDeriving -XDeriveDataTypeable) as orphan instances in the Internal module.

The issue occurs when using bracket expressions (in POSIX regular expressions). There may be other cases but I have not uncovered them.

> ghci
GHCi, version 7.8.3: http://www.haskell.org/ghc/  :? for help
Loading package ghc-prim ... linking ... done.
Loading package integer-gmp ... linking ... done.
Loading package base ... linking ... done.
>>> :set -XQuasiQuotes
>>> import Text.Regex.TDFA.QuasiQuoter
>>> :t [re|;[a]|]
Loading package transformers-0.3.0.0 ... linking ... done.
Loading package array-0.5.0.0 ... linking ... done.
Loading package deepseq-1.3.0.2 ... linking ... done.
Loading package bytestring-0.10.4.0 ... linking ... done.
Loading package containers-0.5.5.1 ... linking ... done.
Loading package mtl-2.1.3.1 ... linking ... done.
Loading package regex-base-0.93.2 ... linking ... done.
Loading package text-1.2.0.3 ... linking ... done.
Loading package parsec-3.1.9 ... linking ... done.
Loading package regex-tdfa-1.2.0 ... linking ... done.
Loading package pretty-1.1.1.1 ... linking ... done.
Loading package template-haskell ... linking ... done.
Loading package regex-tdfa-quasiquoter-0.2.0.0 ... linking ... done.

<interactive>:1:5:
    Illegal data constructor name: `fromList'
    When splicing a TH expression:
      Text.Regex.TDFA.TDFA.patternToRegex (GHC.Tuple.(,) (Text.Regex.TDFA.Pattern.POr ((GHC.Types.:) (Text.Regex.TDFA.Pattern.PConcat ((GHC.Types.:) (Text.Regex.TDFA.Pattern.PAny (Text.Regex.TDFA.Common.DoPa 1) (Text.Regex.TDFA.Pattern.PatternSet (Data.Maybe.Just (Data.Set.Base.fromList ((GHC.Types.:) 'a' GHC.Types.[]))) Data.Maybe.Nothing Data.Maybe.Nothing Data.Maybe.Nothing)) GHC.Types.[])) GHC.Types.[])) (GHC.Tuple.(,) 0 (Text.Regex.TDFA.Common.DoPa 1))) (Text.Regex.TDFA.Common.CompOption GHC.Types.True GHC.Types.True GHC.Types.True GHC.Types.True GHC.Types.False) (Text.Regex.TDFA.Common.ExecOption GHC.Types.True)
>>>

Change History (13)

comment:1 Changed 2 years ago by simonpj

This could be the regex-tdfa-quasiquoter package generating illegal Haskell, no?

comment:2 Changed 2 years ago by goldfire

A little poking through shows me that this is an infelicity in the containers library. The splice above calls Language.Haskell.TH.Quote.dataToQa, which performs lifting (in the sense of TH's lift operation and Lift class) based on a Data instance. The data involved includes a Data.Set.Set. And Set's Data instance reads, in part

instance (Data a, Ord a) => Data (Set a) where
  toConstr _     = fromListConstr

fromListConstr :: Constr
fromListConstr = mkConstr setDataType "fromList" [] Prefix

Note that the "constructor" is named fromList. But this is a lie, of course. It's done to preserve abstraction, which is a laudable goal, so I'm not calling it a bug.

An easy solution here would be to generalize dataToQa to check if it's given a data constructor or just a function and react accordingly. It really should be getting a constructor, but there seems to be no harm in allowing Data instances to lie like this and continue gracefully.

comment:3 Changed 2 years ago by goldfire

Keywords: newcomer added
Milestone: 7.12.1

Milestoning this so that I return to it when dispatching a bunch of TH tickets this fall. But this would be easy for just about anyone to tackle. Just update dataToQa!

comment:4 Changed 2 years ago by thoughtpolice

Milestone: 7.12.18.0.1

Milestone renamed

comment:5 Changed 2 years ago by thomie

Architecture: x86_64 (amd64)Unknown/Multiple
Operating System: WindowsUnknown/Multiple

Not just Windows.

comment:6 Changed 2 years ago by RyanGlScott

One thing isn't clear to me about the changes proposed. We could certainly modify dataToQa so that dataToExpQ (and thus liftData) works with both function and constructor names. What about dataToPatQ, though? There's no way to pattern-match on an expression directly, so what would this translate to?

f :: Set Char -> Set Char
f s@($(dataToPatQ (const Nothing) (fromList "test"))) = s

The only way I could envision this working is if it were translated to something like:

f :: Set Char -> Set Char
f x | x == fromList "test" = x

which is reminiscent of how pattern-matching on literals works, I suppose.

comment:7 Changed 2 years ago by goldfire

Good point. But I'm not terribly concerned here. The current behavior of dataToPatQ in this situation (when it sees a function instead of a constructor) is to create some bogus Haskell. The new behavior of dataToPatQ in this situation will be to create some bogus Haskell. The bogus Haskell changes, but I don't think it's any worse. And, as you say, with this change, we could improve dataToExpQ to create correct Haskell.

Perhaps better would be for calls to dataToQa to know when it's working with a pattern and fail more gracefully. That would indeed be an improvement.

But I don't think that we should try too hard here. The Data instance involved is lying, calling something a constructor when it's not. If we can accommodate the lie easily -- as we can in dataToExpQ -- then great. Otherwise, I think it's OK to fail.

comment:8 Changed 2 years ago by RyanGlScott

Differential Rev(s): Phab:D1313
Owner: set to RyanGlScott

I can agree that it's probably not worth the effort to make such patterns work, especially since we'd have to convert the subpatterns to expressions to make this work, which would probably require a rewrite of dataToQa... I'll just stick to the dataToExpQ case.

Phab:D1313 includes a simple test case. I also compiled regex-tdfa-quasiquoter against these changes, and [re|;[a]|] now splices successfully.

Last edited 2 years ago by RyanGlScott (previous) (diff)

comment:9 Changed 2 years ago by RyanGlScott

On the discussion for Phab:D1313, Richard noted that in order to fix this bug properly, it would be necessary for template-haskell to know precisely which identifiers correspond to variables, and which correspond to data constructors. This can be surprisingly tricky, since a litmus test for variables vs. constructors must take into account infix symbols and special characters (e.g., Japanese katakana or hiragana).

Such a task seems best left to a function separate from dataToQa, but which one? There's startsVarSym, startsVarId, startsConSym, and startsConId from Lexeme in GHC, but template-haskell can't depend on GHC. We ought not move these to base either, since that we need this functionality to compile stage 1.

Therefore, it seems apt to move this to a package which both ghc and template-haskell could depend on. Luckily, there's some momentum behind this already: Simon PJ suggested reappropriating the bin-package-db package to become a library which contains functionality that is needed across multiple GHC boot libraries (including GHC itself). Less fortunately, we'd probably need to give it a new name to reflect this.

Here is what I propose to resolve this quagmire:

  1. Rename bin-package-db to ghc-common (other naming suggestions are welcome).
  2. Move startsVarSym, startsVarId, startsConSym, startsConId, startsVarSymASCII, and isVarSymChar from Lexeme to ghc-common. There are other things that could be moved, but these would be enough to fix this ticket, and don't rely on a GHC internal data type (e.g., FastString).
  3. Make ghc and template-haskell depend on ghc-common. No other changes to ghc should be necessary.
  4. Use startsVarSym and startsVarId to check if a string corresponds to a variable name within template-haskell.

comment:10 Changed 2 years ago by goldfire

I approve this plan.

Name suggestion: ghc-boot. It reflects the reality of the situation: it's a package needed "before" ghc proper. And it seems to wave a big flag saying "I'm an internal package. Don't look at me!"

comment:11 Changed 2 years ago by Austin Seipp <austin@…>

In d2f9972/ghc:

Make dataToQa aware of Data instances which use functions to implement toConstr

Trac #10796 exposes a way to make `template-haskell`'s `dataToQa` function
freak out if using a `Data` instance that produces a `Constr` (by means of
`toConstr`) using a function name instead of a data constructor name. While
such `Data` instances are somewhat questionable, they are nevertheless present
in popular libraries (e.g., `containers`), so we can at least make `dataToQa`
aware of their existence.

In order to properly distinguish strings which represent variables (as opposed
to data constructors), it was necessary to move functionality from `Lexeme` (in
`ghc`) to `GHC.Lexeme` in a new `ghc-boot` library (which was previously named
`bin-package-db`).

Reviewed By: goldfire, thomie

Differential Revision: https://phabricator.haskell.org/D1313

GHC Trac Issues: #10796

comment:12 Changed 2 years ago by thoughtpolice

Resolution: fixed
Status: newclosed

Thanks Ryan - I think this should be handled now, so closing this.

comment:13 Changed 2 years ago by Thomas Miedema <thomasmiedema@…>

In 3340fe01/ghc:

Build system: fix `make -j1` (#10973)

There are multiple hacks all over the build system to account for the
fact that the ghc package uses different build subdirectories
(stage1/stage2) than the other packages (dist/dist-install).

One such hack filtered on 'ghc%', with the intention of filtering the
ghc package only. After renaming bin-package-db to ghc-boot
(d2f9972a35ce05ceb8a78893e433ef1df06f73ef, Phab:D1313, #10796), ghc-boot
also got caught in the hack, which broke the build when running without
parallelism.

This patch replaces the before mentioned hack by a different one, such
that filtering on 'ghc%' is no longer necessary. See Note [inconsistent
distdirs].

Reviewed by: austin

Differential Revision: https://phabricator.haskell.org/D1333
Note: See TracTickets for help on using tickets.