Opened 9 years ago

Closed 4 years ago

Last modified 9 months ago

#2507 closed feature request (fixed)

quotation characters in error messages

Reported by: Isaac Dupree Owned by:
Priority: lowest Milestone: 7.8.1
Component: Compiler Version: 6.8.3
Keywords: Cc: dterei
Operating System: Unknown/Multiple Architecture: Unknown/Multiple
Type of failure: None/Unknown Test Case:
Blocked By: Blocking:
Related Tickets: #2811,#3398 Differential Rev(s):
Wiki Page:

Description

(wasn't there a ticket for this already?)

Currently identifiers etc. are quoted like this, with the "grave accent" and symmetric single-quote characters:

    Ambiguous type variable `m' in the constraint:
      `Monad m' arising from a use of `>>=' at gw.hs:6:47-71

This is not only an incorrect use of the "grave accent", but can be confusing when an identifier-name contains the prime symbol which is the same as the character used here to end the quote.

What should we do? Well, I just noticed that gcc-4.2.3 uses the Unicode begin-single-quote and end-single-quote characters for the purpose (and it actually looks quite nice on my terminal). If GCC was willing to do it, perhaps we should be too! To be precise, it uses them in my default locale, "en_US.UTF-8", which must have been the Ubuntu default that I didn't even remember I had. With env LANG=C, GCC emits ASCII single-quotes for both the begin and the end.

> cat errory.c 
syntax error
> gcc errory.c 
errory.c:1: error: expected ‘=’, ‘,’, ‘;’, ‘asm’ or ‘__attribute__’ before ‘error’
> env LANG=C gcc errory.c 
errory.c:1: error: expected '=', ',', ';', 'asm' or '__attribute__' before 'error'

I propose copying GCC's behavior (which might involve first looking into exactly what its behavior is in the general case).

Attachments (1)

0001-Use-U-2018-instead-of-U-201B-quote-mark-in-compiler-.patch.gz (105.6 KB) - added by hvr 4 years ago.

Download all attachments as: .zip

Change History (39)

comment:1 Changed 9 years ago by simonpj

difficulty: Unknown

I avoid discussions involving the word "unicode". I'm certainly not in principle opposed to the change suggested above, although I worry a bit that it might give us a new portability headache.

The comment I wanted to add is that the bracketing single-quotes in GHC's error message are (pretty much without exception I think) added by a single function, Outputable.quotes. So if someone figures out the details, actually making the change should be easy.

Simon

comment:2 Changed 9 years ago by simonmar

Milestone: 6.10 branch

Yes, I also noticed that gcc now uses the correct single quote characters, because the font I'm using in my xterm doesn't have those glyphs so they appear as boxes :-)

It is the right thing to do, but it needs to wait until at least we have locale encoding/decoding support in Handle I/O.

comment:3 Changed 9 years ago by simonmar

Architecture: UnknownUnknown/Multiple

comment:4 Changed 9 years ago by simonmar

Operating System: UnknownUnknown/Multiple

comment:5 Changed 9 years ago by igloo

Milestone: 6.10 branch6.12 branch

comment:6 Changed 8 years ago by igloo

See also #3398.

comment:7 Changed 8 years ago by igloo

Milestone: 6.12 branch6.12.3

comment:8 Changed 8 years ago by igloo

Milestone: 6.12.36.14.1
Priority: normallow

comment:9 Changed 7 years ago by igloo

Milestone: 7.0.17.0.2

comment:10 Changed 7 years ago by igloo

Milestone: 7.0.27.2.1

comment:11 Changed 7 years ago by dterei

Cc: dterei added
Type of failure: None/Unknown

comment:12 Changed 6 years ago by igloo

Milestone: 7.2.17.4.1

comment:13 Changed 6 years ago by igloo

Milestone: 7.4.17.6.1
Priority: lowlowest

comment:14 Changed 5 years ago by igloo

Milestone: 7.6.17.6.2

comment:15 Changed 5 years ago by morabbin

Since #2811 is closed:fixed, ought this be doable?

comment:16 Changed 5 years ago by ian@…

commit e2bea6019fd523d4b6061174b114c49f55fa981c

Author: Ian Lynagh <ian@well-typed.com>
Date:   Sun Feb 24 00:26:07 2013 +0000

    Use unicode quote characters in error messages etc; fixes #2507
    
    We only use the unicode characters if the locale supports them.

 compiler/main/DynFlags.hs      |   15 ++++++++++++++-
 compiler/main/DynFlags.hs-boot |    1 +
 compiler/utils/Outputable.lhs  |    7 ++++++-
 3 files changed, 21 insertions(+), 2 deletions(-)

comment:17 Changed 5 years ago by igloo

Resolution: fixed
Status: newclosed

Done

comment:18 in reply to:  17 ; Changed 4 years ago by refold

Replying to igloo:

Done

It looks like it's not possible to disable unicode quotes with LANG=C (at least on my system, Ubuntu 12.04):

$ env LANG=C ~/bin/ghc-head/bin/ghci
Prelude GHC.IO.Encoding GHC.Foreign System.IO> let str = "‛’"
> let enc = localeEncoding
> (withCString enc str $ \cstr -> do { str' <- peekCString enc cstr; return (str == str') })
True

comment:19 in reply to:  18 Changed 4 years ago by refold

Replying to refold:

Replying to igloo:

Done

It looks like it's not possible to disable unicode quotes with LANG=C

Looks like one must use LC_ALL instead.

comment:20 Changed 4 years ago by igloo

It works here:

$ ghc -c q.hs 

q.hs:1:1: The IO action ‛main’ is not defined in module ‛Main’

$ LANG=C ghc -c q.hs

q.hs:1:1: The IO action `main' is not defined in module `Main'

I suspect that you have LC_CTYPE or LC_ALL set to a different locale, and that is overriding LANG.

comment:21 in reply to:  20 Changed 4 years ago by refold

Replying to igloo:

I suspect that you have LC_CTYPE or LC_ALL set to a different locale, and that is overriding LANG.

Yes, that's the case.

comment:22 Changed 4 years ago by goldfire

Resolution: fixed
Status: closednew

Test T2507 is failing both on MacOS 10.7.5 and 10.8.4. I have no special locale settings, to my knowledge.

comment:23 Changed 4 years ago by goldfire

Here is the diff in the output:

 T2507.hs:5:7:
-    Couldn't match expected type `Int' with actual type `()'
+    Couldn't match expected type ‛Int’ with actual type ‛()’
     In the expression: ()
-    In an equation for `foo': foo = ()
+    In an equation for ‛foo’: foo = ()

comment:24 Changed 4 years ago by ezyang

This is because in OS X, the C locale has now been aliased to UTF-8, so quotes are enabled. I guess this is fine, but we'll need to find a different locale to force non-Unicode quotes...

comment:25 Changed 4 years ago by ezyang

Here is a Bison developer running into a similar bug: http://lists.gnu.org/archive/html/bug-bison/2012-01/msg00120.html

Unfortunately, he doesn't say what one ought to do to force ASCII output.

comment:26 Changed 4 years ago by ezyang

I suggest we just punt the test entirely.

comment:27 Changed 4 years ago by goldfire

It might be worth noting that when I run ghc in a bash shell from within Emacs on my Mac, the Unicode quotes cause formatting codes to be printed in place of the quotes. For example:

    The type signature for \342\200\233foo\342\200\231 lacks an accompanying binding
      (The type signature must be given where \342\200\233foo\342\200\231 is declared)

I couldn't seem to disable Unicode printing in this case, which mattered, as I was planning on using ghc in this configuration for a presentation. As it turned out, I found an emacs-specific solution (included below for the curious; I forget the source, sorry), but I can imagine a scenario where there's no easy solution to this problem. Perhaps use something like -fno-unicode-quotes?

Emacs solution that worked for me, added to my .emacs file:

(setq locale-coding-system 'utf-8)
(set-terminal-coding-system 'utf-8-unix)
(set-keyboard-coding-system 'utf-8)
(set-selection-coding-system 'utf-8)
(prefer-coding-system 'utf-8)

comment:28 Changed 4 years ago by simonpj

I've been putting up with the same goop in my emacs buffer for some time; thanks for the incantations to add to my .emacs, Richard.

If this happens a lot, a flag to get rid of the unicode quotes would be a Good Thing. (It could be a static flag I guess.)

Simon

comment:29 in reply to:  28 Changed 4 years ago by jstolarek

Replying to simonpj:

(It could be a static flag I guess.)

Oh, I thought that there is a long-term goal of removing static flags altogether?

comment:30 Changed 4 years ago by ErlendH

This patch uses the following symbols:

  • U+201B SINGLE HIGH-REVERSED-9 QUOTATION MARK
  • U+2019 RIGHT SINGLE QUOTATION MARK

Shouldn't the first one be U+2018 LEFT SINGLE QUOTATION MARK to match the right quotation mark? Right now it looks really unbalanced with the fonts I tested. Is this intentional?

Here's a screenshot, using Microsoft's Source Code Pro font:

http://i.imgur.com/RkwwPB6.png

comment:31 Changed 4 years ago by hvr

Status: newpatch

comment:32 Changed 4 years ago by Herbert Valerio Riedel <hvr@…>

In 018676c7f883886b388652c913c99a10d2591b0b/ghc:

Use U+2018 instead of U+201B quote mark in compiler messages

This matches GCC's choice of Unicode quotation marks (i.e. U+2018 and U+2019)
and therefore looks more familiar on the console. This addresses #2507.

Signed-off-by: Herbert Valerio Riedel <hvr@gnu.org>

comment:33 Changed 4 years ago by hvr

Status: patchmerge

I'm setting this ticket to merge to have this considered for merging into GHC 7.8 (meaning, I'll leave the decision up to Austin :-)

comment:34 Changed 4 years ago by hvr

FYI, just found this related article ASCII and Unicode quotation marks, which is summarised as follows:

Please do not use the ASCII grave accent (0x60) as a left quotation mark together with the ASCII apostrophe (0x27) as the corresponding right quotation mark (as in `quote'). Your text will otherwise appear rather strange with most modern fonts (e.g., on Windows and Mac systems). Only old X Window System fonts and some old video terminals show ASCII 0x60/0x27 as left and right quotation marks, while most modern systems follow the ISO and Unicode standards instead.

If you can use only ASCII’s typewriter characters, then use the apostrophe character (0x27) as both the left and right quotation mark (as in 'quote'). If you can use Unicode characters, nice directional quotation marks are available in the form of characters U+2018, U+2019, U+201C, and U+201D (as in ‘quote’ or “quote”).

comment:35 in reply to:  33 Changed 4 years ago by hvr

Resolution: fixed
Status: mergeclosed

Replying to hvr:

I'm setting this ticket to merge to have this considered for merging into GHC 7.8 (meaning, I'll leave the decision up to Austin :-)

after conferring with Austin, I've merged this into ghc-7.8 as [ebb9bd36b80040dc/ghc] (and release notes in [939fe6c827a6/ghc]).

Last edited 4 years ago by hvr (previous) (diff)

comment:36 Changed 4 years ago by hvr

Milestone: 7.6.27.8.1

comment:37 Changed 9 months ago by pacak

Benefits: Looks pretty.

Disadvantages: If LANG is not set any attempts of reading ghc output by a haskell program you get this instead:

hGetContents: invalid argument (invalid byte sequence)

I think I've wasted much more time debugging this problem (in my scenario things are still broken) than actually looking at those pretty quotes.

comment:38 Changed 9 months ago by bgamari

pacak, if there is an issue can you open a ticket? I'm afraid I don't know what you mean and this ticket is already closed. Do feel free to leave a reference to your new ticket here, however.

Note: See TracTickets for help on using tickets.