Opened 3 years ago

Closed 3 years ago

#5518 closed bug (fixed)

Some unicode symbols are not allow in literal characters or strings

Reported by: ertai Owned by:
Priority: normal Milestone:
Component: Compiler Version: 7.0.3
Keywords: Cc:
Operating System: Linux Architecture: x86_64 (amd64)
Type of failure: None/Unknown Difficulty:
Test Case: Blocked By:
Blocking: Related Tickets:

Description

main = putChar 'ₖ'

This program is rejected with following error message:
lexical error in string/character literal at character '\8342'

There is at least a few other characters with the same issue, for
instance this whole string should be accepted:
"ₕₖₗₘₙₒₚᵣₛₜᵤᵥₓ"

A related issue is that GHCi do not let me paste these characters either.

Attachments (1)

q.hs (23 bytes) - added by ertai 3 years ago.

Download all attachments as: .zip

Change History (7)

comment:1 Changed 3 years ago by judahj

GHC requires that source files be encoded in UTF-8. Can you please check whether that's the case for your program? If you're not sure or if that didn't fix the problem, can you please attach the bad program to this ticket?

For ghci: What terminal are you using (e.g. xterm, urxvt, etc.)? Also, please let us know the results of running these commands in that terminal:

echo $TERM
echo $LANG

comment:2 Changed 3 years ago by igloo

  • Status changed from new to infoneeded

It works for me:

$ hexdump -C q.hs
00000000  0a 6d 61 69 6e 20 3d 20  70 75 74 43 68 61 72 20  |.main = putChar |
00000010  27 e2 82 96 27 0a 0a                              |'...'..|
00000017
$ ghc -c q.hs
$

Changed 3 years ago by ertai

comment:3 Changed 3 years ago by ertai

  • Version changed from 7.2.1 to 7.0.3

I reproduce the same file than igloo and I have the same output for hexdump.

However ghc -c q.hs yields:

q.hs:2:17:

lexical error in string/character literal at character '\8342'

(the GHC version I use is actually 7.0.3, I updated the ticket info)

echo $TERM
rxvt-unicode-256color

echo $LANG
en_US.UTF-8

comment:4 Changed 3 years ago by judahj

I could reproduce the issue with ghc-7.0.3 and ghc-7.0.4.

I looked into this since it seemed to be affecting Haskeline too. The cause (for both problems) was that older versions of GHC support a older version of Unicode:

$ ghc-7.0.3 -e "Data.Char.generalCategory '\8342'"
NotAssigned
$ ghc-7.0.4 -e "Data.Char.generalCategory '\8342'"
NotAssigned
$ ghc-7.2.1 -e "Data.Char.generalCategory '\8342'"
ModifierLetter

So if you want to use those characters, you will probably need to upgrade to ghc-7.2.1.

comment:5 Changed 3 years ago by ertai

Ok, thank you.

comment:6 Changed 3 years ago by igloo

  • Resolution set to fixed
  • Status changed from infoneeded to closed

Yup, I can also reproduce it with 7.0.2 but not 7.2.1.

Note: See TracTickets for help on using tickets.