Opened 7 years ago
Last modified 11 hours ago
#5518 patch bug
Some unicode symbols are not allow in literal characters or strings
| Reported by: | ertai | Owned by: | ulysses4ever |
|---|---|---|---|
| Priority: | normal | Milestone: | |
| Component: | Compiler | Version: | |
| Keywords: | Cc: | ||
| Operating System: | Linux | Architecture: | x86_64 (amd64) |
| Type of failure: | None/Unknown | Test Case: | |
| Blocked By: | Blocking: | ||
| Related Tickets: | Differential Rev(s): | Phab:D5066 | |
| Wiki Page: |
Description
main = putChar 'ₖ'
This program is rejected with following error message: lexical error in string/character literal at character '\8342'
There is at least a few other characters with the same issue, for
instance this whole string should be accepted:
"ₕₖₗₘₙₒₚᵣₛₜᵤᵥₓ"
A related issue is that GHCi do not let me paste these characters either.
Attachments (1)
Change History (12)
comment:1 Changed 7 years ago by
comment:2 Changed 7 years ago by
| Status: | new → infoneeded |
|---|
It works for me:
$ hexdump -C q.hs 00000000 0a 6d 61 69 6e 20 3d 20 70 75 74 43 68 61 72 20 |.main = putChar | 00000010 27 e2 82 96 27 0a 0a |'...'..| 00000017 $ ghc -c q.hs $
Changed 7 years ago by
comment:3 Changed 7 years ago by
| Version: | 7.2.1 → 7.0.3 |
|---|
I reproduce the same file than igloo and I have the same output for hexdump.
However ghc -c q.hs yields:
q.hs:2:17:
lexical error in string/character literal at character '\8342'
(the GHC version I use is actually 7.0.3, I updated the ticket info)
echo $TERM rxvt-unicode-256color
echo $LANG en_US.UTF-8
comment:4 Changed 7 years ago by
I could reproduce the issue with ghc-7.0.3 and ghc-7.0.4.
I looked into this since it seemed to be affecting Haskeline too. The cause (for both problems) was that older versions of GHC support a older version of Unicode:
$ ghc-7.0.3 -e "Data.Char.generalCategory '\8342'" NotAssigned $ ghc-7.0.4 -e "Data.Char.generalCategory '\8342'" NotAssigned $ ghc-7.2.1 -e "Data.Char.generalCategory '\8342'" ModifierLetter
So if you want to use those characters, you will probably need to upgrade to ghc-7.2.1.
comment:6 Changed 7 years ago by
| Resolution: | → fixed |
|---|---|
| Status: | infoneeded → closed |
Yup, I can also reproduce it with 7.0.2 but not 7.2.1.
comment:7 Changed 29 hours ago by
Similarly, with ghc 8.2.2 (debian), this is not accepted:
main = putChar '🥖'
That's U+1F956 baguette. ghc says:
lexical error in string/character literal at character '\129366'
My system is fully utf-8 enabled and the original problem character works ok.
Guess this is just lag getting the unicode character tables updated. However, while it seems reasonable for ghc to not let me define a function eg
(🥖) = (</>)
since it doesn't know what kind of symbol baguette is, it seems much less reasonable to not accept any unicode inside a string.
comment:9 Changed 11 hours ago by
| Differential Rev(s): | → Phab:D5066 |
|---|---|
| Resolution: | fixed |
| Status: | closed → new |
| Version: | 7.0.3 |
I renewed the Unicode tables as described here, and this fixed the issue. Merge?
comment:10 Changed 11 hours ago by
| Owner: | set to ulysses4ever |
|---|
comment:11 Changed 11 hours ago by
| Status: | new → patch |
|---|

GHC requires that source files be encoded in UTF-8. Can you please check whether that's the case for your program? If you're not sure or if that didn't fix the problem, can you please attach the bad program to this ticket?
For ghci: What terminal are you using (e.g. xterm, urxvt, etc.)? Also, please let us know the results of running these commands in that terminal: