Show instance of Char should print literals for non-ascii printable charcters

changed weight to 5

One of application that is broken by this change, is when a customized Show instance of a type is controlled by other variables in that type. For example, the following code simulates a press code that respects privacy for people of age under 20.

data Sex = Male | Female
data Person = Person {name :: String, age :: Int, sex :: Sex}

instance Show Person where
  show (Person _ a Male  ) | a < 20 = "A boy (" ++ show a ++ ")"
  show (Person _ a Female) | a < 20 = "A girl (" ++ show a ++ ")"
  show (Person n a _     )          = n

assert $ show (Person "村主崇行" 19 Male) == "A boy (19)"
assert $ show (Person "村主崇行" 20 Male) == "\26449\20027\23815\34892"

I'm very looking forward to learn other drawbacks of this change.

Absolutely any code in the entire world that relies on the current behavior will break. The current behavior is expressed in the reference implementation in the Haskell 2010 report. Frankly, changing it is not an option. You can write your own function to unescape valid Unicode. You can also write your own UShow class if you like with a method for showing various things using Unicode generally. You can then try to convince other developers to depend on your package and write instances of your class.

Dear dfeuer, thank you for pointing out that the Show Char is specified in Haskell 2010. I believe the corresponding section is the following:

https://www.haskell.org/onlinereport/haskell2010/haskellch16.html#x24-21700016.6

16.6 String representations

showLitChar :: Char -> ShowS
    Convert a character to a string using only printable characters, 
    using Haskell source-language escape conventions. For example:

     showLitChar '\n' s  =  "\\n" ++ s

where "Haskell source-language escape conventions" are defined, in turn, in Section 2.6 https://www.haskell.org/onlinereport/haskell2010/haskellch2.html#x7-200002.6 .

Correct me if I'm wrong.

You can put something like this in your .ghci file:

:seti -XScopedTypeVariables

:{
let myShow :: Show a => a -> String
    myShow x = go (show x) where
      go :: String -> String
      go [] = []
      go s@(x:xs) = case x of
          '\"' -> '\"' : str ++ "\"" ++ go rest
          '\'' -> '\'' : char : '\'' : go rest'
          _    -> x : go xs
        where
          (str :: String, rest):_ = reads s
          (char :: Char, rest'):_ = reads s
:}

:{
let myPrint :: Show a => a -> IO ()
    myPrint = putStrLn . myShow
:}

:set -interactive-print=myPrint

Example:

Prelude> [(++"の父"), (++"の母")] <*> ["田中", "山田"]
["田中の父","山田の父","田中の母","山田の母"]

closed

Dear thomie, thank you for your comment. Yes, -interactive-print is a great feature! I regret that I was not able to search out this has been done for years.

There are also several customized show function proposed, like myShow here. However, when I used it in some detail, I found that printing in Unicode has many corner cases that are more difficult than it seems .... As far as I have searched, I cannot find a unicode-printing function that satisfies read . unicode_show == id for sufficiently many types. For example, https://gist.github.com/nushio3/4a10f3c0092295696daf

Thus, I decided to start a small package -interactive-printing. http://hackage.haskell.org/package/unicode-show I wish this helps many people enjoy Haskell!

Trac metadata

Trac field	Value
Resolution	Unresolved → ResolvedInvalid

By the way, now I know that this issue was a language feature, rather than lack of implementation, I think it is proper to close this ticket.

GHCi could be changed to show unicode characters nicely by default. The code is in the function tcUserStmt in compiler/typecheck/TcRnDriver.hs.

Expressions:

        -- The plans are:
        --   A. [it <- e; print it]     but not if it::()
        --   B. [it <- e]
        --   C. [let it = e; print it]

Statements:

        -- The plans are:
        --      [stmt; print v]         if one binder and not v::()
        --      [stmt]                  otherwise

Replace print by putStrLn . uShow, with a suitable uShow. That shouldn't break anyone's code.

reopened

Oh, silly me, print in tcUserStmt of course uses that interactive printer setting I mentioned in ticket:11529#comment:115501.

So my suggestion is to change the default interactive printer to display unicode characters nicely.

Trac metadata

Trac field	Value
Resolution	ResolvedInvalid → Unresolved

There is also a bug, ghci -fprint-bind-result doesn't use the interactive printer for statements:

Prelude> :set -fprint-bind-result
Prelude> let x = "の父"
"\12398\29238"
Prelude> "の父"
"の父"

(+1) to suggestion that to change the default interactive printer to display unicode characters nicely. The algorithm in unicode-show might be suitable for the purpose, although there should be various opinions on what is the "nice way to print unicode."

By the way, if we update the default interactive printer, will we be breaking the doctests that shows values with unicodes, forcing them to update the expected results from the interpreter?

I would love for something like ticket:11529#comment:115501 to become the default in ghci. It could even be simpler/stupider and just replace any sequence like \12345 with the corresponding Unicode character wherever it appears. I mean when would you ever have such a string in the output of show, short of a weird custom Show instance? And it would be more robust to other weird custom Show instances, that used quotes in an unbalanced fashion.

I don't think we should replace \n or \ESC or especially \\ though. Just printable Unicode characters outside the ASCII range, probably. And we could decline to do the replacement if the replacement character can't be encoded in the user's locale.

One drawback is that the user's font might not contain the Unicode characters in question, like mine does not contain \12345. So there should probably be an option to disable these replacements.

This recently come up again on ghc-devs, https://mail.haskell.org/pipermail/ghc-devs/2016-March/011655.html. It has also come up repeatedly in the past as pointed out in that thread,

• 2016: https://mail.haskell.org/pipermail/haskell-cafe/2016-February/122874.html • 2012: http://stackoverflow.com/questions/14039726/how-to-make-haskell-or-ghci-able-to-show-chinese-characters-and-run-chinese-char • 2012 again: https://mail.haskell.org/pipermail/haskell-cafe/2012-July/102569.html • 2011: http://stackoverflow.com/questions/5535512/how-to-hack-ghci-or-hugs-so-that-it-prints-unicode-chars-unescaped • 2010: https://mail.haskell.org/pipermail/haskell-cafe/2010-August/082823.html

Replying to [ticket:11529#comment:115492 dfeuer]:

Absolutely any code in the entire world that relies on the current behavior will break. The current behavior is expressed in the reference implementation in the Haskell 2010 report. Frankly, changing it is not an option. You can write your own function to unescape valid Unicode. You can also write your own UShow class if you like with a method for showing various things using Unicode generally. You can then try to convince other developers to depend on your package and write instances of your class.

I disagree. I think, the current implementation is actually wrong and does not adhere to the standard. The standard states in 16.6 that showLitChar be defined as follows:

Convert a character to a string using only printable characters, using Haskell source-language escape conventions.

However, the current implementation of showLitChar fail to use isPrint; instead it uses a naive condition, c > '\DEL', to determine printability. This is wrong.

The solution is simple, replace the condition c > '\DEL' by not (isPrint c) in the definition of showLitChar.

isPrint does not answer the question "can this character be displayed by the current user given their current locale?". That would require it to be in IO, and would limit the ability to use it in other contexts.

isPrint answers the question "is the Unicode codepoint contained in the given Char considered printable by the version of the Unicode standard to which the runtime conforms?".

It is not the correct question to ask here.

Replying to [ticket:11529#comment:118694 allbery_b]:

isPrint does not answer the question "can this character be displayed by the current user given their current locale?". That would require it to be in IO, and would limit the ability to use it in other contexts.

isPrint answers the question "is the Unicode codepoint contained in the given Char considered printable by the version of the Unicode standard to which the runtime conforms?".

It is not the correct question to ask here.

It is, however, what the standard prescribes. IMHO it is also the right thing to do as it leads to less unexpected behaviour than the current implementation.

I believe there is an ambiguity in the specification

showLitChar :: Char -> ShowS
    "Convert a character to a string using only printable characters

whether "printable" means ASCII printable or Unicode printable. How shall we solve the ambiguity?

By the way, I use map fromEnum to investigate the content of the string when lack of appropriate font or when I am debugging a pretty printer.

added Pnormal label

Trac field	Value
Version	7.10.3
Type	Bug
TypeOfFailure	OtherFailure
Priority	normal
Resolution	Unresolved
Component	Compiler
Test case
Differential revisions
BlockedBy
Related
Blocking
CC
Operating system
Architecture

Show instance of Char should print literals for non-ascii printable charcters

Child items ...

Activity