Opened 5 years ago

Closed 5 years ago

#5512 closed bug (fixed)

UTF-16//ROUNDTRIP encoding behaves weirdly

Reported by: batterseapower Owned by:
Priority: normal Milestone: 7.4.1
Component: libraries/base Version: 7.2.1
Keywords: Cc:
Operating System: Unknown/Multiple Architecture: Unknown/Multiple
Type of failure: Incorrect result at runtime Test Case:
Blocked By: Blocking:
Related Tickets: Differential Rev(s):
Wiki Page:

Description (last modified by batterseapower)

Try this program:

module Main where

import System.IO

main = do
    roundtrip_enc <- mkTextEncoding "UTF16//ROUNDTRIP"
    h <- openFile "out.temp" WriteMode
    hSetEncoding h roundtrip_enc
    hPutStr h "Hi\xEFE8Hi"

It fails with:

hSetEncoding: invalid argument (Invalid argument)

If you change UTF16 to UTF-16 (so we use the builtin encoding rather than iconv) it works, but the output file only contains the first Hi.

I think part of what is going on here is that iconv does not generate EILSEQ for identity transformations such as that between a UTF-16 text file and our UTF-16 CharBuffers. Since we never get that exception, we can't fix up the lone surrogates we use to encode roundtrip characters.

Change History (4)

comment:1 Changed 5 years ago by batterseapower

Description: modified (diff)

comment:2 Changed 5 years ago by igloo

Milestone: 7.4.1

What's the expected output? I got a 0 byte output file, but if I add "hClose h" then I get

$ ls -l out.temp; hexdump -C out.temp
-rw-r--r-- 1 ian ian 11 Nov 10 01:01 out.temp
00000000  fe ff 00 48 00 69 e8 00  48 00 69                 |...H.i..H.i|

(HEAD, amd64/Linux)

comment:3 Changed 5 years ago by batterseapower

You are seeing exactly the expected output. My recent change to have mkTextEncoding try our Haskell TextEncodings before it falls back to iconv may have made this better.

In practice, we don't really care whether UTF-16ROUNDTRIP works because we only use ROUNDTRIP for the fileSystemEncoding (a modified localeEncoding), UTF-16 is not an ASCII superset, and IIRC the Posix standard requires the locale encoding to be an ASCII superset.

comment:4 Changed 5 years ago by batterseapower

Resolution: fixed
Status: newclosed

I've just realised that the first paragraph above is rubbish: I didn't change mkTextEncoding, only localeEncoding. So perhaps it was just a foolish missing hClose that was causing this behaviour. Sorry for the noise.

The second paragraph still stands.

Closing the ticket.

Note: See TracTickets for help on using tickets.