Opened 8 years ago

Closed 7 years ago

#3455 closed proposal (wontfix)

Add a setting to change how Unicode encoding errors are handled

Reported by: judahj Owned by:
Priority: normal Milestone: Not GHC
Component: libraries/base Version: 6.10.4
Keywords: Cc:
Operating System: Unknown/Multiple Architecture: Unknown/Multiple
Type of failure: None/Unknown Test Case:
Blocked By: Blocking:
Related Tickets: Differential Rev(s):
Wiki Page:

Description

I proposal that we augment ghc-6.12.1's support for Unicode Handles by adding the following functions to System.IO:

hSetOnEncodingError :: Handle -> OnEncodingError -> IO ()
hGetOnEncodingError :: Handle -> IO OnEncodingError

as well as the enumeration OnEncodingError with three constructors:

  • ThrowEncodingError: Throw an exception at the first encoding or decoding error.
  • SkipEncodingError: Skip all invalid bytes or characters.
  • TranslitEncodingError: Replace undecodable bytes with u+FFFD, and unencodable characters with '?'.

I have implemented this functionality in the attached patch. Haddock docs are here: http://code.haskell.org/~judah/new-io-docs/System-IO.html#23

The choice of error handler is orthogonal to the choice of encoder. Additionally, the same setting is used for both read and write modes. For portability, the handlers are written in pure Haskell rather than using GNU iconv's TRANSLIT feature.

Note that the text package, for example, provides more sophisticated error-handling options. However, I think the above choices are useful enough without making the API too complicated.

Discussion deadline: September 9

Haddock docs: http://code.haskell.org/~judah/new-io-docs/System-IO.html#23

Attachments (1)

encoding-error-handlers.dpatch (38.5 KB) - added by judahj 8 years ago.

Download all attachments as: .zip

Change History (4)

Changed 8 years ago by judahj

comment:1 Changed 8 years ago by igloo

difficulty: Unknown
Milestone: Not GHC

comment:2 Changed 8 years ago by simonmar

It looks like the main question here is whether the IOError should be returned explicitly (as in your patch), or whether we should just catch the exception. All things being equal, catching the exception would be simpler, as it wouldn't require any changes in the codecs. Is there a reason why you didn't do it that way? Perhaps because you want to be sure that the exception is really an encoding error, and not some other kind of exception? If that's the case, then we should introduce a new exception for encoding errors (that's probably a good idea anyway).

comment:3 Changed 7 years ago by igloo

Resolution: wontfix
Status: newclosed
Type of failure: None/Unknown

Looks like an abandoned proposal. If that's not the case, please re-open and give a discussion summary and consensus decision, as described on the process page.

Note: See TracTickets for help on using tickets.