text decoding doesn't use recover on eof
ghc-7.2.1 provides a way for TextEncodings
to recover from decoding errors. However, that functionality does not work for incomplete byte sequences at the end of a file; in that case, it throws an error regardless of the recovery function. This is a problem since it makes it difficult to ensure that a program won't throw an exception on bad input.
Reproduction steps:
ghc --make GetChar.hs
ghc -e "Data.ByteString.hPut System.IO.stdout (Data.ByteString.pack [200])" | ./GetChar
where GetChar.hs
is the following module:
{-# LANGUAGE RecordWildCards #-}
./GetChar
module Main where
import System.IO
import GHC.IO.Encoding
import GHC.IO.Encoding.Failure
main = do
mkRecoveringLocaleEncoding "UTF-8" >>= hSetEncoding stdin
getChar >>= print
mkRecoveringLocaleEncoding :: String -> IO TextEncoding
mkRecoveringLocaleEncoding name = do
enc <- mkTextEncoding name
return $ case enc of
TextEncoding {..} -> TextEncoding {
mkTextDecoder = fmap (setRecover $ recoverDecode TransliterateCodingFailure)
mkTextDecoder,
mkTextEncoder = fmap (setRecover $ recoverEncode TransliterateCodingFailure)
mkTextEncoder,..
}
where
setRecover r x = x { recover = r }
Result:
GetChar: <stdin>: hGetChar: invalid argument (invalid byte sequence for this encoding)
In the course of investigating the issue, I found the following comment near the definition of GHC.IO.Handle.streamEncode:
-- FIXME: we should use recover to deal with EOF, rather than always throwing an
-- IOException (ioe_invalidCharacter).
So I guess this ticket records my vote to fix that problem.
Trac metadata
Trac field | Value |
---|---|
Version | 7.2.1 |
Type | Bug |
TypeOfFailure | OtherFailure |
Priority | normal |
Resolution | Unresolved |
Component | Compiler |
Test case | |
Differential revisions | |
BlockedBy | |
Related | |
Blocking | |
CC | |
Operating system | |
Architecture |