bytestring: incorrect documentation for hGetContents
(I guess this is the ByteString
bug tracker? This copies an email I sent directly to Don Stewart and Duncan Coutts.)
The current documentation [1] for Data.ByteString.hGetContents
says:
"As with hGet
, the string representation in the file is assumed to be ISO-8859-1."
This confuses me -- why would ByteString
care what the "string representation" of a file is, or that it even has one?
Perhaps it's for the benefit of this function's appearance in ByteString.Char8
. But the caveats of that module are widely documented throughout, and it's a hack that most people shouldn't use. I think the docs for hGetContents
should reflect the "main" module and its intended use on possibly opaque binary data.
Don Stewart suggested that this is "a property of the underlying handle", but I have confirmed that ByteString
is insensitive to the handle's designated encoding.
$ cat test-bytestring-handle-encoding.hs
import System.IO
import qualified Data.ByteString as B
test :: TextEncoding -> IO ()
test enc = do
hdl <- openFile "snowman" ReadMode
hSetEncoding hdl enc
B.hGetContents hdl >>= (print . B.unpack)
hClose hdl
main :: IO ()
main = do
test utf8
test utf32
test latin1
$ ghc -V
The Glorious Glasgow Haskell Compilation System, version 7.0.4
$ ghc-pkg list bytestring
/var/lib/ghc/package.conf.d
bytestring-0.9.1.10
$ echo -ne '\xe2\x98\x83' > snowman
$ hexdump -C snowman
00000000 e2 98 83 |...|
00000003
$ runhaskell test-bytestring-handle-encoding.hs
[226,152,131]
[226,152,131]
[226,152,131]
Trac metadata
Trac field | Value |
---|---|
Version | |
Type | Bug |
TypeOfFailure | OtherFailure |
Priority | normal |
Resolution | Unresolved |
Component | libraries (other) |
Test case | |
Differential revisions | |
BlockedBy | |
Related | |
Blocking | |
CC | |
Operating system | |
Architecture |