Opened 5 years ago

Closed 4 years ago

#7522 closed bug (fixed)

segfault when ignoring invalid byte sequence when decoding UTF8//IGNORE

Reported by: ganesh Owned by: batterseapower
Priority: high Milestone: 7.6.2
Component: Compiler Version: 7.6.1
Keywords: Cc:
Operating System: Unknown/Multiple Architecture: Unknown/Multiple
Type of failure: Runtime crash Test Case:
Blocked By: Blocking:
Related Tickets: Differential Rev(s):
Wiki Page:


The code below segfaults on a variety of platforms and GHC versions - I've tried 7.4.1 and 7.6.1, on Windows and Linux.

It seems to be related to (a) the specific choice of UTF8 - doesn't happen with UTF16 or UTF32 etc and (b) having the invalid byte sequence at the end of the thing being decoded.

import qualified Data.ByteString as B
import System.IO

tempFile = "temp"

main = do
   utf8Ignore <- mkTextEncoding "UTF8//IGNORE"
   B.writeFile tempFile (B.pack [128])
   h <- openFile tempFile ReadMode
   hSetEncoding h utf8Ignore
   hGetContents h >>= putStrLn

Change History (2)

comment:1 Changed 5 years ago by simonmar

difficulty: Unknown
Milestone: 7.6.2
Owner: set to batterseapower
Priority: normalhigh

I suggest Max might be the best person to look at this, he implemented the support for //IGNORE.

comment:2 Changed 4 years ago by batterseapower

Resolution: fixed
Status: newclosed

Fixed and tested in 1ac38ef6e9decc3f4763848f3d43c0cc68d1d390 of the base library.

Note: See TracTickets for help on using tickets.