Opened 5 years ago

Closed 3 years ago

#6016 closed bug (fixed)

On Windows, runhaskell hits an error on UTF-8 files with a BOM

Reported by: vsajip Owned by:
Priority: normal Milestone: 7.10.1
Component: Compiler (Parser) Version: 7.0.4
Keywords: BOM Cc: dagitj@…
Operating System: Unknown/Multiple Architecture: Unknown/Multiple
Type of failure: GHC rejects valid program Test Case:
Blocked By: Blocking:
Related Tickets: #1744 Differential Rev(s): Phab:D176
Wiki Page:

Description (last modified by thomie)

The file

#!/usr/bin/env runhaskell
main = putStrLn "Hello, world!"

works as expected:

C:\Temp>runhaskell hello.hs
Hello, world!

However, if the file is saved as UTF-8 with a BOM (Windows Notepad, for example, sometimes adds this BOM to files), an error occurs:

C:\Temp>runhaskell hello2.hs

hello2.hs:1:1: parse error on input `#!/'

I'm using the Haskell Platform 2011.4.0.0.

I believe that runhaskell/runghc should handle the presence of a BOM correctly; some Windows programs insert a BOM unbeknownst to the user.

This behaviour was observed on Windows XP (32-bit) and Windows 7 (32-bit and 64-bit).

Attachments (2)

hello.hs (60 bytes) - added by vsajip 5 years ago.
Script which works (no BOM)
hello2.hs (61 bytes) - added by vsajip 5 years ago.
Script which fails (with BOM)

Download all attachments as: .zip

Change History (10)

Changed 5 years ago by vsajip

Attachment: hello.hs added

Script which works (no BOM)

Changed 5 years ago by vsajip

Attachment: hello2.hs added

Script which fails (with BOM)

comment:1 Changed 5 years ago by pcapriotti

difficulty: Unknown
Milestone: 7.6.1

Thanks for the report.

comment:2 Changed 5 years ago by igloo


comment:3 Changed 5 years ago by dagit

Cc: dagitj@… added

comment:4 Changed 3 years ago by thoughtpolice


Moving to 7.10.1.

comment:5 Changed 3 years ago by thomie

Description: modified (diff)
Operating System: WindowsUnknown/Multiple

comment:6 Changed 3 years ago by thomie

Differential Rev(s): Phab:D176
Status: newpatch

comment:7 Changed 3 years ago by Austin Seipp <austin@…>

In 9e939403241b758a685834c9ff62edcd3172a2cf/ghc:

StringBuffer should not contain initial byte-order mark (BOM)

Just skipping over a BOM, but leaving it in the Stringbuffer, is not
sufficient. The Lexer calls prevChar when a regular expression starts
with '^' (which is a shorthand for '\n^'). It would never match on the
first line, since instead of '\n', prevChar would still return '\xfeff'.

Test Plan: validate

Reviewers: austin, ezyang

Reviewed By: austin, ezyang

Subscribers: simonmar, ezyang, carter

Differential Revision:

GHC Trac Issues: #6016

comment:8 Changed 3 years ago by thomie

Resolution: fixed
Status: patchclosed
Note: See TracTickets for help on using tickets.