Opened 3 years ago

Closed 7 months ago

#6016 closed bug (fixed)

On Windows, runhaskell hits an error on UTF-8 files with a BOM

Reported by: vsajip Owned by:
Priority: normal Milestone: 7.10.1
Component: Compiler (Parser) Version: 7.0.4
Keywords: BOM Cc: dagitj@…
Operating System: Unknown/Multiple Architecture: Unknown/Multiple
Type of failure: GHC rejects valid program Test Case:
Blocked By: Blocking:
Related Tickets: #1744 Differential Revisions: Phab:D176

Description (last modified by thomie)

The file

#!/usr/bin/env runhaskell
main = putStrLn "Hello, world!"

works as expected:

C:\Temp>runhaskell hello.hs
Hello, world!

However, if the file is saved as UTF-8 with a BOM (Windows Notepad, for example, sometimes adds this BOM to files), an error occurs:

C:\Temp>runhaskell hello2.hs

hello2.hs:1:1: parse error on input `#!/'

I'm using the Haskell Platform 2011.4.0.0.

I believe that runhaskell/runghc should handle the presence of a BOM correctly; some Windows programs insert a BOM unbeknownst to the user.

This behaviour was observed on Windows XP (32-bit) and Windows 7 (32-bit and 64-bit).

Attachments (2)

hello.hs (60 bytes) - added by vsajip 3 years ago.
Script which works (no BOM)
hello2.hs (61 bytes) - added by vsajip 3 years ago.
Script which fails (with BOM)

Download all attachments as: .zip

Change History (10)

Changed 3 years ago by vsajip

Script which works (no BOM)

Changed 3 years ago by vsajip

Script which fails (with BOM)

comment:1 Changed 3 years ago by pcapriotti

  • difficulty set to Unknown
  • Milestone set to 7.6.1

Thanks for the report.

comment:2 Changed 3 years ago by igloo

  • Milestone changed from 7.6.1 to 7.6.2

comment:3 Changed 2 years ago by dagit

  • Cc dagitj@… added

comment:4 Changed 9 months ago by thoughtpolice

  • Milestone changed from 7.6.2 to 7.10.1

Moving to 7.10.1.

comment:5 Changed 8 months ago by thomie

  • Description modified (diff)
  • Operating System changed from Windows to Unknown/Multiple

comment:6 Changed 7 months ago by thomie

  • Differential Revisions set to Phab:D176
  • Status changed from new to patch

comment:7 Changed 7 months ago by Austin Seipp <austin@…>

In 9e939403241b758a685834c9ff62edcd3172a2cf/ghc:

StringBuffer should not contain initial byte-order mark (BOM)

Summary:
Just skipping over a BOM, but leaving it in the Stringbuffer, is not
sufficient. The Lexer calls prevChar when a regular expression starts
with '^' (which is a shorthand for '\n^'). It would never match on the
first line, since instead of '\n', prevChar would still return '\xfeff'.

Test Plan: validate

Reviewers: austin, ezyang

Reviewed By: austin, ezyang

Subscribers: simonmar, ezyang, carter

Differential Revision: https://phabricator.haskell.org/D176

GHC Trac Issues: #6016

comment:8 Changed 7 months ago by thomie

  • Resolution set to fixed
  • Status changed from patch to closed
Note: See TracTickets for help on using tickets.