GHC refuses to compile a file that starts with a Byte Order Mark (BOM)

Trying to compile a file that starts with a Byte Order Mark (BOM) results in the message like:

Camels.hs:1:1: lexical error at character '\65279'

No compilation is done. Note that, if a file is saved as UTF-8, Notepad adds this BOM to the beginning of the file.

This is definitely not a bug on GHC's part, but rather on Notepad's.

BOMs cause many problems when used in UTF8 and are highly discouraged, so it should come to no surprise that GHC complains about it.

When I remove the BOM by saving the file in ANSI coding (using Notepad), I get the following message from GHC:


lexical error in string/character literal (UTF-8 decoding error)

This is because of an o-umlaut in the comments. The file can be found at:

(Geany states that the file is in CP1252 code and displays it correctly)

Currently, GHC's lexer assumes its input to be ASCII or UTF8 (for which a BOM is rather pointless -- as an UTF8 stream doesn't allow for different byteorders).

The CP1252 (same with ISO-8859-1 btw) encoding, however, is only compatible for the lowest 128 code-points.

I believe the usual recommendation is to use Notepad++ which allows to write UTF8 w/o that gratuitous BOM.

I am bringing you good news from #6016. A fix for BOMs in Haskell source files will be in 7.10.

