Opened 2 years ago

Closed 18 months ago

#10623 closed bug (fixed)

Handling of ASCII encodings introduced in D898 breaks Unicode terminal detection

Reported by: bgamari Owned by:
Priority: normal Milestone: 8.2.1
Component: Test Suite Version: 7.10.2-rc2
Keywords: Cc:
Operating System: Windows Architecture: Unknown/Multiple
Type of failure: None/Unknown Test Case: T8958a, libraries/base/tests/IO/encoding005
Blocked By: Blocking:
Related Tickets: #10298, #7695 Differential Rev(s): Phab:D1059, Phab:D1085, Phab:D2262
Wiki Page:

Description (last modified by bgamari)

Phab:D898 was primarily intended to fix hangs in the event that iconv was unavailable (namely #10298 and #7695). In addition to this fix, it also introduced self-contained handling of ANSI terminals to allow compiled executables to run in minimal environments lacking iconv.

However, the behavior that the patch introduced is highly suspicious. Specifically, it gives the user a UTF-8 encoding even if they requested ASCII.

This has the potential to break quite a lot of code. At very least it breaks GHC's Unicode terminal detection logic, which attempts to catch an invalid character when encoding a pair of smart-quotes. Of course, this exception will never be thrown if a UTF-8 encoder is used.

Change History (17)

comment:1 Changed 2 years ago by bgamari

Description: modified (diff)

comment:2 Changed 2 years ago by bgamari

Differential Rev(s): Phab:1059

comment:3 Changed 2 years ago by Ben Gamari <ben@…>

In d69dfba4e27c4ec33459906fd87c9a56a371f510/ghc:

Fix self-contained handling of ASCII encoding

D898 was primarily intended to fix hangs in the event that iconv was
unavailable (namely #10298 and #7695). In addition to this fix, it also
introduced self-contained handling of ANSI terminals to allow compiled
executables to run in minimal environments lacking iconv.

However, the behavior that the patch introduced is highly suspicious.
Specifically, it gives the user a UTF-8 encoding even if they requested
ASCII.

This has the potential to break quite a lot of code. At very least it
breaks GHC's Unicode terminal detection logic, which attempts to catch
an invalid character when encoding a pair of smart-quotes. Of course,
this exception will never be thrown if a UTF-8 encoder is used.

Here we use the `char8` encoding to handle requests for ASCII encodings
in the event that we find iconv to be non-functional.

Fixes #10623.

Test Plan: Validate with T8959a

Reviewers: rwbarton, hvr, austin, hsyl20

Subscribers: thomie

Differential Revision: https://phabricator.haskell.org/D1059

GHC Trac Issues: #10623

comment:4 Changed 2 years ago by bgamari

Resolution: fixed
Status: newclosed

A version of this is also present in the ghc-7.10 branch as 677552f21690761b89255d05e42976679be4d374.

comment:5 Changed 2 years ago by rwbarton

Differential Rev(s): Phab:1059Phab:D1059, Phab:D1085
Milestone: 7.10.2
Priority: normalhigh
Resolution: fixed
Status: closednew

I realized that at least hsyl20 and perhaps others actually intend to run their programs in the (what I would consider to be broken) setting of a chroot without a working iconv installation. In that case we really shouldn't use char8 when asked for an ASCII locale. If the programmer wants char8 they can use it explicitly. I implemented an ASCII encoding in Phab:D1085.

comment:6 Changed 2 years ago by Ben Gamari <ben@…>

In dbe6dac/ghc:

When iconv is unavailable, use an ASCII encoding to encode ASCII

D898 and D1059 implemented a fallback behavior to handle the case
that the end user's iconv installation is broken (typically due to
running inside a chroot in which the necessary locale files and/or
gconv modules have not been installed). In this case, if the
program requests an ASCII locale, GHC's char8 encoding is used
rather than the program failing.

However, silently mangling data like char8 does when the programmer
did not ask for it is poor behavior, for reasons described in D1059.

This commit implements an ASCII encoding and uses it in the fallback
case when iconv is unavailable and the user has requested ASCII.

Test Plan:
Added tests for the encodings defined in Latin1.
Also, manually ran a statically-linked executable of that test
in a chroot and the tests passed (up to the ones that call
mkTextEncoding "LATIN1", since there is no fallback from iconv
for that case yet).

Reviewers: austin, hvr, hsyl20, bgamari

Reviewed By: hsyl20, bgamari

Subscribers: thomie

Differential Revision: https://phabricator.haskell.org/D1085

GHC Trac Issues: #7695, #10623

comment:7 Changed 2 years ago by bgamari

Resolution: fixed
Status: newclosed

comment:8 Changed 2 years ago by Thomas Miedema <thomasmiedema@…>

In 89060374/ghc:

Testsuite: mark encoding005 expect_broken(#10623) on Windows

comment:9 Changed 2 years ago by thomie

Operating System: Unknown/MultipleWindows
Resolution: fixed
Status: closednew

The test encoding005 is failing on Windows with an unknown ASCII encoding error.

comment:10 Changed 2 years ago by thoughtpolice

Milestone: 7.10.27.10.3

Moving out to 7.10.3, as it seems there was a regression (maybe a new ticket is needed?)

Last edited 2 years ago by thoughtpolice (previous) (diff)

comment:11 Changed 2 years ago by rwbarton

It's not a regression, just a new test (I added it in dbe6dac96543f426297a59d8d16c3f5afacf42d4). The test uses mkTextEncoding "ASCII" which clearly (if you read GHC.IO.Encoding.mkTextEncoding') won't work on Windows. It didn't work in the past either except between e28462de700240288519a016d0fe44d4360d9ffd and d69dfba4e27c4ec33459906fd87c9a56a371f510.

Should we make it work, i.e., handle encodings in our hard-coded list as ASCII on Windows too? Personally I don't see why not.

comment:12 Changed 2 years ago by bgamari

Milestone: 7.10.38.0.1

The issue mentioned in comment:11 is still outstanding. Punting to 8.0.1.

comment:13 Changed 2 years ago by thomie

Component: CompilerTest Suite
Priority: highnormal

comment:14 Changed 20 months ago by thoughtpolice

Milestone: 8.0.18.0.2

Punting to 8.0.2, since this just affects the testsuite.

comment:15 Changed 18 months ago by thomie

Differential Rev(s): Phab:D1059, Phab:D1085Phab:D1059, Phab:D1085, Phab:D2262
Status: newpatch

I have another patch for this series.

comment:16 Changed 18 months ago by Thomas Miedema <thomasmiedema@…>

In 1319363f/ghc:

Always use native-Haskell de/encoders for ASCII and latin1

This fixes test encoding005 on Windows (#10623).

Reviewed by: austin, bgamari

Differential Revision: https://phabricator.haskell.org/D2262

comment:17 Changed 18 months ago by thomie

Milestone: 8.0.28.2.1
Resolution: fixed
Status: patchclosed
Test Case: T8958aT8958a, libraries/base/tests/IO/encoding005
Note: See TracTickets for help on using tickets.