Opened 2 years ago

Last modified 2 years ago

#12971 upstream bug

Paths are encoded incorrectly when invoking GCC

Reported by: erikprantare Owned by: Phyx
Priority: normal Milestone:
Component: Compiler Version: 8.0.1
Keywords: Cc:
Operating System: Windows Architecture: Unknown/Multiple
Type of failure: None/Unknown Test Case:
Blocked By: Blocking:
Related Tickets: Differential Rev(s): Phab:D2917 Phab:/D2942
Wiki Page:

Description

Hello, I've been trying to get the ghc to compile a .hs file to binary. I get an error that seems to be caused during the C compilation phase from the special character ä in my user directory. When compiling I get this:

C:\Users\Erik Präntare>ghc C:\code\Haskell\test.hs
Linking C:\code\Haskell\test.exe ...
realgcc.exe: error: C:\Users\Erik Präntare\AppData\Local\Temp\ghc28912_0\ghc_1.
c: No such file or directory
realgcc.exe: fatal error: no input files
compilation terminated.
`gcc.exe' failed in phase `C Compiler'. (Exit code: 1)

Notice how it says Pr├ñntare instead of Präntare. The UTF-8 encoding for ä is 0xC3 0xA4 which, surprise surprise, corresponds to ├ñ in ASCII. Now, I managed to get around this by changing the tmpdir to my actual temporary directory, but as a new user of ghc it was very troublesome to find a solution. Maybe consider changing the default tmpdir, at least if special characters are encountered?

Change History (14)

comment:1 Changed 2 years ago by erikprantare

Summary: gcc not finding default temporary directoryChange default tmpdir

comment:2 Changed 2 years ago by bgamari

Milestone: 8.2.1
Operating System: Unknown/MultipleWindows
Priority: normalhighest
Summary: Change default tmpdirPaths are encoded incorrectly when invoking GCC
Type: feature requestbug

Oh dear, it looks like we are failing to encode a filename with UTF-16 as expected by Windows.

Last edited 2 years ago by bgamari (previous) (diff)

comment:3 Changed 2 years ago by bgamari

Indeed I can reproduce this with,

$ mkdir Präntare
$ export TMP=`pwd`/Präntare
$ echo 'main = putStrLn "hello world!"' > Hello.hs
$ ghc Hello.hs
[1 of 1] Compiling Main             ( Hello.hs, Hello.o )
Linking Hello.exe ...
realgcc.exe: error: C:\msys64\home\ben\Präntare\ghc2596_0\ghc_4.c: No such file or directory
realgcc.exe: fatal error: no input files
compilation terminated.
`gcc.exe' failed in phase `C Compiler'. (Exit code: 1)

$ export TMP=/tmp
$ ghc Hello.hs
Linking Hello.exe ...
Last edited 2 years ago by bgamari (previous) (diff)

comment:4 Changed 2 years ago by bgamari

ghc -v reveals that the failing command is,

Linking Hello.exe ...
*** C Compiler:
"C:\msys64\home\ben\ghc-8.0.1-i386\lib/../mingw/bin/gcc.exe" "-U__i686" \
    "-march=i686" "-fno-stack-protector" "-DTABLES_NEXT_TO_CODE" "-c" \
    "C:\msys64\home\ben\Prntare\ghc4688_0\ghc_4.c" \
    "-o" "C:\msys64\home\ben\Prntare\ghc4688_0\ghc_5.o" \
    "-IC:\msys64\home\ben\ghc-8.0.1-i386\lib/include"
realgcc.exe: error: C:\msys64\home\ben\Präntare\ghc4688_0\ghc_4.c: No such file or directory
realgcc.exe: fatal error: no input files
compilation terminated.

Note the oddly missing ä in the gcc command line. However, keep-tmp-files confirms that the directory was created with the correct name.

Last edited 2 years ago by bgamari (previous) (diff)

comment:5 Changed 2 years ago by bgamari

It seems quite likely that DriverPipeline.mkExtraObj is responsible for this call but it doesn't have any immediately evident issues. Seems like SysTools.newTempName is likely the culprit.

Last edited 2 years ago by bgamari (previous) (diff)

comment:6 Changed 2 years ago by bgamari

Hmm, the missing ä above may have just been due to the terminal. Redirecting ghc's output to a file and viewing the file with vim shows the following,

Linking Hello.exe ...
*** C Compiler:
"C:\msys64\home\ben\ghc-8.0.1-i386\lib/../mingw/bin/gcc.exe" "-U__i686" \
    "-march=i686" "-fno-stack-protector" "-DTABLES_NEXT_TO_CODE" "-c" \
    "C:\msys64\home\ben\Pr<84>ntare\ghc4344_0\ghc_4.c" \
    "-o" "C:\msys64\home\ben\Pr<84>ntare\ghc4344_0\ghc_5.o" \
    "-IC:\msys64\home\ben\ghc-8.0.1-i386\lib/include"
realgcc.exe: error: C:\msys64\home\ben\Präntare\ghc4344_0\ghc_4.c: No such file or directory

comment:7 Changed 2 years ago by bgamari

Owner: set to Phyx

Phyx kindly said he would take this from here.

comment:8 Changed 2 years ago by Ben Gamari <ben@…>

In 8f0546bf/ghc:

testsuite: Add test for #12971

Test Plan: Validate

Reviewers: austin

Subscribers: thomie

Differential Revision: https://phabricator.haskell.org/D2855

GHC Trac Issues: #12971

comment:9 Changed 2 years ago by Phyx-

Ok, this issue is an upstream one.

The problem is with how the response files are read into GCC. We write out a UTF8 file but the relevant code in libiberty[1] assumes 1 byte per character. It works fine if response files aren't used since the rest of the argument handling code seems to work fine with utf-16 and utf-8.

Assuming we want to keep the response files, one possible work around would be to convert all paths to the dos short paths.

[1] https://github.com/gcc-mirror/gcc/blob/master/libiberty/argv.c#L420

comment:10 Changed 2 years ago by Phyx-

Differential Rev(s): Phab:D2917
Status: newpatch

I've made a temporary workaround in Phab:D2917. I can patch libiberty for 8.4.

I'll leave it up to you to decide if you want to include this or wait @bgamari

Last edited 2 years ago by Phyx- (previous) (diff)

comment:11 Changed 2 years ago by Phyx-

Differential Rev(s): Phab:D2917Phab:D2917 Phab:/D2942

Added another possible fix.

comment:12 Changed 2 years ago by Tamar Christina <tamar@…>

In f63c8ef/ghc:

Use latin1 code page on Windows for response files.

Summary:
D2917 added a change that will make paths on Windows response files
use DOS 8.3 shortnames to get around the fact that `libiberty` assumes
a one byte per character encoding.

This is actually not the problem, the actual problem is that GCC on
Windows doesn't seem to support Unicode at all.

This comes down to how unicode characters are handled between POSIX and
Windows. On Windows, Unicode is only supported using a multibyte character
encoding such as `wchar_t` with calls to the appropriate wide version of
APIs (name post-fixed with the `W` character). On Posix I believe the standard
`char` is used and based on the value it is decoded to the correct string.

GCC doesn't seem to make calls to the Wide version of the Windows APIs,
and even if it did, it's character representation would be wrong. So I
believe GCC just does not support utf-8 paths on Windows.

So the hack in D2917 is the only way to get Unicode support. The problem is
however that `GCC` is not the only tool with this issue and we don't use response
files for every invocation of the tools. Most of the tools probably don't support it.

Furthermore, DOS 8.1 shortnames only exist when the path or file physically exists on
disk. We pass lots of paths to GCC that don't exist yet, like the output file.
D2917 works around this by splitting the path from the file and try shortening that.

But this may not always work.

In short, even if we do Unicode correctly (which we don't atm, the GCC driver we build
uses `char` instead of `wchar_t`) we won't be able to compile using unicode paths that
need to be passed to `GCC`. So not sure about the point of D2917.

What we can do is support the most common non-ascii characters by writing the response
files out using the `latin1` code page.

Test Plan: compile + make test TEST=T12971

Reviewers: austin, bgamari, erikd

Reviewed By: bgamari

Subscribers: thomie, #ghc_windows_task_force

Differential Revision: https://phabricator.haskell.org/D2942

GHC Trac Issues: #12971

comment:13 Changed 2 years ago by Phyx-

Priority: highestnormal
Status: patchupstream

This is as fixed as it's going to get with the current toolchain.

We'll have to revisit this if we ever move toolchains.

comment:14 Changed 2 years ago by Phyx-

Milestone: 8.2.1
Note: See TracTickets for help on using tickets.