msys has bad Unicode support
Tests 3307
environment001
pass on Cygwin, Linux, fail on msys
:
> lib/IO 3307 [bad exit code] (normal)
> lib/IO environment001 [bad stdout] (normal)
Here is Max's diagnosis:
Basically, msys has kind of bad Unicode support. If you write a program "len.c" like this:
#include <windows.h>
#include <stdio.h>
#include <string.h>
int main(int _argc, char **_argv) {
LPWSTR cmdLine = GetCommandLineW();
int argc;
LPWSTR *argv = CommandLineToArgvW(cmdLine, &argc);
printf("%d args, %d wide chars in first arg\n", argc, wcslen(argv[1]));
return 0;
}
Create a UTF-8 encoded file called "utf8" containing two characters:
不好
And then execute it like so:
gcc len.c && ./a.exe $(cat utf8)
(NB: it is irrelevant whether you use Cygwin gcc or msys gcc: this is an issue with the shells)
You get different results on msys and Cygwin:
- On Cygwin, you get 2 wide characters in the first argument; i.e. the UTF-16 encoded Chinese text
- On msys, you get 6 wide characters in the first argument; i.e. one 16-byte value for every byte in the UTF-8 encoded Chinese text
IMHO the msys behaviour is broken because the command line arguments supplied via the Windows API are meant to be UTF-16. It does match the behaviour of Windows cmd if you do this:
set /p myvar= < utf8
a.exe %myvar%
(You get "6 wide characters" printed)
Perhaps the issue in cmd stems from the fact that the Windows console is stuck in code page 850 and doesn't support the UTF-8 "code page". But msys really has no excuse since it reports itself as being UTF-8.
I'm not sure what to do here because I don't think our code actually has a problem, and the test does pass (and check something useful) in Linux, OS X and Cygwin. But still, something is not working quite right here. Perhaps just mark it as expect-fail in msys?
Trac metadata
Trac field | Value |
---|---|
Version | 7.2.1 |
Type | Bug |
TypeOfFailure | OtherFailure |
Priority | normal |
Resolution | Unresolved |
Component | Compiler |
Test case | |
Differential revisions | |
BlockedBy | |
Related | |
Blocking | |
CC | |
Operating system | |
Architecture |