Opened 7 years ago

Closed 6 years ago

#4006 closed bug (fixed)

System.Process doesn't encode its arguments.

Reported by: Khudyakov Owned by:
Priority: normal Milestone: 7.2.1
Component: libraries/process Version: 6.12.1
Keywords: Cc: rabeslik@…
Operating System: Linux Architecture: Unknown/Multiple
Type of failure: None/Unknown Test Case:
Blocked By: Blocking:
Related Tickets: Differential Rev(s):
Wiki Page:

Description

System.Process doesn't encode strings while creating new processes. It truncate them instead. Here is simple test case. testUnicode should always return True but does so only for ASCII string.

testUnicode :: String -> IO Bool
testUnicode str = ((== str) . init) `fmap` (readProcess "echo" [str] "")

In GHCi I get following:

*Main> testUnicode "It works here"
True
*Main> testUnicode "А здесь сломалось"
False

I think this bug isn't controversial like #3307 and #3309 since there is no possible information loss.

Change History (7)

comment:1 Changed 7 years ago by igloo

"truncate" here means it just takes the low 8 bits of each Char:

import System.Process

testUnicode :: String -> IO ()
testUnicode str = do str' <- readProcess "echo" [str] ""
                     print (length str, length str')
                     print (str, str')

It's not clear to me that UTF8 encoding is the right thing to do. Shouldn't there be a low level function which takes [Word8] rather than String, and then perhaps a String function on top of that which does UTF8 encoding?

comment:2 Changed 7 years ago by igloo

Milestone: 6.14.1

comment:3 in reply to:  1 Changed 7 years ago by beroal

Cc: rabeslik@… added

I tracked down this issue to 'hackage.base.Foreign.C.String.withCString*' . That function uses 'hackage.base.Foreign.C.String.withCString.castCharToCChar', which is not supposed to handle Unicode at all. Test it with

withCStringLen "А здесь сломалось" $
	\(cs, sl) -> withBinaryFile "/tmp/user" WriteMode $
	\h -> hPutBuf h cs sl

So, 'withCString*' incorrectly implements the FFI addendum: "The marshalling converts each Haskell character, representing a Unicode code point, to one or more bytes in a manner that, by default, is determined by the current locale." see http://www.cse.unsw.edu.au/~chak/haskell/ffi/ffi/ffise6.html#x10-420006.3

Replying to igloo:

It's not clear to me that UTF8 encoding is the right thing to do. Shouldn't there be a low level function which takes [Word8] rather than String, and then perhaps a String function on top of that which does UTF8 encoding?

Supposing 'withCString*' works properly, then command arguments are encoded using current locale. I (and the ticket author I guess) would be satisfied with this solution. Should a user be allowed to pass arbitrary binary data as command arguments, is a broader question — the same question appears for file names.

comment:4 Changed 7 years ago by simonmar

See #1414 for the withCString encoding issue.

comment:5 Changed 6 years ago by igloo

Milestone: 7.0.17.0.2

comment:6 Changed 6 years ago by igloo

Milestone: 7.0.27.2.1

comment:7 Changed 6 years ago by batterseapower

Resolution: fixed
Status: newclosed
Note: See TracTickets for help on using tickets.