Implement FFI spec behaviour for *CString family
|Reported by:||batterseapower||Owned by:|
|Type of failure:||None/Unknown||Test Case:|
|Related Tickets:||Differential Rev(s):|
Although the FFI spec requires the *CString functions in Foreign.C.String to use the locale encoding to interpret the supplied CString, the implementation currently uses ASCII.
I have implemented this feature, and at the same time have changed the behaviour of the encoder upon encountering non-decodable characters to silently ignore them. The rationale behind this is just that the previously-specified behaviour (replace them with ?) was in fact also unimplemented and just ignoring them is marginally easier.
Here are some discussion points:
- It seems a shame not to expose a general interface to peek CStrings in any supported TextEncoding.
- I'm not a fan of either the "ignore" error handling behaviour or the "replace with ?" behaviour. In my opinion we should throw an exception upon encoding failure because how to recover in this situation in general will depend on the user application
- I could implement the replace-with-? error handling behaviour with modest extra effort, if it is deemed necessary.
- To ensure that this patch does not change the behaviour of GHC in any way, I replaced every instance of a *CString function with a call to the CAString equivalent, and marked the source with a comment of the form "-- UNICODE". The intention is that if and when this patch is accepted I will then go back and figure out what is really going on in each case and choose the correct function to call.
- Some of the occurrences of CString in my GHC repo came from projects with a distinct upstream, such as Cabal. Should I be submitting these patches upstream rather than here?
I note that if we did expose a version of the CString functions that took a TextEncoding, it would then be easy for the user to decode ignoring errors instead, because they could simply supply a TextEncoding with different error-handling behaviour.
I have validated this patch on Windows and OS X, and not seen any reproducible failures above and beyond the usual set.