Improve performance of a few functions in Foreign.Marshal.*
A number of functions in Foreign.Marshal.* are relatively slow. The reasons for it are:
- Division and multiplication operations when determining the size of memory block in words (bit shifts should be used instead).
- The functions do not get inlined and so do not optimize away things dependent on the data type in question.
A couple of patches fix at least some of the performance issues. With both of them applied, the results of performance improvement, as tested by a basic benchmark in non-threaded RTS are:
TEST NAME BEFORE AFTER
withCString: 146.391 ns 133.646 ns
alloca: 51.424 ns 15.208 ns
allocaBytes: 31.872 ns 14.501 ns
mallocForeignPointer: 34.630 ns 17.498 ns
bytestring: 94.872 ns 58.938 ns
mvar: 61.473 ns 54.806 ns
alloca+advancePtr: 54.480 ns 14.687 ns
new/finalizerFree: 61.172 ns 44.144 ns
with: 69.096 ns 14.600 ns
Please could someone take a look at the patches I offer and merge them into the repository?
One of them is for the runtime system (definitions for Cmm), another one is for Foreign.Marsha.*.
Trac metadata
Trac field | Value |
---|---|
Version | 6.12.2 |
Type | Task |
TypeOfFailure | OtherFailure |
Priority | normal |
Resolution | Unresolved |
Component | Runtime System |
Test case | |
Differential revisions | |
BlockedBy | |
Related | |
Blocking | |
CC | |
Operating system | |
Architecture |