Consider using xchg instead of mfence for CS stores

To get sequential consistency for atomicWriteIntArray# we use an mfence instruction. An alternative is to use an xchg instruction (which has an implicit lock prefix), which might have lower latency. We should check what other compilers do.

Trac metadata

Trac field	Value
Version	7.9
Type	FeatureRequest
TypeOfFailure	OtherFailure
Priority	normal
Resolution	Unresolved
Component	Compiler
Test case
Differential revisions
BlockedBy
Related
Blocking
CC
Operating system
Architecture

To upload designs, you'll need to enable LFS and have an admin enable hashed storage. More information