wiki:AtomicPrimops

Version 6 (modified by tibbe, 11 months ago) (diff)

--

Introduction

Atomic operations on basic numeric (and pointer) types can be used as a foundation to build higher level construct (e.g. mutexes.) There are also useful on their own (e.g. to implement an atomic counter.) This page outlines a design for adding atomic primops.

The primops

The new primops are modeled after those provided by C11, C++11, GCC, and LLVM.

atomicReadIntArray# :: MutableByteArray# s -> Int# -> State# s -> (# State# s, Int# #)
atomicWriteIntArray# :: MutableByteArray# s -> Int# -> Int# -> State# s -> State# s
fetchAddIntArray#
    :: MutableByteArray#     -- Array to modify
    -> Int#                  -- Index, in words
    -> Int#                  -- Amount to add
    -> State# s
    -> (# State# s, Int# #)  -- Value held previously
fetchSubIntArray# :: MutableByteArray# -> Int# -> Int# -> State# s -> (# State# s, Int# #)
fetchOrIntArray#  :: MutableByteArray# -> Int# -> Int# -> State# s -> (# State# s, Int# #)
fetchXorIntArray# :: MutableByteArray# -> Int# -> Int# -> State# s -> (# State# s, Int# #)
fetchAndIntArray# :: MutableByteArray# -> Int# -> Int# -> State# s -> (# State# s, Int# #)
casIntArray# :: MutableByteArray# s -> Int# -> Int# -> Int# -> State# s -> (#State# s, Int##)

fetchAddIntArray# and casIntArray# already exist (but are implemented as out-of-line primops.)

Implementation

The primops are implemented as CallishMachOps to allow us to emit LLVM intrinsics when using the LLVM backend. This also allows us to provide a fallback implementation in C on those platforms where we don't want to implement the backend support for these operations. The fallbacks can be implemented using the GCC/LLVM atomic built-ins e.g. __sync_fetch_and_add.

Ensuring ordering of non-atomic operations

A CallishMachOp already acts as a memory barrier; the Cmm optimizer will not float loads/stores past it. We'll document the reliance on this behavior in the sinking pass to make sure that if it's ever changed, the optimizer is taught about how to handle these atomic operations in some different way.

The CallishMachOp will be translated to the correct instructions in the backends (e.g. lock; add on x86).

As the Cmm code generator cannot reorder reads/writes around prim calls (i.e. CallishMachOps) the memory_order_seq_cst semantics should be preserved, as long as the backend outputs a memory barrier to prevent CPU speculation.

Ordering of non-atomic operations

We'll use sequential consistency, which corresponds to the C++0x/C1x memory_order_seq_cst, Java volatile, and the gcc-compatible __sync_* builtins. This is the strongest consistency guarantee. We can provide weaker guarantees in the future, if needed.