Threaded RTS performing badly on recent OS X (10.8?)

This ticket is to remind us about the following problem: OS X is now using llvm-gcc, and as a result GHC's garbage collector with -threaded is much slower than it should be (approx 30% slower overall runtime). Some results here: http://www.haskell.org/pipermail/cvs-ghc/2011-July/063552.html

This is because the GC code relies on having fast access to thread-local state. It uses one of two methods: either a register variable (gcc only) or __thread variables (which aren't supported on OS X). To make things work on OS X, we use calls to pthread_getspecific instead (see #5634 (closed)), which is quite slow, even though it compiles to inline assembly.

I don't recall which OS X / XCode versions are affected, maybe a Mac expert could fill in the details.

We have tried other fixes, such as passing around the thread-local state as extra arguments, but performance wasn't good. Ideally Apple will implement TLS in OS X at some point and we can start to use it.

A workaround is to install a real gcc (using homebrew?) and use that to compile GHC. Whoever builds the GHC distributions for OS X should probably do it that way, so everyone benefits.

Trac metadata

Trac field	Value
Version	7.6.1
Type	Bug
TypeOfFailure	OtherFailure
Priority	normal
Resolution	Unresolved
Component	Runtime System
Test case
Differential revisions
BlockedBy
Related
Blocking
CC
Operating system	Unknown/Multiple
Architecture	Unknown/Multiple

To upload designs, you'll need to enable LFS and have an admin enable hashed storage. More information