Threaded RTS performing badly on recent OS X (10.8?)
|Reported by:||simonmar||Owned by:||thoughtpolice|
|Keywords:||thread-local state, TLS clang||Cc:||johan.tibell@…, chak@…, anton.nik@…, george.colpitts@…, simonmar|
|Operating System:||MacOS X||Architecture:||x86_64 (amd64)|
|Type of failure:||Runtime performance bug||Test Case:|
|Related Tickets:||Differential Rev(s):|
This ticket is to remind us about the following problem: OS X is now using llvm-gcc, and as a result GHC's garbage collector with -threaded is much slower than it should be (approx 30% slower overall runtime). Some results here: http://www.haskell.org/pipermail/cvs-ghc/2011-July/063552.html
This is because the GC code relies on having fast access to thread-local state. It uses one of two methods: either a register variable (gcc only) or __thread variables (which aren't supported on OS X). To make things work on OS X, we use calls to pthread_getspecific instead (see #5634), which is quite slow, even though it compiles to inline assembly.
I don't recall which OS X / XCode versions are affected, maybe a Mac expert could fill in the details.
We have tried other fixes, such as passing around the thread-local state as extra arguments, but performance wasn't good. Ideally Apple will implement TLS in OS X at some point and we can start to use it.
A workaround is to install a real gcc (using homebrew?) and use that to compile GHC. Whoever builds the GHC distributions for OS X should probably do it that way, so everyone benefits.
Change History (55)
comment:14 Changed 4 years ago by thoughtpolice
- Architecture changed from Unknown/Multiple to x86_64 (amd64)
- Operating System changed from Unknown/Multiple to MacOS X
- Owner set to thoughtpolice
- Version changed from 7.6.1 to 7.7
comment:23 Changed 3 years ago by simonmar
- Milestone changed from _|_ to 7.8.1
- Priority changed from normal to high
comment:34 in reply to: ↑ 33 Changed 3 years ago by Andrea
comment:37 Changed 3 years ago by thoughtpolice
- Milestone changed from 7.8.1 to 7.8.2
- Version 7.8.1-rc1 deleted
comment:39 Changed 3 years ago by George
- Keywords thread-local state TLS clang added
- Type of failure changed from None/Unknown to Runtime performance bug