parallel gc suffers badly if one thread is descheduled

The parallel GC synchronisation uses pure spinlocks, which leads to a severe decline in performance when one thread is descheduled. This is the main cause of the "last core parallel slowdown": using a -N value that matches the number of cores in the machine can be slower than using one fewer. The effect seems to be quite bad on Linux, reports are that it is less of an issue on OS X.

Switching to mutexes would help, but it isn't easy because we sometimes unlock these from a different thread than they were locked from, and standard mutexes don't let you do that (the locks in question are mut_spin and gc_spin in the GcThread structure).

Trac metadata

Trac field	Value
Version	6.10.4
Type	Bug
TypeOfFailure	OtherFailure
Priority	normal
Resolution	Unresolved
Component	Runtime System
Test case
Differential revisions
BlockedBy
Related
Blocking
CC
Operating system
Architecture

To upload designs, you'll need to enable LFS and have an admin enable hashed storage. More information