Version 10 (modified by nr, 11 years ago) (diff)


The Scheduler

The scheduler is the heart of the runtime: it is the single part of the system through which all entry to the Haskell world goes, and it handles requests from outside to invoke Haskell functions (foreign export).

In this part of the commentary we'll discuss the threaded version of the runtime (see Commentary/Rts/Config), that is, the version of the runtime that uses multiple OS threads, because it is by far the most complex beast.

We begin by discussing the basic abstractions used in the scheduler.

OS Threads

Source files: includes/OSThreads.h, rts/win32/OSThreads.c, rts/posix/OSThreads.c

We assume that the OS provides some kind of native threads, and for SMP parallelism we assume that the OS will schedule multiple OS threads across the available CPUs.

When running on an SMP, we use a fixed number of OS threads for running Haskell code, typically chosen to be the the same as the number of CPU cores in the machine. There may be more than this number of OS threads in the runtime, but only this number will be executing Haskell code at any one time.

The RTS provides a platform-independent abstraction layer for OS threads in rts/OSThreads.h.

Haskell threads

A Haskell thread is represented by a TSO. There are two kinds of Haskell thread:

  • A bound thread is created as the result of a call-in from outside Haskell; that is, a call to foreign export or foreign import "wrapper". A bound thread is tied to the OS thread that made the call; all further foreign calls made by this Haskell thread are made in the same OS thread. (this is part of the design of the FFI, described in the paper Extending the Haskell Foreign Function Inteface with Concurrency).
  • An unbound thread is created by Control.Concurrent.forkIO. Foreign calls made by an unbound thread are made by an arbitrary OS thread.


Source files: rts/Task.h, rts/Task.c

A Task is a further layer of abstraction over an OS thread.

One Task is created for each call-in to the runtime. When a call-in is made, a Task is allocated, a new Haskell thread is created, and the two are bound together for the duration of the call: task->tso points to the TSO, and tso->bound points to the Task. If a thread makes a call-in, followed by a call-out, and another call-in and so on, there could be a whole stack of tasks associated with a single OS thread.

Additionally, there is one Task for each worker OS thread in the system. A worker OS thread is used for executing the unbound Haskell threads. tso->bound is NULL for a worker Task.

The function

  Task *myTask (void);

returns the Task associated with the current OS thread. However, there may be multiple Tasks associated with a particular OS thread, for example if a call-in executes some code that makes a foreign call, and that foreign call makes a further call-in, and so on. If an OS thread has multiple Tasks associated with it, then myTask returns the topmost, or most recently created, one. The myTask function is implemented using OS thread-local state.

The important components of a Task are:

  • The OS thread that owns this Task
  • The Capability that this Task holds (see below)
  • The TSO that this Task is bound to, if any
  • A condition variable on which this Task can put itself to sleep
  • Some link fields for placing the Task on various queues


Source files: rts/Capability.h, rts/Capability.c

A Capability is a virtual CPU for executing Haskell code. The number of capabilities in the system is chosen by the +RTS -N option. This value should be chosen to be the same as the number of real CPU cores, so that we never try to run more Haskell threads simultaneously than we have real CPUs available.

Invariant: a task that holds a capability is not blocked in the operating system.

This makes some parts of the system simpler - for example, we can use spin locks that spin indefinitely, because we can ensure that the spin lock is only held by a currently executing CPU, and will therefore be released in a finite (and short) amount of time.

Also we can maximise the advantage of our lightweight threading by not using OS-level context switching. We still use OS-level blocking I/O, however - only the OS knows how to do that in general.

A Capability is in one of two states:

  • It is free if cap->running_task == NULL. The Capability is dormant, not currently executing any code.
  • Otherwise, it is held by a Task, and cap->running_task points to the Task that is currently holding it.

The important components of a Capability are:

  • The registers of the virtual machine, for executing Haskell code (although while actually executing, some of these registers may be held in real machine registers, they are only saved to the Capability when returning to the scheduler).
  • The Task that is currently animating this Capability.
  • A queue of runnable Haskell threads (the run queue).
  • A list of Haskell thrads currently making safe foreign calls.
  • A list of worker OS threads.
  • A list of Tasks waiting to return to Haskell from foreign calls.
  • A list of Haskell threads waiting to wake up on this Capability.

Handing over a Capability

The Task currently holding a Capability might need to relinquish it for one of the following reasons:

  • The Haskell thread at the head of the run queue is not appropriate for this Task: it is bound to another Task, or it is unbound and the current Task is bound.
  • There is a Task waiting to return to Haskell from a foreign call (these are given priority over Haskell threads in the run queue, because in general they haven't had a full time slice yet).
  • Another Task is trying to garbage collect (in the current single-threaded GC, all activity has to stop in order to GC).

The details of handing over a Capability are rather subtle, so look at the code for the definitive picture. Broadly speaking, when handing over a Capability to a Task, we make the Task aware that it should wake up and on which Capability, and we mark the Capability as free. The Task wakes up, tries to acquire ownership of the Capability. If it fails because another Task is holding the Capability (this is entirely possibly, since the Capability was marked free momentarily), then it goes back to sleep: the other Task will release and hand it over at some point.

One reason behind marking a Capability as free when it is handed over is to support fast callouts. When making a safe foreign call we have to release the Capability, and therefore hand it over to another worker thread. If the foreign call is short, we don't want to incur the cost of a context switch on returning, but since we marked the Capability as free there's a good chance the returning Task will be able to re-acquire it immediately and continue. The worker that we woke up will find that the Capability is owned, and go back to sleep again (this may incur a double context switch if there are no free CPUs on which to run the worker, however).

The Scheduler's main loop

A transcript of Simon's explanation at the board:

  for (;;) {
    yieldCapability(cap);  /* give cap to anybody wanting in from outside */
    tso = popRunQueue(cap);
    result = StgRun(tso);
    case result of
      out of heap -> re-enqueue tso; call GC;
      out of stack -> enlarge tso; re-enqueue tso;
      time expired -> put tso on end of queue; /* round robin */
      finished -> 
        if (tso is a bound thread)

Affinity and migration