|Version 2 (modified by 7 years ago) (diff),|
Startup / Shutdown support in the RTS
A common property of all our parallel variants is, the runtime system starts in several coordinated instances on different machines, and also needs to coordinate the clean shutdown.
- We added a suitable (deliberately simple) message passing layer (== PVM | MPI | home brewn, corresponding to
parpvm | mpi | cp)
rts/parallel/MPSystem.h is the interface
rts/parallel/PVMComm.c|MPIComm.c|CpComm.c implement the methods (TODO
CpComm.c are only stubs for now)
- Managing the startup of several machines (parsed by start script)
rts/parallel/ParInit.c::startupParallelSystem|synchroniseSystem, which call the suitable functions in the message passing layer
- Each machine is running an own (empty) scheduler, started by a function added to
Historic: the first version simply used
workerStart from the threaded RTS way to run the empty scheduler. This released the capability after finishing (scheduler returns) and caused an assertion failure when releasing it again (by rts_unlock) in RtsMain. Now we have our own (tiny) method to start the scheduler.
- Shutdown is triggered via system message PP_FINISH from node 1 to the others.
In a first version, the empty scheduler instances blocked on a call to MP_recv in
getRemoteWork. Now, we handle it inside a more complex message processing function, when we have more types of messages between the machines).