Opened 15 months ago

Last modified 15 months ago

#9321 new feature request

Support for waiting on multiple MVars

Reported by: schyler Owned by: simonmar
Priority: normal Milestone:
Component: Runtime System Version: 7.8.3
Keywords: mvar Cc: simonmar
Operating System: Unknown/Multiple Architecture: Unknown/Multiple
Type of failure: None/Unknown Test Case:
Blocked By: Blocking:
Related Tickets: Differential Rev(s):


A lot of code in servers uses MVars because they seem to have more desirable scalability characteristics than STM. Unfortunately, unlike STM which is composable (i.e. readTChan chan1 <|> readTChan chan2), MVars often require extra inefficient intermediate steps to funnel many-to-one.

A common thing for people to do when they need to funnel N MVars into one is to create 1 MVar and N forks where each fork attempts to read from its associated MVar and then writes it into the one MVar where another fork is waiting to process the data.

This is such a waste; it produces more forks and another MVar where contention can occur.

In some ways it would be better if the internal representation of an MVar had a pointer to the "next MVar" so that we could use a function like eitherMVar to merge two (or more) MVars together into one which can be waited on and yield values from any of the containing MVars.

(I believe) the runtime would need to provide appropriate support in the scheduler so that the list is traversed instead of only the single MVar checked. The overhead for code which does not use this feature would probably be only 1 branch, which is acceptable.

Attachments (1)

Bench.hs (3.0 KB) - added by schyler 15 months ago.
STM benchmark mentioned in my comment

Download all attachments as: .zip

Change History (11)

comment:1 Changed 15 months ago by schyler

My understanding of how this would work is as follows.

You start with 2 MVars:

m1 <- newMVar
m2 <- newMVar

You then fuse them, as MVars would [need to be extended to] have an inbuilt linked list of StgMVars they actually represent:

mx <- fuseMVar m1 m2

Readers need to subscribe to every StgMVar in their list of MVars ( ). They should also write a pointer to the list of MVars they are waiting on to their lightweight thread blocking reason (see below).

Writers need to consider that the lightweight thread they are waking may already have been served, or even, processed data so fast they are re-subscribed into the queue in time to save their spot (unlikely, maybe this should be stopped with a gauge). Writers need to test the blocking reason for the lightweight thread (around ), and if it's blocking on something, ensure that they are in the supplied list of StgMVars. This is actually only an O(n) operation to do, so it still costs O(1) + 1 branch if the MVar hasn't been fused.

When a lightweight thread is finally served it comes out of blocking, and at that point a late writer would realise it's not blocking anymore (something like ), hence ignoring it and move along the list to the next one.

Last edited 15 months ago by schyler (previous) (diff)

comment:2 Changed 15 months ago by schyler

And finally, the motivation.

Use case: Some server which can read from an MVar containing packets from the socket, personal messages from other clients on the server (emplaced and processed in O(1) by other forks) or global server events (emplaced and processed in O(1) by other forks). Each client has only 1 state and 1 fork. Funnelling everything into the one stateful handler is really tricky to do at the moment;

Advantages of STM:

  • Many-to-one waiting without spawning new forks. Can just do i.e.
    next <- atomically $ readTVar packets <|> readTVar events <|> readTVar whispers

Disadvantages of STM:

  • Does not scale well because it's not fair
  • When contention happens, transactions can become really slow as a result

Advantage of MVar:

  • It's fair
  • It's fast

Disadvantages of MVar:

  • To wait on N channels, have to use N extra forks, 2 MVar puts and N+1 total MVar read operations

Advantages of MVar multiple waiting support:

  • To wait on N channels, have to use no extra forks, 1 MVar put and 1 MVar read operation (with N-1 waiter ignores, but these can be seen as "free").
Last edited 15 months ago by schyler (previous) (diff)

comment:3 Changed 15 months ago by carter

could the workload you want to handle work with something like ?

comment:4 Changed 15 months ago by carter

likewise, it seems like the async package provides a version of this

waitEither :: Async a -> Async b -> IO (Either a b)

among others.

do you have a specific test case where the mvar or async approaches are too slow?

comment:5 Changed 15 months ago by carter

I think simon marlow has mentioned wanting to explore providing an abstraction thats kinda in between MVar and TMVar, though I don't recall where I've seen him say that, so maybe he can chime in on that (which might be relevant to what you want)

comment:6 Changed 15 months ago by schyler

(The async package uses STM, for the benefit of others)

comment:7 Changed 15 months ago by simonpj

Some things to think about here:

  • Could we instead make the STM implementation fairer or faster, if those are the problems?
  • Before going anywhere, you need to define a complete API (e.g. is fuseMVar the only new operation?) and then give the semantics of the new operations. The original Concurrent Haskell paper, or Tackling the Awkward Squad gives the style for this semantics. That will force to the surface questions like:
    • What if you take or put to a fused MVar?
    • Can the same MVar be fused in to several different compounds?


comment:8 Changed 15 months ago by simonmar

I think it's really really hard to add support for multi-MVar operations. So I'm with Simon: we should put our efforts into improving STM instead.

On the fairness point, MVar's fairness guarantee means that MVar actually performs quite badly under contention, due to the excessive number of context switches. Have you measured your application using both MVar and STM? STM transactions with only a few TVars perform quite well in my experience, and tend to outperform MVar under heavy contention.

Nevertheless, if you want to go ahead with this, you'll need (a) a clear description of the desired semantics, and (b) a detailed implementation plan. I think you'll encounter lots of problems trying to do both of these.

Changed 15 months ago by schyler

STM benchmark mentioned in my comment

comment:9 Changed 15 months ago by schyler

I have been hacking together a benchmark for STM (attached) first, then later I will put together the equivalent thing for MVar. On my i7 haswell macbook pro I compile and run it like this:

$ ghc -O2 -threaded -fforce-recomp Bench.hs 
$ time ./Bench +RTS -N8

This emulates the exact use case I mentioned for the server above, actually. We have

  • A packets queue, bounded, written and read by only 1
  • An events chan, unbounded and written by everything (but individual forks)
  • A commands queue, bounded, written and read by 5% of the clients (but individual forks)

In terms of the configurable parameters:

  • threshold is how many events, commands OR packets clients want before they just disconnect. I needed a fair simulation of throughput
  • clients is the number of clients to simulate connected to the server
  • There are adjustable delays on the client-local chans you can adjust, packetDelay, eventDelay and commandDelay. This is just to give a kind of estimate of the actual distribution of messages in a real server (I expect i.e. there to be 50 events per 1 actual client packet per 5 client-specific commands)

Interestingly the bench terminates in 4 seconds when clients = 20. I set clients = 40 and couldn't get it to terminate for over 5 minutes (and my laptop was really hot) so I killed it.

comment:10 Changed 15 months ago by schyler

I will see if benching MVar over the same thing is any better, and then (to make it even faster and more elegant) talk about proposed API, semantics etc soon.

Last edited 15 months ago by schyler (previous) (diff)
Note: See TracTickets for help on using tickets.