Opened 8 years ago

Closed 8 years ago

Last modified 8 years ago

#4322 closed bug (invalid)

High CPU usage during idle time due to GC

Reported by: hesselink Owned by:
Priority: normal Milestone:
Component: Runtime System Version: 6.12.3
Keywords: idle GC Cc:
Operating System: MacOS X Architecture: x86
Type of failure: Runtime performance bug Test Case:
Blocked By: Blocking:
Related Tickets: Differential Rev(s):
Wiki Page:

Description

We have a web application using happstack. It always used ~1% CPU, which was fine. Recently, we started caching more things in memory, and CPU usage during idle time rose to 15-30%. We eventually determined this was due to the idle time GC (we compile with -threaded). Using +RTS -I2 removed the problem.

This is similar to #3408, except that that says it is fixed in 6.12, and that the interval is set to 5s. The docs [1] still mention 0.3s, though, and since changing it to 2s fixes our problem, it seems it's not 5s by default.

It is difficult to attach a simple test case, but I can test things if needed.

[1] http://www.haskell.org/ghc/docs/6.12.2/html/users_guide/runtime-control.html#rts-options-gc

Attachments (2)

server-debug.log.gz (16.9 KB) - added by hesselink 8 years ago.
Log of our server running with -Ds
server-debug.log.2.gz (129.6 KB) - added by hesselink 8 years ago.
Log of our server running with -Ds -vt

Download all attachments as: .zip

Change History (11)

comment:1 Changed 8 years ago by simonmar

The idle GC is supposed to run just once, and then the application should go idle. However, there are some things that can cause it to run repeatedly:

  • the idle GC invokes a finalizer; the finalizer creates a new finalizable object, which then gets finalized by the next idle GC, and so on
  • the idle GC detects a deadlocked thread and sends it an exception. The deadlocked thread catches the exception and proceeds to deadlock again.
  • if your application has a timer signal that executes some Haskell code every 1s, say, then you'll get an idle GC after each one. Setting the idle GC to a longer period than your timer signal (e.g. 2s) will prevent this happening.

In the case of #3408, the latter was happening: the Windows console was sending events every 1s, which caused some code to execute in GHCi.

So my question is, does any of these scenarios apply to you? You can compile the program with -debug and watch the events with +RTS -Ds to see what is happening during the idle time. On a Unix system, strace may help too, or Process Monitor on Windows.

comment:2 Changed 8 years ago by hesselink

I compiled using -debug, and ran using +RTS -N -Ds. Here is the output I get:

    b0103000: cap 0: starting GC
all threads:
threads on capability 0:
threads on capability 1:
	thread   32 @ 0x8c2d000 is not blocked (TSO_DIRTY)
other threads:
	thread   29 @ 0x8c6246c is blocked on an MVar @ 0x7c9c744
	thread    2 @ 0x8c2b000 is blocked on an external call
	thread    3 @ 0x8da3000 is blocked on an MVar @ 0x7ca5840
	thread   10 @ 0x7cb0af0 is blocked on an MVar @ 0x8c2f368
	thread   11 @ 0x8c1a250 is blocked on an STM operation
	thread   13 @ 0x8c63000 is blocked on an STM operation
    b0185000: cap 1: starting GC
    a039a720: woken up on capability 1
    a039a720: capability 1 is owned by another task
    b0185000: cap 1: finished GC
    b0185000: cap 1: running thread 32 (ThreadRunGHC)
    b0185000: cap 1: thread 32 stopped (yielding)
    b0185000: cap 1: thread 32 appended to run queue
    b0185000: cap 1: running thread 32 (ThreadRunGHC)
    b0185000: cap 1: thread 32 stopped (heap overflow)
    b0185000: cap 1: requesting parallel GC
    b0185000: ready_to_gc, grabbing GC threads
    b0103000: cap 0: finished GC

This repeats continuously. Sometimes, after ready_to_gc, there are 10-20 lines saying b0185000: passing capability 0 to worker 0xb0103000. I've attached a full log to this ticket which shows a slightly different run. There is a lot of noise during startup. The server is running at line 25194. At 26162, I do a simple request; this triggers the CPU usage. Before that, CPU usage is 0%, after that, 9%. At 28362, I shut down the server again.

As for the three points: we have finalizers that free/destroy C/C++ structures, nothing else. We don't catch deadlock exceptions anywhere. I'm not sure about the timers; I don't see anything in the -Ds logs, but I don't really know how to read them.

One thing I forgot to mention earlier: the cache we use is some Data.Map's and Data.Set's in TVar's. When we 'deepseq' these before storing them, the CPU usage drops to 2-10%, which is still high, but lower than before. I'm guessing this is just the GC having less indirections to traverse, but perhaps it is useful info.

Does this shed any light on anything?

Changed 8 years ago by hesselink

Attachment: server-debug.log.gz added

Log of our server running with -Ds

comment:3 Changed 8 years ago by simonmar

It might help to add the timestamps too (+RTS -vt). Anyway, it looks like between every two idle GCs the IO manager thread (thread 2) wakes up one of the program threads (thread 29). I imagine you have a thread in your program that is sitting in a loop calling threadDelay, with maybe a 1 second timeout?

I don't understand why this is happening:

    a039a720: woken up on capability 1
    a039a720: resuming capability 1
    a039a720: giving up capability 1
    a039a720: freeing capability 1
    a039a720: woken up on capability 1
    a039a720: resuming capability 1
    a039a720: giving up capability 1
    a039a720: freeing capability 1
    a039a720: woken up on capability 1
    a039a720: resuming capability 1
    a039a720: giving up capability 1
    a039a720: freeing capability 1
    a039a720: woken up on capability 1
    a039a720: resuming capability 1
    a039a720: giving up capability 1
    a039a720: freeing capability 1

that's the main thread waking up several times, and finding nothing to do it goes back to sleep again.

comment:4 Changed 8 years ago by hesselink

Yes, that lead me to it! In happstack there is Happstack.Server.HTTP.Clock, which starts a thread that fills an IORef with the current time every second, using threadDelay. It is used for the request headers. I don't know why it is there; perhaps some kind of caching that improves performance over querying the time every request.

I imagine removing this would fix the problem, but haven't tried yet.

comment:5 Changed 8 years ago by simonmar

Resolution: invalid
Status: newclosed

Thanks; I'm closing the bug (don't be put off by the "invalid", it's just to indicate that the ticket didn't turn out to be a bug in GHC).

comment:6 Changed 8 years ago by hesselink

Would you say that this is a bug in happstack then, or just something users should be aware of? Because this is a hard to debug problem (profiling for example doesn't point you to the right place), and it would be nice if we could change something to prevent others from running into this in the future.

comment:7 Changed 8 years ago by simonmar

I have an idea. Suppose we had an operation:

   idleThread :: Bool -> IO ()

which marks a thread in such a way that if it wakes up it will not cause the runtime to think that activity has occurred, which would trigger another idle GC. We would mark the IO manager thread in the same way, since all these wakeups involve the IO manager thread too.

Even with this, I suggest that for your application you probably want to set the idle GC to something large like 30s.

comment:8 Changed 8 years ago by hesselink

That sounds like it could be a solution.

Note that we don't see any difference between an idle GC time of 2s and 30s. Once we get above the 1s of the threadDelay, the CPU usage drops sharply from ~10% to ~0.25% and stays stable.

It seems I still don't understand something. If the threadDelay causes a idle GC, I should see a GC run every 1s, right? (the threadDelay is 1e6) But I just created a log with -vt, and, assuming times are in microseconds as well (which they seem to be), I see a GC run every 1000-2000 microseconds, which is 1000x more than I'd expect.

I've attached the log again.

Changed 8 years ago by hesselink

Attachment: server-debug.log.2.gz added

Log of our server running with -Ds -vt

comment:9 Changed 8 years ago by simonmar

If you search for "deadlocked" in the log, you can see where the idle GCs happen.

  • line 26335, we've just finished a GC (16.0s).
  • line 26338, the IO manager thread wakes up thread 30 (16.27s).
  • line 26417, we do an idle GC (16.59s).

So the idle GC happens roughly 0.3s after the last activity, which is correct.

From around this point I see the idle GCs happening once per second, as we expect.

Note: See TracTickets for help on using tickets.