Opened 2 years ago

Closed 2 years ago

#6061 closed bug (fixed)

threadDelay broken on Windows

Reported by: simonmar Owned by: pcapriotti
Priority: highest Milestone: 7.6.1
Component: Compiler Version: 7.4.1
Keywords: Cc:
Operating System: Unknown/Multiple Architecture: Unknown/Multiple
Type of failure: None/Unknown Difficulty: Unknown
Test Case: Blocked By:
Blocking: Related Tickets:

Description

There seems to be new breakage in threadDelay on Windows. I noticed conc070 failing in a validate:

=====> conc070(ghci) 191 of 3265 [0, 0, 0]

cd ./concurrent/should_run && 'c:/simonmar/ghc-validate/bindisttest/install   dir/bin/ghc.exe' -fforce-recomp -dcore-lint -dcmm-lint -dno-debug-output -no-user-package-conf -rtsopts  -fno-ghci-history conc070.hs --interactive -v0 -ignore-dot-ghci +RTS -I0.1 -RTS   <conc070.genscript 1>conc070.interp.stdout 2>conc070.interp.stderr

Actual stdout output differs from expected:

--- ./concurrent/should_run/conc070.stdout	2011-08-02 14:43:53 +0100
+++ ./concurrent/should_run/conc070.run.stdout	2012-04-27 10:54:04 +0100
@@ -1 +1 @@
-[ThreadBlocked BlockedOnMVar,ThreadBlocked BlockedOnMVar,ThreadRunning,ThreadFinished]
+[ThreadFinished,ThreadBlocked BlockedOnMVar,ThreadRunning,ThreadFinished]

*** unexpected failure for conc070(ghci)

and strangely if you try threadDelay 1000000 in GHCi it returns immediately.

This test is also failing:

=====> ThreadDelay001(threaded1) 7 of 7 [0, 0, 0]
cd . && 'c:/simonmar/ghc-validate/inplace/bin/ghc-stage2.exe' -fforce-recomp -dcore-lint -dcmm-lint -dno-debug-output -no-user-package-conf -rtsopts  -fno-ghci-history -o ThreadDelay001 ThreadDelay001.hs -threaded -debug   >ThreadDelay001.comp.stderr 2>&1
cd . && ./ThreadDelay001    </dev/null >ThreadDelay001.run.stdout 2>ThreadDelay001.run.stderr
Actual stdout output differs from expected:
--- /dev/null   2012-04-30 11:17:41 +0100
+++ ./ThreadDelay001.run.stdout 2012-04-30 11:17:41 +0100
@@ -0,0 +1,2 @@
+(Mon Apr 30 11:17:29 GMT Daylight Time 2012,Mon Apr 30 11:17:29 GMT Daylight Time 2012,1000000,0,-1000000,-1.0e-6)
+(Mon Apr 30 11:17:29 GMT Daylight Time 2012,Mon Apr 30 11:17:29 GMT Daylight Time 2012,5000000,0,-5000000,-5.0e-6)
*** unexpected failure for ThreadDelay001(threaded1)

Possibly related to the fix for #5865

Attachments (2)

6061.patch (6.7 KB) - added by pcapriotti 2 years ago.
6061-base.patch (3.6 KB) - added by pcapriotti 2 years ago.

Download all attachments as: .zip

Change History (9)

comment:1 Changed 2 years ago by pcapriotti

  • Owner set to pcapriotti

I can't reproduce the first two failures.

The ThreadDelay001 test is failing because threadDelay sometimes sleeps for slightly less time than requested. I'm looking into that.

comment:2 Changed 2 years ago by pcapriotti

I think this is due to clock skew among different CPUs when using QPC. The skew itself shouldn't be very large (up to 5 μs on my system), so it's not a concern for the purpose of threadDelay, but it does break the invariant that the actual delay time is greater or equal to the requested time. Maybe we could just change the test and add a few μs tolerance.

I'm not sure why you saw threadDelay 1000000 return immediately. I haven't been able to reproduce that here. Maybe the clock skew can got over 1 second somehow? That would mean we can't really rely on QueryPerformanceCounter.

comment:3 Changed 2 years ago by pcapriotti

  • Status changed from new to patch

I realized that we can actually work around the clock skew issue by calculating the delay target directly in the IO manager thread, assuming it always wakes up promptly enough to make the difference in timing negligible.

The attached patch fixes the ThreadDelay001 test for me.

Changed 2 years ago by pcapriotti

Changed 2 years ago by pcapriotti

comment:4 Changed 2 years ago by pcapriotti

As discussed with Simon, I moved getMonotonicUSec to the RTS (actually, I made it into getMonotonicNSec to avoid an intermediate conversion).

For the moment, it's used only on Windows, but it could possibly be used by the other platforms as well, to avoid duplication. That can be done separately, in any case.

comment:5 Changed 2 years ago by simonmar

Ok, looks good.

comment:6 Changed 2 years ago by simonmar

BTW - I suggest not calculating the delay target on the IO manager thread, because that could introduce an arbitrary extra delay if the IO manager is not scheduled quickly. Instead we should allow some tolerance (maybe only on Windows) for ThreadDelay001.

comment:7 Changed 2 years ago by pcapriotti

  • Resolution set to fixed
  • Status changed from patch to closed

Pushed to GHC:

commit c04619769d5a09325d9e7f28b1382f52df6051b4
Author: Paolo Capriotti <p.capriotti@gmail.com>
Date:   Tue May 8 13:05:14 2012 +0100

    Move getMonotonicUSec from base to the RTS.

and base:

commit 29ef12e8b6b4fa38d1c70588e902e205b82e9d52
Author: Paolo Capriotti <p.capriotti@gmail.com>
Date:   Tue May 8 13:06:12 2012 +0100

    Use RTS version of getMonotonicNSec on Windows (#6061)
Note: See TracTickets for help on using tickets.