System.IO.openTempFile does not scale
In search of a bug in darcs http://bugs.darcs.net/issue2364 i've notice very bad property of openTempFile: it's pattern is very predictable and has O(n^2) of already created temp files.
Predictability allows very fun bugs survive in buggy programs, like:
thread1:
(fn, fh) <- openTempFile "." "hello"
renameFile fn "something"
-- some time after
when (some_rare_buggy_condition) $
-- oops, reused temp name, but too late, other thread killed it
writeFileFile fn
thread2:
(fn, fh) <- openTempFile "." "hello"
workWithFn fn -- nobody should touch it, right?
It's very hard to debug data corruption when
all temp files are named "foo{pid}" and sometimes "foo
{pid+1}".
And more serious bug: the more threads you have trying to create similar temps performance drops significantly:
Attached program shows the following numbers:
$ time ./bench-temps same 2000
real 0m2.795s
user 0m1.516s
sys 0m1.190s
$ time ./bench-temps diff 2000
real 0m0.161s
user 0m0.043s
sys 0m0.115s
It's O(N^2) growing open() storm.
https://github.com/ghc/ghc/blob/master/libraries/base/System/IO.hs#L465
FileExists -> findTempName (x + 1)
This is the source of the problem. I'd suggest always using random name for it. For portability reasons I suggest adding at least insecure random rand() value from C library.
That way we will succeed in opening temp file at the first attempt.
Trac metadata
Trac field | Value |
---|---|
Version | 7.8.2 |
Type | Bug |
TypeOfFailure | OtherFailure |
Priority | normal |
Resolution | Unresolved |
Component | Compiler |
Test case | |
Differential revisions | |
BlockedBy | |
Related | |
Blocking | |
CC | |
Operating system | |
Architecture |