Concurrent modifications of package.cache are not safe
There are a couple of different issues here.
- On Linux, issuing
ghc-pkg register
for multiple packages in parallel might result in lost updates to package database because of howregisterPackage
function works - it reads existing package databases, picks the one to modify, then checks that package info for the package to register is fine and replaces package database with what was read in the beginning + new package info.
Therefore, if updates interleave, it might happen that process1 reads the database, then process2 updates it while process1 still has the old version and uses it for its update later, so update made by process2 is lost.
- On Windows, update to package database might fail - the issue is that GHC attempts to update it using rename trick, which fails whenever any other process has file to be replaced open for reading. Combine that with the fact that GHC reads package database when compiling packages and you get problems in both Stack (https://github.com/commercialhaskell/stack/issues/2617) and Cabal (https://github.com/haskell/cabal/issues/4005).
BTW, rename trick (used for atomic database updates) not only doesn't work on Windows, it's also not atomic e.g. on NFS (https://stackoverflow.com/questions/41362016/rename-atomicity-and-nfs).
The solution to both problems is to use OS specific features to lock database file (in shared mode when reading and in exclusive mode when writing). This can be done on Windows using LockFileEx. Unfortunately for POSIX things are a bit more complicated.
There are two ways to lock a file on Linux:
- Using fcntl(F_SET_LK) (POSIX API)
- Using flock (BSD API)
However, fcntl locks have a serious limitation:
The record locks described above are associated with the process
(unlike the open file description locks described below). This has
some unfortunate consequences:
-
If a process closes any file descriptor referring to a file, then
all of the process's locks on that file are released, regardless
of the file descriptor(s) on which the locks were obtained. This
is bad: it means that a process can lose its locks on a file such
as /etc/passwd or /etc/mtab when for some reason a library
function decides to open, read, and close the same file.
-
The threads in a process share locks. In other words, a
multithreaded program can't use record locking to ensure that
threads don't simultaneously access the same region of a file.
Whereas flock is not guaranteed to work with NFS, according to https://en.wikipedia.org/wiki/File_locking#Problems:
Whether and how flock locks work on network filesystems, such as NFS, is implementation dependent. On BSD systems, flock calls on a file descriptor open to a file on an NFS-mounted partition are successful no-ops. On Linux prior to 2.6.12, flock calls on NFS files would act only locally. Kernel 2.6.12 and above implement flock calls on NFS files using POSIX byte-range locks. These locks will be visible to other NFS clients that implement fcntl-style POSIX locks, but invisible to those that do not.[4]
Assuming that the solution would be to go with locking the database, we would need to:
- In
registerPackage
, lock all read databases in shared mode except for the database that will later be modified, which has to be locked in exclusive mode. The handle also would need to be kept open and passed tochangeDB
later and used for rewriting the database with updated version inGHC.PackageDb.writePackageDb
instead ofwriteFileAtomic
(which is not actually unconditionally atomic, as demonstrated above). -
GHC.PackageDb.decodeFromFile
would lock a file in appropriate mode and return the handle to open file if appropriate. - Add support for locking a file. This should be fairly easy to do in GHC.IO.Handle.FD by extending function
openFile'
with appropriate parameters and then adding wrapper functionopenLockedFile
or something. We can add both blocking and non-blocking locking to make ghc-pkg show information about waiting for locked package database if appropriate.
Alternatively we could add a function similar to the following: hLock :: Handle -> LockMode -> Bool{-block-} -> IO Bool
, but that requires extracting file descriptor from Handle, which as far as I see is problematic.
Is going with locking an acceptable solution here?
Trac metadata
Trac field | Value |
---|---|
Version | 8.0.1 |
Type | Bug |
TypeOfFailure | OtherFailure |
Priority | normal |
Resolution | Unresolved |
Component | ghc-pkg |
Test case | |
Differential revisions | |
BlockedBy | |
Related | |
Blocking | |
CC | |
Operating system | |
Architecture |