Opened 5 months ago

Closed 6 weeks ago

#13945 closed bug (fixed)

'ghc-pkg update' fails due to bad file descriptor error

Reported by: mpickering Owned by: bgamari
Priority: high Milestone: 8.2.2
Component: Compiler Version: 8.2.1-rc3
Keywords: Cc: arybczak, goldfire
Operating System: Linux Architecture: Unknown/Multiple
Type of failure: None/Unknown Test Case:
Blocked By: Blocking:
Related Tickets: Differential Rev(s): Phab:D3897
Wiki Page:

Description (last modified by bgamari)

In the final part of installation when packages are registered, the following command fails

"inplace/bin/ghc-cabal" register libraries/ghc-prim dist-install "/home/pgrads/mp16005/linux/installed-ghc/lib/ghc-8.2.0.20170708/bin/ghc" "/home/pgrads/mp16005/linux/installed-ghc/lib/ghc-8.2.0.20170708/bin/ghc-pkg" "/home/pgrads/mp16005/linux/installed-ghc/lib/ghc-8.2.0.20170708" '' '/home/pgrads/mp16005/linux/installed-ghc' '/home/pgrads/mp16005/linux/installed-ghc/lib/ghc-8.2.0.20170708' '/home/pgrads/mp16005/linux/installed-ghc/share/doc/ghc-8.2.0.20170708/html/libraries' NO

with the following output

Registering library for ghc-prim-0.5.0.0..
ghc-cabal:
'/home/pgrads/mp16005/linux/installed-ghc/lib/ghc-8.2.0.20170708/bin/ghc-pkg'
exited with an error:
ghc-pkg: Couldn't open database
/home/pgrads/mp16005/linux/installed-ghc/lib/ghc-8.2.0.20170708/package.conf.d
for modification: {handle:
/home/pgrads/mp16005/linux/installed-ghc/lib/ghc-8.2.0.20170708/package.conf.d/package.cache.lock}:
hLock: invalid argument (Bad file descriptor)

It might be something to do with the permissions I have on my machine but I used to be able to install ghc if I gave it a suitable prefix other than /usr/local.

A workaround is to make sure that HAVE_FLOCK is not defined, the way I did this was commenting out the three relevant lines in libraries/base/configure.ac.

geekosaur suggests on IRC that the problem might be

what might be possible is that hLock uses fcntl locking, ghc-pkg opens a database r/o if it can't open it r/w, and it tried to acquire a write lock (which will fail with EBADF if the file descriptor is only open for read). and there is an SElinux context prohibiting your process opening the db for write

Attachments (1)

test.c (925 bytes) - added by bgamari 4 months ago.
An end-to-end testcase

Download all attachments as: .zip

Change History (28)

comment:1 Changed 5 months ago by mpickering

Cc: goldfire added
Description: modified (diff)
Summary: make install fails when registering packages'ghc-pkg update' fails due to bad file descriptor error

I updated the ticket as I think this will be a more general problem using 8.2.1 on any system where these restrictions exist. I can't properly diagnose the problem but geekosaur speculates an explanation which seemed plausible.

This also seems the same issue that Richard was having back in March. https://mail.haskell.org/pipermail/ghc-devs/2017-March/013915.html

comment:2 Changed 5 months ago by mpickering

Priority: normalhigh
Version: 8.0.18.2.1-rc3

I confirmed this happens with the ghc-8.2.1-rc3 bindist as well.

comment:3 Changed 5 months ago by mpickering

With a NO_FLOCK build, I have to compile package with cabal with -j1 -v1 otherwise very strange errors happen.

Last edited 5 months ago by mpickering (previous) (diff)

comment:4 Changed 5 months ago by bgamari

Operating System: Unknown/MultipleLinux

comment:5 Changed 5 months ago by bgamari

Description: modified (diff)

comment:6 Changed 5 months ago by bgamari

mpickering is using CentOS (which indeed uses SELinux, IIRC). goldfire, are you as well?

comment:7 Changed 5 months ago by j.waldmann

I too am seeing this error when doing "make install" after building rc3 from source, configured with a non-standard prefix, on Debian, where the target directory is on an nfs-mounted volume.

comment:8 Changed 5 months ago by bgamari

J.waldmann, is the NFS lock daemon running?

comment:9 Changed 4 months ago by bgamari

Milestone: 8.2.2

comment:10 Changed 4 months ago by bgamari

I am able to reproduce this locally. Here is a small reproducer,

#include <errno.h>
#include <stdio.h>
#include <string.h>
#include <unistd.h>
#include <sys/types.h>
#include <sys/stat.h>
#include <sys/file.h>

int main () {
  int fd = open("libraries/bootstrapping.conf/package.cache.lock", O_RDONLY);
  int res;

  struct stat stat;
  res = fstat(fd, &stat);
  printf("stat: %s\n", strerror(errno));

  res = flock(fd, LOCK_EX);
  printf("flock: %s\n", strerror(errno));

  res = flock(fd, LOCK_UN);
  printf("funlock: %s\n", strerror(errno));

  close(fd);
  return 0;
}

When run in an NFS-mounted GHC tree with an existing lockfile this will fail with,

$ gcc test.c && ./a.out 
stat: Success
flock: Bad file descriptor
funlock: Bad file descriptor

Strangely if one changes O_RDONLY to O_RDWR the failure becomes,

$ gcc test.c && ./a.out 
stat: Success
flock: No locks available
funlock: No locks available

So I think this may be in part due to the read-only nature of the fd, but there may be more at play.

Update: Indeed there was more at play: the server wasn't running statd. With statd running the program works as expected.

Last edited 3 months ago by bgamari (previous) (diff)

Changed 4 months ago by bgamari

Attachment: test.c added

An end-to-end testcase

comment:11 Changed 4 months ago by bgamari

Here is a standalone test. I have confirmed that this runs on my local filesystem, yet not on my NFS mount (where both the server and client are running statd, rpcbind, and portmapper).

comment:12 Changed 4 months ago by bgamari

Ahh, the issue appears to be that NFS is more strict about the privileges necessary to take an exclusive (LOCK_EX) flock. Namely, you need write access. If you only have read access to a file then you can only take a LOCK_SH lock.

comment:13 Changed 3 months ago by bgamari

Differential Rev(s): Phab:D3897
Status: newpatch

So this is problematic: lockPackageDb first tries to open (and then exclusively lock) the lockfile as read-only to account for the possibility that we are opening a global package database for which we only have read access. However, NFS does not allow this as mentioned above.

I believe Phab:D3897 is one possible fix.

comment:14 Changed 3 months ago by Ben Gamari <ben@…>

In f86de44/ghc:

ghc-pkg: Try opening lockfiles in read-write mode first

As pointed out in #13945, some filesystems only allow allow exclusive
locks if the fd being locked was opened for write access. This causes
ghc-pkg to fail as it first attempts to open and exclusively lock its
lockfile in read-only mode to accomodate package databases for which we
lack write permissions (e.g.  global package databases).

Instead, we now try read-write mode first, falling back to read-only
mode if this fails.

Reviewers: austin

Subscribers: rwbarton, thomie

GHC Trac Issues: #13945

Differential Revision: https://phabricator.haskell.org/D3897

comment:15 Changed 3 months ago by Ben Gamari <ben@…>

In 779b9e6/ghc:

PackageDb: Explicitly unlock package database before closing

Reviewers: austin

Subscribers: rwbarton, thomie

GHC Trac Issues: #13945

Differential Revision: https://phabricator.haskell.org/D3874

comment:16 Changed 3 months ago by bgamari

Status: patchinfoneeded

My testing suggests that comment:14 should be sufficient to fix this. It would be great if someone could confirm this. Note that in my experience it is the boot compiler's ghc-pkg that fails, so you will need to build a new compiler with this patch and use it to bootstrap another tree to test this.

comment:17 Changed 3 months ago by j.waldmann

I manually applied the patch to a source distribution of 8.2.1. Then, "make && make install" worked, and the installed ghc seems good.

Last edited 3 months ago by j.waldmann (previous) (diff)

comment:18 Changed 3 months ago by bgamari

Thanks for the data point, j.waldmann.

comment:19 Changed 2 months ago by simonpj

I'm still stuck on this. Sigh.

comment:20 Changed 2 months ago by bgamari

Sigh indeed; this is on master, yes? Same exact error?

comment:21 Changed 2 months ago by bgamari

Status: infoneedednew

Reopening so we don't lose track of this.

comment:22 Changed 2 months ago by bgamari

Owner: set to bgamari

comment:23 Changed 2 months ago by j.waldmann

I built from ghc-8.2.2-rc1 source and "make install" fails as before - unsurprisingly, since the above patch is not in there. Should it be?

comment:24 Changed 2 months ago by bgamari

In this case the ticket hasn't yet been closed, which is when I typically backport patches. That being said, I suppose comment:14 and comment:15 are almost certainly going to be part of the final solution so I'll go ahead and merge them.

comment:26 Changed 6 weeks ago by Ben Gamari <ben@…>

In 3b784d4/ghc:

base: Implement file locking in terms of POSIX locks

Hopefully these are more robust to NFS malfunction than BSD flock-style
locks.  See #13945.

Test Plan: Validate via @simonpj

Reviewers: austin, hvr

Subscribers: rwbarton, thomie, erikd, simonpj

GHC Trac Issues: #13945

Differential Revision: https://phabricator.haskell.org/D4129

comment:27 Changed 6 weeks ago by bgamari

Resolution: fixed
Status: newclosed
Note: See TracTickets for help on using tickets.