Opened 3 years ago

Closed 2 years ago

#7592 closed bug (fixed)

Building the latest master branch on FreeBSD 9.1 fails

Reported by: kazu-yamamoto Owned by: igloo
Priority: normal Milestone: 7.8.1
Component: Build System Version: 7.7
Keywords: Cc: pgj@…
Operating System: FreeBSD Architecture: Unknown/Multiple
Type of failure: Building GHC failed Test Case:
Blocked By: Blocking:
Related Tickets: Differential Revisions:

Description

The following sequence fails on FreeBSD 9.1:

% perl boot
% ./configure --prefix=/ghc-head --with-iconv-includes=/usr/local/include --with-iconv-libraries=/usr/local/lib --with-gmp-includes=/usr/local/include --with-gmp-libraries=/usr/local/lib --with-gcc=/usr/local/bin/gcc47 --with-gcc-4.2=/usr/local/bin/gcc47
% gmake -j10 -d MAKE=gmake
...
Putting child 0x801e5fd80 (ghc/stage1/package-data.mk) PID 36063 on the chain.
 Recipe of `ghc/stage1/package-data.mk' is being run.
 Pruning file `compiler/stage1/package-data.mk'.
Live child 0x801e5fd80 (ghc/stage1/package-data.mk) PID 36063 
Configuring ghc-bin-7.7.20130111...
Warning: 'data-dir: ..' is a relative path outside of the source tree. This
will not work when generating a tarball with 'sdist'.
Reaping winning child 0x801e5fd80 PID 36063 
Removing child 0x801e5fd80 PID 36063 from chain.
 Considering target file `ghc/stage1/package-data.mk'.
 File `ghc/stage1/package-data.mk' was considered already.
Re-executing[2]: gmake -r --no-print-directory -f ghc.mk phase=0 phase_0_builds
GNU Make 3.82
Built for amd64-portbld-freebsd9.1
Copyright (C) 2010  Free Software Foundation, Inc.
License GPLv3+: GNU GPL version 3 or later <http://gnu.org/licenses/gpl.html>
This is free software: you are free to change and redistribute it.
There is NO WARRANTY, to the extent permitted by law.
Reading makefiles...
Reading makefile `ghc.mk'...
ghc.mk:84: *** Make has restarted itself 2 times; is there a makefile bug?.  Sto
p.
Reaping losing child 0x80156e290 PID 34605 
gmake: *** [all] Error 2
Removing child 0x80156e290 PID 34605 from chain.

If I type "gmake -j10 MAKE=gmake" gain, the building completes.

I believe this is relating to http://www.haskell.org/pipermail/glasgow-haskell-users/2012-March/022150.html

Actually if I remove "error" on line 84, this does not happen and the building completes without any problems.

GHC for bootstrapping: 7.4.1, GCC 4.7.3, gmake: 3.82

Attachments (2)

LOG.2.gz (31.3 KB) - added by kazu-yamamoto 3 years ago.
gzipped log of gmake -j10 -dr --no-print-directory -f ghc.mk phase=0 phase_0_builds
LOG.3.gz (31.7 KB) - added by kazu-yamamoto 2 years ago.
Log with modified makefile

Download all attachments as: .zip

Change History (22)

comment:1 Changed 3 years ago by pgj

  • Cc pgj@… added
  • Operating System changed from Unknown/Multiple to FreeBSD
  • Owner set to pgj
  • Type of failure changed from None/Unknown to Building GHC failed

Take.

comment:2 Changed 3 years ago by pgj

I think this a very special concurrency bug that can be only produced on really fast systems. I could not do that with my own machines, but this indeed caused problems at the FreeBSD package building cluster [1] so it has been patched for the corresponding port in the Ports Collection.

But let me know if you have any recommendations on how to solve this.

[1] http://www.freebsd.org/cgi/cvsweb.cgi/ports/lang/ghc/files/patch-ghc.mk?rev=1.2;content-type=text%2Fx-cvsweb-markup

comment:3 Changed 3 years ago by kazu-yamamoto

I know this patch. Probably my machine("12 cores" machine (Intel Xeon E5645, two sockets, 6 cores per 1 CPU, two QPI between two CPUs)) is too fast. On my machine, 3 tries are not good enough. I modified the make file so as to retry 4 times.

I should note that I have another machine of same spec and run Linux 3 on it. It does not cause this problem without this kind patch. So, I believe that we can solve this even on FreeBSD.

comment:4 Changed 3 years ago by igloo

  • difficulty set to Unknown

I assume this only happens when a -j flag is used?

If running this:

gmake -dr --no-print-directory -f ghc.mk phase=0 phase_0_builds

after the boot and configure commands also fails, can you attach the full output of it please?

comment:5 Changed 3 years ago by igloo

Sorry, I meant:

gmake -j10 -dr --no-print-directory -f ghc.mk phase=0 phase_0_builds

(assuming the -j is necessary).

comment:6 Changed 3 years ago by kazu-yamamoto

gmake -j10 -dr --no-print-directory -f ghc.mk phase=0 phase_0_builds

succeeded.

Considering target file `phase_0_builds'.
 File `phase_0_builds' does not exist.
  Considering target file `utils/hsc2hs/dist/build/.depend.haskell'.
  File `utils/hsc2hs/dist/build/.depend.haskell' was considered already.
  Considering target file `utils/hsc2hs/dist/build/.depend.c_asm'.
  File `utils/hsc2hs/dist/build/.depend.c_asm' was considered already.
  Considering target file `utils/genprimopcode/dist/build/.depend.haskell'.
  File `utils/genprimopcode/dist/build/.depend.haskell' was considered already.
  Considering target file `utils/genprimopcode/dist/build/.depend.c_asm'.
  File `utils/genprimopcode/dist/build/.depend.c_asm' was considered already.
 Finished prerequisites of target file `phase_0_builds'.
Must remake target `phase_0_builds'.
Successfully remade target file `phase_0_builds'.

comment:7 Changed 3 years ago by igloo

If that's the full output then it must be from an already built tree. Could you try in a fresh tree please?

comment:8 Changed 3 years ago by kazu-yamamoto

No. Just a tail part. I did:

% gmake maintainer-clean
% perl boot
% ./configure --prefix=/ghc-head --with-iconv-includes=/usr/local/include --with-iconv-libraries=/usr/local/lib --with-gmp-includes=/usr/local/include --with-gmp-libraries=/usr/local/lib --with-gcc=/usr/local/bin/gcc47 --with-gcc-4.2=/usr/local/bin/gcc47
% gmake -j10 -dr --no-print-directory -f ghc.mk phase=0 phase_0_builds

comment:9 Changed 3 years ago by kazu-yamamoto

I made a mistake. I built GHC on a tree where I modified ghc.mk. I built GHC again on a vanilla tree. I will attach a log.

Changed 3 years ago by kazu-yamamoto

gzipped log of gmake -j10 -dr --no-print-directory -f ghc.mk phase=0 phase_0_builds

comment:10 Changed 3 years ago by igloo

Hmm, on line 3039 of the log inplace/bin/ghc-cabal is created:

Invoking recipe from utils/ghc-cabal/ghc.mk:22 to update target `inplace/bin/ghc-cabal'.
"cp" utils/ghc-cabal/dist/build/tmp/ghc-cabal inplace/bin/ghc-cabal

but then on line 5291 it appears to be out-of-date and gets rebuilt:

    Prerequisite `utils/ghc-cabal/dist/build/tmp/ghc-cabal' is newer than target `inplace/bin/ghc-cabal'.
    Prerequisite `inplace/bin//.' is order-only for target `inplace/bin/ghc-cabal'.
   Must remake target `inplace/bin/ghc-cabal'.
Need a job token; we don't have children
Invoking recipe from utils/ghc-cabal/ghc.mk:22 to update target `inplace/bin/ghc-cabal'.
"cp" utils/ghc-cabal/dist/build/tmp/ghc-cabal inplace/bin/ghc-cabal

I can't see any reason for it being out-of-date. Please could you try modifying the GHC_CABAL_INPLACE rule on line 21 of utils/ghc-cabal/ghc.mk to this:

$(GHC_CABAL_INPLACE) : $(GHC_CABAL_DIR)/dist/build/tmp/ghc-cabal$(exeext) | $$(dir $$@)/.
    -stat $<
    -stat $@
    "$(CP)" $< $@
    -stat $<
    -stat $@

and do a build? Perhaps the output will give some insight.

Changed 2 years ago by kazu-yamamoto

Log with modified makefile

comment:11 Changed 2 years ago by kazu-yamamoto

Attached.

comment:12 follow-up: Changed 2 years ago by igloo

Here are the stat outputs:

First time rules fires:
Before copy:
100 10437839 -rwxr-xr-x 1 kazu kazu 20851248 9446798 "Feb  3 05:22:22 2013" "Feb  3 05:22:22 2013" "Feb  3 05:22:22 2013" "Feb  3 05:22:21 2013" 32768 18560 0 utils/ghc-cabal/dist/build/tmp/ghc-cabal
(inplace/bin/ghc-cabal doesn't exist)

After copy:
100 10437839 -rwxr-xr-x 1 kazu kazu 20851248 9446798 "Feb  3 05:22:22 2013" "Feb  3 05:22:22 2013" "Feb  3 05:22:22 2013" "Feb  3 05:22:21 2013" 32768 18560 0 utils/ghc-cabal/dist/build/tmp/ghc-cabal
100  9395816 -rwxr-xr-x 1 kazu kazu 18824736 9446798 "Feb  3 05:22:22 2013" "Feb  3 05:22:22 2013" "Feb  3 05:22:22 2013" "Feb  3 05:22:22 2013" 32768 18560 0 inplace/bin/ghc-cabal

Second time rules fires:
Before copy:
100 10437839 -rwxr-xr-x 1 kazu kazu 20851248 9446798 "Feb  3 05:22:22 2013" "Feb  3 05:22:22 2013" "Feb  3 05:22:22 2013" "Feb  3 05:22:21 2013" 32768 18560 0 utils/ghc-cabal/dist/build/tmp/ghc-cabal
100  9395816 -rwxr-xr-x 1 kazu kazu 18824736 9446798 "Feb  3 05:22:48 2013" "Feb  3 05:22:22 2013" "Feb  3 05:22:22 2013" "Feb  3 05:22:22 2013" 32768 18560 0 inplace/bin/ghc-cabal

After copy:
100 10437839 -rwxr-xr-x 1 kazu kazu 20851248 9446798 "Feb  3 05:22:50 2013" "Feb  3 05:22:22 2013" "Feb  3 05:22:22 2013" "Feb  3 05:22:21 2013" 32768 18560 0 utils/ghc-cabal/dist/build/tmp/ghc-cabal
100  9395816 -rwxr-xr-x 1 kazu kazu 18821696 9446798 "Feb  3 05:22:48 2013" "Feb  3 05:22:50 2013" "Feb  3 05:22:50 2013" "Feb  3 05:22:22 2013" 32768 18560 0 inplace/bin/ghc-cabal

I can't see anything wrong there. I've just looked at the make 3.82 code, and it certainly looks like a dependency needs to be strictly newer than the target for it to be considered changed:

d->changed |= noexist || d_mtime > this_mtime;

I think this will need someone who can reproduce the problem to diagnose exactly what's going on (it might be easiest to make a cut-down testcase first). One possibility is that the filesystem has sub-second precision, stat is ignoring the extra precision, and cp is giving inconsistent times (perhaps truncating the extra precision in the file it creates). Another is that the stat syscall is giving the wrong result for some reason. It's also not impossible that this is a make bug.

comment:13 follow-up: Changed 2 years ago by kazu-yamamoto

comment:14 in reply to: ↑ 13 Changed 2 years ago by graphov

Hi. I'll try to reproduce it in OpenIndiana at the weekend if needed.
Also my machine was really slow (it was virtual machine run on old hardware) and if I remember well, I didn't use -j at all because VM had only one hardware core assigned.

comment:15 in reply to: ↑ 12 Changed 2 years ago by pgj

Replying to igloo:

I think this will need someone who can reproduce the problem to diagnose exactly what's going on (it might be easiest to make a cut-down testcase first). One possibility is that the filesystem has sub-second precision, stat is ignoring the extra precision, and cp is giving inconsistent times (perhaps truncating the extra precision in the file it creates). Another is that the stat syscall is giving the wrong result for some reason. It's also not impossible that this is a make bug.

I am not sure if this helps, but my attention has just been called to that precision of file-system timestamps can be set by the vfs.timestamp_precision sysctl(3) variable. This is 0 by default, which means "seconds", but increasing the value could increase the precision as well. (Use sysctl -d vfs.timestamp_precision to see them.)

comment:16 follow-up: Changed 2 years ago by kazu-yamamoto

Great!

% sudo -w vfs.timestamp_precision=1

solves this problem. I think that displaying a warning message is useful for FreeBSD users.

comment:17 Changed 2 years ago by igloo

  • Milestone set to 7.8.1
  • Owner changed from pgj to igloo

comment:18 in reply to: ↑ 16 Changed 2 years ago by pgj

Replying to kazu-yamamoto:

I think that displaying a warning message is useful for FreeBSD users.

I have added a paragraph on this to the build troubleshooting page on the wiki.

comment:19 Changed 2 years ago by ian@…

commit 83a9f4f4cc3d1358d76a4b3815f0d567560b35e7

Author: Ian Lynagh <[email protected]>
Date:   Sun May 19 20:44:32 2013 +0100

    Refer to the wiki page in the "Make has restarted itself n times" error
    
    Fixes #7592.

 ghc.mk |    2 +-
 1 files changed, 1 insertions(+), 1 deletions(-)

comment:20 Changed 2 years ago by igloo

  • Resolution set to fixed
  • Status changed from new to closed

I've added a link to the troubleshooting wiki page in the error message.

Note: See TracTickets for help on using tickets.