Opened 3 months ago

Closed 5 weeks ago

Last modified 8 days ago

#8696 closed bug (fixed)

linking fails with 'relocation R_X86_64_PC32 against undefined symbol'

Reported by: Kata Owned by:
Priority: high Milestone: 7.8.1
Component: Compiler Version: 7.8.1-rc2
Keywords: Cc: jan.stolarek@…, bgamari@…, simonmar
Operating System: Linux Architecture: x86_64 (amd64)
Type of failure: Other Difficulty: Unknown
Test Case: Blocked By:
Blocking: Related Tickets:

Description

Building ekmett's lens package from HEAD (revision 80c1fddf) with the following error fragment:

Last 10 lines of the build log ( /home/kata/.cabal/logs/lens-4.0.log ):
[77 of 85] Compiling Data.Vector.Lens ( src/Data/Vector/Lens.hs, dist/build/Data/Vector/Lens.o )
[78 of 85] Compiling Data.Vector.Generic.Lens ( src/Data/Vector/Generic/Lens.hs, dist/build/Data/Vector/Generic/Lens.o )
[79 of 85] Compiling Generics.Deriving.Lens ( src/Generics/Deriving/Lens.hs, dist/build/Generics/Deriving/Lens.o )
[80 of 85] Compiling GHC.Generics.Lens ( src/GHC/Generics/Lens.hs, dist/build/GHC/Generics/Lens.o )
[81 of 85] Compiling System.Exit.Lens ( src/System/Exit/Lens.hs, dist/build/System/Exit/Lens.o )
[82 of 85] Compiling System.FilePath.Lens ( src/System/FilePath/Lens.hs, dist/build/System/FilePath/Lens.o )
[83 of 85] Compiling System.IO.Error.Lens ( src/System/IO/Error/Lens.hs, dist/build/System/IO/Error/Lens.o )
/usr/bin/ld: dist/build/Control/Lens/TH.dyn_o: relocation R_X86_64_PC32 against undefined symbol `lenszm4zi0_ControlziLensziInternalziTH_appsE1zulgo_info' can not be used when making a shared object; recompile with -fPIC
/usr/bin/ld: final link failed: Bad value
collect2: error: ld returned 1 exit status
cabal: Error: some packages failed to install:
lens-4.0 failed during the building phase. The exception was:
ExitFailure 1

Full log is attached. I'm not trying to build with shared libraries, or profiling, or anything like that.

I can reproduce this failure starting from a fresh Debian testing install as follows:

  1. Bootstrap GHC HEAD.
    sudo aptitude install ghc=7.6.3-6 happy=1.19.0-1 alex=3.1.0-1
    sudo apt-get build-dep ghc
    cd /var/tmp
    git clone git://git.haskell.org/ghc.git
    cd ghc
    git checkout e01367ff
    ./sync-all get
    ./boot
    ./configure
    # Adjust the -j option to taste.
    make -j4
    sudo make install
    
  1. Bootstrap cabal-install.
    cd /var/tmp
    wget http://www.haskell.org/cabal/release/cabal-install-1.18.0.2/cabal-install-1.18.0.2.tar.gz -O - | tar xzf -
    cd cabal-install-1.18.0.2
    patch -p3 < cabal-install-ghc-HEAD.patch # see attached
    sudo aptitude install zlib1g-dev
    ./bootstrap.sh
    export PATH=$HOME/.cabal/bin:$PATH
    cabal update
    
  1. Build lens.
    cd /var/tmp
    git clone https://github.com/ekmett/lens
    cd lens
    git checkout 80c1fddf
    cabal install cpphs
    cabal install # this fails
    

Using gcc Debian 4.8.2-14, ld 2.24. The GHC fingerprint and output of ghc --info is attached.

Attachments (6)

ghc-info (1.7 KB) - added by Kata 3 months ago.
output of ghc --info
ghc-fingerprint (2.1 KB) - added by Kata 3 months ago.
output of utils/fingerprint/fingerprint.py create
cabal-install-ghc-HEAD.patch (1.6 KB) - added by Kata 3 months ago.
patch to make cabal-install build
lens-4.0.log (16.3 KB) - added by Kata 3 months ago.
log of cabal install -v2
0001-Test-case-for-8696.patch (2.2 KB) - added by rwbarton 3 months ago.
simple reproducer in ghci
0001-Temporary-workaround-for-8696-allow-dynamically-link.patch (1.0 KB) - added by rwbarton 2 months ago.

Download all attachments as: .zip

Change History (39)

Changed 3 months ago by Kata

output of ghc --info

Changed 3 months ago by Kata

output of utils/fingerprint/fingerprint.py create

Changed 3 months ago by Kata

patch to make cabal-install build

Changed 3 months ago by Kata

log of cabal install -v2

comment:1 Changed 3 months ago by Kata

Weirdly, if I run cabal install again after the failure, it seems to succeed. I don't know whether it actually succeeded or if it's just lying to me and will blow up in my face at some point.

comment:2 Changed 3 months ago by rwbarton

I looked into this a bit (with cabal install --ghc-option=-v3 -v -v -v) and it looks like

  • there are (at least) two modules in lens which use Template Haskell, Control.Lens.At and System.IO.Error.Lens
  • when building the first one Control.Lens.At, ghc correctly links the modules it needs to run the TH into a little shared library
  • when building the second one System.IO.Error.Lens, ghc doesn't include any modules (I only did a couple spot checks here, but) that were built before Control.Lens.At when building the shared library for TH. In particular, it includes Control.Lens.TH but not Control.Lens.Internal.TH which the former depends on.

I didn't attempt to investigate in the source code why this is happening.

comment:3 Changed 3 months ago by jstolarek

  • Cc jan.stolarek@… added

I reported similar problem on GHC devs recently:

http://www.haskell.org/pipermail/ghc-devs/2014-January/003877.html

In my case failure also seems related to Template Haskell

comment:4 Changed 3 months ago by thoughtpolice

  • Priority changed from normal to highest

comment:5 Changed 3 months ago by thoughtpolice

  • Milestone set to 7.8.1

comment:6 Changed 3 months ago by bgamari

  • Cc bgamari@… added

Changed 3 months ago by rwbarton

simple reproducer in ghci

comment:7 Changed 3 months ago by rwbarton

The relevant code is in compiler/ghci/Linker.lhs, functions linkDependencies, getLinkDeps, dynLinkObjs and dynLoadObjs. When linking in a module GHC attempts to link all the object files that are not already loaded into GHC into a single shared library and then dlopen that library. That fails here because T8696B.o apparently contains a PC-relative relocation to T8696A.o and therefore cannot be dynamically linked to it.

I noticed that both getLinkDeps and dynLinkObjs seem to be guilty of pruning modules/object files that have already been loaded into GHC from the list of dependencies to link together. But even if they did not prune any dependencies, would there then be problems with two copies of T8696A.o being loaded into GHC? So I'm not sure what the right fix is here.

comment:8 Changed 3 months ago by carter

Is this bug possibly the culprit behind https://github.com/bscarlet/llvm-general/issues/84#issuecomment-33917697 ?
namely
lookupSymbol failed in relocateSection (RELOC_GOT)
/usr/local/Cellar/llvm33/3.3/lib/llvm-3.3/lib/libLLVMSupport.a: unknown symbol `_dso_handle'
ghc: unable to load package llvm-general-3.3.8.2' `

this happens in the context of trying to build llvm-general with haskell shared libs while liking to the static lib version of llvm's libs

comment:10 Changed 3 months ago by rwbarton

So here are three approaches that come to mind for the original Template Haskell issue.

  1. When dynamically loading module M from package P, just link together all the dependencies of M in package P, regardless of whether they have been loaded already, and hope that loading two copies of a dependency won't cause any problems. Unclear whether the latter is the case.
  1. After building and loading the shared library of dependencies and running Template Haskell for a module, unload the shared library. (Or, lazily unload it only when we need to load more modules that have intra-package dependencies on modules we have loaded already.) I don't know how difficult this is, but I understand that unloading shared libraries is supposed to basically work, so perhaps it's not hard.
  1. Build a separate "flavor" of object file .really_dyn_o that uses dynamic references even to symbols in the same package, so we can freely dynamically load individual modules. These are needed only while building a package that uses Template Haskell; once the package is completely built they can be thrown away. This is the most obviously correct option IMO, but it might impose a pretty high build time cost.

I think option 2 is best, unless it turns out to be a lot harder than I anticipate, with option 3 my second choice.

There's a closely related issue in ghci with incrementally loading modules in the "main" (i.e., no) package, which is what the test case I attached actually demonstrates. In this setting, unloading old modules so that we can reload them linked against new ones seems less reasonable, since we might be resetting global state I guess. So here I think option 3 is best. It would be very easy to implement: simply disallow static linking to any symbol in the "main" package. (Is there a good reason to build .dyn_o files for an executable in the first place?) (EDIT: Checking for package "main" is technically not correct because you can ghci -package-name foo (but why would you?) Urk. Maybe there really should just be a third flavor for object files that are intended to be loaded individually into ghci.)

Last edited 3 months ago by rwbarton (previous) (diff)

comment:11 Changed 3 months ago by rwbarton

Hmm, option 2 won't interact very well with the parallel upsweep...

comment:12 Changed 2 months ago by simonmar

  • Cc simonmar added

comment:13 Changed 2 months ago by simonmar

@rwbarton diagnosed it correctly. The problem is that

  • when we compile objects for dynamic linking, references to symbols in the same package are static references, resolved when we link the dynamic library together.
  • But in GHCi, we are not loading all the modules together, we're loading only the modules we need to satisfy the dependencies of the current expression, linked together as a shared library. If we need more modules later, we link those as a separate shared library, which breaks the assumption that all the objects of a package are linked into the same shared library. In the test case. T8696A is loaded first, then when we try to load T8696B we can't resolve the reference to T8696A because it is in a different shared library.

We can link *all* the objects together, regardless of what we actually depend on, and load the whole shared library (and all the package dependencies). The problem with this is that when compiling multiple modules with TH we'll end up loading O(n2) the number of modules. Unloading doesn't work with dynamic linking.

Sigh. I hate dynamic linking. This all used to work properly before.

I've run out of time for this today, but there isn't an obviously right solution so it needs a bit more thought anyway.

comment:14 follow-up: Changed 2 months ago by simonpj

which breaks the assumption that all the objects of a package are linked into the same shared library

Could we not simply drop that assumption? At least when we aren't loading a whole package but instead are doing this "load module and its dependencies" stuff.

Simon

comment:15 in reply to: ↑ 14 Changed 2 months ago by simonmar

Replying to simonpj:

which breaks the assumption that all the objects of a package are linked into the same shared library

Could we not simply drop that assumption? At least when we aren't loading a whole package but instead are doing this "load module and its dependencies" stuff.

It's a compile-time choice, so we can't do one thing for compiling a package and another when running TH. I think it's probably a big performance hit to drop this optimisation (but I don't have measurements). Basically the optimisation means that all the intra-package references can be direct, rather than going via the indirection table.

comment:16 Changed 2 months ago by rwbarton

Perhaps it makes sense to drop the optimization for 7.8-RC2, though, so that testers can find issues that are masked by this one? Patch attached. I've tested that it fixes "cabal install lens" and am currently benchmarking it.

comment:17 Changed 2 months ago by rwbarton

(I haven't tried validating that patch, I guess it will probably fail due to -Werror because the argument this_pkg is now unused.)

comment:18 Changed 2 months ago by duncan

So one thought that we were just mulling over in #ghc is...

Use bytecode for the local modules in the current package. This already works for GHCi of course. We don't have to use the .dyn_o files. We do know when we start a --make job if we have any modules that use TH/QQ. If we do then we could turn on a mode "compile to bytecode in mem for all modules and spit out object code", a bit like the existing (albeit hacky) -dynamic-too flag.

comment:19 follow-up: Changed 2 months ago by carter

how would that impact compilation time? (using the bytecode versions?)

comment:20 in reply to: ↑ 19 ; follow-up: Changed 2 months ago by duncan

Replying to carter:

how would that impact compilation time? (using the bytecode versions?)

It'd increase it a bit because we'd have to generate and retain the bytecode. The factors are:

  • we only need to do it for sets of modules involving TH & QQ
  • we could do it just for the modules that are dependencies of modules using TH
  • however we may have to run the pipeline twice because the bytecode generator cannot handle -O results (because of unboxed tuples etc). Though in principle we only need to do that for when you build with -O, and it doesn't have to re-run all stages.

comment:21 in reply to: ↑ 20 Changed 2 months ago by carter

Replying to duncan:

Ok, so the bytecode approach gives a "fast path" that avoids the "2 passes" fall back general case?

So, eg, a package with unboxed tuples in every module would trigger the fall back case?

comment:22 Changed 2 months ago by thoughtpolice

Reid's fix does in fact work and I'm inclined to go with it for RC2 since it's slightly overdue. I've checked that with the 7.8 branch that lens installs properly.

After thinking this weekend, I think this should go in. In the mean time, we can think up a better way to fix this.

comment:23 Changed 2 months ago by Austin Seipp <austin@…>

In ed1aced403b50f1a15fbe06cc7eeca5b23e69e37/ghc:

Fix #8696 - don't generate static intra-package references.

See the comments in Packages.lhs and the ticket for some more explanation.

This is a temporary fix while we consider a way to re-enable intra-package
references in the mean time.

Authored-by: Reid Barton <rwbarton@gmail.com>
Signed-off-by: Austin Seipp <austin@well-typed.com>

comment:24 Changed 2 months ago by thoughtpolice

  • Priority changed from highest to high

This is now in the 7.8 branch. I'm dropping the priority but not closing or punting this ticket off, since it's at least mitigated a bit.

comment:25 Changed 2 months ago by simonmar

While this change is in, someone could measure how much effect it has on runtime and binary sizes. Since we still default to static linking, perhaps we can live with a performance hit for dynamic code.

comment:26 Changed 7 weeks ago by thoughtpolice

  • Version changed from 7.8.1-rc1 to 7.8.1-rc2

comment:27 Changed 5 weeks ago by thoughtpolice

BTW, I ran nofib earlier and the results seem pretty conclusive on amd64: this change has minimal - if any - impact. Here's the overview:

--------------------------------------------------------------------------------
        Program           Size    Allocs   Runtime   Elapsed  TotalMem
--------------------------------------------------------------------------------
            Min          +0.0%     -0.2%     -6.5%     -4.1%     +0.0%
            Max          +0.0%     +0.0%     +5.4%     +5.4%     +3.0%
 Geometric Mean          -0.0%     -0.0%     -0.1%     +0.1%     +0.0%

I imagine the differences here are mostly noise due to my machine - the affected programs don't even use dynamic when compiled with nofib. Compile times remain effectively unchanged:

        -1 s.d.                -----           -2.7%
        +1 s.d.                -----           +3.4%
        Average                -----           +0.3%

I saw this consistently on both my Linux/amd64 machines and OSX/amd64 machines. I'll get i386 benchmarks soon, as this is where we will likely see any possible difference.

comment:28 Changed 5 weeks ago by gidyn

  • Cc gideon@… added

comment:29 Changed 5 weeks ago by thoughtpolice

  • Resolution set to fixed
  • Status changed from new to closed

The results for i386 aren't bad either, compile times seem OK as well:

        -1 s.d.                -----           -2.0%
        +1 s.d.                -----           +1.7%
        Average                -----           -0.2%

So I think this ticket can be considered closed - the penalties seem very small overall on GHC, which is the primary case for dynamic linking (and probably the largest). Simon, do re-open if you disagree.

comment:30 Changed 8 days ago by nomeata

Is this related to the failure

configure: Building in-tree ghc-pwd
/usr/bin/ld: utils/ghc-pwd/dist-boot/Main.o: relocation R_X86_64_32 against `stg_CHARLIKE_closure' can not be used when making a shared object; recompile with -fPIC
utils/ghc-pwd/dist-boot/Main.o: could not read symbols: Bad value
collect2: error: ld returned 1 exit status
configure: error: Building ghc-pwd failed

at http://deb.haskell.org/dailies/2014-04-16/ghc_7.9.20140416-0.daily_amd64.build which prevents up-to-date packages to appear in on deb.haskell.org?

comment:31 Changed 8 days ago by simonmar

No, that's different. What LDFLAGS are being passed to configure?

comment:32 Changed 8 days ago by gidyn

  • Cc gideon@… removed

comment:33 Changed 8 days ago by nomeata

No, that's different. What LDFLAGS are being passed to configure?

Ok, reported as #9007.

Note: See TracTickets for help on using tickets.