Opened 3 years ago

Closed 2 years ago

Last modified 2 years ago

#10322 closed bug (fixed)

In ghci object code loader, linking against the previous temp dylib is not enough on OS X

Reported by: rwbarton Owned by:
Priority: high Milestone: 7.10.2
Component: Compiler Version: 7.10.1
Keywords: Cc: George, a.ulrich
Operating System: MacOS X Architecture: Unknown/Multiple
Type of failure: None/Unknown Test Case:
Blocked By: Blocking:
Related Tickets: Differential Rev(s): Phab:D852
Wiki Page:

Description

joelteon encountered this issue in a rather complicated program, and we worked out the cause over IRC.

Suppose that ghci needs to sequentially load three modules A, B and C, where B refers to symbols from A and C refers to symbols from both A and B. (For example modules B, C and another module D all contain Template Haskell that refers to the previous module(s).) The object code linker currently works like this:

  • Link the module A.o into a dylib ghc_1.dylib
  • Link the module B.o against ghc_1.dylib into a new dylib ghc_2.dylib
  • Link the module C.o against ghc_2.dylib into a new dylib ghc_3.dylib

As a result, ghc_2.dylib ends up with a NEEDED (or whatever it's called in Mach-O) entry for ghc_1.dylib, and ghc_3.dylib ends up with a NEEDED entry for ghc_2.dylib.

However, this apparently does not satisfy the OS X dlopen implementation, which complains about a missing symbol _A_foo referenced by ghc_3.dylib and which is defined in ghc_1.dylib. Apparently the dynamic loader only checks the direct dependencies when trying to resolve undefined symbols.

(The Linux dynamic loader seems to be perfectly happy to use an indirect dependency to resolve an undefined symbol. But I found out that the linker gold has the same sort of behavior as the OS X dynamic loader. I don't know whether there is any standard here, but it seems that we cannot rely on the Linux dynamic loader's behavior.)

Presumably the fix is to keep track of all the previous temporary dylibs (rather than just one in last_temp_so) and link against all of them when building a new temporary dylib. I'm slightly worried about quadratic behavior here, but I think it's unlikely to be an issue in practice.

I have a reproducer at http://lpaste.net/808723564239781888 (which I'll add to the test suite next) and oherrala ran the test on an OS X system with the following results:

=====> Last(ghci) 1167 of 4480 [0, 0, 0] 
cd ./ghci/scripts && HC="/Users/oherrala/gits/ghc/inplace/bin/ghc-stage2" HC_OPTS="-dcore-lint -dcmm-lint -dno-debug-output -no-user-package-db -rtsopts -fno-warn-tabs -fno-ghci-history " "/Users/oherrala/gits/ghc/inplace/bin/ghc-stage2" -dcore-lint -dcmm-lint -dno-debug-output -no-user-package-db -rtsopts -fno-warn-tabs -fno-ghci-history  --interactive -v0 -ignore-dot-ghci +RTS -I0.1 -RTS    <Last.script > Last.run.stdout 2> Last.run.stderr
Actual stderr output differs from expected:
--- /dev/null	2015-04-18 17:23:26.000000000 +0300
+++ ./ghci/scripts/Last.run.stderr	2015-04-18 17:23:34.000000000 +0300
@@ -0,0 +1,9 @@
+ghc-stage2: panic! (the 'impossible' happened)
+  (GHC version 7.11.20150418 for x86_64-apple-darwin):
+	Loading temp shared object failed: dlopen(/var/folders/64/90jfy8lj65bcm1k02syxz_l80000gn/T/ghc18812_0/libghc18812_12.dylib, 5): Symbol not found: _LastA_a_closure
+  Referenced from: /var/folders/64/90jfy8lj65bcm1k02syxz_l80000gn/T/ghc18812_0/libghc18812_12.dylib
+  Expected in: flat namespace
+ in /var/folders/64/90jfy8lj65bcm1k02syxz_l80000gn/T/ghc18812_0/libghc18812_12.dylib
+
+Please report this as a GHC bug:  http://www.haskell.org/ghc/reportabug
+
Actual stdout output differs from expected:
--- ./ghci/scripts/Last.stdout	2015-04-18 16:26:55.000000000 +0300
+++ ./ghci/scripts/Last.run.stdout	2015-04-18 17:23:34.000000000 +0300
@@ -1,3 +1,2 @@
 3
 4
-7
*** unexpected failure for Last(ghci)

Change History (16)

comment:1 Changed 3 years ago by Reid Barton <rwbarton@…>

In 88b84063c11a48820011805a8341d95f7fcd59db/ghc:

Test case for indirect dependencies in ghci linker (#10322)

comment:2 Changed 3 years ago by rwbarton

Differential Rev(s): Phab:D852

comment:3 Changed 3 years ago by trommler

Interesting, Mach-O doesn't re-export imported symbols but ELF does.

I wonder if there is a linker flag to make Mach-O export imported symbols anyway.

comment:4 Changed 3 years ago by George

Not sure but the -flat_namespace option might be what you are looking for. Following is from the Mac man page for ld. I wonder if ghc has always had this bug or if this is a regression?

Two-level namespace

By default all references resolved to a dynamic library record the library to which they were resolved. At runtime, dyld uses that information to directly resolve symbols. The alternative is to use the -flat_namespace option. With flat namespace, the library is not recorded. At runtime, dyld will search each dynamic library in load order when resolving symbols. This is slower, but more like how other operating systems resolve symbols.

Indirect dynamic libraries

If the command line specifies to link against dylib A, and when dylib A was built it linked against dylib B, then B is considered an indirect dylib. When linking for two-level namespace, ld does not look at indirect dylibs, except when re-exported by a direct dylibs. On the other hand when linking for flat namespace, ld does load all indirect dylibs and uses them to resolve references. Even though indirect dylibs are specified via a full path, ld first uses the specified search paths to locate each indirect dylib. If one cannot be found using the search paths, the full path is used.

-flat_namespace

Alters how symbols are resolved at build time and runtime. With -two_levelnamespace (the default), the linker only searches dylibs on the command line for symbols, and records in which dylib they were found. With -flat_namespace, the linker searches all dylibs on the command line and all dylibs those original dylibs depend on. The linker does not record which dylib an external symbol came from, so at runtime dyld again searches all images and uses the first definition it finds. In addition, any undefines in loaded flat_namespace dylibs must be resolvable at build time.

Last edited 3 years ago by George (previous) (diff)

comment:5 Changed 3 years ago by George

Cc: George added

comment:6 Changed 3 years ago by thoughtpolice

Status: newpatch

comment:7 Changed 3 years ago by Austin Seipp <austin@…>

In b0b11ad93cf8470caed572dc16e5cf91304fa355/ghc:

In ghci linker, link against all previous temp sos (#10322)

The OS X dlopen() appears to only resolve undefined symbols in
the direct dependencies of the shared library it is loading.

Reviewed By: trommler, austin

Differential Revision: https://phabricator.haskell.org/D852

GHC Trac Issues: #10322

comment:8 Changed 3 years ago by Austin Seipp <austin@…>

In 470a94947b076cb74a6adcbcf9b39057a67e1fba/ghc:

Revert "In ghci linker, link against all previous temp sos (#10322)"

This reverts commit b0b11ad93cf8470caed572dc16e5cf91304fa355.

It apparently made Harbormaster sad.

comment:9 in reply to:  8 Changed 2 years ago by trommler

Replying to Austin Seipp <austin@…>:

In 470a94947b076cb74a6adcbcf9b39057a67e1fba/ghc:

[...]
It apparently made Harbormaster sad.

Do you have details on what went wrong with Harbormaster?

Linking all libraries looked to me like the most promising way to do dynamic linking in the future and also to implement library unload.

comment:10 in reply to:  4 ; Changed 2 years ago by trommler

Replying to George:

Not sure but the -flat_namespace option might be what you are looking for.

How bad as in breaking assumptions made by MacOS would it be to use -flat_namespace?

comment:11 Changed 2 years ago by a.ulrich

Cc: a.ulrich added

comment:12 Changed 2 years ago by Austin Seipp <austin@…>

In a52f1444ea4045a2075dc88bb973a9289ee7e2cf/ghc:

In ghci linker, link against all previous temp sos (#10322)

The OS X dlopen() appears to only resolve undefined symbols in
the direct dependencies of the shared library it is loading.

Reviewed By: trommler, austin

Differential Revision: https://phabricator.haskell.org/D852

GHC Trac Issues: #10322

comment:13 Changed 2 years ago by thoughtpolice

Status: patchmerge

comment:14 Changed 2 years ago by thoughtpolice

Resolution: fixed
Status: mergeclosed

Merged to ghc-7.10.

comment:15 in reply to:  10 Changed 2 years ago by rwbarton

Replying to trommler:

Replying to George:

Not sure but the -flat_namespace option might be what you are looking for.

How bad as in breaking assumptions made by MacOS would it be to use -flat_namespace?

I don't know, but I found this cryptic comment in compiler/main/SysTools.hs:

-- This feature requires Mac OS X 10.3 or later; there is -- a similar feature, -flat_namespace -undefined suppress, -- which works on earlier versions, but it has other -- disadvantages.

comment:16 Changed 2 years ago by George

see also #10568

Note: See TracTickets for help on using tickets.