Opened 16 months ago

Closed 13 months ago

Last modified 5 weeks ago

#12479 closed bug (fixed)

build fail of commercialhaskell.com with stack build on mac os x sierra

Reported by: stephenb Owned by:
Priority: highest Milestone: 8.0.2
Component: Compiler Version: 8.0.1
Keywords: Cc: lelf, glguy, chak@…, mnislaih, dleuschner, tohnann
Operating System: MacOS X Architecture: Unknown/Multiple
Type of failure: Compile-time crash Test Case:
Blocked By: Blocking:
Related Tickets: #12198 Differential Rev(s): Phab:D2532
Wiki Page:

Description

Hi,

git clone (commit dfe55e97ed86567aafca2e5f3c19096e2a4cb50f - Sep 20th 2016) of commercialhaskell.com from github:

https://github.com/commercialhaskell/commercialhaskell.com.git

follow instructions from that repo exactly for os x (stacksetup installed ghc-7.10.2, but i have seen similar issue with 8):

$ brew install icu4c
Add the following to your ~/.stack/stack.yaml:

   extra-include-dirs:
   - /usr/local/opt/icu4c/include
   extra-lib-dirs:
   - /usr/local/opt/icu4c/lib
Now:

   $ stack build

Hit one minor issue in build process with icu4c, resolving with following stack build:

stack build --extra-lib-dirs=/usr/local/opt/icu4c/lib --extra-include-dirs=/usr/local/opt/icu4c/include

reach build of yesod-auth, with a failure:

yesod-auth-1.4.6: configure
yesod-auth-1.4.6: build
texmath-0.8.2.2: copy/register
Progress: 35/38
--  While building package yesod-auth-1.4.6 using:
      /Users/stephen/.stack/setup-exe-cache/x86_64-osx/setup-Simple-Cabal-1.22.4.0-ghc-7.10.2 --builddir=.stack-work/dist/x86_64-osx/Cabal-1.22.4.0 build --ghc-options " -ddump-hi -ddump-to-file"
    Process exited with code: ExitFailure 1
    Logs have been written to: /Users/stephen/Documents/github/commercialhaskell.com/.stack-work/logs/yesod-auth-1.4.6.log

    Configuring yesod-auth-1.4.6...
    Building yesod-auth-1.4.6...
    Preprocessing library yesod-auth-1.4.6...
    [ 1 of 11] Compiling Yesod.PasswordStore ( Yesod/PasswordStore.hs, .stack-work/dist/x86_64-osx/Cabal-1.22.4.0/build/Yesod/PasswordStore.o )
    
    /private/var/folders/3k/ycnfbqgx33n7qdytkl9ryx7m0000gn/T/stack49268/yesod-auth-1.4.6/Yesod/PasswordStore.hs:166:31: Warning:
        Defaulting the following constraint(s) to type ‘Integer’
          (Integral b0)
            arising from a use of ‘^’ at Yesod/PasswordStore.hs:166:31
          (Num b0)
            arising from the literal ‘32’ at Yesod/PasswordStore.hs:166:32-33
        In the first argument of ‘(-)’, namely ‘2 ^ 32’
        In the first argument of ‘(*)’, namely ‘(2 ^ 32 - 1)’
        In the second argument of ‘(>)’, namely ‘(2 ^ 32 - 1) * hLen’
    
    /private/var/folders/3k/ycnfbqgx33n7qdytkl9ryx7m0000gn/T/stack49268/yesod-auth-1.4.6/Yesod/PasswordStore.hs:419:1: Warning:
        Defined but not used: ‘toStrict’
    
    /private/var/folders/3k/ycnfbqgx33n7qdytkl9ryx7m0000gn/T/stack49268/yesod-auth-1.4.6/Yesod/PasswordStore.hs:422:1: Warning:
        Defined but not used: ‘fromStrict’
    [ 2 of 11] Compiling Yesod.Auth.Message ( Yesod/Auth/Message.hs, .stack-work/dist/x86_64-osx/Cabal-1.22.4.0/build/Yesod/Auth/Message.o )
    
    /private/var/folders/3k/ycnfbqgx33n7qdytkl9ryx7m0000gn/T/stack49268/yesod-auth-1.4.6/Yesod/Auth/Message.hs:22:1: Warning:
        The import of ‘Data.Monoid’ is redundant
          except perhaps to import instances from ‘Data.Monoid’
        To import instances alone, use: import Data.Monoid()
    [ 3 of 11] Compiling Yesod.Auth.Routes ( Yesod/Auth/Routes.hs, .stack-work/dist/x86_64-osx/Cabal-1.22.4.0/build/Yesod/Auth/Routes.o )
    [ 4 of 11] Compiling Yesod.Auth       ( Yesod/Auth.hs, .stack-work/dist/x86_64-osx/Cabal-1.22.4.0/build/Yesod/Auth.o )
    ghc: panic! (the 'impossible' happened)
      (GHC version 7.10.2 for x86_64-apple-darwin):
    	Loading temp shared object failed: dlopen(/var/folders/3k/ycnfbqgx33n7qdytkl9ryx7m0000gn/T/ghc64990_0/libghc_21.dylib, 5): no suitable image found.  Did find:
    	/var/folders/3k/ycnfbqgx33n7qdytkl9ryx7m0000gn/T/ghc64990_0/libghc_21.dylib: malformed mach-o: load commands size (34176) > 32768
    
    Please report this as a GHC bug:  http://www.haskell.org/ghc/reportabug

So reporting.... I've seen the same error attempting to brew install haskell-stack on sierra betas 3 and 4, but on those occasions, an issue with libghc_44.dylib causing the problem.

[ 4 of 87] Compiling System.Process.Read ( src/System/Process/Read.hs, dist/dist-sandbox-558713ad/build/System/Process/Read.o )
ghc: panic! (the 'impossible' happened)
  (GHC version 8.0.1 for x86_64-apple-darwin):
	Loading temp shared object failed: dlopen(/var/folders/3k/ycnfbqgx33n7qdytkl9ryx7m0000gn/T/ghc67839_0/libghc_44.dylib, 5): no suitable image found.  Did find:
	/var/folders/3k/ycnfbqgx33n7qdytkl9ryx7m0000gn/T/ghc67839_0/libghc_44.dylib: malformed mach-o: load commands size (40560) > 32768

Please report this as a GHC bug:  http://www.haskell.org/ghc/reportabug

stack version:

Version 1.1.2, Git revision cebe10e845fed4420b6224d97dcabf20477bbd4b (3646 commits) x86_64 hpack-0.14.0

stack exec env returns:

stack exec env
Apple_PubSub_Socket_Render=/private/tmp/com.apple.launchd.tEAdTiAWFI/Render
GHC_PACKAGE_PATH=/Users/stephen/Documents/github/commercialhaskell.com/.stack-work/install/x86_64-osx/lts-3.0/7.10.2/pkgdb:/Users/stephen/.stack/snapshots/x86_64-osx/lts-3.0/7.10.2/pkgdb:/Users/stephen/.stack/programs/x86_64-osx/ghc-7.10.2/lib/ghc-7.10.2/package.conf.d
HASKELL_DIST_DIR=.stack-work/dist/x86_64-osx/Cabal-1.22.4.0
HASKELL_PACKAGE_SANDBOX=/Users/stephen/.stack/snapshots/x86_64-osx/lts-3.0/7.10.2/pkgdb
HASKELL_PACKAGE_SANDBOXES=/Users/stephen/Documents/github/commercialhaskell.com/.stack-work/install/x86_64-osx/lts-3.0/7.10.2/pkgdb:/Users/stephen/.stack/snapshots/x86_64-osx/lts-3.0/7.10.2/pkgdb:
HOME=/Users/stephen
LANG=en_IE.UTF-8
LOGNAME=stephen
NAME=Stephen Barrett
OLDPWD=/Users/stephen/Documents/github
PATH=/Users/stephen/Documents/github/commercialhaskell.com/.stack-work/install/x86_64-osx/lts-3.0/7.10.2/bin:/Users/stephen/.stack/snapshots/x86_64-osx/lts-3.0/7.10.2/bin:/Users/stephen/.stack/programs/x86_64-osx/ghc-7.10.2/bin:/Users/stephen/Library/Haskell/bin:/usr/local/bin:/usr/bin:/bin:/usr/sbin:/sbin
PWD=/Users/stephen/Documents/github/commercialhaskell.com
SECURITYSESSIONID=186a6
SHELL=/bin/bash
SHLVL=1
SSH_AUTH_SOCK=/private/tmp/com.apple.launchd.RTM6V8Lo2y/Listeners
STACK_EXE=/Library/Haskell/ghc-8.0.1-x86_64/bin/stack
TERM=xterm-256color
TERM_PROGRAM=Apple_Terminal
TERM_PROGRAM_VERSION=377
TERM_SESSION_ID=020844C9-1E8A-410E-8892-43DC0C5A8C0B
TMPDIR=/var/folders/3k/ycnfbqgx33n7qdytkl9ryx7m0000gn/T/
USER=stephen
XPC_FLAGS=0x0
XPC_SERVICE_NAME=0
_=/usr/local/bin/stack
__CF_USER_TEXT_ENCODING=0x1F5:0x0:0x6C

Attachments (3)

load_commands.txt (78.4 KB) - added by mistydemeo 15 months ago.
Output of otool -l libghc_29.dylib
objdump_x.txt (46.2 KB) - added by mistydemeo 15 months ago.
Output of gobjdump -x (GNU objdump, not LLVM objdump)
libghc_29.dylib.xz (14.8 KB) - added by mistydemeo 15 months ago.

Download all attachments as: .zip

Change History (58)

comment:1 Changed 16 months ago by mpickering

Stephen, thanks for the instructions but it would be much easier to make progress if they were simpler.

  • Didn't rely on stack (essential)
  • Was reproducible with the latest GHC version
  • Didn't depend on lots of packages. Could you try removing code and packages/dependencies from yesod-auth so that we can a standalone reproducible case?

Ideally the reproduction would just be a single command line invocation of GHC.

comment:2 Changed 15 months ago by bgamari

Status: newinfoneeded

Is it possible that your libghc is built with -split-sections? It sounds like we are producing too many sections for Darwin's linker.

comment:3 Changed 15 months ago by thomie

Operating System: Unknown/MultipleMacOS X

Also reported as #12198.

comment:4 Changed 15 months ago by mistydemeo

I reported #12198, which happened when the ghc being used to build stack was installed via Homebrew.

As far as I'm aware, -split-sections is off by default, and the Homebrew buildscript doesn't explicitly enable it: https://github.com/Homebrew/homebrew-core/blob/df621e49f7006c9aa7ed7d1f570c955447f160a8/Formula/ghc.rb

I don't believe stack is building with it either.

Last edited 15 months ago by mistydemeo (previous) (diff)

comment:5 Changed 15 months ago by bgamari

Can someone please provide the output of objdump -x (or Darwin's equivalent) from the libghc in question? I really don't know what could be producing tens of thousands of load commands if not -split-sections.

comment:6 Changed 15 months ago by mistydemeo

GHC is deleting the temp shared object immediately after the build fails; is there a flag I can set to keep it?

comment:7 Changed 15 months ago by mistydemeo

Okay, got it - I fetched the actual ghc invocation from cabal and added -keep-tmp-files. I've attached the output of otool -l, which contains the load commands, and the dylib itself.

Changed 15 months ago by mistydemeo

Attachment: load_commands.txt added

Output of otool -l libghc_29.dylib

Changed 15 months ago by mistydemeo

Attachment: objdump_x.txt added

Output of gobjdump -x (GNU objdump, not LLVM objdump)

Changed 15 months ago by mistydemeo

Attachment: libghc_29.dylib.xz added

comment:8 Changed 14 months ago by bgamari

Carter astutely pointed out that config.mk enables GHC's SplitObjs build system setting on those platforms that support it (of which OS X is one).

The real question is why this only started breaking now. I checked the Mach-O specification and there is no mention of any limit on the number of load commands, so presumably this is some artificial limit imposed by the linker. Carter suggested that perhaps Sierra moved to the LLVM linker, lld, but I see no mention of this error message in the lld repository. Very mysterious.

I think we really have no choice here but to mark Sierra as having broken split-objs support. Unfortunately, I can't think of a good way to make an autoconf test for this short of generating 30000 sections.

Last edited 14 months ago by bgamari (previous) (diff)

comment:9 Changed 14 months ago by bgamari

Milestone: 8.0.2

Actually, it looks like we have little choice but to disable split sections on all OS X builds since we won't know what linker is being used to do the final link until runtime. Sadly this means that Darwin binary distribution sizes will grow but that's a price that OS X users will just need to pay. I'll get a patch in to 8.0.2 doing this.

comment:10 Changed 14 months ago by bgamari

As it turns out there is already autoconf logic to disable to detect related Apple brokenness in configure.ac (#4013). That being said, it seems quite insufficient since, as related above, there's no reason to believe that the linker at configure-time is the same linker that will be used by the built compiler.

comment:11 Changed 14 months ago by bgamari

Differential Rev(s): Phab:D2532

I think we want something like Phab:D2532 although I'm still investigating.

comment:12 Changed 14 months ago by bgamari

Austin says,

<thoughtpolice> bgamari: FWIW, the load commands stuff is part of dyld, not LLVM at all. Which makes sense because this is a dynamic linking problem at load time, and not one with the object linker trying to statically link some objects together at runtime ("Loading temp shared object failed..." implies it actually built the dylib). <thoughtpolice> The bad part about this is Apple hasn't published the new source code for dyld in Sierra, so there's no rhyme or reason as to why just yet. Look at 'ImageLoaderMachO::sniffLoadCommands' here: http://opensource.apple.com/source/dyld/dyld-353.2.1/src/ImageLoaderMachO.cpp?txt

Version 1, edited 14 months ago by bgamari (previous) (next) (diff)

comment:13 Changed 14 months ago by rwbarton

We did some investigation last night in IRC and came to the conclusion that the runtime loader in Sierra has a new limit on the sizeofcmds field which is the total size of all the load commands in the dynamic library. Looking at the otool -l output in an earlier attachment, we can see that the main contributors to the size are primarily the RPATH entries and to a lesser extent the LOAD_DYLIB entries. There are over 100 of each, and the paths in the RPATH entries are quite long (I guess this is why stack is affected in particular).

So, I don't expect that disabling split objects will help at all, unfortunately. There seems to just be a limit on the total size of RPATH entries we can use.

In fact Cabal is the one choosing these RPATHs, not GHC itself. So this will require some kind of change to Cabal to decrease the total size of the RPATH entries it produces.

comment:14 Changed 14 months ago by rwbarton

Actually GHC also sets RPATHs on the temporary dynamic libraries it builds (see linkDynLib), which is the instance in this ticket. So both GHC and Cabal will need some kind of workaround.

comment:15 Changed 14 months ago by jonathan

Just for reference, I built GHC with split-sections disabled with no luck as previously expected. As @rwbarton mentioned, this occurs when there are a large number of dependencies which overflow past the RPATH limits (building Stack itself is a good test).

Has a ticket been created for Cabal yet?

comment:16 Changed 14 months ago by rwbarton

The OS X linker has an option

-dylib_file install_name:file_name

Specifies that a dynamic shared library is in a different location than its standard location. Use this option when you link with a library that is dependent on a dynamic library, and the dynamic library is in a location other than its default location. install_name specifies the path where the library normally resides. file_name specifies the path of the library you want to use instead. For example, if you link to a library that depends upon the dynamic library libsys and you have libsys installed in a nondefault location, you would use this option: -dylib_file /lib/libsys_s.A.dylib:/me/lib/libsys_s.A.dylib.

So GHC could do something like this. When it needs to link a temporary shared library against a Haskell package, rather than using -Ll -Wl,-rpath -Wl,l to add the package's library directory l to the rpath, since l typically has a form like /home/rwbarton/.cabal/lib/x86_64-linux-ghc-7.8.4/text-1.1.1.3, instead add the parent directory of l to the rpath and use -dylib_file to link against the library with an install name of @rpath/text-1.1.1.3/libHStext-1.1.1.3-ghc7.8.4.dylib. Since the parent directories of the package libraries will generally be mostly the same (like /home/rwbarton/.cabal/lib/x86_64-linux-ghc-7.8.4), they can be deduplicated to save most of the size of the load commands.

This is kind of a hack that relies on the directory structure that Cabal produces to be effective, but it has the advantage of being a local change. Cabal would also need some similar change in order for the final libraries that it links to be loadable.

comment:17 Changed 14 months ago by bgamari

Perhaps this would be a good excuse to address #11587.

comment:18 Changed 14 months ago by rwbarton

Absolutely, but it might be tough to do on short notice; I'm guessing it requires a coordinated change to Cabal, and might break other build systems.

comment:19 Changed 14 months ago by glguy

Cc: glguy added

comment:20 Changed 14 months ago by duncan

This is kind of a hack that relies on the directory structure that Cabal produces to be effective, but it has the advantage of being a local change. Cabal would also need some similar change in order for the final libraries that it links to be loadable.

Unfortunately, in general it is the user/builder calling Cabal that gets to choose the layout, so it's hard to pick a fixed scheme.

That said, we can in principle compute an optimal (or near-optimal) set of shared prefixes. We start with the info from the ghc-pkg db (InstalledPackageInfo) which has the full path to the libraries (or enough info to compute it). Then we would take the set of those paths, make a trie, count the number of elements beneath each trie node and take as shared prefixes all the ones with a count of > 1. Or something like that.

comment:21 Changed 14 months ago by rwbarton

I suppose another dumb hack would be to just create symlinks to all our dependencies in the GHC temporary directory and set that as the RPATH...

comment:22 in reply to:  13 Changed 14 months ago by chak

Replying to rwbarton:

We did some investigation last night in IRC and came to the conclusion that the runtime loader in Sierra has a new limit on the sizeofcmds field which is the total size of all the load commands in the dynamic library. Looking at the otool -l output in an earlier attachment, we can see that the main contributors to the size are primarily the RPATH entries and to a lesser extent the LOAD_DYLIB entries. There are over 100 of each, and the paths in the RPATH entries are quite long (I guess this is why stack is affected in particular).

So, I don't expect that disabling split objects will help at all, unfortunately. There seems to just be a limit on the total size of RPATH entries we can use.

In fact Cabal is the one choosing these RPATHs, not GHC itself. So this will require some kind of change to Cabal to decrease the total size of the RPATH entries it produces.

I suspect that this particular use of RPATHs is a remnant of trying to deal with dylibs on macOS as its done on Linux. On macOS, there is a pretty simple fix to this (which is what I use in Haskell for Mac btw). The library name of a dylib in macOS can include a path component. So, instead of just using the filename of a library by itself as its install name, simply use the filename prefixed by the package directory. So, instead of base-x.y.z-ABCD.dylib use base-x.y.z/base-x.y.z-ABCD.dylib.

Then, only one RPATH is needed, namely the GHC LIBDIR (or wherever the package db is). This keeps the load command size down and solves the problem in #11587.

(Incidentally, this approach in combination with using @loader_path in RPATH specs also allows to make relocate package dbs, which is important if you want to ship a macOS app that includes dynamically linked Haskell code.)

comment:23 Changed 14 months ago by chak

Cc: chak added

comment:24 Changed 14 months ago by darchon

Given that I'm the one who wrote GHC's @rpath-relative install_name code, and Cabal's RPATH-handling code, I want to point out one problem that needs some thought before we make all the install_names @rpath/libname-x.y.z/libname-x.y.z.dylib: sometimes, the libraries that we link against aren't in their installed location yet.

And there is one particular case when this is a problem: when a Cabal testsuite executable is dynamically linked against the library under development. In this case, the libname-x.y.z.dylib is still in ./dist/build. Now, because (currently) the install_name is simply @rpath/libname-x.y.z.dylib, we can simply execute the testsuite executable where the DYLD_LIBRARY_PATH environment variable contains ./dist/build.

If the install_name is going to be changed to @rpath/libname-x.y.z/libname-x.y.z.dylib[1], then we have to take care of the above-mentioned problem. For example by changing Cabal to build the library under development in ./dist/build/libname-x.y.z

[1] Another solution is of course to put (or symlink) all the .dylibs in the $lib directory instead of $lib/libname-x.y.z, which I've seen being suggested, but is something I personally don't like it.

Last edited 14 months ago by darchon (previous) (diff)

comment:25 Changed 14 months ago by darchon

Also, I'll be attending the hackathon on 8 and 9 October (after Haskell Exchange 2016), so if this problem isn't already fixed by that time, I volunteer to work on a patch during the hackathon. I won't have time to work on it on any other date or time.

comment:26 Changed 14 months ago by mnislaih

Cc: mnislaih added

comment:27 Changed 14 months ago by simonpj

Priority: normalhighest

Gershom thinks that shipping GHC 8.0.2 that does not work on OSX's latest release would be worse than delaying 8.0.2.

So I'm making priority = highest. (Suggestion only of course.)

Simon

comment:28 Changed 14 months ago by rwbarton

So I think there are three places in total where RPATHs are chosen.

  • In GHC, when building a temporary shared library to load the current package for ghci or TH.
  • In GHC, when doing the final link of a dynamic executable or library using -dynload system (the default)
  • In Cabal, when doing the final link of a dynamic executable or library (invoking GHC with -dynload deploy and explicit -optl options to set the RPATH)

The first issue is essentially local, since the temporary shared library is needed just once. But the other two issues are not, and changing the way we set up the RPATHs and install names has side effects, for example in the case darchon mentioned above, or for someone who deploys their application with all the Haskell libraries in a single directory. So it's a bit of a sticky situation.

comment:29 Changed 14 months ago by gershomb

I'm not sure how the triage system works these days. Should this be removed from infoneeded to show up in the standard lists at this point of tickets with status new?

comment:30 Changed 14 months ago by bgamari

Status: infoneedednew

Indeed it should be in new. Thanks for pointing this out, Gershom.

comment:31 Changed 14 months ago by carter

Just to be clear, the ghc work flows that are impacted are those which need the system linker. Namely any dynamic linked executables , ghci , or TH.

I agree with Gershom's assessment that 8.0.2 should wait till this is validated as being fixed for 10.12 and at least doesn't trigger any regressions on 10.11 and 10.10.

@ilovezfs will you be able to help us consistently test that version range for correctness and absence of regressions? I could probably help with some guidance.

I guess this is the first time Mac OS X support range has come up since the cpp / clang challenges in recent years. Or the more recent issue of some object code tool from llvm, nm, not defaulting to posix format for certain outputs.

I would propose that we treat at least the 2-3 most recent OS X releases as "core tier 1" of our tier 1 Apple support. At least as an informal guide post?

comment:32 Changed 14 months ago by chak

Cc: chak@… added; chak removed

comment:33 Changed 14 months ago by bgamari

To be clear, while I agree that this is an important issue, 8.0.2 needs to leave the dock at some point soon as 8.2.1 is on the horizon. I'm okay with pushing the 8.0.2 release off by up to two weeks for this, but at that point I will really need to release it regardless of whether this issue is fixed.

I am working with Oleg to get our OS X test box running Sierra, but I can't make any promises about being able to fix this myself. It would be quite helpful if someone could summarize the proposed solution.

Even better, it would be amazing if someone with an affected machine and knowledge of dynamic linking on OS X (a small set, I know!) could take ownership of this issue and carry it to solution. This may be darchon, but I'm a bit worried what might happen if he doesn't have time to finish, October 10th arrives, and we have no solution.

comment:34 Changed 14 months ago by bgamari

I would propose that we treat at least the 2-3 most recent OS X releases as "core tier 1" of our tier 1 Apple support. At least as an informal guide post?

I'd say we really can't do that unless we have the ability to readily test these releases. In principle Tier 1 platforms are supposed to have a sponsor and some sort of regular testing. I am working to get Harbormaster building on OS X but we have only one test box and I don't see that changing in the near future.

Last edited 14 months ago by bgamari (previous) (diff)

comment:35 Changed 14 months ago by carter

@darchon could you share any further Brain dump / background / design references either here and or shoot me an email? I'm willing to carve out some work time this week to see if I can help get the ball rolling, but I'd need at least a brain dump or light guidance. I'm not exactly familiar with linker shenanigans but I can probably get at least a skeleton of a patch together if you could orient me towards what pieces of ghc/cabal need tweaking and such.

But i need a teeny bit of guidance to make sure it's a good jump start for you or such

comment:36 Changed 14 months ago by dleuschner

Cc: dleuschner added

comment:37 Changed 14 months ago by darchon

@carter, braindump as requested:

  • After reading @rwbarton's comment about a person wanting to pack up dynlibs for deployment in a single directory, I think I've changed my mind about installation layout. That is, I think the install_name of dynlibs should stay @rpath/libname-x.y.z.dylib as they are now.
  • This means that, upon installation, dynlibs should be copied to $libdir/libname-x.y.z.dylib, instead of the current $libdir/libname-x.y.z/libname-x.y.z.dyblib.
  • The .hi files should still be copied to the $libdir/libname-x.y.z/ directory.
  • Perhaps the a static lib should also be copied to $libdir/libname-x.y.z.a, for consistency with the dynlib dynamic lib.
  • Due to the use of nub, Cabal (in the case of dynload deploy) will then already make sure we only end up with a couple of RPATH equal to the $libdirs of the installed packages.
  • I'm not sure, but I think GHC also nubs the RPATHs (in the case of dynload system), this would have to be checked.
  • This plan doesn't involve mucking about with dynamic linking commands.

So, if we go with the "put-all-the-dynlibs-in-single-directory" plan, which is what I now prefer, then we would need to do the following:

  • Add a --libifacedir setting to Cabal, which is, like --libsubdir, implicitly prefixed by $libdir. --libifacedir determines where the .hi files go, and the default should be the same as the current --libsubdir
  • Change the purpose of --libsubdir to mean the the location of the .dylib/.so/.a files.
  • For OSX only, change the default --libsubdir to ., i.e. equal to $libdir.
  • Update GHCs Makefile, so that, on OS X, make install puts the .dylib/.so/.a in $libdir.

Things that still need figuring out:

  • Do the above changes mean changes in the package database format? if so, what?
  • Do we need to update stack to use the new --libifacedir as well? or does it simply take over the default Cabal settings?
Last edited 14 months ago by darchon (previous) (diff)

comment:38 Changed 14 months ago by carter

Let the record show that darchon is ChristianB :)

I'll ask you some more off line so I can better understand

comment:39 Changed 14 months ago by carter

I'm not sure how much headway I'll make between now and the weekend, what pieces of this plan are / are not obvious? One issue that came up in the IRC chat with Duncan is cabal new build style needs treatment too?

What are the low hanging action items , the easy ones, and the tricky bits?

I have Sierra hardware now, so there's that

At the very least I'll make sure I file a radar this week

comment:40 Changed 14 months ago by bgamari

darchon said,

I'm not sure, but I think GHC also nubs the RPATHs (in the case of dynload system), this would have to be checked.

For the record, I believe we compute the linker command line in SysTools.linkDynLib, which calls collectLibraryPaths to compute the set of needed RPATH directories. collectLibraryPaths indeed calls nub, but it does not sort before doing so, meaning that nub won't always remove all duplicates. This should be fixed.

Cabal should also be checked for this issue.

comment:41 Changed 14 months ago by bgamari

Oops, please excuse the cognitive malfunction above; nub does not require sorted input. It seems GHC is fine with respect to RPATH deduplication.

comment:42 Changed 14 months ago by bgamari

darchon said,

  • For OSX only, change the default --libsubdir to ., i.e. equal to $libdir.
  • Update GHCs Makefile, so that, on OS X, make install puts the .dylib/.so/.a in $libdir.

I'm not sure we want to limit this to OS X. Having many RPATH directories is also a startup performance issue on Linux (and likely other platforms) since the dynamic linker does a linear search through every RPATH directory for every library it needs to load. This was noticed in #11587.

The solution to this is to move all dynamic libraries into $libdir, as you suggest.

comment:43 Changed 14 months ago by darchon

Once https://github.com/haskell/cabal/pull/3955 is merged, and a new point-release of Cabal-v1.24 is made, I think this issue will be solved by including the new point release in GHC-8.0.2. I did a cabal install stack --enable-executable-dynamic on OS X Sierra, and the build succeeded, and the resulting stack executable worked as expected.

comment:44 Changed 14 months ago by bgamari

Status: newupstream

comment:45 Changed 14 months ago by lelf

Cc: lelf added

comment:46 Changed 14 months ago by tohnann

Cc: tohnann added

comment:47 Changed 14 months ago by fhoffmeyer

Summary: build fail of commercialhaskell.com with stack build on mac os x sierra beta 4build fail of commercialhaskell.com with stack build on mac os x sierra

comment:48 Changed 14 months ago by ilovezfs

There's a proposal for a partial fix in https://github.com/haskell/cabal/pull/3982 (and backported to Cabal 1.24 in https://github.com/haskell/cabal/pull/3983).

"Partial" fix in that the problem persists unaddressed for cabal new-buld, which will still trigger the "malformed mach-o: load commands size" failure.

I have used the PR to build git-annex in a cabal sandbox successfully (hurray). Attempting to do the same with cabal new-build still fails as "expected."

comment:49 Changed 13 months ago by Edward Z. Yang <ezyang@…>

In f41a8a36/ghc:

Add and use a new dynamic-library-dirs field in the ghc-pkg info

Summary:
Build systems / package managers want to be able to control the file
layout of installed libraries. In general they may want/need to be able
to put the static libraries and dynamic libraries in different places.
The ghc-pkg library regisrtation needs to be able to handle this.

This is already possible in principle by listing both a static lib dir
and a dynamic lib dir in the library-dirs field (indeed some previous
versions of Cabal did this for shared libs on ELF platforms).

The downside of listing both dirs is twofold. There is a lack of
precision, if we're not careful with naming then we could end up
picking up the wrong library. The more immediate problem however is
that if we list both directories then both directories get included
into the ELF and Mach-O shared object runtime search paths. On ELF this
merely slows down loading of shared libs (affecting prog startup time).
On the latest OSX versions this provokes a much more serious problem:
that there is a rather low limit on the total size of the section
containing the runtime search path (and lib names and related) and thus
listing any unnecessary directories wastes the limited space.

So the solution in this patch is fairly straightforward: split the
static and dynamic library search paths in the ghc-pkg db and its use
within ghc. This is a traditional solution: pkg-config has the same
static / dynamic split (though it describes in in terms of private and
public, but it translates into different behaviour for static and
dynamic linking).

Indeed it would make perfect sense to also have a static/dynamic split
for the list of the libraries to use i.e. to have dynamic variants of
the hs-libraries and extra-libraries fields. These are not immediately
required so this patch does not add it, but it is a reasonable
direction to follow.

To handle compatibility, if the new dynamic-library-dirs field is not
specified then its value is taken from the library-dirs field.

Contains Cabal submodule update.

Test Plan:
Run ./validate

Get christiaanb and carter to test it on OSX Sierra, in combination
with Cabal/cabal-install changes to the default file layout for
libraries.

Reviewers: carter, austin, hvr, christiaanb, bgamari

Reviewed By: christiaanb, bgamari

Subscribers: ezyang, Phyx, thomie

Differential Revision: https://phabricator.haskell.org/D2611

GHC Trac Issues: #12479

comment:50 Changed 13 months ago by bgamari

Resolution: fixed
Status: upstreamclosed

comment:51 Changed 13 months ago by Ben Gamari <ben@…>

In 7eae862a/ghc:

ghc-pkg: Munge dynamic library directories

Otherwise we end up looking in the wrong place for dynamic libraries on
Windows. This addresses a regression introduced by D2611. See #12479.

Test Plan: validate across platforms

Reviewers: austin

Subscribers: thomie

Differential Revision: https://phabricator.haskell.org/D2640

GHC Trac Issues: #12479

comment:53 Changed 12 months ago by ilovezfs

Sources for Sierra's dyld have now been posted:

https://opensource.apple.com/source/dyld/dyld-421.1/

https://opensource.apple.com/tarballs/dyld/dyld-421.1.tar.gz

https://opensource.apple.com/source/dyld/dyld-421.1/src/ImageLoader.h.auto.html

#define MAX_MACH_O_HEADER_AND_LOAD_COMMANDS_SIZE (32*1024)

https://opensource.apple.com/source/dyld/dyld-421.1/src/ImageLoaderMachO.cpp.auto.html

if ( sizeofcmds > (MAX_MACH_O_HEADER_AND_LOAD_COMMANDS_SIZE-sizeof(macho_header)) )
		dyld::throwf("malformed mach-o: load commands size (%u) > %u", sizeofcmds, MAX_MACH_O_HEADER_AND_LOAD_COMMANDS_SIZE);

Note the changes from El Capitan https://opensource.apple.com/source/dyld/dyld-360.22/src/ImageLoaderMachO.cpp.auto.html

Other Sierra sources: https://opensource.apple.com/release/os-x-1012.html

comment:54 Changed 5 weeks ago by dredozubov

I'm experiencing the same problem with OS X Sierra:

[ 53 of 176] Compiling Auth.DB.Model.User ( src/Auth/DB/Model/User.hs, .stack-work/dist/x86_64-osx/Cabal-2.0.0.2/build/Auth/DB/Model/User.o )
ghc: panic! (the 'impossible' happened)
  (GHC version 8.2.1 for x86_64-apple-darwin):
	Loading temp shared object failed: dlopen(/var/folders/f8/2_rc4tgd1gj9vbgv7q9gbk4c0000gn/T/ghc45626_0/libghc_261.dylib, 5): no suitable image found.  Did find:
	/var/folders/f8/2_rc4tgd1gj9vbgv7q9gbk4c0000gn/T/ghc45626_0/libghc_261.dylib: malformed mach-o: load commands size (32936) > 32768

Please report this as a GHC bug:  http://www.haskell.org/ghc/reportabug


Not sure if that's a GHC or stack issue. I can only reproduce it only on a few big projects with 8.2.1 and a bigger number of projects failing with 7.10.3.

comment:55 Changed 5 weeks ago by bgamari

dredozubov, since this ticket is getting a bit long could you open another one?

Note: See TracTickets for help on using tickets.