Opened 9 years ago

Closed 3 years ago

#2615 closed bug (fixed)

ghci doesn't play nice with linker scripts

Reported by: AlecBerryman Owned by:
Priority: high Milestone: 6.12.3
Component: GHCi Version: 7.0.3
Keywords: dlopen, dynamic linking Cc: maeder, fasta, slyfox, ghc@…, hvr
Operating System: Linux Architecture: Unknown/Multiple
Type of failure: Incorrect result at runtime Test Case:
Blocked By: Blocking:
Related Tickets: Differential Rev(s):
Wiki Page:

Description

I'm trying to use HsHyperEstraier with ghci. I can compile and run the included examples, but when I run them in ghci, I see:

$ ghci
GHCi, version 6.8.3: http://www.haskell.org/ghc/  :? for help
Loading package base ... linking ... done.
Prelude> :l HelloWorld.hs
[1 of 1] Compiling Main             ( HelloWorld.hs, interpreted )
Ok, modules loaded: Main.
*Main> main
[...]
Loading package HsHyperEstraier-0.2.1 ... can't load .so/.DLL for: c
(/usr/lib/libc.so: invalid ELF header)

I see a similar error message if I specify '-package HsHyperEstraier' on the command line.

I did some looking and came up with these messages:

http://www.haskell.org/pipermail/glasgow-haskell-users/2004-May/006632.html http://www.nabble.com/RE:-idea-to-allow-ghci-to-use-a-different-libs-list-p1830432.html

Debian's /usr/lib/libc.so is indeed a GNU linker script, not an actual shared library. If I remove all the libraries in HsHyperEstraier's ~/.ghc/.../package.conf that are linker scripts (pthreads and c), it loads up fine.

Could ghci either recognize or ignore linker scripts?

Attachments (4)

T2615a.dsend (53.1 KB) - added by hgolden 8 years ago.
FIX #2615 - ghc repository
T2615b.dsend (42.9 KB) - added by hgolden 8 years ago.
FIX #2615 - testsuite repository
libncursesw.so (32 bytes) - added by greenrd 6 years ago.
libc.so (253 bytes) - added by Jinhui_Chen 3 years ago.

Download all attachments as: .zip

Change History (48)

comment:1 Changed 9 years ago by igloo

difficulty: Unknown
Milestone: 6.10.2

This is a long-standing bug, but I can't find a ticket for it. Anyway, we should fix it.

comment:2 Changed 9 years ago by maeder

Architecture: x86_64 (amd64)Unknown/Multiple
Cc: maeder added

here is another example using ghc-6.8.3

Loading package cairo-0.9.13 ... can't load .so/.DLL for: pthread (/usr/lib/libpthread.so: invalid ELF header)

/usr/lib/libpthread.so contains:

/* GNU ld script
   Use the shared library, but some functions are only in
   the static library, so try that secondarily.  */
OUTPUT_FORMAT(elf32-i386)
GROUP ( /lib/libpthread.so.0 /usr/lib/libpthread_nonshared.a )

comment:3 Changed 9 years ago by maeder

Version: 6.8.36.10.1

This bug stops me from using template haskell (that uses ghci) and gtk in the sources for a cabal package.

comment:4 Changed 9 years ago by simonmar

Priority: normalhigh

Seems important to do something about this, but I'm not sure exactly what.

comment:5 Changed 9 years ago by maeder

Removing "pthread" (and "m") from the extraLibraries of the gtk package in my package.conf solved the problem. (I've left in "-pthread" under ldOptions.)

comment:6 Changed 8 years ago by igloo

Owner: set to igloo

comment:7 Changed 8 years ago by simonmar

As Duncan says, this won't be a problem when we're using shared libraries:

It means that ghci will not need to link to system shared libs except when someone uses -lblah on the ghci command line. That's because when we link a Haskell package as a shared lib the system linker interprets any linker scripts and embeds the list of dependencies on other shared libs (other Haskell packages and system libs). Then ghci just dlopens the shared libs for the directly used Haskell packages that that automatically resolves all their deps on other Haskell and system shared libs.

comment:8 Changed 8 years ago by igloo

Priority: highnormal

The problem is illustrated by this C program:

#include <stdio.h>
#include <dlfcn.h>

int main(void) {
    void *p;

    p = dlopen("/usr/lib/libgmp.so", RTLD_LAZY | RTLD_GLOBAL);
    if (p) printf("OK\n");
    else   printf("%s\n", dlerror());
    p = dlopen("/usr/lib/libpthread.so", RTLD_LAZY | RTLD_GLOBAL);
    if (p) printf("OK\n");
    else   printf("%s\n", dlerror());

    return 0;
}

which fails to dlopen /usr/lib/pthread.so because it's a linker script:

$ gcc -ldl c.c -o c
$ ./c
OK
/usr/lib/libpthread.so: invalid ELF header
$ cat /usr/lib/libpthread.so
/* GNU ld script
   Use the shared library, but some functions are only in
   the static library, so try that secondarily.  */
OUTPUT_FORMAT(elf64-x86-64)
GROUP ( /lib/libpthread.so.0 /usr/lib/libpthread_nonshared.a )

This most commonly crops up with -lpthread and -lc, and in both these cases you can work around it by just not passing the flag.

I've done some digging, but haven't been able to find a replacement for dlopen that can handle linker scripts. There are two things we could do:

  • Special case -lpthread and -lc. This wouldn't solve the problem in general, but would fix the most common instances of it.
  • Make an empty library linked with -lpthread (or whatever other -l flags we're given) and dlopen that library. Then the system linker takes care of it for us. This is ugly, but if we have code to generate .so libraries anyway (for making dynamic Haskell libraries) then at least it's not too much work to implement.

comment:9 Changed 8 years ago by igloo

Owner: igloo deleted

comment:10 Changed 8 years ago by igloo

Milestone: 6.10.26.12.1

comment:11 Changed 8 years ago by fasta

Cc: fasta added

comment:12 Changed 8 years ago by hgolden

At least on Gentoo, I think this can be dealt with as follows:

  1. In Linker.c if dlopen fails, search the file with a regular expression that would recognize "GROUP ( ... )" where ... is the important part. In Gentoo, when a .so file contains a linker script, the actual file is specified by the GROUP ( ... ).
  2. If this is found, try the dlopen again using the filename.
  3. If this fails, report an error.

I'm not familiar with debian or debian-based distros. Do they use a similar approach? If so, a regular expression search for their filename in the script could be added as well.

comment:13 Changed 8 years ago by slyfox

Cc: slyfox added
Type of failure: None/Unknown

comment:14 Changed 8 years ago by igloo

Milestone: 6.12.16.14.1

comment:15 Changed 8 years ago by hgolden

Keywords: dlopen dynamic linking added
Owner: set to hgolden
Type of failure: None/UnknownIncorrect result at runtime

I have been testing a patch which has been reviewed by Simon M. and Duncan C. I am now incorporating the changes they requested and preparing a test case. I expect to have this completed by December 14, 2009.

comment:16 Changed 8 years ago by guest

Cc: ghc@… added

I want to use the llvm package in GHCi. To this end I converted all of the libLLVM*.a files to local libLLVM*.so. When I start the main function of a Haskell program using LLVM functions then I get the known:

  Loading package llvm-0.6.7.0 ... can't load .so/.DLL for: pthread (/usr/lib/libpthread.so: invalid ELF header)

My libpthread.so is also a script like that shown by Christian Maeder. However, pthread is not mentioned in llvm wrapper source files. It only appears in the files generated by configuration.

  $ grep -r pthread .
  Match in binary file ./dist/setup/setup.
  ./config.status:S["llvm_ldflags"]="-L/usr/lib/llvm  -lpthread -ldl -lm "
  ./config.status:S["LDFLAGS"]="-L/usr/lib/llvm  -lpthread -ldl -lm  "
  ./config.log:configure:3698: gcc -o conftest -g -O2 -I/usr/include  -D_DEBUG  -D_GNU_SOURCE -D__STDC_LIMIT_MACROS -D__STDC_CONSTANT_MACROS  -L/usr/lib/llvm  -lpthread -ldl -lm   conftest.c  >&5
  ./config.log:configure:4010: g++ -o conftest -g -O2 -I/usr/include  -D_DEBUG  -D_GNU_SOURCE -D__STDC_LIMIT_MACROS -D__STDC_CONSTANT_MACROS  -L/usr/lib/llvm  -lpthread -ldl -lm   conftest.c -lLLVMCore  -lLLVMSupport -lLLVMSystem  >&5
  ./config.log:LDFLAGS='-L/usr/lib/llvm  -lpthread -ldl -lm  '
  ./config.log:llvm_ldflags='-L/usr/lib/llvm  -lpthread -ldl -lm '
  ./llvm.buildinfo:ld-options: -L/usr/lib/llvm  -lpthread -ldl -lm  /usr/lib/llvm/LLVMX86AsmPrinter.o /usr/lib/llvm/LLVMX86CodeGen.o -lLLVMSelectionDAG -lLLVMAsmPrinter /usr/lib/llvm/LLVMExecutionEngine.o /usr/lib/llvm/LLVMJIT.o -lLLVMCodeGen -lLLVMScalarOpts -lLLVMTransformUtils -lLLVMipa -lLLVMAnalysis -lLLVMTarget -lLLVMCore -lLLVMSupport -lLLVMSystem -lstdc++

Where do I have to remove pthread?

comment:17 in reply to:  16 Changed 8 years ago by hgolden

Replying to guest:

Where do I have to remove pthread?

I think you can compile your LLVM modules using the -normal way instead of the -threaded way.

My patch fixes this problem. I'm still working on a test, but the patch works. I could send it to you immediately if you are willing to rebuild your ghc. (Try the above suggestion first!)

Changed 8 years ago by hgolden

Attachment: T2615a.dsend added

FIX #2615 - ghc repository

Changed 8 years ago by hgolden

Attachment: T2615b.dsend added

FIX #2615 - testsuite repository

comment:18 Changed 8 years ago by hgolden

Owner: changed from hgolden to igloo

My patches above pass validation. Please let me know if I need to do anything else.

comment:19 in reply to:  16 Changed 8 years ago by guest

Replying to guest:

Where do I have to remove pthread?

In the special case of LLVM I could just remove occurences of pthread from ~/.ghc/i386-linux-6.10.4/package.conf for the llvm package in order to solve that problem.

comment:20 Changed 8 years ago by simonmar

Looks good to me. It needs validating on OS X though: I think the #ifdefs at the top may need to be tweaked, as I don't think the #include <regex.h> is enabled under OBJFORMAT_MACHO.

Ian, could you validate & push?

comment:21 Changed 8 years ago by igloo

Resolution: fixed
Status: newclosed

Done.

comment:22 Changed 7 years ago by igloo

Milestone: 6.14.16.12.3
Resolution: fixed
Status: closedreopened
Type: bugmerge

comment:23 Changed 7 years ago by igloo

Priority: normalhigh

comment:24 Changed 7 years ago by igloo

Status: newmerge

comment:25 Changed 7 years ago by igloo

Type: mergebug

comment:26 Changed 7 years ago by igloo

Resolution: fixed
Status: mergeclosed

Didn't get merged

comment:27 Changed 6 years ago by greenrd

Owner: igloo deleted
Resolution: fixed
Status: closednew

This still doesn't work for me with ghc 7.0.2 on Fedora 15.

The file /usr/lib/libncursesw.so contained the text INPUT(libncursesw.so.5 -ltinfo)

ghc gave the error "file too short". I know this bug has recently been fixed, so I tried to make the linker produce the other error message, "invalid ELF header", by adding lots of newlines onto the end of the file. It does change the error message:

Loading package terminfo-0.3.1.3 ... <command line>: can't load .so/.DLL for: ncursesw (/usr/lib/libncursesw.so: invalid ELF header)

but ghc doesn't pick up on the error message and do the right thing.

I think this error arises from an attempt to use Template Haskell, rather than ghci - could this be relevant?

Or did my newlines mess it up somehow?

comment:28 Changed 6 years ago by hgolden

Status: newinfoneeded

I'll take a look at this. Please attach the complete /usr/lib/libncursesw.so file to this ticket. At first glance, Fedora 15 may be using a different linker script pattern from other systems (e.g., Gentoo).

Changed 6 years ago by greenrd

Attachment: libncursesw.so added

comment:29 Changed 6 years ago by greenrd

Status: infoneedednew

comment:30 Changed 6 years ago by hgolden

Owner: set to hgolden
Version: 6.10.17.0.3

My original patch was too simplistic. It only handled the GROUP( ... ) command, not the INPUT( ... ) command. Apparently, the Fedora 15 scripts use INPUT( ... ) for redirection. I will add this to the code.

comment:31 Changed 6 years ago by greenrd

It looks like such a change has already been made in git head - so I guess this is fixed in head.

comment:32 in reply to:  31 Changed 6 years ago by hgolden

Replying to greenrd:

It looks like such a change has already been made in git head - so I guess this is fixed in head.

I didn't see this when I looked. Could you send me a link?

comment:34 in reply to:  33 Changed 6 years ago by hgolden

Resolution: fixed
Status: newclosed

Looks good to me. I'm closing this based on igloo's patch linked above.

comment:35 Changed 6 years ago by SimonHengel

Fixed in 7.2.

comment:36 Changed 6 years ago by alexp

I'm having the same problem as greenrd with /usr/lib/libncursesw.so.

So I thought to use the current ghc 7.2.2. But the ghc homepage advises not to build ghc manually and instead to use haskell-platform. But, if I were to do that then I would be waiting eleven months as I'm on Fedora 16. And the next release of Fedora in May 2012 will ship the current haskell-platform 2011.4.0.0 with ghc-7.0.4 also. So all hopes are on the next+1 version of Fedora that ships in November 2012, which might hopefully include ghc>7.2. Eleven months.

I'm thinking my best option is to patch the existing 7.0.4 ghc. I can also send this to the Fedora maintainer so it might go in as an update.

I see a whole lot of patches above, but I can't follow the code to put together a single patch for 7.0.4.

comment:37 in reply to:  36 Changed 6 years ago by hgolden

Replying to alexp:

I see a whole lot of patches above, but I can't follow the code to put together a single patch for 7.0.4.

There's very little change between Linker.c in 7.0.4 and 7.2.2. I think the only thing you need to change is the regular expression. I suggest you do a diff between the 7.0.4 version and the 7.2.2 version and change the regular expression in 7.0.4 to match.

comment:38 Changed 6 years ago by SimonHengel

Any chance to get this into the next 7.0 minor release (if any)?

comment:39 in reply to:  38 Changed 6 years ago by simonmar

Replying to SimonHengel:

Any chance to get this into the next 7.0 minor release (if any)?

There won't be another 7.0 release, I'm afraid.

comment:40 Changed 3 years ago by Jinhui_Chen

Cc: hvr added
Owner: hgolden deleted
Resolution: fixed
Status: closednew

I recently encounter a problem which I think is related to this bug. This is illustrated by this Haskell program:

import ObjLink
import Foreign
import Foreign.C.Types
import Foreign.C.String

foreign import ccall "setlocale" c_setlocale :: CInt -> CString -> IO CString

main = do
  withCString "zh_CN.UTF-8" $ \lc -> c_setlocale 5 lc
  r <- loadDLL "/usr/lib/libc.so"
  putStrLn (show r)

which outputs:

Just "/usr/lib/libc.so: \26080\25928\30340 ELF \22836"

The "\26080\25928\30340 ELF \22836" part is "无效的ELF头" in Chinese.

I suspect the problem is caused by addDLL function, which expects dlopen to return "invalid ELF header", and that is not true on non-C/non-en locales.

comment:41 Changed 3 years ago by hgolden

I suspect that the problem is caused by the error message being in a different language. I wonder if it is possible to add code to the patch to change the locale momentarily while in AddDLL and revert it before returning? If this can be done, then the error message would be in the language we are expecting and the result should be to look for the true shared library by scanning the linker script. Of course, this assumes that the linker script is still in English. (I don't know if this is the case.)

If the linker script is in a different language, then the problem becomes much harder to solve without identifying the language and selecting the appropriate strings to use in the regular expressions.

Jinhui, please attach the text of /usr/lib/libc.so on your test system so we can see what language it is in. Thanks.

Changed 3 years ago by Jinhui_Chen

Attachment: libc.so added

comment:42 Changed 3 years ago by Jinhui_Chen

Yes, the libc.so is in English. And you are right, I have tried to change the locale momentarily, and it works.

   olc <- withCString "C" $ \lc -> c_setlocale 5 lc
   r <- loadDLL "/usr/lib/libc.so"
   putStrLn (show r)
   c_setlocale 5 olc

5 is for LC_MESSAGES.

comment:43 Changed 3 years ago by hgolden

I have opened a new ticket:10046, so I am closing this ticket again.

comment:44 Changed 3 years ago by hgolden

Resolution: fixed
Status: newclosed
Note: See TracTickets for help on using tickets.