Linker does not correctly resolve symbols in previously loaded objects
Summary: Given two object files (created from C files) A and B, where B refers to symbols defined in A, if we load B before A, calling resolveObjs
after each object file, symbol resolution goes wrong. Full test case attached.
Detailed description: Consider
// a.c
#include <stdio.h>
void defined_in_A() {
printf("In A\n");
}
// b.c
#include <stdio.h>
void defined_in_A();
void defined_in_B() {
printf("In B\n");
defined_in_A();
printf("In B\n");
}
and
{-# LANGUAGE ForeignFunctionInterface #-}
module Main where
import System.IO
foreign import ccall "defined_in_B" defined_in_B :: IO ()
main :: IO ()
main = defined_in_B
and we use the GHC API to load the object files a.o
and b.o
(corresponding to a.c
and b.c
), calling resolveObjs
after loading each object, and then load Main
and call Main.main
, everything works fine (the attached test case contains both a.c
, b.c
and Main.hs
as well as the test program that calls into the GHC API).
However, if we load b.o
before a.o
, things go wrong. When we load b.o
and call resolveObjs
then resolveObjs
(quite reasonably) complains that
lookupSymbol failed in relocateSection (relocate external)
b.o: unknown symbol `_defined_in_A'
since we haven't loaded a.o
yet. But when we then load a.o
and call resolveObjs
again, resolveObjs
reports okay, but something goes wrong because the program subsequently segfaults.
A successful run
I took a closer look at what happens exactly. First, let's consider the test run where we load A before B, and everything works fine, and let's look at the precise code that we actually execute when Main.main
does the foreign call to defined_in_B
:
# lldb Linkerbug
Current executable set to 'Linkerbug' (x86_64).
(lldb) breakpoint set -n ffi_call
Breakpoint 1: where = Linkerbug`ffi_call + 29 at ffi64.c:421, address = 0x0000000102be2eed
(lldb) run
Process 92361 launched: 'Linkerbug' (x86_64)
Loading object "a.o"
symbol resolution ok
Loading object "b.o"
symbol resolution ok
Loading Haskell module "Main.hs"
ok
Running "Main.main"
Process 92361 stopped
* thread #1: tid = 0x32048b, 0x0000000102be2eed Linkerbug`ffi_call(cif=0x000000010cc68500, fn=0x0000000104c6c210, rvalue=0x00007fff5fbff920, avalue=0x00007fff5fbff8c0) + 29 at ffi64.c:421, queue = 'com.apple.main-thread', stop reason = breakpoint 1.1
frame #0: 0x0000000102be2eed Linkerbug`ffi_call(cif=0x000000010cc68500, fn=0x0000000104c6c210, rvalue=0x00007fff5fbff920, avalue=0x00007fff5fbff8c0) + 29 at ffi64.c:421
(lldb) disassemble -c 14 -s fn
0x104c6c210: pushq %rbp
0x104c6c211: movq %rsp, %rbp
0x104c6c214: pushq %rbx
0x104c6c215: pushq %rax
0x104c6c216: leaq 0x1d(%rip), %rbx
0x104c6c21d: movq %rbx, %rdi
0x104c6c220: callq 0x104c6c3c8
0x104c6c225: xorl %eax, %eax
0x104c6c227: callq 0x104c6b210
0x104c6c22c: movq %rbx, %rdi
0x104c6c22f: addq $0x8, %rsp
0x104c6c233: popq %rbx
0x104c6c234: popq %rbp
0x104c6c235: jmpq 0x104c6c3c8
This disassembly is the compiled code for B which, in order, does a call to printf
, then to defined_in_A
, and then back to printf
(the last call is jmpq
rather than callq
: the C compiler did a tail call optimization). Let's make sure that this is actually true; the first call is a call to
(lldb) disassemble -c 2 -s 0x104c6c3c8
0x104c6c3c8: jmpq *-0xe(%rip)
0x104c6c3ce: addb %al, (%rax)
which is an indirect jump to
(lldb) memory read -f A -c 1 "0x104c6c3ce - 0xe"
0x104c6c3c0: 0x00007fff84fd5b9b libsystem_c.dylib`puts
to puts
, okay, good. Seems an unnecessary level of indirection, but that's okay. The next call is to
(lldb) disassemble -c 5 -s 0x104c6b210
0x104c6b210: pushq %rbp
0x104c6b211: movq %rsp, %rbp
0x104c6b214: leaq 0x6(%rip), %rdi
0x104c6b21b: popq %rbp
0x104c6b21c: jmpq 0x104c6b370
which is the compiled code for defined_in_A
, which should be just a call to printf
. Let's confirm:
(lldb) disassemble -c 2 -s 0x104c6b370
0x104c6b370: jmpq *-0xe(%rip)
0x104c6b376: addb %al, (%rax)
(lldb) memory read -f A -c 1 "0x104c6b376 - 0xe"
0x104c6b368: 0x00007fff84fd5b9b libsystem_c.dylib`puts
Ok, all good.
A failed run
Now let's check what happens when we load the objects in the opposite order. The code that we execute is
(lldb) disassemble -c 14 -s fn
0x104c6b210: pushq %rbp
0x104c6b211: movq %rsp, %rbp
0x104c6b214: pushq %rbx
0x104c6b215: pushq %rax
0x104c6b216: leaq 0x1d(%rip), %rbx
0x104c6b21d: movq %rbx, %rdi
0x104c6b220: callq 0x104c6b3c8
0x104c6b225: xorl %eax, %eax
0x104c6b227: callq 0x104c6c210
0x104c6b22c: movq %rbx, %rdi
0x104c6b22f: addq $0x8, %rsp
0x104c6b233: popq %rbx
0x104c6b234: popq %rbp
0x104c6b235: jmpq 0x104c6b556
This immediately looks suspicious: the address of the jmpq
(the second -- tail -- call to printf
) is different from the first. The first callq
is okay:
(lldb) disassemble -c 2 -s 0x104c6b3c8
0x104c6b3c8: jmpq *-0xe(%rip)
0x104c6b3ce: addb %al, (%rax)
(lldb) memory read -f A -c 1 "0x104c6b3ce - 0xe"
0x104c6b3c0: 0x00007fff84fd5b9b libsystem_c.dylib`puts
and the call to defined_in_A
, as well as the code for defined_in_A
, are both ok:
(lldb) disassemble -c 5 -s 0x104c6c210
0x104c6c210: pushq %rbp
0x104c6c211: movq %rsp, %rbp
0x104c6c214: leaq 0x6(%rip), %rdi
0x104c6c21b: popq %rbp
0x104c6c21c: jmpq 0x104c6c370
(lldb) disassemble -c 2 -s 0x104c6c370
0x104c6c370: jmpq *-0xe(%rip)
0x104c6c376: addb %al, (%rax)
(lldb) memory read -f A -c 1 "0x104c6c376 - 0xe"
0x104c6c368: 0x00007fff84fd5b9b libsystem_c.dylib`puts
However, that second call to printf
(the jmpq
) in defined_in_B
is indeed wrong: it's jumping into nowhere:
(lldb) memory read -c 8 0x104c6b556
0x104c6b556: 00 00 00 00 00 00 00 00 ........
Note that somewhat surprisingly is it not the resolution of the symbol from a.o
that is wrong; that does get resolved okay once we load a.o
; rather, it is the jump to puts
which is wrong (and then only the one of them).
Trac metadata
Trac field | Value |
---|---|
Version | 7.8.2 |
Type | Bug |
TypeOfFailure | OtherFailure |
Priority | normal |
Resolution | Unresolved |
Component | Compiler |
Test case | |
Differential revisions | |
BlockedBy | |
Related | |
Blocking | |
CC | simonmar |
Operating system | Unknown/Multiple |
Architecture |