Opened 3 years ago

Closed 2 years ago

#9907 closed bug (fixed)

"Unknown PEi386 section name `.text$printf'" error in GHCi on Windows

Reported by: mmikolajczyk Owned by: Phyx-
Priority: normal Milestone: 7.10.3
Component: GHCi Version: 7.8.3
Keywords: Cc: hvr, Phyx-, igloo
Operating System: Windows Architecture: x86
Type of failure: GHCi crash Test Case:
Blocked By: Blocking:
Related Tickets: #7103, #10051, #7056, #8546 Differential Rev(s): Phab:D1244
Wiki Page:

Description

I work on a Haskell library interfacing with foreign C++ library. While trying to use it in GHCi on Windows 8.1 64bit, I encountered an error message that said:

<loading other libraries>
Loading package library-0.1.0.0 ... <interactive>: Unknown PEi386 section name `
.text$_ZNSt6vectorIcSaIcEED1Ev' (while processing: c:\path\to\file.o)
ghc.exe: panic! (the 'impossible' happened)
(GHC version 7.8.3 for i386-unknown-mingw32):
loadObj "C:\\path\\to\\file.o": failed
 
Please report this as a GHC bug: http://www.haskell.org/ghc/reportabug 

I've prepared a minimal example triggering this bug and attached it to this bug report. On Linux (Arch 64bit), after cabal build and cabal repl it behaves as intended:

GHCi, version 7.8.3: http://www.haskell.org/ghc/  :? for help
Loading package ghc-prim ... linking ... done.
Loading package integer-gmp ... linking ... done.
Loading package base ... linking ... done.
Loading object (static) dist/build/cbits/ex.o ... done
final link ... done
[1 of 1] Compiling Example          ( Example.hs, interpreted )
Ok, modules loaded: Example.
λ: foo
Test

However, on Windows 8.1 it crashes while loading object file:

GHCi, version 7.8.3: http://www.haskell.org/ghc/ :? for help
Loading package ghc-prim ... linking ... done.
Loading package integer-gmp ... linking ... done.
Loading package base ... linking ... done.
Loading object (static) dist\build\cbits\ex.o ... ghc.exe: Unknown PEi386 sectio
n name `.text$printf' (while processing: dist\build\cbits\ex.o)
ghc.exe: panic! (the 'impossible' happened)
(GHC version 7.8.3 for i386-unknown-mingw32):
loadObj "dist\\build\\cbits\\ex.o": failed
 
Please report this as a GHC bug: http://www.haskell.org/ghc/reportabug 

Crashing in GHCi means that I cannot use it with programs that contain TH splices, what is important for me.

Attachments (1)

ghcilinkerbug.zip (1.8 KB) - added by mmikolajczyk 3 years ago.
Minimal cabal project triggering this bug

Download all attachments as: .zip

Change History (21)

Changed 3 years ago by mmikolajczyk

Attachment: ghcilinkerbug.zip added

Minimal cabal project triggering this bug

comment:1 Changed 3 years ago by kdmadej

I've encountered a similar error while working on windows myself. Checked with the attached example and it also crashes. Is there a chance this matter will get looked into?

comment:2 Changed 3 years ago by rwbarton

comment:3 in reply to:  2 Changed 3 years ago by danilo2

Replying to rwbarton: Hello rwbarton! I'm working with both kdmadej as well as mmikolajczyk and this bug is some kind of blocker for us. I would love to ask you if can we discuss possible solutions / workarounds to make our use case work? We would like to collaborate with you guys as strong as we are able, we can try to investigate it further if you provide any hints for us - anything.

As a side-note: we are using GHCi (or more strictly GHC API) under the hood and GHCi is some kind of interpreter our product bases on - because of that and because the release deadline is in very narrow time from now, we are worrying about that issue. I would be very thankful for any help! :)

comment:4 Changed 3 years ago by rwbarton

I would love to ask you if can we discuss possible solutions / workarounds to make our use case work?

Use Linux? :)

Sorry, I have no Windows experience and no access to a Windows machine.

comment:5 in reply to:  4 Changed 3 years ago by danilo2

Replying to rwbarton:

Use Linux? :) Sorry, I have no Windows experience and no access to a Windows machine.

I hope that was not offensiwe (although I feel it was). We cannot convert all the people, who we address our software to, to use Linux, can we? Additional I thought GHC is ment to be serious - cross-platform compiler, so I think solving this bug is somewhat important for everybody.

We've got some windows expirence and people that can help you (but they are not haskellers) additional I can provide you any time remote machine on Amazon with everything configured - you could connect to it and check the things out. We will help you as much as we can also - what do you think?

Last edited 3 years ago by danilo2 (previous) (diff)

comment:6 Changed 3 years ago by thoughtpolice

(Adding related tickets.)

Basically, the check for these debugging symbol sections is a real hack. See https://github.com/ghc/ghc/blob/master/rts/Linker.c#L4388-L4407 for the relevant code. GHC tries to ignore sections containing debugging information in its own linker code, but this has proven pretty painful for us in the long run, because MinGW/binutils changes mean we frequently hit sections we didn't know about before, so things like this fail (even though those sections are almost certainly harmless).

I suspect the best thing to do honestly is remove this code, or at least rework it. It is probably better to add a message which is printed out when linked with -debug (and using a debugging runtime flag) about what unknown sections we found, instead of always erroring out when an unknown section is found like we do today.

This fix would be pretty simple and also fix the root issue of most of the related tickets (since they're basically all dupes of different colors). If someone would submit a (tested!) patch, that would be excellent!

comment:7 Changed 3 years ago by Phyx-

Architecture: x86_64 (amd64)x86
Owner: set to Phyx-

comment:8 Changed 3 years ago by Phyx-

Differential Rev(s): D671
Status: newpatch

comment:9 Changed 3 years ago by Austin Seipp <austin@…>

In a293925d810229fbea77d95f2b3068e78f8380cc/ghc:

rts/linker: ignore unknown PE sections

Summary: Currently the linker tries to see if it understands/knows every section in the PE file before it continues. If it encounters a section it doesn't know about it errors out. Every time there's a change in MinGW compiler that adds a new section to the PE file this will break the ghc linker. The new sections don't need to be understood by `ghc` to continue so instead of erroring out the section is just ignored. When running with `-debug` the sections that are ignored will be printed.

Test Plan:
See the file `ghcilinkerbug.zip` in #9907.

 1) unzip file content.
 2) open examplecpp.cabal and change base <4.8 to <4.9.
 3) execute cabal file with cabal repl.

Applying the patch makes `cabal repl` in step 3) work.

Note that the file will fail on a `___mingw_vprintf` not being found. This is because of the `cc-options` specifying `-std=c++0x`, which will also require `libmingwex.a` to be linked in but wasn't specified in the cabal file. To fix this, remove the `cc-options` which defaults to c99.

Reviewers: austin

Reviewed By: austin

Subscribers: thomie

Differential Revision: https://phabricator.haskell.org/D671

GHC Trac Issues: #9907, #7103, #10051, #7056, #8546

comment:10 Changed 3 years ago by thoughtpolice

Differential Rev(s): D671Phab:D671
Milestone: 7.10.1
Status: patchmerge

Merged, thanks!

comment:11 Changed 3 years ago by thoughtpolice

Resolution: fixed
Status: mergeclosed

Merged to ghc-7.10 (via ad628657cd56362964d17677728f4ae4d6868613).

comment:12 Changed 2 years ago by ezyang

Owner: Phyx- deleted
Resolution: fixed
Status: closednew

I am reopening this ticket, because by suppressing these errors we have opened up users to a more pernicious situation: GHC silently ignores a section it doesn't understand (failing to map it into memory) when a program ACTUALLY needs it to function. Previously, it was pretty obvious that something bad had happened and it was because GHC didn't support a section, but now the errors can be a lot more obscure, e.g. #10672 and #10563.

Is there any reason we can't take an alternate approach, where by default we attempt to map in ALL sections in an object file, except ones we've specifically blacklisted?

comment:13 Changed 2 years ago by Phyx-

Cc: Phyx- added

I have been trying something similar, but then I encountered another error: unhandled PEi386 relocation type 3. Looking at the code there is indeed no case for relocation type 3 which unless I'm mistaken is:

IMAGE_REL_AMD64_ADDR32NB 0x0003 The 32-bit address without an image base (RVA).

But curiously while trying to understand what the linker is doing (and I may have the wrong idea since I'm new to this part) I see under the x86_64 cases this case 17: /* R_X86_64_32S */.

However in the PE doc I can't find any relocation type 0x0011. So am I looking at the wrong place or should this have been 0x0003?

comment:14 Changed 2 years ago by ezyang

Cc: igloo added

Ian Lyangh would be able to say better, having authored the patch, but what I think happened was, because the PE spec doesn't actually say how to process relocations, the code was written by cross-referencing against relocations in ELF. But it does look like R_X86_64_32S was given the wrong constant...

comment:15 Changed 2 years ago by Phyx-

Owner: set to Phyx-

comment:16 Changed 2 years ago by thoughtpolice

Milestone: 7.10.17.10.3

Moving to 7.10.3, in case there's a fix. Thanks Phyx-!

comment:17 Changed 2 years ago by Phyx-

@thoughtpolice Yes there will be a fix :) I have changed the code to identify most of the sections based on the flags in the PE file instead of the section names. So we don't have to keep a list of white-listed sections, so it should be much more resilient to changes.

The only sections still being ignored by names are a few debugging ones, but it's fine since those are reserved names and currently we don't do debug section relocations.

I'm just trying to find where the constant 17 comes from, but from looking at the generated .s files, I think this might be a bug in GAS. I'll submit a diff this weekend after I finish checking if the related bug reports are also fixed :)

comment:18 Changed 2 years ago by Phyx-

Differential Rev(s): Phab:D671Phab:D1244
Status: newpatch

comment:19 Changed 2 years ago by Thomas Miedema <thomasmiedema@…>

In 620fc6f9/ghc:

Make Windows linker more robust to unknown sections

The Windows Linker has 3 main parts that this patch changes.

1) Identification and classification of sections
2) Adding of symbols to the symbols tables
3) Reallocation of sections

1.
Previously section identification used to be done on a whitelisted
basis. It was also exclusively being done based on the names of the
sections. This meant that there was a bit of a cat and mouse game
between `GCC` and `GHC`. Every time `GCC` added new sections there was a
good chance `GHC` would break. Luckily this hasn't happened much in the
past because the `GCC` versions `GHC` used were largely unchanged.

The new code instead treats all new section as `CODE` or `DATA`
sections, and changes the classifications based on the `Characteristics`
flag in the PE header. By doing so we no longer have the fragility of
changing section names. The one exception to this is the `.ctors`
section, which has no differentiating flag in the PE header, but we know
we need to treat it as initialization data.

The check to see if the sections are aligned by `4` has been removed.
The reason is that debug sections often time are `1 aligned` but do have
relocation symbols. In order to support relocations of `.debug` sections
this check needs to be gone. Crucially this assumption doesn't seem to
be in the rest of the code. We only check if there are at least 4 bytes
to realign further down the road.

2.
The second loop is iterating of all the symbols in the file and trying
to add them to the symbols table. Because the classification of the
sections we did previously are (currently) not available in this phase
we still have to exclude the sections by hand. If they don't we will
load in symbols from sections we've explicitly ignored the in # 1. This
whole part should rewritten to avoid this. But didn't want to do it in
this commit.

3.
Finally the sections are relocated. But for some reason the PE files
contain a Linux relocation constant in them `0x0011` This constant as
far as I can tell does not come from GHC (or I couldn't find where it's
being set). I believe this is probably a bug in GAS. But because the
constant is in the output we have to handle it. I am thus mapping it to
the constant I think it should be `0x0003`.

Finally, static linking *should* work, but won't. At least not if you
want to statically link `libgcc` with exceptions support. Doing so would
require you to link `libgcc` and `libstd++` but also `libmingwex`. The
problem is that `libmingwex` also defines a lot of symbols that the RTS
automatically injects into the symbol table. Presumably because they're
symbols that it needs. like `coshf`. The these symbols are not in a
section that is declared with weak symbols support. So if we ever want
to get this working, we should either a) Ask mingw to declare the
section as such, or b) treat all a imported symbols as being weak.
Though this doesn't seem like it's a good idea..

Test Plan:
Running ./validate for both x86 and x86_64

Also running the specific test case for #10672

make TESTS="T10672_x86 T10672_x64"

Reviewed By: ezyang, thomie, austin

Differential Revision: https://phabricator.haskell.org/D1244

GHC Trac Issues: #9907, #10672, #10563

comment:20 Changed 2 years ago by thomie

Resolution: fixed
Status: patchclosed

Should be really fixed now.

Note: See TracTickets for help on using tickets.