Opened 17 months ago

Last modified 8 months ago

#12388 new bug

Don't barf on failures in the RTS linker

Reported by: dobenour Owned by:
Priority: normal Milestone: 8.4.1
Component: Runtime System (Linker) Version: 8.0.1
Keywords: Cc: erikd, bgamari, simonmar
Operating System: Unknown/Multiple Architecture: Unknown/Multiple
Type of failure: None/Unknown Test Case:
Blocked By: Blocking:
Related Tickets: Differential Rev(s): Phab:D2570
Wiki Page:

Description

The RTS linker currently calls barf() when it fails. This is a problem because:

  1. It appears that there is a bug in GHC, when there is no bug.
  2. Failures to load code really should be recoverable.

According to a TODO in the code, the culprit is resource deallocation, which is very difficult due to the code being written in C and having complicated control flow. There are a few solutions:

  • Port the RTS linker to C++ and use RAII for resource management. Failures would be handled (internally to the linker) by throwing a C++ exception. This is actually my favorite, but might not be popular with the GHC devs.
  • Build a huge context struct containing all needed resources and free it before returning. Signal errors with longjmp().
  • Try to find each and every place where resources need to be free, and free them by hand. Signal errors with return codes. This seems too error-prone.

Change History (13)

comment:1 Changed 16 months ago by bgamari

Cc: erikd bgamari simonmar added

Ccing the usual RTS people.

comment:2 Changed 16 months ago by erikd

I would be the first to agree that the RTS Linker code is nowhere near as nice as it should be be. This code has been put together via accretion over a decade or more. It has had over a dozen people work on it and all of those people know C but would prefer to write Haskell. Another problem is that code has to support at least 6 CPU architecture about 5 different Unix variants and Windows.

I personally think porting the linker to C++ is a really bad idea. The linker is already under-resourced (in terms of people working on it) and using C++ instead of C would make it significantly more difficult for newcomers to work on it.

As for your other two suggestions I have no strong feelings for or against, but would happy to have anyone work on the linker to help improve it.

One last thought, a system using or based on talloc (https://talloc.samba.org/talloc/doc/html/index.html) may help.

Last edited 16 months ago by erikd (previous) (diff)

comment:3 Changed 16 months ago by simonmar

I already fixed a bunch of these in 5300099edf106c1f5938c0793bd6ca199a0eebf0, but there are definitely more. I think you're talking about loadArchive, right? That's where I ran out of energy, because it needs a lot of refactoring.

C++ would make it easy, but I don't think it's worth going to C++ just for this one bit of code. I would refactor it carefully, splitting up the function into smaller pieces and building a decent test suite.

comment:4 Changed 16 months ago by simonmar

Something that might help here is gcc's cleanup attribute: https://gcc.gnu.org/onlinedocs/gcc/Common-Variable-Attributes.html which gives you some of the benefits of RAII in plain C.

Going to C++ even for a small bit of the codebase would be a big step because we would have to use g++ as the linker and we get dependencies on C++ standard libs etc.

comment:5 Changed 14 months ago by dobenour

I have a patch. It doesn't solve the problem completely, but it solves the problem in loadArchive(). Failures in that function now cause an error return and a message to be written via errorBelch(). This isn't great, but it is better than the current situation.

Currently running ./validate --slow to test.

comment:6 Changed 13 months ago by thomie

Differential Rev(s): Phab:D2570
Status: newpatch

comment:7 Changed 13 months ago by Ben Gamari <ben@…>

In 488a9ed3/ghc:

rts/linker: Move loadArchive to new source file

Test Plan: Validate

Reviewers: erikd, simonmar, austin, DemiMarie

Reviewed By: erikd, simonmar, DemiMarie

Subscribers: hvr, thomie

Differential Revision: https://phabricator.haskell.org/D2615

GHC Trac Issues: #12388

comment:8 Changed 13 months ago by Ben Gamari <ben@…>

In aa10c67e/ghc:

rts/linker: Move loadArchive to new source file

Test Plan: Validate

Reviewers: DemiMarie, austin, simonmar, erikd

Reviewed By: DemiMarie

Subscribers: Phyx, thomie, hvr

Differential Revision: https://phabricator.haskell.org/D2642

GHC Trac Issues: #12388

comment:9 Changed 12 months ago by Ben Gamari <ben@…>

In 83d69dc/ghc:

Don't barf() on failures in loadArchive()

This patch replaces calls to barf() in loadArchive() with proper
error handling.

Test Plan: GHC CI

Reviewers: rwbarton, erikd, hvr, austin, simonmar, bgamari

Reviewed By: bgamari

Subscribers: thomie

Tags: #ghc

Differential Revision: https://phabricator.haskell.org/D2652

GHC Trac Issues: #12388

comment:10 Changed 12 months ago by bgamari

Resolution: fixed
Status: patchclosed

comment:11 Changed 12 months ago by bgamari

Resolution: fixed
Status: closednew

Actually, there are plenty more barfs in the linker. This will be a long road...

comment:12 Changed 12 months ago by Ben Gamari <ben@…>

In c766d53/ghc:

rts/linker: Fix LoadArchive build on Windows

Test Plan: Validate on Windows.

Reviewers: austin, erikd, simonmar

Subscribers: thomie

Differential Revision: https://phabricator.haskell.org/D2798

GHC Trac Issues: #12388

comment:13 Changed 8 months ago by bgamari

Milestone: 8.2.18.4.1

Given that 8.2.1-rc1 is imminent, I'm bumping these off to the 8.4

Note: See TracTickets for help on using tickets.