Opened 6 years ago

Closed 5 years ago

Last modified 5 years ago

#8039 closed task (fixed)

RTS linker: unloadObj() does not actually unload the code

Reported by: simonmar Owned by: simonmar
Priority: high Milestone: 7.8.1
Component: Runtime System Version: 7.6.3
Keywords: Cc: ggreif@…
Operating System: Unknown/Multiple Architecture: Unknown/Multiple
Type of failure: None/Unknown Test Case: rts/linker_unload
Blocked By: Blocking: #7746
Related Tickets: Differential Rev(s):
Wiki Page:

Description

We've known about this for a long time, but it hasn't been a pressing issue in GHCi. The problem with actually unloading code is that there might be pointers into it from the heap or other RTS data structures, so we need to do a full heap traversal to discover whether it is safe to unload code or not.

This is an issue for a project at Facebook that needs to have long running processes that regularly unload and load code using the RTS linker, and right now the process grows over time.

This ticket is to track the issue, and I'm also working on a fix. The same problem affects the dynamic linker, although I'm not planning to fix that (yet).

Attachments (2)

0001-Really-unload-object-code-when-it-is-safe-to-do-so-8.patch (7.2 KB) - added by simonmar 6 years ago.
preview patch
0001-Really-unload-object-code-when-it-is-safe-to-do-so-8.2.patch (16.2 KB) - added by simonmar 6 years ago.
fixed version of the patch

Download all attachments as: .zip

Change History (21)

comment:1 Changed 6 years ago by ezyang

Excellent! I've run into this problem, and I'm curious what the fix's approach is.

Changed 6 years ago by simonmar

preview patch

comment:2 Changed 6 years ago by simonmar

I've attached a patch that sort of works. There's at least one problem still to fix (it doesn't get the static objects correctly when the GC is running in parallel).

It seems to work with some simple testing in GHCi, but I still need to make some automated test cases.

Changed 6 years ago by simonmar

fixed version of the patch

comment:3 Changed 6 years ago by igloo

By "in GHCi", I assumed you mean in 7.6, or HEAD with DYNAMIC_GHC_PROGRAMS=NO?

comment:4 Changed 6 years ago by ezyang

OK, so essentially, do a heap census. Sounds good to me. (I think I'd like it to work for the dynamic linker to but we want to eject that code, don't we?)

comment:5 Changed 6 years ago by simonmar

@igloo: yes, I have DYNAMIC_GHC_PROGRAMS=NO in my mk/build.mk. I don't know how to do this with the system linker yet, because there doesn't seem to be an easy way to get hold of the start and end address of the image in memory - there's a fairly complex API in rtld-audit, but even that only seems to give you the start address.

Anyway, for the application I need this for, we're using the RTS linker and not shared libraries. So please don't make the RTS linker go away :-)

comment:6 Changed 6 years ago by ezyang

OK, so maybe the comments here are out of date? http://hackage.haskell.org/trac/ghc/wiki/DynamicByDefault

comment:7 Changed 6 years ago by heisenbug

Cc: ggreif@… added

Great! This is a dearly needed feature for http://hackage.haskell.org/package/dynamic-loader too...

comment:8 in reply to:  5 ; Changed 6 years ago by igloo

Replying to simonmar:

Anyway, for the application I need this for, we're using the RTS linker and not shared libraries. So please don't make the RTS linker go away :-)

My plan was to leave it in 7.8, but not used by default, and then to remove it from HEAD shortly after 7.8.1 is released. The goal is to stop maintaining the ghci linker, after all!

comment:9 in reply to:  8 Changed 6 years ago by simonmar

Replying to igloo:

My plan was to leave it in 7.8, but not used by default, and then to remove it from HEAD shortly after 7.8.1 is released. The goal is to stop maintaining the ghci linker, after all!

I thought we had some platforms that didn't have dynamic linking support, e.g. ARM/Linux?

I personally don't care if we get rid of the MACH-O support, and maybe even the COFF support from the linker. The ELF support is fairly solid though, and for the use case I'm interested in we don't run into any of the known limitations.

Maybe I could use shared libraries. But (a) this is a scenario where dynamic linking is generally shunned because it creates more problems than it solves (deploying to large numbers of machines), (b) we really care about performance, and dynamic linking adds unnecessary overhead, and (c) we need to unload objects, and I don't know how to do that with shared libraries yet.

I still buy the arguments for moving to dynamic linking for GHCi. But applications with plugin support is a use case we weren't really thinking about during that discussion, and I think the tradeoffs are different.

comment:10 Changed 6 years ago by igloo

For platforms that don't support either the GHCi linker or dynlibs currently, then I think it would be easier and more useful to add dynlib support. There may be one or two (I'd guess that Arm Linux and PPC Linux are the only likely candidates) that currently support the GHCi linker and not dynlibs, but I think that it would be better for people who care about such platforms to add dynlib support rather than continuing to maintain the GHCi linker.

comment:11 Changed 5 years ago by Simon Marlow <marlowsd@…>

In bdfefb3b72a71cd0afca6e7766456c0d97c47c86/ghc:

Really unload object code when it is safe to do so (#8039)

The next major GC after an unloadObj() will do a traversal of the heap
to determine whether the object code can be removed from memory or
not.  We'll keep doing these until it is safe to remove the object
code.

In my experiments with GHCi, the objects get unloaded immediately,
which is a good sign: we're not accidentally holding on to any
references anywhere in the GHC data structures.

Changes relative to the patch earlier posted on the ticket:
 - fix two memory leaks discovered with Valgrind, after
   testing with tests/rts/linker_unload.c

comment:12 Changed 5 years ago by Simon Marlow <marlowsd@…>

In 269c89062dd5b6d2b8bc4d41e5dca0eca2308d02/testsuite:

Add a test for unloading object files in the linker (#8039)

comment:13 Changed 5 years ago by simonmar

Not ready to close yet: the linker doesn't work when DYNAMIC_GHC_PROGRAMS is on. I can't think of a reason for this.

comment:14 Changed 5 years ago by simonmar

Resolution: fixed
Status: newclosed
Test Case: rts/linker_unload

comment:15 Changed 5 years ago by heisenbug

I have pushed a new branch on my github ghc fork. It contains just one commit, about unloading dlopened shared object files.

It is actually untested, and non-windows. But it may be of use for doing what Simon did with unloading .o files in this ticket. Might be interesting to folks.

Of course the GC must be wired up in a similar fashion, etc. Be careful.

comment:16 Changed 5 years ago by simonmar

The big missing piece with unloading of .so's is knowing their boundaries in memory so that the GC can detect whether there are any references left. I don't know of a way to do that with the system linker, if anyone knows of a way then I'd be happy to add support for unloading .so's.

comment:17 in reply to:  16 Changed 5 years ago by heisenbug

Replying to simonmar:

The big missing piece with unloading of .so's is knowing their boundaries in memory

Eli Bendersky's article suggests to use the dl_iterate_phdr function for finding information about loaded libraries. Seems to be linux only. There is a workaround on OSX, though, on StackOverflow.

And here is how Böhm-Demers-Weiser's GC implement a callback function for dl_iterate_phdr.

Last edited 5 years ago by heisenbug (previous) (diff)

comment:18 Changed 5 years ago by ezyang

Blocking: 7746 added

comment:19 Changed 5 years ago by simonmar

Created #8238 to track unloading of shared libraries.

Note: See TracTickets for help on using tickets.