Opened 23 months ago

Last modified 4 months ago

#9314 new bug

Each object file in a static archive file (.a) is loaded into its own mmap()ed page

Reported by: kazu-yamamoto Owned by:
Priority: high Milestone: 8.2.1
Component: Runtime System Version: 7.8.3
Keywords: Cc: michael@…, chak@…, simonmar
Operating System: Unknown/Multiple Architecture: Unknown/Multiple
Type of failure: Other Test Case:
Blocked By: Blocking:
Related Tickets: Differential Rev(s):
Wiki Page:

Description

GHC API 7.8.x uses much more memory than GHC API 7.6.x. Attached two files demonstrate this:

  • A.hs -- Simple program using GHC API (copied from Wiki)
  • B.hs -- A target file, just hello world

You can compile A.hs as follows:

% ghc A.hs -package ghc -package ghc-paths

A.hs stays in 10 seconds. So, we can investigate its memory usage with the "top" command.The following is the result:

            Mac (64bit)  Linux (64bit)
GHC 7.6.3:         20MB            4MB
GHC 7.8.3:        106MB           13MB

Attachments (2)

A.hs (613 bytes) - added by kazu-yamamoto 23 months ago.
A simple code using GHC API
B.hs (45 bytes) - added by kazu-yamamoto 23 months ago.
A target file, just hello world

Download all attachments as: .zip

Change History (26)

Changed 23 months ago by kazu-yamamoto

A simple code using GHC API

Changed 23 months ago by kazu-yamamoto

A target file, just hello world

comment:1 Changed 23 months ago by kazu-yamamoto

From Karel Gardas: On Solaris 11 i386 (32bit binary)

GHC 7.6.3  53MB (size), 44MB (RSS)
GHC 7.8.2: 91MB (size), 81MB (RSS)

comment:2 Changed 23 months ago by snoyberg

  • Cc michael@… added

comment:3 Changed 23 months ago by JohnWiegley

I'm seeing a problem too, with an application that builds fine using 7.6.3. When building with -O2 and 7.8.3, GHC exhausts system memory (16G) and ultimately is killed. With 7.8.3 and -O1, or with 7.6.3, it finishes.

I can't paste my code here, but would be happy to try any suggestions to help isolate the problem.

comment:4 Changed 23 months ago by simonpj

-O2 adds SpecConstr, a notorious source of blow-up. Try switching it off with -fno-spec-constr.

The SpecConstr blow-up needs love and attention, and I keep being too distracted. Help most welcome. I don't think it's fundamental.

Simon

comment:5 Changed 23 months ago by simonmar

  • Milestone set to 7.8.4
  • Priority changed from normal to high

comment:6 Changed 23 months ago by kazu-yamamoto

In my case, this happens even with -O0.

comment:7 Changed 23 months ago by chak

  • Cc chak@… added

comment:8 Changed 23 months ago by kazu-yamamoto

Are there any -fxxx added for GHC 7.8, which enlarges loaded modules?

comment:9 Changed 23 months ago by simonmar

GHC loads more interface files in 7.8.x, due to the AMP warnings. This does make it use more memory (but a constant amount per compilation).

comment:10 Changed 22 months ago by rwbarton

OK here's the situation. Kazu's program is statically linked against the GHC API (ghc builds statically-linked executables by default) which means GHC is loading the static library versions of ghc-prim, integer-gmp, base. This was the case in GHC 7.6 also, but that version shipped with .o files (e.g. HSghc-prim-0.3.0.0.o) which the GHC linker can read as a single unit. These are not included with GHC 7.8 on platforms that use dynamic libraries by default (I guess since GHCi would not use them), so GHC 7.8 has to read the .a files instead. As described in a comment in loadArchive(), for reasons having to do with alignment, each member object file of an archive is loaded into its own mmap()ed page(s). The base package consists of approximately a zillion tiny object files (split objects), each of which costs 4 kilobytes, so as the comment mentions, this is quite wasteful. Almost all of the memory mapped by the program under 7.8 is attributable to either these mappings or to the executable file itself.

We could perhaps come up with a more efficient scheme for loading archive files, but a workaround is to just build the executable with -dynamic, so that it will load the GHC API as a shared library and avoid this excessive allocation.

comment:11 Changed 22 months ago by kazu-yamamoto

I confirmed that -dynamic reduces the memory usage.

comment:12 Changed 22 months ago by simonpj

Reid, thank you. I have no opinion about dynamic linking (except that it is generally the work of the devil, and has caused us a totally unreasonable amount of pain), but we should all be very grateful to you for diagnosing what is going on. I would never have thought of that in a million (or zillion) years.

Thanks!

Simon

comment:13 Changed 20 months ago by thoughtpolice

  • Milestone changed from 7.8.4 to 7.10.1

Moving (in bulk) to 7.10.4

comment:14 Changed 19 months ago by thomie

  • Component changed from GHC API to Compiler
  • Summary changed from Huge space leak of GHC API 7.8.x to Each object file in a static archive file (.a) is loaded into its own mmap()ed page

The comment in rts/Linker.c that rwbarton referred to in comment:10:

             /* We can't mmap from the archive directly, as object
                files need to be 8-byte aligned but files in .ar
                archives are 2-byte aligned. When possible we use mmap
                to get some anonymous memory, as on 64-bit platforms if
                we use malloc then we can be given memory above 2^32.
                In the mmap case we're probably wasting lots of space;
                we could do better. */

comment:15 Changed 17 months ago by dfeuer

  • Milestone changed from 7.10.1 to 7.12.1

comment:16 Changed 12 months ago by hsyl20

I have made a patch for this one (https://phabricator.haskell.org/D985)

With the example given in the ticket I get:

                   Linux (64bit)
7.10.1             174 MB / 20209 calls to mmap
HEAD with patch     95 KB /   332 calls to mmap

comment:17 Changed 12 months ago by hsyl20

  • Cc simonmar added
  • Component changed from Compiler to Runtime System
  • Differential Rev(s) set to Phab:D985
  • Status changed from new to patch

comment:18 Changed 9 months ago by thoughtpolice

  • Milestone changed from 7.12.1 to 8.0.1

Milestone renamed

comment:19 Changed 7 months ago by thomie

Phab:D985 has been merged into into Phab:D975, which has been merged into HEAD, but I'm not seeing the huge improvements that hsyl20 mentioned in comment:16.

GHC RSS
7.11.20151111 (HEAD) 123M
7.10.2 142M
7.8.4 131M
7.6.3 41M

I compile ghc A.hs -package ghc -package ghc-paths, then run ./A, look at the RES column in top, and convert from Kb to Mb.

What am I missing?

comment:20 Changed 7 months ago by thomie

  • Differential Rev(s) Phab:D985 deleted
  • Status changed from patch to new

Edit: fixing this might also make the use of driver/utils/merge_sections.ld unnecessary (see https://phabricator.haskell.org/D1242#inline-12085).

Last edited 7 months ago by thomie (previous) (diff)

comment:21 Changed 7 months ago by hsyl20

@simonmar wrote: "I like not having to memcpy all the memory for an object file into the 1-2GB region, we just mmap it - this is also useful because various tools (like perf) understand the mapping and can give symbol names. For .a files I think it's fine to memcpy the bits though." (https://phabricator.haskell.org/D985#26591)

I think that's why D975 doesn't bring the memory improvement while D985 did. Maybe we should add a flag (or use -g) to switch between the mmap mode that tools like perf understand and the memcpy mode that is much cheaper (the latter becoming the default mode).

comment:22 Changed 7 months ago by hsyl20

I have created a diff to put the fix back: Phab:D1470

It shouldn't be an issue for perf-like tools because the mmap's I replaced with m32_alloc's were not mapping a file.

comment:23 Changed 4 months ago by thomie

bgamari: what is the status here?

comment:24 Changed 4 months ago by bgamari

  • Milestone changed from 8.0.1 to 8.2.1

bgamari: what is the status here?

hsyl20 and I were discussing it a few weeks ago; I fixed a few issues in Phab:D1470 but there is still a fair amount of debugging to do before it is mergeable. It certainly isn't 8.0 material at this point.

Note: See TracTickets for help on using tickets.