Opened 9 years ago

Last modified 3 years ago

#3571 new bug

Bizzarely bloated binaries

Reported by: guest Owned by:
Priority: lowest Milestone:
Component: Compiler (Linking) Version: 6.10.4
Keywords: Cc: batterseapower@…, michal.terepeta@…
Operating System: Unknown/Multiple Architecture: Unknown/Multiple
Type of failure: None/Unknown Test Case:
Blocked By: Blocking:
Related Tickets: Differential Rev(s):
Wiki Page:


Compiling a trivial test program:

module Main where

main = print "Hello World"

Using GHC 6.10.4 produces a VERY suspicious PE file. (NB: this applies to DLL as well as EXE output).

The two problems that I have observed are:

1) The PE always contains a .stab and .stabstr section totalling 0x2A00 of debug data. Looking at the contents of stabstr, this appears to originate from a libffi object file. Perhaps we could disable stabs when building libffi to remove this bloat from output binaries.

2) The PE contains *A LOT* of trailing junk. My hello world program is 691K, and the PE contains 0x4FAFC = 318K of data which doesn't live in any section! Trimming this data appears to have no effect on the correctness of the program! The amount of junk grows proportionally to the amount of real code and data - I have observed e.g. 18Mb DLLs of which 9Mb are trailing junk.

To repeat: we could potentially *halve* GHC binary sizes by fixing this linker behaviour.

I'm not sure where exactly the fault lies - whether it is a GHC problem or some bug in Ld.

To test trimming your executables and DLLs, you can use this utility I whipped up. Usage is "trimpe <file1> ... <fileN>". It will trim useless data from the end of the files in place:

{-# LANGUAGE ScopedTypeVariables #-}
module Main (main) where

import Control.Monad

import Data.Binary
import Data.Binary.Get

import qualified Data.ByteString.Lazy as ByteString
import Data.Word

import System.Environment

import Debug.Trace

assertM :: Monad m => Bool -> m ()
assertM True  = return ()
assertM False = fail "assertM"

newtype PEImageLength = PEImageLength Word32

instance Binary PEImageLength where
    get = do
        -- Skip the MS DOS stub
        skip 0x3c
        pe_sig_offset <- getWord32le
        -- Skip to the PE signature
        skip (fromIntegral pe_sig_offset - (0x4 + 0x3c))
        -- Read the PE signature itself
        -- NB: this will always be the string "PE\0\0"
        _sig <- getWord32le
        assertM (_sig == 0x00004550)
        -- Read COFF file header
        _machine <- getWord16le
        _no_of_sects <- getWord16le
        _time_date_stamp <- getWord32le
        _ptr_to_sym_tab <- getWord32le
        _no_of_syms <- getWord32le
        _size_of_opt_header <- getWord16le
        assertM (_size_of_opt_header /= 0)
        _characteristics <- getWord16le
        -- Read the "optional" header
        magic <- getWord16le
        let pe32plus = magic == 0x20B
        _maj_link_ver :: Word8 <- get
        _min_link_ver :: Word8 <- get
        _size_of_code <- getWord32le
        _size_of_init_data <- getWord32le
        _size_of_uninit_data <- getWord32le
        _addr_of_entry_point <- getWord32le
        _base_of_code <- getWord32le
        when (not pe32plus) $ do _base_of_data <- getWord32le; return ()
        -- Read the optional header Windows fields
        if pe32plus
         then do _image_base <- getWord64le; return ()
         else do _image_base <- getWord32le; return ()
        _sect_alignment <- getWord32le
        _file_alignment <- getWord32le
        _maj_os_version <- getWord16le
        _min_os_version <- getWord16le
        _maj_image_version <- getWord16le
        _min_image_version <- getWord16le
        _maj_subsys_version <- getWord16le
        _min_subsys_version <- getWord16le
        _win32_version <- getWord32le
        size_of_image <- getWord32le
        -- There is more stuff later, but I simply don't care about it
        -- NB: we could trim a little more agressively if we interpreted
        -- the sections as well...
        return $ PEImageLength size_of_image
    put = error "Binary PEImageLength: put"

main :: IO ()
main = do
    files <- getArgs
    forM_ files trimPEToImageSize

trimPEToImageSize :: FilePath -> IO ()
trimPEToImageSize file = do
    putStrLn $ file
    pe_contents <- ByteString.readFile file
    let PEImageLength image_size = decode pe_contents
    -- Force the file to close so that the write may succeed
    (ByteString.last pe_contents) `seq` return ()
    when (ByteString.length pe_contents > fromIntegral image_size) $ do
        putStrLn $ "* Trimming to image size (" ++ show image_size ++ ")"
        let pe_contents' = ByteString.take (fromIntegral image_size) pe_contents
        ByteString.writeFile file pe_contents'

Change History (21)

comment:1 Changed 9 years ago by guest

I'm finding it very hard to believe that this data really is useless. It remains in the object file whether I used ld 2.17.50, 2.18.50 or 2.19.1 to do the final link. However, running "strip" on the executable has a similar effect and indeed ensures that the PE image size is not exceeded by the file length.

The actual binary contents of the EXE's useless rump is strange. It is a mixture of z-encoded symbols, section names (.rodata, .bss, .data, .txt), and purely binary data.

comment:2 Changed 9 years ago by guest

I'm also confirming that using strip is the normal thing to do, I've done this for years, usually followed by a upx which further reduces the size to about 120KB.

comment:3 Changed 9 years ago by igloo

difficulty: Unknown
Milestone: 6.12 branch

Thanks for the report.

comment:4 Changed 9 years ago by simonmar

Milestone: 6.12 branch6.12.2
Type of failure: None/Unknown

Let's investigate this for 6.12.2. It's probably normal, but we ought to discover why it's happening and record the knowledge somewhere.

comment:5 Changed 8 years ago by igloo


comment:6 Changed 8 years ago by igloo

Priority: normallow

comment:7 Changed 8 years ago by michalt

Cc: michal.terepeta@… added

I don't know much about PE format, but if strip fixes the problem, then isn't the whole thing just about discarding symbols/debug info from an executable? As for the size reduction by the "trimming" mentioned in the description, I get very similar results with simply stripping the executable..

For "hello world":

> ll Test
-rwxr-x--- 1 m m 975K Dec 12 18:50 Test*
> strip Test
> ll Test
-rwxr-x--- 1 m m 649K Dec 12 18:51 Test*

So the difference is 326KB. For xmonad:

> ll xmonad-x86_64-linux
-rwxr-x--- 1 m m 6.4M Dec 12 18:49 xmonad-x86_64-linux*
> strip xmonad-x86_64-linux
> ll xmonad-x86_64-linux
-rwxr-x--- 1 m m 3.6M Dec 12 18:51 xmonad-x86_64-linux*

So for larger programs the size is almost halved.

Btw. cabal strips executables by default.

comment:8 Changed 8 years ago by batterseapower

Of course, it is expected behaviour that strip eliminates the bloat. The point of the trimming code above is to demonstrate that almost all of the effect of strip is just because it removes useless gunk that lives outside *any PE section*! This trimming can be accomplished by just truncating the PE to the image length encoded within it.

Even debug data should have a section - indeed many PE executables have a .debug section - but this stuff at the end is not in any section and therefore just seems entirely redundant.

So even though we know that strip fixes the problem, we do not know why the problem exists in the first place...

comment:9 Changed 8 years ago by michalt

Looking at the PE spec:

The .debug Section


The next section describes the format of the debug directory, which can be
anywhere in the image. Subsequent sections describe the "groups" in object files
that contain debug information.

The default for the linker is that debug information is not mapped into the
address space of the image. A .debug section exists only when debug information
is mapped in the address space.

Debug Directory (Image Only)

Image files contain an optional debug directory that indicates what form of
debug information is present and where it is. This directory consists of an
array of debug directory entries whose location and size are indicated in the
image optional header.

The debug directory can be in a discardable .debug section (if one exists), or
it can be included in any other section in the image file, or not be in a
section at all.

Each debug directory entry identifies the location and size of a block of debug
information. The specified RVA can be zero if the debug information is not
covered by a section header (that is, it resides in the image file and is not
mapped into the run-time address space). If it is mapped, the RVA is its

So if the debugging stuff is not mapped into memory, then I don't think it is going to be included in the size_of_image.

Does that make any sense, or did I misunderstand the problem?

comment:10 Changed 8 years ago by batterseapower

OK, that does make sense. That quote contradicts what I had thought the behaviour of PE was.

I'm still in favour of keeping the ticket open, because that debug info shouldn't be linked in at all (except maybe with -debug?), and it might help with our linking time problems if it wasn't.

Furthermore, I had to write the "trimmer" above because I found weird behaviour with (IIRC - it's some time ago now) Cygwin strip when used on Windows DLLs that caused it to strip out the DLL's exports, contrary to what the manpage said. If GHC binaries were svelte by default I would not have had to grapple with getting strip to do what I want.

comment:11 Changed 8 years ago by igloo


comment:12 Changed 8 years ago by igloo


comment:13 Changed 7 years ago by igloo


comment:14 Changed 7 years ago by igloo

Priority: lowlowest

comment:15 Changed 6 years ago by igloo


comment:16 Changed 4 years ago by thoughtpolice


Moving to 7.10.1.

comment:17 Changed 4 years ago by thoughtpolice


Moving to 7.12.1 milestone; if you feel this is an error and should be addressed sooner, please move it back to the 7.10.1 milestone.

comment:18 Changed 4 years ago by thoughtpolice

Moving to 7.12.1 milestone; if you feel this is an error and should be addressed sooner, please move it back to the 7.10.1 milestone.

comment:19 Changed 3 years ago by ezyang

Component: CompilerCompiler (Linking)

comment:20 Changed 3 years ago by thoughtpolice


Milestone renamed

comment:21 Changed 3 years ago by thomie

Milestone: 8.0.1
Note: See TracTickets for help on using tickets.