When trying to embed some files into an executable using Template Haskell, I find that memory usage during compilation exceeds 4GB and often crashes my laptop. The files I am trying to embed are only about 25MB in size (totally 35MB in size).
I made a somewhat minimal example to demonstrate this problem. To embed the files, I am using the `file-embed` package (the issue persists when using the alternative `wai-app-static` package too). The code to demonstrate runs in Linux and is available here - https://github.com/donatello/file-embed-exp. To try it out, just clone the repository and run make (it uses the Haskell Stack tool and the Linux dd utility).
This appear to be an issue in GHC. Is there anyway to mitigate the issue in the current version?
No, compiling with -O0 or -O2 has no effect. I see that embedding a 3MB file takes over 2.5GB of RAM!
I have updated the code to use only cabal and have managed to inline specific parts of file-embed (I am not very familiar with template haskell) - the problem still persists. Now I am only trying to embed a 3MB file (created by the Makefile).
The embed bytestring generates a large literal bytestring in assembly code, represented by (CmmString [Word8]). The pprASCII function will generated a list of Lit SDoc then use hcat to combine them.
I have made some optimization to pprASCII in D4384, after this patch this pprASCII still consume the most part of memory allocation, but it can decrease the total memory allocation efficiently.
Before:
total time = 2.43 secs (2429 ticks @ 1000 us, 1 processor)total alloc = 4,741,422,496 bytes (excludes profiling overheads)
After:
total time = 0.85 secs (851 ticks @ 1000 us, 1 processor)total alloc = 1,343,531,416 bytes (excludes profiling overheads)
Thank you for the fix, it looks promising - but I am not sure if the problem is completely solved.
The profiling output says that total allocations were reduced from 4.7GB to 1.3GB, which is 3.5X improvement. However, the goal in my initial program was to embed ~100MB of static data in my program - whereas the bug demonstrates the problem with a 3MB embedded string.
Is there any way I could get a built version of the ghc master for 64-bit x86 Linux (from a CI server perhaps), so I could try it out myself?
I'm not sure embedding a 100mb file into a program is really supported. What are you doing after you embed this file? Can't you just read the file when the program runs?
I want to embed some static assets used by my program (which is also built as a static binary), into the binary itself to enable easy distribution/deployment - simply download and execute a single (binary) file. It is quite common in some other languages (e.g. https://github.com/elazarl/go-bindata-assetfs#readme).
Due to this issue, I am currently reading the static assets in at start, but I would prefer to build all the assets into the binary itself.
The PPA does not seem to have the most recent commits, so I will for it to be updated before I try this out.
Embedding ~100MB static data in haskell code may consume around 40GB memory. Currently in TH the StringPrimL is built with [Word8] rather than ByteString.
Unpacking ~100MB bytestring to [Word8] and escaping it already consume GBs of memory.