I've just taken over an existing haskell package and in the process of getting it to work with ghc 8.0 discovered that when compiling one file ghc tires to use all my ram. It basically ends up using 40% of 16 Gig before being killed by the OOM (out-of-memory) killer.
Maybe not too surprisingly, removing all the INLINE pragmas (51 in 500 lines of code) allows the code to compile even with -O2. (suggested by @mpickering).
A guess would be that in 8.0’s library version, liftM is implemented in a way that mentions fmap, and liftM is inlined to expose that. Then all your unconditinally marked inline functions become effectively recursive, but still keep inlining. But didn’t investigate deeper.
This looks bad. Inlining should not go on for ever ... there's a tick counter as a last-ditch hard stop. It could just possibly be too many INLINE pragmas, but it's quite difficult to eat 16G.
Thomie, THANK YOU for the small repro case in ticket:12425#comment:122699. With its help I rapidly nailed the bug. Here's what is happening.
Suppose we have
let rec { f = ...g...g... ; g = ...f...f... }incase x of True -> ...f.. False -> ..f...
In each iteration of the simplifier the occurrence analyser OccAnal chooses a loop breaker. Suppose in iteration 1 it choose g as the loop breaker. That means it is free to inline f.
Suppose that GHC decides to inline f in the branches of the case, but (for some reason; eg it is not satureated) in the rhs of g. So we get
let rec { f = ...g...g... ; g = ...f...f... }incase x of True -> ...g...g..... False -> ..g..g....
Now suppose that, for some reason, in the next iteraion the occurrence analyser chooses f as the loop breaker, so it can freely inling g. And again for some reason the simplifer inlines g at its calls in the case branches, but not in the RHS of f. Then we get
let rec { f = ...g...g... ; g = ...f...f... }incase x of True -> ...(...f...f...)...(...f..f..)..... False -> ..(...f...f...)...(..f..f...)....
You can see where this is going! Each iteration of the simplifier doubles the number of calls to f or g. No wonder GHC is slow!
(In the particular example in ticket:12425#comment:122699, f and g are the two mutually recursive fmap instances for CondT and Result. They are both marked INLINE which, oddly, is why they don't inline in each other's RHS, because the call there is not saturated.)
The root cause is that we flip-flop on our choice of loop breaker. I always thought it didn't matter, and indeed for any single iteration to terminate, it doesn't matter. But when we iterate, it matters a lot!!
I think this was not happening before because, by a fluke, we tended to always choose the same one. But, perhaps as a resul of Bartosz's work on making GHC deterministic, the occurrence analyser now reliably flip-flops in each iteration, as above.
What to do? We want the occurrence analyser to choose a loop breaker based on stable criteria, not arbitrary things. Simple idea: if two or more potential loop breakers look equally plausible, choose the one that was a loop breaker in the previous iteration. That should make it stable.
Stay tuned. But I thought I'd explain what is happening in case I get distracted.
I built a compiler from git HEAD, but that wouldn't build the dependencies for my library (numerous problems with the primitve and vector libraries).
I tried checking out the ghc-8.0 branch and then cherry-picking commit d03e41b4f5 but that failed due to merge conflicts in compiler/simplCore/OccurAnal.hs that seem rather tricky to resolve.
Sorry erikd, several of those dependencies not building on GHC HEAD were probably my fault. After adjusting some template-haskell upper version bounds on dependent packages on Hackage, I was able to successfully install conduit-find with GHC HEAD, and I did not observe any noticeable slowness.
Thanks @RyanGlScott . As long it compiled with -O1 or above, then its fixed. The original problem was GHC chewing up all memory when *compiling* conduit-find.
Just for the record, this bug can be reproduced by compiling Hackage's find-conduit 0.4.4 with GHC 8.0.1. find-conduit is the old version of conduit-find prior to a maintainer change, and prior to the workarounds to avoid this bug that have been incorporated into conduit-find.