Compiler performance

This is where we track various efforts to characterize and improve the performance of the compiler itself. If you are interested in the performance of code generated by GHC, see Performance/Runtime.

Relevant tickets

Identify tickets by using "Compile time performance bug" for the "Type of failure field".

Open Tickets:

Superclass `Monad m =>` makes program run 100 times slower
Do more coercion optimisation on the fly
Fix performance regressions from #14737
Memory strain while compiling HLint
2-fold memory usage regression GHC 8.2.2 -> GHC 8.4.1 compiling `mmark` package
Compile speed regression
Recompilation avoidance fails after a LANGUAGE change
The size of FastString table is suboptimal for large codebases
Hole-y partial type signatures greatly slow down compile times
High-memory usage during compilation using Template Haskell
Investigate performance of CoreTidy
2 modules / 2500LOC takes nearly 3 minutes to build
Minor regressions from removal of non-linear behavior from simplifier
Investigate regressions from simplifier refactor
Fix fusion for GHC's utility functions
Linker paths carry substantial N*M overhead when many libaries are used
Certain inter-module specializations run out of simplifier ticks
LLVM does not need to trash caller-saved registers.
Investigate compile-time regressions in regex-tdfa-1.2.2
Compile-time regression in 8.2 when compiling bloodhound's test suite
ghc --make seems to leak memory
Compiler allocations on sched in nofib regressed by 10% between 091333313 and 1883afb2
Why does memory usage increase so much during CoreTidy?
vector test suite uses excessive memory on GHC 8.2
compile-time memory-usage regression for DynFlags between GHC 8.0 and GHC 8.2
Poor compiler performance with type families
foldr/nil rule not applied consistently
Introduce fast path through simplifier for static bindings
Check known-key lists
Make Core Lint faster
Exponential compilation time with RWST & ReaderT stack with `-02`
Compiler allocation regressions from top-level string literal patch
family instance consistency checks are too pessimistic
Program uses 8GB of memory
Splitter is O(n^2)
Consider using compact regions in GHC itself to reduce GC overhead
GeneralizedNewtypeDeriving + MultiParamTypeClasses sends typechecker into an infinite loop
ghci -fobject-code -O2 doesn't do the same optimisations as ghc --make -O2
Don't optimize coercions with -O0
Compile time regression in GHC 8.
SIMD things introduce a metric ton of known key things
GHC panic: simplifier ticks exhausted
Performance regression with large numbers of equation-style decls
Large let bindings are 6x slower (since 6.12.x to 7.10.x)
Pattern match checker exceeded (2000000) iterations
Optimize coercionKind
Strictness signature blowup
Representation of value set abstractions as trees causes performance issues
Compiling a 10.000 line file exhausts memory
powerpc64: recomp015 fails with redundant linking
"Simplifier ticks exhausted" that resolves with fsimpl-tick-factor=200
Re-compilation driver/recomp11 test fails
TypeInType performance regressions
T3064 regresses with wildcard refactor
Deriving Read instance from datatype with N fields leads to N^2 code size growth
CallStack should not be inlined
GHC 7.10.2 takes much longer to compile some packages
Installation of SFML failed
Increased memory usage with GHC 7.10.1
Performance regression GHC 7.8.4 to GHC HEAD
dep_orphs in Dependencies redundantly records type family orphans
Unreasonable memory usage on large data structures
Long compile time/high memory usage for modules with many deriving clauses
Deriving instances is slow
unfolding info as seen when building a module depends on flags in a previously-compiled module
(super!) linear slowdown of parallel builds on 40 core machine
large performance regression in type checker speed in 7.8
Massive blowup of code size on trivial program
Transitivity of Auto-Specialization
long compilation time for module with large data type and partial record selectors
blowup in space/time for type checking and object size for high arity tuples
ghc -c recompiles TH every time while --make doesn't
GHC uses nub
Exponential behavior in instance resolution on fixpoint-of-sum
Interface hashes include time stamp of dependent files (UsageFile mtime)
TypeFamilies painfully slow
Superclass methods are left unspecialized
Regression in optimisation time of functions with many patterns (6.12 to 7.4)?
GHC compile times are seriously non-linear in program size
Compiling DynFlags is jolly slow
GHC retains unnecessary binding
Deriving Generic of a big type takes a long time and lots of space
Improve consistency checking for family instances
SpecConstr should exploit cases where there is exactly one call pattern
Improve float-in
Compilation of large source files requires a lot of RAM
ghc runs preprocessor too much

Closed Tickets:

Slowdown in ghc compile times from GHC 8.0.2 to GHC 8.2.1 when doing Called arity analysis
Underconstrained typed holes are non-performant
TH eats 50 GB memory when creating ADT with multiple constructors
Improve performance of Simplify.simplCast
GHC 8.4.1-alpha loops infinitely when typechecking
Redundant computation in fingerprintDynFlags when compiling many modules
Computing imp_finst can take up significant amount of time
Slow compile times for Happy-generated source
Compiling a function with a lot of alternatives bottlenecks on insertIntHeap
Quadratic constructor tag allocation
GHCi spins forever
GHC 8.2.1 regression: -ddump-tc-trace hangs forever
Unreasonably high memory use when compiling with profiling and -O2/-O2
The Binary instance for TypeRep smells a bit expensive
Performance Problems on AST Dump
Look into haddock performance regressions due to desugaring on -fno-code
checkFamInstConsistency dominates compile time
GHCi 2x slower without -keep-tmp-files
Bug report: "AThing evaluated unexpectedly tcTyVar a_alF"
Skylighting package compilation is glacial
3x slowdown on GHC HEAD with file containing lots of overloaded string literals
Space leak / quadratic behavior when inlining
Core string literal patch regresses compiler performance considerably
COMPLETE pragma causes compilation to hang forever under certain scenarios
Code size explosion with with inlined instances for fixed point of functor
High memory usage during compilation
Deriving Foldable causes GHC to take a long time (GHC 8.0 ONLY)
Use gold linker by default if available on ELF systems
GHC 8.0.1 uses copious amounts of RAM and time when trying to compile lambdabot-haskell-plugins
Adding an explicit export list halves compilation time.
`ghc --make` recompiles unchanged files when using `-fplugin` OPTIONS
Compilation time/space regression in GHC 8.0/8.1 (search in type-level lists and -O)
With -O1 and above causes ghc to use all available memory before being killed by OOM killer
Commit adding instances to GHC.Generics regression compiler performance
Increasing maximum constraint tuple size significantly blows up compiler allocations
'deriving Eq' on recursive datatype makes ghc eat a lot of CPU and RAM
regression: out of memory with -O2 -ddump-hi on a complex INLINE function
7% allocation regression in Haddock performance tests
Compile time performance degradation on code that uses undefined/error with CallStacks
Generics deriving is quadratic
T9872d bytes allocated has regressed terribly on 32-bit Linux
Cache coercion kinds and roles
Optimize cmpTypeX
Test TcCoercibleFail hangs with substitution sanity checks enabled
SPECIALIZE pragma does not work + compilation times regression in GHC 8.0-rc1
pandoc-types fails to build on 4 GB machine
-XTypeInType uses up all memory when used in data family instance
Solver hits iteration limit in code without recursive constraints
Type aliases twice as slow to compile as closed type families.
Pattern matching against sets of strings sharing a prefix blows up pattern checker
Split objects makes static linking really slow
New exhaustiveness checker breaks T5642
T783 regresses severely in allocations with new pattern match checker
New exhaustiveness checker breaks concurrent/prog001
New exhaustiveness checker breaks ghcirun004
-O0 -g slows GHC down on list literals (compared to -O0 without -g)
invalid fixup in runtime linker
D757 (emit Typeable at type definition site) regresses T3294 max_bytes_used by factor of two
Smaller generated Ord instances
ghc 7.8.4 on arm - panic: Simplifier ticks exhausted
Constant-time indexing of closed type family axioms
vector-0.11 compile time increased substantially with 7.10.1
Defining mapM_ in terms of traverse_ causes substantial blow-up in ByteCodeAsm
Profile ghc -j with an eye for performance issues
compile time performance regression with OverloadedStrings and Text
Regression, simplifier explosion with Accelerate, cannot compile, increasing tick factor is not a workaround
Compile time regression in OpenGLRaw
CallArity taking 20% of compile time
compiling huge HashSet hogs memory
compile-time performance regression compiling genprimcode
Performance problem with TrieMap
Excessive memory usage compiling T3064
compile-time performance regression (probably due to Generics)
poor performance when compiling modules with many Text literals at -O1
Recompilation avoidance doesn't work for -fno-code/-fwrite-interface
Compiler performance regression
Compiler memory use regression
Forcing the type to be IO {} instead of IO() causes a "panic! The impossible has happened" output.
small SPECIALIZE INLINE program taking gigabytes of memory to compile
compile hang and memory blowup when using profiling and optimization
7.8.1 uses a lot of memory when compiling attoparsec programs using <|>
Investigate recent 32bit compiler performance regressions
Exponential-long compilation of code with Implicit params
Linking in Windows is slow
GHC should not load packages for TH if they are not used
Compiling profiling CCS registration .c file takes far too long
Maintain per-generation lists of weak pointers
GHC 7.7 cannot link primitives
Memory Leak in CoreM (CoreWriter)
split-objs not supported for ARM
plugins always trigger recompilation
GHC doesn't optimise away primitive identity conversions
GHCi erroneously unloads modules after a failed :reload
New codegen more than doubles compile time of T3294
Extensive Memory usage (regression)
Regression: space leak in HEAD vs. 7.4
quadratic slowdown with very long module names
Type checker hangs
ghc with incorrect arguments deletes source file
T3016 takes long time to compile with LLVM
Compilation slowdown from 7.0.x to 7.2.x
mc03 -O -fliberate-case -fspec-constr runs out of memory
Very slow (nonterminating?) compilation if libraries compiled with -fexpose-all-unfoldings
Very slow constraint solving for type families
Simplifier performance regression (or infinite loop)
Compilation speed regression
New codegen: CmmStackLayout igraph memory explosion
ghc struggles to compile a large case statement
Slow type checking of type-level computation heavy code.
Performance regression in the type checker regression for GADTs and type families
object code size fairly large for ghc-7.0.1 with optimization
LLVM mangler takes too long at runtime
stand-alone deriving sometimes fails for GADTs
T3016 failed with timeout (hpc and optasm)
barton-mangler-bug failed with timeout (multiple ways)
Compilation performance regression
Compiler space regression in 7.0.1 RC 1
Template Haskell: Splicing Infinite Syntax Tree doesn't stop
deriving Enum fails for data instances
ghci leaks memory when loading a file
ghc 6.12.1 and 6.13.20090922 consume a lot more memory than 6.10.4 when compiling language-python package
reading a large String as Double takes too long
GHC leaks memory when compiling many files
GHC 6.12 dependency checking many times slower than 6.10
Ghc eats tremendous heaps of RAM in -prof build (highlighting-kate)
Code compiled WITHOUT profiling many times slower than compiled WITH profiling on
Large compilation time/memory consumption
Very long compile times with type functions
Reduce coercion terms to normal form
Excessive heap usage
Type-checking performance regression
Compiling with -O2 is 7x slower than -O
memory performance problem when compiling lots of derived instances in a single file
Compiling DoCon with 6.8.3 has 3x slow-down compared with 6.8.2
Use a more efficient representation than [DynFlag]
reading the package db is slow
problems with very large (list) literals
enormous compile times
Compiling with -O is 30 times slower than with -Onot
debugger: :trace is wasting time
High memory use when compiling many let bindings.

Type pile-up

Some programs can produce very deeply nested types of non-linear size. See Scrap your type applications for a way to improve these bad cases

  • #9198: large performance regression in type checker speed in 7.8
    • Types in Core blowing up quadratically (as seen in -ddump-ds output)

Coercion pile-up

One theme that seems to pop up rather often is the production of Core with long strings of coercions, with the size scaling non-linearly with the size of the types in the source program. These may or may not be due to similar root-causes.

  • #8095: TypeFamilies painfully slow
    • Here a recursive type family instance leads to quadratic blow-up of coercions
    This ticket has a discussion about a way to snip off coercions when not using -dcore-lint.
  • #7428: GHC compile times are seriously non-linear in program size
    • Here a CPS'd State monad is leading to a quadratic blowup in Core size over successive simplifier iterations
  • #5642: Deriving Generic of a big type takes a long time and lots of space
  • #14338: Simplifier fails with "Simplifier ticks exhausted"
    • Specialised dictionaries parametrized on a type-level list produce very large coercions.

One possible solution (proposed in #8095) is to eliminate coercions from the Core AST during usual compilation, instead only including them when we want to lint the Core.

Deriving instances

Another theme often seen is issues characterized by perceived slowness during compilation of code deriving instances. This could be due to a number of reasons,

  1. the implementation of the logic responsible for producing the instance code is inefficient
  2. the instance itself is large but could be expressed more concisely
  3. the instance itself is large but irreducibly so

While it's possible to fix (1) and (2), (3) is inherent.

Uncategorised compiler performance issues

  • #2346: desugaring let-bindings
  • #10228: increase in compiler memory usage, regression from 7.8.4 to 7.10.1
  • #10289: 2.5k static HashSet takes too much memory to compile
    • Significantly improved in memory usage from #10370, but worse at overall wall-clock time!
  • #7450: Regression in optimisation time of functions with many patterns (6.12 to 7.4)?
  • #10800: vector-0.11 compile time increased substantially with 7.10.1
    • Regression in vector testsuite perhaps due to change in inlinings
  • #13639: Skylighting package compilation is glacial

nofib results

tests/perf/compiler results

7.6 vs 7.8

  • A bit difficult to decipher, since a lot of the stats/surrounding numbers were totally rewritten due to some Testsuite API overhauls.
  • The results are a mix; there are things like peak_megabytes_allocated being bumped up a lot, but a lot of them also had bytes_allocated go down as well. This one seems pretty mixed.

7.8 vs 7.10

  • Things mostly got better according to these, not worse!
  • Many of them had drops in bytes_allocated, for example, T4801.
  • The average improvement range is something like 1-3%.
  • But one got much worse; T5837's bytes_allocated jumped from 45520936 to 115905208, 2.5x worse!

7.10 vs HEAD

  • Most results actually got better, not worse!
  • Silent superclasses made HEAD drop in several places, some noticeably over 2x
    • max_bytes_used increased in some cases, but not much, probably GC wibbles.
  • No major regressions, mostly wibbles.

Compile/build times

(NB: Sporadically updated)

As of April 22nd, 2016:

  • GHC HEAD: 14m9s (via 7.8.3) (because of Joachim's call-arity improvements)
  • GHC 7.10: 15m43s (via 7.8.3)
  • GHC 7.8: 12m54s (via 7.8.3)
  • GHC 7.6: 8m19s (via 7.4.1)

Random note: GHC 7.10's build system actually disabled DPH (half a dozen more packages and probably a hundred extra modules), yet things *still* got slower over time!

Interesting third-party library numbers

  • Compile time of some example program (fluid-tree) of fltkhs library increased from about 15 seconds to more than a minute (original message).
  • GHC takes significantly more memory compiling the xmlhtml library with -j4 than -j1 (1GB vs 150MB). See #9370.
  • The Language.Haskell.Exts.Annotated.Syntax of haskell-src-exts takes many tens of seconds to compile. Howeever, this may not be surprising: Consists of roughly 70 data definitions, some with many constructors, deriving (Eq,Ord,Show,Typeable,Data,Foldable,Traversable) on most of them as well as defining Functor.
  • vector-algorithms may be a nice test and reportedly got slower to compile and run in recent GHC releases.

Relevant changes

GHC 7.10 to GHC 8.0

GHC 8.0 to GHC 8.2

GHC 8.2 to GHC 8.4

Last modified 3 months ago Last modified on Apr 4, 2018 9:41:01 AM