In this repo there is a small program which performs much better with a late specialisation pass. There is a plugin which implements this pass. Instructions about how to build the repo are in the README.
The code in question first uses a type class to generate an overloaded function. The overloaded function is not immediately apparent, it is defined in terms of combinators which must be inlined and then later we get calls of fmap next to a dictionary which can be specialised upon.
Diffing the core output immediately shows where the difference is. In the bad version we have lots of calls to fmap which are not eliminated because the function they are contained in is not specialised.
I don't know if this is related. I asked Johan Tibell the other day why unordered-containers marks almost everything INLINE instead of INLINABLE. He replied that when an INLINE function calls an INLINABLE one, we end up calling to specialize. He also indicated that he'd opened a ticket about this long ago; I don't know which one.
commit afad5561d88f04744c398ef0640d846db6262aa0Author: Matthew Pickering <matthewtpickering@gmail.com>Date: Mon Mar 19 13:29:14 2018 -0400 Add -flate-specialise which runs a later specialisation pass Runs another specialisation pass towards the end of the optimisation pipeline. This can catch specialisation opportunities which arose from the previous specialisation pass or other inlining. You might want to use this if you are you have a type class method which returns a constrained type. For example, a type class where one of the methods implements a traversal. It is not enabled by default or any optimisation level. Only by manually enabling the flag `-flate-specialise`. Reviewers: bgamari Reviewed By: bgamari Subscribers: rwbarton, thomie, carter Differential Revision: https://phabricator.haskell.org/D4457