Version 11 (modified by thoughtpolice, 3 years ago) (diff)


New Plugins work

Max originally did the work on GHC plugins in his GSoC 2008 hacking sprint. It involved the implementation of annotations as well as a dynamic loading aspect to GHC. While the annotations work was included into GHC HEAD, the loading infrastructure was not. This document describes the current work (as of 2011) to get it integrated into GHC HEAD so you can write core plugins, and future extensions to the interface, primarily writing C-- passes, and new backends.

1/17/11: I (Austin Seipp) am working on getting the patch cleaned up a little more and tidying it up before it gets integrated. Still need testsuite patches.

NB. Ridiculously incomplete writing/documentation.

Current overview

Get GHC from its HEAD repository, and apply this patch:

Then build GHC like normal.

Now GHC understands the -fplugin and -fplugin-arg options. You essentially install plugins for GHC by cabal installing them, and then calling GHC in the form of:

$ ghc -fplugin=Some.Plugin.Module -fplugin-arg=Some.Plugin.Module:no-fizzbuzz a.hs

Some.Plugin.Module should export a symbol named 'plugin' - see the following repository for an example that does Common Subexpression Elimination:

Basic overview of the plugins API for Core

Modules can be loaded by GHC as compiler plugins by exposing a declaration called 'plugin' of type 'GHCPlugins.Plugin', which is an ADT containing a function that installs a pass into the Core pipeline.

module Some.Plugin.Module (plugin) where
import GHCPlugins

plugin :: Plugin
plugin = defaultPlugin {
  installCoreToDos = install

-- type CommandLineOption = String

install :: [CommandLineOption] -> [CoreToDo] -> CoreM [CoreToDo]
install _options passes = do

We can think of CoreToDo as being a type synonym for (Core -> Core) - that is, an installation function inserts a pass into the list of core passes by just inserting itself into the list and returning it. For example, the CSE pass actually couples a simplification pass, followed by CSE into the front of the compilation pipeline:

module CSE.Plugin where


install :: [CommandLineOption] -> [CoreToDo] -> CoreM [CoreToDo]
install _options todos = do
    -- You should probably run this with -fno-cse !
    return $ CoreDoPasses [defaultGentleSimplToDo, cse_pass] : todos

cse_pass = CoreDoPluginPass "Plugged-in common sub-expression" (BindsToBindsPluginPass cseProgram)

More specifically, a CoreToDo describes some sort of particular pass over a Core program that can be invoked as many times as you like. For reference, defaultGentlSimplToDo is constructed using CoreDoSimplify. In this case, we construct a CoreDoPluginsPass, which takes a name and a PluginPass which looks like the following:

data PluginPass = BindsToBindsPluginPass ([CoreBind] -> CoreM [CoreBind]) -- ^ Simple pass just mutating the Core bindings
                | ModGutsToBindsPluginPass (ModGuts -> CoreM [CoreBind])  -- ^ Pass that has access to the information from a 'ModGuts'
                                                                          -- from which to generate it's bindings
                | ModGutsToModGutsPluginPass (ModGuts -> CoreM ModGuts)   -- ^ Pass that can change everything about the module being compiled.
                                                                          -- Do not change any field other than 'HscTypes.mg_binds' unless you
                                                                          -- know what you're doing! Plugins using this are unlikely to be stable
                                                                          -- between GHC versions

Most people will be using the first case - that is, writing a BindsToBindsPluginPass that just manipulates every individual Core binding.

The Future

Plugins for Cmm

Aside from manipulating the core language, we would also like to manipulate the C-- representation GHC generates for modules too.

The new code generator (more precisely, the optimizing conversion from STG to Cmm part of GHC) based on hoopl for dataflow analysis is going to be merged into HEAD Real Soon Now. It would be best perhaps to leave this part of the interface alone until it is merged - clients of the interface could then use hoopl to write dataflow passes for Cmm.

  • New code generator is based on several phases:
    • Conversion from STG to Cmm
    • Basic optimisations
    • CPS Conversion
    • More basic optimisations on CPS form
    • Conversion to old Cmm representation, then passed to backend code generator.
  • We need some sort of interface to describe how to insert it into the pipeline - is the Core approach best here?
  • Add new interface to Plugin next to installCoreToDos i.e. installCmmPass, that installs a pass of type CmmGraph -> CmmGraph into the optimization pipeline somehow.

New Backends

Backends could be written using plugins as well. This would make it possible to, for example pull the LLVM code generator out of GHC, and into a cabal package using the llvm bindings on hackage (like the dragonegg plugin for GCC) among other crazy things.

  • New interface to Plugin that is used by CodeOutput for custom backends?
    • TODO FIXME any assumptions about the backend that would invalidate this general idea?

Currently the new code generator converts the new Cmm based on Hoopl to the old Cmm representation when in use, so it can be passed onto the current native code generators. So adding this part of the API is rather independent of the current status of the new backend - the backend API just has to use the old CMM representation for conversion.

All backends are given the final Cmm programs in the form of the RawCmm datatype.

Possible interface: extend Plugin with a new field on the constructor, which can have a Cmm backend (TODO should DynFlags argument to plugin be replaced with type [CommandLineOption] that is already in use?)

type CmmBackend = DynFlags -> FilePath -> [RawCmm] -> IO ()
type CmmBackendPlugin = Maybe (String, CmmBackend)

data Plugin = Plugin {
  installCmmBackend :: CmmBackendPlugin

defaultPlugin = Plugin {
  installCmmBackend = Nothing

Then, to use:

module Some.Cmm.Plugin (plugin) where
import GHCPlugins

plugin :: Plugin
plugin = defaultPlugin {
  installCmmBackend = Just ("Wharble code generator backend", backend)

backend :: DynFlags -> FilePath -> [RawCmm] -> IO ()
backend dflags filenm flat_absC = do

backend is expected, roughly, to produce some intermediate code of some sort (like .S files for GNU as or .bc for LLVM.)

Modifications to compiler pipeline:

  • Dynamic code loading can be provided by the same code that works for Core plugins, so this is DONE
  • Extending HscTarget to recognize the new compilation output case
    • Might not be necessary. We can load plugins whenever, and scrutinize the 'installCmmBackend' field to see if there is Nothing and if there is, invoke the normal pipeline, otherwise call our own backend and exit then.
  • Modify compiler/main/CodeOutput.lhs to invoke the plugin callback.
    • Should Plugin-based backends should automatically prioritize over built-in backends (i.e., if it gets loaded through -fplugin, it is gettin' used no question?)
  • DriverPipeline needs to be aware of how to integrate a new backend into the overall compilation phase - for example, see compiler/main/DriverPipeline.hs, specifically runPhase which does things like running the LLVM optimizer, compiler and LLVM mangler when the LLVM backend is invoked. Afterwords, the assembler is invoked on the resultant object files.
    • Even though normally the backends are responsible for the code generation up to but not including linking, the Cmm backends need to have some concept of how to link together the final resultant program, and GHC needs to give it the necessary information - the plugin could very well want to do its own linking/final compilation steps for good reasons - the backend could be based on doing something fancy and needs to control the final link.