Changes between Version 3 and Version 4 of Ghc/Hooks


Ignore:
Timestamp:
Sep 5, 2013 12:52:36 AM (2 years ago)
Author:
luite
Comment:

--

Legend:

Unmodified
Added
Removed
Modified
  • Ghc/Hooks

    v3 v4  
    11= GHC API Hooks =
    22
    3 This document describes a proposal for a **hooks** interface to customise certain phases of GHC's source transformation facilities.  A hook is simply a callback that is called at certain points in the program.
     3This document describes a proposal for a **hooks** interface to customise certain phases of GHC's source transformation facilities.  A hook is GHC API-user-supplied callback that overrides the built-in functionality when installed.
    44
    55== The Problem ==
    66
    7 The GHC API can be used for many different purposes, but some of these purposes require deviating behaviour from what GHC would normally do.  For example, if we want to use GHC as a front-end to a Haskell compiler, we may want to replace the code generator with a custom backend (e.g., a byte code format or JavaScript code).  Similarly, some tools want to inject information into the compiler (e.g., during quasi-quoting).
     7Much of GHC is exposed as a library, making it possible to use the compiler in various new ways, like extracting type information from Haskell code or generating code from one of the intermediate representations. Often, these use cases require some change from the default behaviour at certain points during compilation. For example someone writing a custom code generator would want reuse as much as possible of the GHC pipeline, to get dependency chasing and recompilation avoidance, but replace the part where normally the native code generator would be called.
     8
     9Unfortunately the GHC API offers no configuration options for this, and it is hard to customize the compiler manually: For example once {{{load LoadAllTargets}}} is called, GHC runs its own pipeline, replacing specific functionality would require the user to copy large parts of `GhcMake` and `DriverPipeline`.
    810
    911== Proposed Solution ==
     
    1113One option is to split GHC's stages into lots of small function calls and allow the user to wire these stages together as needed.  Unfortunately, this is very difficult to do and wouldn't necessarily be very flexible since there are a number of expected invariants not encoded in the types.
    1214
    13 Instead we identify "interesting" places inside the compiler and allow users of the GHC API to specify a call-back that gets invoked when execution reaches that place.  For example, instead of calling {{{typecheckRename}}} function directly, we first look up whether there is a hook specified for and if so call that function instead.  The hook may choose to perform its own renaming and type checking passes (unlikely) or call the GHC's {{{typecheckRename}}} function and inspect its output and generate additional files on disk.
     15Instead we propose hooks: Hooks are an extensible way to replace specific parts of the GHC pipeline by user-supplied callbacks. The goal is to add them to strategic points in the library, to cover most use cases without sprinkling hooks carelessly throughout the GHC code. The API should be considered in flux, since it could require some experimentation to find the best hook locations.
     16
     17An example that uses all currently implemented hooks, along with who uses them can be found here:
     18
     19[https://gist.github.com/luite/6444273 Hooks demonstration program]
    1420
    1521== Example ==
     22{{{
     23hooksExample :: [Located String] -> [FilePath] -> IO ()
     24hooksExample args targetFiles =
     25    defaultErrorHandler defaultFatalMessager defaultFlushOut $
     26      runGhc (Just libdir) $ do
     27        dflags0 <- getSessionDynFlags
     28        (dflags1, _leftover, _warns) <- parseDynamicFlags dflags0 args
     29        let ah hk h dfs = dfs { hooks = insertHook hk h (hooks dfs) }
     30            dflags2 = dflags1 { hooks = insertHook RunQuasiQuoteHook myRunQuasiQuote (hooks dflags1) }
     31        setSessionDynFlags dflags2
     32        setTargets =<< mapM (\file -> guessTarget file Nothing) targetFiles
     33        successFlag <- sourceErrorHandler (load LoadAllTargets)
     34        when (failed successFlag) (throw $ ExitFailure 1)
     35
     36sourceErrorHandler m = handleSourceError (\e -> do
     37  GHC.printException e
     38  liftIO $ exitWith (ExitFailure 1)) m
     39
     40myRunQuasiQuote :: HsQuasiQuote Name -> RnM (HsQuasiQuote Name)
     41myRunQuasiQuote q@(HsQuasiQuote name span quoted) = do
     42  liftIO $ putStrLn ("myRunQuasiQuote: running quasiquoter on\n" ++ show quoted)
     43  return q -- don't change the quote or quoter for the example
     44}}}
     45
     46`myRunQuasiQuote` is called for every quasiquote
     47
     48[https://gist.github.com/luite/6444273 Demonstration program that uses all hooks]
     49
     50== Design ==
     51
     52Each hook has a potentially different type from all the other hooks. Additionally, we need to be able to communicate hooks to all the locations where they may be invoked. We can achieve this by storing the hooks in the {{{DynFlags}}}.
     53
     54Unfortunately, it is impossible to store them directly, as a record, since that would lead to huge cyclic imports (the data types used by the hooks would depend on {{{DynFlags}}}, but {{{DynFlags}}} would depend on the modules defining the types in the hooks record)
     55
     56Instead we implement {{{Hooks}}} as a heterogeneous map. The public API allows one to insert and lookup hooks safely, correctness of the types is guaranteed by the {{{Hook}}} type family. Internally, the {{{TypeRep}}} of {{{a}}} is used as the actual key for {{{Hook a}}}:
     57
     58{{{
     59newtype Hooks = Hooks (M.Map TypeRep Any)
     60
     61type family Hook a :: *
     62
     63insertHook :: Typeable a => a -> Hook a -> Hooks -> Hooks
     64insertHook tag hook (Hooks m) =
     65  Hooks (M.insert (typeOf tag) (unsafeCoerce hook) m)
     66
     67lookupHook :: Typeable a => a -> Hooks -> Maybe (Hook a)
     68lookupHook tag (Hooks m) =
     69  fmap unsafeCoerce (M.lookup (typeOf tag) m)
     70}}}
     71
     72== Installing a hook ==
     73
     74To use a hook, just add it to the {{{DynFlags}}} for your session, using the Hook's key:
     75
     76{{{
     77dflags1 = dflags0 { hooks = insertHook SomeHook myImplementation (hooks dflags0) }
     78}}}
     79
     80Keep in mind that Hooks is a low level API, it's easy to break things. Also in some cases, inserting one hook may require inserting another. For example if you use `TcForeignsHook` to accept extra types for your foreign imports, you'll need `DsForeignsHook` to desugar them, otherwise GHC will not know what to do with them.
     81
     82== Making a new hook ==
     83
     84Every hook requires a key, a data type that has to implement {{{Typeable}}}, since we use its {{{TypeRep}}} as a globally unique key for the {{{Hooks}}} map. Additionally, a type family instance of {{{Hook}}} is required, to map the key to the actual hook type. You might want to export the original unhooked function, or extra types and functions that users of the hook will need.
     85
     86If you're in a monad with a {{{HasDynFlags}}} instance, you can use the {{{getHooked}}} function from {{{DynFlags}}}:
     87
     88{{{
     89data HscFrontendHook = HscFrontendHook deriving Typeable
     90type instance Hook HscFrontendHook = ModSummary -> Hsc TcGblEnv
     91
     92genericHscFrontend :: ModSummary -> Hsc TcGblEnv
     93genericHscFrontend mod_summary =
     94  getHooked HscFrontendHook genericHscFrontend' >>= ($ mod_summary)
     95
     96-- original function
     97genericHscFrontend' :: ModSummary -> Hsc TcGblEnv
     98genericHscFrontend' mod_summary
     99    | ExtCoreFile <- ms_hsc_src mod_summary =
     100        panic "GHC does not currently support reading External Core files"
     101    | otherwise = do
     102         hscFileFrontEnd mod_summary
     103}}}
     104
     105Otherwise, use the hooks field from `DynFlags` directly:
    16106
    17107{{{
    18108
    19   withGhc libdir $ do
    20     dflags0 <- getSessionDynFlags
    21     let dflags1 = dflags0{ hooks = insertHook LocateLibHook myLocateLib
    22                                  . insertHook LinkDynLibHook myLinkDynLibHook
    23                                  $ hooks dflags0 }
    24     setSessionDynFlags dflags1
     109data HscCompileOneShotHook =
     110  HscCompileOneShotHook deriving Typeable
     111type instance Hook HscCompileOneShotHook =
     112  HscEnv -> FilePath -> ModSummary -> SourceModified -> IO HscStatus
    25113
    26 myLocateLib :: DynFlags -> Bool -> [FilePath] -> String -> IO LibrarySpec
    27 myLocateLib
     114 hscCompileOneShot :: HscEnv
     115                   -> FilePath
     116                   -> ModSummary
     117                   -> SourceModified
     118                   -> IO HscStatus
     119hscCompileOneShot env =
     120  fromMaybe hscCompileOneShot'
     121    (lookupHook HscCompileOneShotHook . hooks . hsc_dflags $ env) env
    28122
    29 myLinkDynLibHook :: DynFlags -> [FilePath] -> [PackageId] -> IO ()
    30 myLinkDynLibHook dflags paths ids = do
     123-- original function
     124hscCompileOneShot' :: HscEnv
     125                   -> FilePath
     126                   -> ModSummary
     127                   -> SourceModified
     128                   -> IO HscStatus
     129hscCompileOneShot' hsc_env extCore_filename mod_summary src_changed
     130   = ...
    31131}}}
    32132
    33 The two functions will be called whenever GHC needs to locate or link a dynamically loaded library.
     133== List all currently available hooks ==
    34134
    35 == The Hook datatype ==
    36 
    37 Each hook has a potentially different type from all the other hooks. Additionally, we need to be able to communicate hooks to all the locations where they may be invoked. This is achieved by storing the list of hooks in the {{{DynFlags}}}.  This, however, means that hooks cannot be defined as an ADT, as that would lead to huge cyclic imports (the data types used by the hooks will depend on {{{DynFlags}}}, but the {{{DynFlags}}} will depend on the hook data type.  Instead we {{{Hooks}}} is an untyped key-value store.  The keys are single constructor types and the {{{Hooks}}} map is indexed by their {{{TypeRep}}}.  We recover the hook type via a type family:
    38 
    39 {{{
    40 --- Implementation Sketch -----------------------
    41 data Hook = forall a. Hook TypeRep a
    42 
    43 type Hooks = Map TypeRep Hook
    44 
    45 type family HookType a :: *
    46 
    47 insertHook :: forall a. Typeable a => a -> HookType a -> Hooks -> Hooks
    48 insertHook key value hooks =
    49   let hook = Hook (typeOf key) value in
    50   Map.insert (typeOf key) hook hooks
    51 
    52 lookupHook :: forall a. Typeable a => Hooks -> Maybe (HookType a)
    53 lookupHook hooks =
    54   let key = typeOf (undefined :: a) in
    55   case Map.lookup key of
    56      Nothing -> Nothing
    57      Just (Hook _ h) -> Just (unsafeCoerce h :: HookType a)  -- the tricky bit
    58 }}}
    59 
    60 {{{
    61 -- Defining a hook type:
    62 data LocateLibHook = LocateLibHook deriving Typeable
    63 type instance HookType LocateLibHook = DynFlags -> Bool -> [FilePath] -> String -> IO LibrarySpec
    64 }}}
    65 
    66 == TODO: List all currently available hooks ==
    67 
     135Todo, add list here, see [https://gist.github.com/luite/6444273 demo program]