Changes between Version 1 and Version 2 of HaddockComments


Ignore:
Timestamp:
Oct 21, 2006 6:54:12 PM (9 years ago)
Author:
waern
Comment:

--

Legend:

Unmodified
Added
Removed
Modified
  • HaddockComments

    v1 v2  
    11= Work in Progress = 
    22= A description of the Haddock comment support in GHC = 
    3 Haddock comment support was added to GHC as part of a [http://code.google.com/soc Google Summer Of Code] project aiming to port the existing Haddock program to use the GHC API. Thus, GHC understands Haddock comments and they are available through the GHC API. 
     3Haddock comment support was added to GHC as part of a [http://code.google.com/soc Google Summer Of Code] project. The aim of the project was to  port the existing Haddock program to use the GHC API. The project is now over -- GHC can understand Haddock comments and they are available through the GHC API. 
    44 
    55 
    6  
    7  
    8  
    9  
    10  
    11  
    12  
    13  
    14  
    15  
    16  
    17  
    18  
    19  
    20 Due to the SoC initiative GHC can understand Haddock comments 
    21  
    22 During the Summer of 2006 I have been working on this project sponsorized by the[http://code.google.com/soc Google SoC] initiative. My mentors were Simon Marlow and David Himmelstrup (lemmih). 
    23  
    24 It has been a lot of fun, and I've learnt a huge amount of things, but the reader must be warned that I am still a beginner in many aspects, and that my knowledge of ghc is very shallow. So please take my words with a bit of perspective. 
    25  
    26 The contributions of the project have been mainly two: 
    27  * A closure viewer, capable of showing intermediate computations without forcing them, and without depending on types (and of course that excludes dependency on Show instances) 
    28  * To put the basic `breakpoint` primitive to use in a system of dynamic breakpoints for ghci. 
    29  
    30 = The closure viewer = 
    31 The closure viewer functionality is provided by the following function at the GHC module: 
    32 {{{ 
    33 obtainTerm :: Session -> Bool -> Id -> IO (Maybe Term) 
    34 }}} 
    35  
    36 The term datatype is defined at a module `RtClosureInspect` in the ghci folder. This datatype represents a partially evaluated Haskell value as an annotated tree: 
    37 {{{ 
    38 data Term = Term { ty        :: Type  
    39                  , dc        :: DataCon  
    40                  , val       :: HValue  
    41                  , subTerms  :: [Term] } 
    42  
    43           | Prim { ty        :: Type 
    44                  , value     :: String } 
    45  
    46           | Suspension { ctype    :: ClosureType 
    47                        , mb_ty    :: Maybe Type 
    48                        , val      :: HValue 
    49                        , bound_to :: Maybe Name   -- Does not belong here, but useful for printing 
    50                        } 
    51 }}} 
    52  
    53 A few other comments on this module: 
    54  * It is not meant to be included in the stage1 compiler  
    55  * It is imported by GHC, so in order to avoid introducing more cyclical dependencies I've tried to keep all `Session` related stuff in the GHC module. 
    56  
    57 == Implementation details == 
    58 Quoting from Simon Marlow in the ghc-cvs list: 
    59   (..)being in GHCi, we have all the compiler's information about the code to hand - including full definitions of data types.  So for a given constructor application in the heap, we can print a source-code representation of it 
    60  
    61 === DataCon recovery === 
    62 The closure viewer obtains the heap address of a Haskell value, find out the address of its associated info table, and trace back to the DataCon corresponding to this info table. This is possible because the ghc runtime allocates a static info table for each and every datacon, so all we have to do is extend the linker with a dictionary relating the static info table addresses to a DataCon name. 
    63 Moreover, the ghci linker can load interpreted code containing new `data` or `newtype` declarations. So the dynamic linker code is extended in the same way. To sum up: 
    64  * `linker.c` has a new hashtable for datacons. 
    65  * `ghci/Linker.hs` has been extended in a similar way. The Persistent Link State datatype now includes a datacons environment. At `linkExpr` and `dynLinkBCOs` the environment is extended with _any_ new datacons witnessed. 
    66    * Since this scheme makes no distinction between statically and dynamically loaded info tables a lot of redundancy goes into this environment, maybe it's worth to fix this. 
    67  
    68 Two new primitive ops have been created which allow to obtain the address of a closure info table and to obtain the closure payload (i.e. if it is a value, the arguments of the datacon).  
    69 {{{ 
    70 infoPtr# :: a -> Addr# 
    71 closurePayload# :: a -> (# Array# b, ByteArr# #) 
    72 }}} 
    73 The use of these primitives is encapsulated in the `RtClosureInspect` module, which provides:  
    74 {{{ 
    75 getClosureType  :: a -> IO ClosureType 
    76 getInfoTablePtr :: a -> Ptr StgInfoTable 
    77 getClosureData  :: a -> IO Closure 
    78  
    79 data Closure = Closure { tipe         :: ClosureType  
    80                        , infoTable    :: StgInfoTable 
    81                        , ptrs         :: Array Int HValue 
    82                        , nonPtrs      :: ByteArray#  
    83                        } 
    84  
    85 data ClosureType = Constr  
    86                  | Fun  
    87                  | Thunk Int  
    88                  | ThunkSelector 
    89                  | Blackhole  
    90                  | AP  
    91                  | PAP  
    92                  | Indirection Int  
    93                  | Other Int 
    94  deriving (Show, Eq) 
    95 }}} 
    96  
    97 The implementation of the datacon recovery stuff is scattered around: 
    98 {{{ 
    99 Linker.recoverDataCon :: a -> TcM Name 
    100  |- recoverDCInDynEnv :: a -> IO (Maybe Name) 
    101  |- recoverDCInRTS    :: a -> TcM Name 
    102     |- ObjLink.lookupDataCon :: Ptr StgInfoTable -> IO (Maybe String) 
    103 }}} 
    104 First we must make sure that we are dealing with a whnf value (i.e. a Constr), as opposed to a thunk, fun, indirection, etc. This information is retrieved from the very own info table (StgInfoTable comes with a Storable instance, defined at ByteCodeItbls). From here on I will use simply constr to refer to a Constr closure. 
    105  
    106 Once we have the ability to recover the datacon of a constr and thus its (possibly polymorphic) type, we can construct its tree representation. The payload of a closure is an ordered set of pointers and non pointers (words). For a Constr closure, the non pointers correspond to leafs of the tree, primitive unboxed values, the pointers being the so-called subTerms, references to other closures. 
    107  
    108 === Type reconstruction === 
    109 `obtainTerm` recursively traverses all the closures that conform a term. Indirections are followed and suspensions are optionally forced. The only problem here is dealing with types. DataCons can have polymorphic types which we would want to instantiate, so the knowledge of the datacon only is not enough. There are two other sources of type information: 
    110  1. The typechecker, via the `Id` argument to `obtainTerm`. 
    111  2. The concrete types of the subterms, if they are sufficiently evaluated. 
    112  
    113 The process followed to reconstruct the types of a value as much as possible is: 
    114  
    115  1. obtain the subTerms of the value recursively calling `obtainTerm` with the available type info (dataCon plus typechecker), discovering new type info in the process. 
    116  2. refine the type of the value. This is accomplished with a step of unification between (1) and (2) above, and matching the result with the type of the datacon, obtaining the tyvars, which are used to instantiate. This step obtains the most concrete type.  
    117    * Note that tyvars need renaming to avoid collisions. 
    118  3. refine the type of the subterms (inductively) with the reconstructed type.  
    119  
    120  
    121 === About handling suspensions in the interactive environment === 
    122 The interactive ui uses `GHC.obtainTerm` to implement the :print and :sprint command. The difference is that :print, additionally, binds suspended values. 
    123 Thus, suspensions inside semievaluated terms are bound by `:print` to _t,,xx,, names in the interactive environment, available for the user.  
    124  
    125 This is done at `InteractiveUI.pprintClosure`. Whenever the suspensions are not sufficiently typed, tyvars are substituted with the type `GHC.Base.Unknown`, which has an associated Show instance that instructs the user to `seq` them to recover the type.  
    126  
    127 There are two quirks with the current solution: 
    128  * It does not remember previous bindings. Two consecutive uses of `:print` will generate two separate bindings for the same thing, generating redundancy and potential confusion. But... 
    129  * since type reconstruction (for polymorphic/untyped things) can eventually happen whenever the suspensions are forced, it is necessary to use `:print` again to obtain a properly typed binding 
    130    * It is a future work to make ghci do this type reconstruction implicitly on the existing, polymorphic bindings. This would be ''nice'' for the _t,,xx,, things, but even nicer for the local bindings in the context of a breakpoint. 
    131  
    132 === Pretty printing of terms === 
    133 We want to customize the printing of some stuff, such as Integers, Floats, Doubles, Lists, Tuples, Arrays, and so on. 
    134  At the `RtClosureInspect` module there is some infrastructure to build a custom printer, with a basic custom printer that covers the enumerated types. 
    135  
    136 In InteractiveUI.hs the function `pprintClosure` takes advantage of this and makes use of a custom printer that uses Show instances if available. 
    137  
    138 === Recovering non-pointers === 
    139 This happens at `RtClosureInspect.extractUnboxed` and might potentially break in some architectures. 
    140  
    141 = Breakpoints = 
    142  
    143 == `breakpoint`  Implementation == 
    144 When compiling to bytecodes, breakpoints are desugared to 'fake' jump functions, i.e. they are not defined anywhere, later in the interactive environment we link them to something:  
    145 {{{ 
    146 breakpoint => breakpointJump 
    147 breakpointCond => breakpointCondJump 
    148 breakpointAuto => breakpointAutoJump 
    149 }}} 
    150 The types would be: 
    151 {{{ 
    152 breakpointAutoJump, breakpointJump ::  
    153                     Int                         -- Address of a StablePtr containing the Ids 
    154                  -> [()]                        -- Local bindings list 
    155                  -> (String, String, Int)       -- Package, Module and site number 
    156                  -> String                      -- Location message (filename + srcSpan) 
    157                  -> b -> b                  
    158 breakpointCond :: Int -> [()] -> (String,String,Int) -> String -> Bool -> b -> b 
    159 }}} 
    160 They get filled with the pointer to the ids in scope, their values, the site, a message, and the wrapped value in the desugarer. Everything served with the right amounts of unsafeCoerce sauce and TyApp dressing to make the generated Core lint. 
    161  
    162 The site number is relevant only for 'auto' breakpoints, explained later. For the other two types of breakpoints its value should be 0. 
    163  
    164 The desugarer monad has been extended with an OccEnv of Ids to track the bindings in scope. Of course this environment thing is probably too ad-hoc to use it for anything else. The monad also carries a mutable table of breakpoint sites for the current module. This is explained below. 
    165  
    166 === Default HValues for the Jump functions === 
    167 The dynamic linker has been modified so that it won't panic if one of the jump functions fails to resolve. 
    168 Now, if the dynamic linker fails to find a HValue for a Name, before looking for a static symbol it will ask  
    169 {{{ 
    170 DsBreakpoint.lookupBogusBreakpointVal :: Name -> Maybe HValue 
    171 }}} 
    172 which returns a "just return the wrapped thing" if it is one of the Jump names and Nothing otherwise. 
    173  
    174 This is necessary because a TH function might contain a call to a breakpoint function So if the module it lives in is compiled to bytecodes, the breakpoints will be desugared to 'jumps'. Whenever this code is spliced, the linker will fail to find the jumpfunctions unless there is a default. 
    175  
    176 Why didn't I address the problem by forbidding breakpoints inside TH code? I couldn't find an easy solution for this, considering the user is free to put a manual breakpoint wherever. 
    177 Why did I introduce the default as a special case in the linker? 
    178 I considered other options: 
    179  * Running TH splices in an extended link env. This would probably scatter breakpoint related code deep in the typechecker, and is ugly. 
    180  * Making the 'jump' functions real, by giving them equations and types, maybe in the GHC.Exts module. This solution seemed fine but I wasn't sure of how this would interact with dynamic linking of 'jumps'.  
    181  
    182                                     
    183 === A note about bindings in scope in a breakpoint === 
    184 While I was trying to get the generated core for a breakpoint to lint, I made the design decision of not making available the things bound in a recursive group in the breakpoint context. This includes lets, wheres, and mdo notation. The latter case however is not enforced: I haven't found the time to work it out yet. 
    185  
    186  
    187 = Dynamic Breakpoints = 
    188 The approach followed here has been the well known 'do the simplest thing that could possibly work'. We instrument the code with 'auto' breakpoints at event ''sites''. Currently event sites are code locations where names are bound, and statements: 
    189  * let declarations 
    190  * where declarations  
    191  * top level declarations  
    192  * case alternatives  
    193  * lambda abstractions 
    194  * do statements (any variant of them) 
    195  
    196 The instrumentation is done at the desugarer too, which has been extended accordingly. We distinguish between 'auto' breakpoints, those introduced by the desugarer, and 'normal' breakpoints user created by using the `breakpoint` function directly. 
    197  
    198 == Overhead == 
    199 The instrumentation scheme potentially introduces overhead at two stages: compile-time and run-time. Compile-time overhead is unnoticeable for general programs, although there are no benchmarks available to sustain this claim. Run-time overhead is much more noticeable. 
    200 Run-time overhead has been measured informally to range in between 9x and 25x, depending on the code of the program under consideration.  
    201  
    202 With an always-on breakpoints scenario in mind, we do a number of things to mitigate this overhead in absence of enabled breakpoints. One of these is to allow a ghc-api client to disable auto breakpoints via the ghc-api functions: 
    203 {{{  
    204 enableAutoBreakpoints  :: Session -> IO () 
    205 disableAutoBreakpoints :: Session -> IO () 
    206 }}} 
    207  
    208 GHCi would keep breakpoints disabled until the user defines the first breakpoint, and thus for normal use we could keep the -fdebugging flag enabled always. 
    209 The problem is that to make the implementation of `disableAutoBreakpoints` (`enableAutoBreakpoints resp.)  effective at all we need to implement it by relinking the `breakpointJumpAuto` function to a new "do nothing" lambda (to the user-set bkptHandler resp.).  
    210 This would imply a relink, which is quite annoying to a user of GHCi since any top level bindings are lost. This is why this functionality is only a proof of concept and is disabled for now. I wish I had a better understanding of how the dynamic linker and the top level environment in ghci work. 
    211  
    212 We also try to do some simple breakpoint coalescing.  
    213  
    214 === Breakpoint coalescing === 
    215 ''.. implemented, to be documented..'' 
    216  
    217 == Modifications in the renamer == 
    218 This section is easy. There are NO modifications in the renamer, other than removing Lemmih's original code for the `breakpoint` function. All the stuff that we had originally placed here was moved to the desugarer in the final stage of the project. 
    219  
    220 == Modifications to the desugarer == 
    221 ''summarize the code instrumentation stuff'' 
    222  
    223 == Passing the sitelist of a module around == 
    224 ''summarize the modifications made to thread the site list of a module from the renamer to the ghc-api'' 
    225 TcGblEnv is extended with a dictionary of sites and coordinates (TODO: switch the coordinate datatype to the ghc-standard SrcLoc) introduced in the module at the desugarer. 
    226  
    227  
    228 == The `Opt_Debugging` flag == 
    229 This is activated in command-line via `-fdebugging` and can be disabled with `-fno-debugging`. 
    230 This flag simply enables breakpoint instrumentation in the desugarer. 
    231  
    232 `-fno-debugging` is different from `-fignore-breakpoints` in that user inserted breakpoints will still work. 
    233  
    234 == Interrupting at exceptions == 
    235 Ideally, a breakpoint that would witness an exception would stop the execution, no more questions. Sadly, it seems impossible to 'witness' an exception. Throw and catch are essentially primitives (throw#, throwio# and catch#), we could install an exception handler at every breakpoint site but that: 
    236  * Would add more overhead 
    237  * Would require serious instrumentation to embed everything in IO, and thus 
    238  * Would alter the evaluation order 
    239  
    240 So it is not doable via this route. 
    241  
    242 We could try and use some tricks. For instance, in every 'throw' we spot, we insert a breakpoint based on the condition on this throw. In every 'assert' we do the same. But this would see only user exceptions, missing system exceptions (pattern match failures for instance), asynchronous exceptions and others. Which is not acceptable imho.  
    243  
    244 I don't know if a satisfactory solution is possible with the current scheme for breakpoints. 
    245  
    246 == The breakpoints api at ghc-api == 
    247 Once an 'auto' breakpoint, that is a breakpoint inserted by the renamer, is hit, an action is taken. There are hooks to customize this behaviour in the ghc-api. The GHC module provides: 
    248 {{{ 
    249 data BkptHandler a = BkptHandler { 
    250      handleBreakpoint  :: forall b. Session -> [(Id,HValue)] -> BkptLocation a ->  String -> b -> IO b 
    251    , isAutoBkptEnabled :: Session -> BkptLocation a -> IO Bool 
    252    } 
    253 }}} 
    254 '' to be finished'' 
    255  
    256  
    257 = Pending work = 
    258 Call stack traces. 
    259 Interruption at unexpected conditions (expections). 
    260  
    261 ''Put together all the small todos here''