wiki:Commentary/ResourceLimits

Version 14 (modified by ezyang, 8 months ago) (diff)

--

Resource Limits

This page describes a proposed resource limits capabilities for GHC. The idea is to give users the ability to create and utilize resource containers inside programs, and then provide in-program access to heap census and other information. The semantics of resource containers are quite similar to cost centers used in profiling, except that they do not have "stack" semantics (more on this later). The end result is the ability to impose resource limits on space usage.

Code generation changes

Resource limits is a new way (similar to profiled and dynamic). Here are the relevant changes:

Dynamic closure allocation

compiler/codeGen/StgCmmHeap.hs:allocDynClosureCmm (via StgCmmCon?, also handles StgCmmBind:mkRhsClosure/cgRhsStdThunk. link_caf needs special treatment.)

// profDynAlloc rep use_cc
// use_cc == CurCCS
         I64[CCCS + 72] = I64[CCCS + 72] + %MO_UU_Conv_W64_W64(4 - 2);
// ALLOCATE THE OBJECT
// emitSetDynHdr base info_ptr use_cc
         I64[Hp - 24] = Data.Maybe.Just_con_info; // info_ptr
         I64[Hp - 16] = CCCS;                     // use_cc
         I64[Hp - 8] = (%MO_UU_Conv_W32_W64(I32[era]) << 30) | 0; // dynLdvInit
// let (cmm_args, offsets) = unzip amodes_w_offsets
// hpStore base cmm_args offsets
         I64[Hp + 0] = I64[R1 + 32];

Changes to:

// invariant: Hp points to nursery of current resource container
         I64[Hp - 8] = Data.Maybe.Just_con_info; // info_ptr
         I64[Hp + 0] = I64[R1 + 32];

CAF Allocation

compiler/codeGen/StgCmmBind.hs:link_caf

Warning: The rest of this document describes an old iteration of the system, which directly used

Front-end changes

The basic idea behind this patch is that data collected during profiling can also be used at runtime to enforce limits. So most of the API involves (1) dynamically setting cost-centres, which GHC uses to do profiling, and (2) querying and receiving callbacks when certain events happen during profiling. Costs can be collected anywhere you could have placed an SCC annotation statically.

-- | A cost-centre; not garbage-collected.
data CostCentre

-- | A cost-centre stack.  Cost-centres are composed into cost-centre
-- stacks, for which costs are actually attributed.  Cost-centre stacks
-- are not garbage-collected.
data CostCentreStack

-- | A listener on a cost-centre stack.  Active listeners are considered
-- roots, so be sure to unlisten when you are done.
data Listener

-- | Type of profiling information to track.  We currently support two
-- types: instantaneous heap residence, and overall memory allocation
-- (which is monotonically increasing).
data ProfType = Resident | Allocated

-- | Allocates a new cost-centre.
newCC :: IO CostCentre

-- | Pushes a cost-centre onto a new cost-centre stack.  This function
-- is memoized, so if you push the same CostCentre onto the same CostCentreStack, you will
-- get the same CostCentreStack back.
pushCC :: CostCentreStack -> CostCentre -> IO CostCentreStack

-- | Attaches a listener to a cost-centre.  The resolution of the
-- listener depends on the type and runtime options.  For resident
-- memory listeners, listeners are checked whenever a heap census is run
-- (which is controllable using @-i@).  For allocated memory listeners,
-- listeners are checked every GC.  When you are no longer interested
-- in events from a listener, make sure you unregister the listener, or
-- you will leak memory.
listenCCS :: CostCentreStack -> ProfType -> Int -> IO () -> IO Listener

-- | Unregisters a listener, so that it action will no longer be run.
unlistenCCS :: Listener -> IO ()

-- | Sets the cost-centre of a object on the heap.
setCCSOf :: CostCentreStack -> a -> IO ()

-- | Runs an IO action with the CostCentreCS set to @ccs@.
withCCS :: CostCentreStack -> IO a -> IO a

-- | Allocates a new dynamic cost-centre stack; generally, if you want
-- something to check usage, this is what you want.
newCCS :: IO CostCentreStack

-- | Queries for memory statistics about a cost-centre stack.
queryCCS :: CostCentreStack -> ProfType -> IO Int

-- | Root cost-center stack for dynamically allocated cost center
-- stacks.
ccsDynamic :: CostCentreStack

The general usage of this API goes like:

f n =
    let xs = [1..n::Integer]
    in  sum xs * product xs

newCCS :: IO CCS
newCCS = pushCC ccsDynamic =<< newCC

main = do
  m <- newEmptyMVar
  forkIO $ do
    x <- newCCS
    tid <- myThreadId
    l <- listenCCS x 2000 (putStrLn "Too much memory is being used" >> killThread tid)
    withCCS x $ do
      evaluate (f 20000)
    unlistenCostCentreStack l
    putMVar m ()
  takeMVar m

Another use-case is more fine-grained SCCs based on runtime properties, not source-level features.

I am planning on providing semantics, based on GHC’s current profiling semantics. Notice that this current story says nothing about RETAINERS (so there is some careful library writing to be done, to prevent untrusted code from holding onto large system structures for too long).

Some points to bikeshed:

  • Instead of the current interface, we could publish STM variables which are updated by the GC; listening is then just an ordinary STM transaction. This might be tricky to implement.

Runtime changes

  • Listener is a new garbage collected object; we expect it can be implemented as a simple PRIM using techniques similar to StgMVarTSOQueue.
  • Checks for listeners occur during heap census; you'll need to pass the -hc flag for this machinery to do anything. Actually, we added a new flag -hl which is -hc but without writing an .hp file. See also #7751 which will dramatically improve performance. Right now, running heap census is very slow; if we can make censuses incremental their cost can be amortized with ordinary garbage collector behavior.

Commentary

CAFs

A CAF is never attributed to a dynamic CCs, because CAFs are always get their own cost-centre for their evaluation. This is correct in a sense, because multiple dynamic CCs can attempt to evaluate a CAF, and we should not unduly penalize the first one to evaluate the CAF. However, this means you have to be very careful about optimizations that introduce CAFs that were not present at the source level, e.g. -ffull-laziness. Dynamically loaded CAFs from untrusted code, of course, need to be relabeled appropriately.

Interaction with traditional profiling

Resource limits must be compiled with -prof; we’d like to treat profiling as semantics preserving but resource limits are anything but. In the long term, it is probably desirable to consider these distinctive systems which happen to share a common mechanism. As a first draft, we don’t intend on supporting profiling and resource limits simultaneously; the dynamic SCC machinery can be used for enhanced profiling or for marking resource limits, but not both. It may be possible to extend the resource limit machinery to handle “superfluous” cost-centres, but this would be more complicated and costly, since a costs will now be spattered over many CostCentre objects and need to be recombined. Currently, the profiling machinery can perform this calculation, but only calculates inherited resource usage at the very end, so this could be expensive.

Since non-dynamic SCCs can interfere with accurate cost attribution, we add a new flag -fprof-drop which drops all SCC pragmas.

Memory leaks

One of the most important intended use-cases of resource limits is when you are rapidly loading and unloading large amounts of untrusted code (think http://tryhaskell.org/). So an important thing to get right is avoiding long term memory leakage, either from leftover objects from the untrusted code or related infrastructure.

On the unloading code front, one technique that could be employed is to replace all third-party closures with “error” frames upon unloading. Similar techniques are already being employed in GHC, and it is semantically sound even if another thread has already witnessed the full value of the data structure: one can imagine some supervisor process sending an asynchronous exception when some unloaded data is accessed. (XXX this may have bad interactions with mask and uninterruptibleMask).

On the cost centre front, the runtime currently assumes that cost centres are permanent and never deallocated. One technique for deallocating a cost-centre goes as follows. We first allocate a distinguished “catch-all” cost-centre which tracks all deallocated cost centres. When we would like to deallocate a cost centre, we mark the cost centre as killed, and upon the next major garbage collection, we look at the cost-centres pointed to by all of the heap objects and rewrite them if they correspond to a killed cost-centre. We also donate all of the cost-centre’s statistics to the catch-all. It is also necessary to rewrite any in-Haskell references to the cost-centre (so we need a new infotable to mark these references.) Once this is done, we’ve removed all references to the cost-centre and it can be dropped. (This is not quite true; CC_LIST and any cost-centre stacks also have to be updated.)

Callback triple fault

Finalizer could trigger a new finalizer, ad infinitum. However, if you don't allocate a new finalizer in the callback, you should be fine.

Discussion

These mailing list threads may be of interest: