Changes between Version 1 and Version 2 of GarbageCollectorNotes


Ignore:
Timestamp:
May 15, 2006 12:07:08 PM (9 years ago)
Author:
guest
Comment:

--

Legend:

Unmodified
Added
Removed
Modified
  • GarbageCollectorNotes

    v1 v2  
    1313 
    1414== The Scheduler == 
    15 Most of the interesting things related to scheduling and multithreading in Haskell centers around the function “schedule” that is define in Schedule.c 
     15Most of the interesting things related to scheduling and multithreading in Haskell center around the function schedule() that is define in Schedule.c. This is the part of schedule that take a thread from the run and decides what to do with it.  
    1616 
    1717{{{ 
     
    5050}}} 
    5151 
    52 The scheduler picks up a thread off the run queand decides what to do with it. If it is runnable, then it called StgRun to run it. It then later examines the “ret” value that is set to see why the the thread stopped.  
     52The scheduler picks up a thread off the run queand decides what to do with it. If it is runnable, then it calles the function StgRun() to run it. At the end of the code block, the variable “ret” is set to indicate why the the thread stopped.  
    5353 
    54 Haskell threads are not time sliced via timer in the way that OS threads are. [cross check is there is some time sliced mechanism] Instead they are interreupted by certain commonly occuring events. Due to the lazy nature of Haskell thunks need to be created and values need to be computed very often. Hence the execution of a thread entails lots of of memory allocation. Whenever a thread has run out of space in its current block, it returns control back to the scheduler.  
     54Haskell threads are not time-sliced via a timer (potentially a time rinterrupt) the way OS threads are [cross check if there is some time sliced mechanism]. Instead they are interreupted by certain commonly occuring events. Due to the lazy nature of Haskell thunks need to be created and values need to be computed very often. Hence the execution of a thread entails lots of of memory allocation. One of the ways the execution of a thread is interrupted is when a thread has run out of space in its current block - it then returns control back to the scheduler.  
     55 
     56A GHC block is a 4k page that is page aligned for the OS VM system.   
     57 
     58Here is what the scheduler does with the "ret" -  
    5559 
    5660{{{ 
     
    8589}}} 
    8690 
    87 The scheduleHandleHeapOverflow(cap,t) call decides to give the thread another block, (or a set of blocks if the thread was asking for allocation of a large object – an object that is larger than a block). If however the scheduleHandleHeapOverflow() function feels that there is not enough free block left, then it decides to Garbage Collect. 
     91The scheduleHandleHeapOverflow(cap,t) call decides to give the thread another block, (or a set of blocks if the thread was asking for allocation of a large object (a large object is one that is larger than a block). If the scheduleHandleHeapOverflow() function feels that there aren't enough free blocks left, it decides to Garbage Collect. This is the point at which everything else stops and the GC kicks in.  
    8892 
    8993 
     
    9498 
    9599== GC Data Structures == 
     100 
     101=== Blocks and MegaBlocks === 
     102 
     103The GC allocates memory from the OS in 1 Mb sized chunks, called mega blocks, which it then divides into pages and manages itself. Each 4k page is called a block and is associated with a block descripter (the abbrevaition BD is used a lot). The BDs for all the blocks on the are stroed at the start of the mega block.  
     104 
     105The BD keeps information about a block. This the BD defintion from blocks.h in the RTS. 
     106 
     107{{{ 
     108typedef struct bdescr_ { 
     109  StgPtr start;                 /* start addr of memory */ 
     110  StgPtr free;                  /* first free byte of memory */ 
     111  struct bdescr_ *link;         /* used for chaining blocks together */ 
     112  union {  
     113      struct bdescr_ *back;     /* used (occasionally) for doubly-linked lists*/ 
     114      StgWord *bitmap; 
     115  } u; 
     116  unsigned int gen_no;          /* generation */ 
     117  struct step_ *step;           /* step */ 
     118  StgWord32 blocks;             /* no. of blocks (if grp head, 0 otherwise) */ 
     119  StgWord32 flags;              /* block is in to-space */ 
     120#if SIZEOF_VOID_P == 8 
     121  StgWord32 _padding[2]; 
     122#else 
     123  StgWord32 _padding[0]; 
     124#endif 
     125} bdescr; 
     126}}} 
     127 
     128Most of the fields ina BD are self explanatory. Let me add a few quick decriptions however. Each bdescr or BD maintains its start address and a "free" pointer used to indicate the next free location in the block. Each block also knows about what generation it belongs to and has a pointer to the step it belongs to (more about generations and steps later).  
     129 
     130If a large object is allocated and a block is a part of a large object, then the first block is has a count of the number of blocks that are part of the object. The link list of blocks making up the object is maintained by the link pointer. [This may not be entirely correct - I will come back to this later]. 
     131 
     132 
     133 
     134 
     135 
     136 
     137 
    96138== Scavenging == 
    97139== Copy ==