Memory Management and Garbage Collection

TCL API

C Routine Interface

General Tools

Introduction

Dervish contains a specialized memory manager and garbage collector. Dervish memory manager can be used to replace entirely the system memory manager (malloc() and friends), or it can be used in conjunction with the system memory manager. Furthermore, the Dervish memory manager does not preclude one from using a third party memory manager. The only stipulation is that one should not mix-and-match calls between memory managers; for instance, one should not attempt to use the Dervish memory manager routine to deallocate a memory block that was allocated using a call to the system memory manager. Consistency is the key here.

It is recommended (required? after all, this is still a democracy) that all framework and pipeline code use Dervish memory manager calls. In addition to making code uniform, this also has the added benefit of collecting interesting statistics about how memory is being used during the a pipeline run. These statistics can be analyzed to figure out memory requirements of a pipeline.

Description of the Dervish Memory Manager

Dervish memory manager works by maintaining two (or sometimes three) lists or pools: the Allocated Memory Pool (AMP), the Free Memory Pool (FMP), and optionally a pool of memory allocated from the operating system but not currently an active part of Dervish's memory management system, the MMP (Malloced Memory Pool). the AMP holds all memory blocks currently being used. FMP holds all memory blocks that have been freed, but not returned to the MMP. Note that freeing memory does not imply returning it back to the operating system, rather the freed block is maintained in the FMP or MMP for later re-use.

The basic algorithm is as follows:

Garbage Collection and Memory Leaks

Garbage collection is the ability of the operating environment to ensure that unused memory blocks are deallocated automatically and returned to the operating environment. Unfortunately, garbage collection is difficult using the C/C++ programming environments since they put the onus of memory management upon the user.

The Dervish Memory Manager provides some degree of help to the user in the area of garbage collection. Each block allocated by Dervish memory management routines contains a unique number: it's serial number. Serial numbers increase in time, thus successively allocated memory blocks will have successively larger serial numbers; this can be used to ensure that there are no memory `leaks'; blocks of allocated memory that are no longer being used (and may have been totally lost, the classic reason being that the only record of their existence having been a local variable in a subroutine).

Garbage collection can then be performed by calling an API that deallocates all memory blocks between two serial numbers (inclusively). Dervish memory allocation API, shMemFreeBlocks() does exactly that. Note well that this will cause havoc if some of the blocks in the range are actually still in use; it's much safer to use Dervish's ability to list allocated blocks to check that no memory is unaccounted for.

Defragmentation

Memory Fragmentation is the tendency for memory managers to reduce the memory in their charge to a very large number of hopelessly small pieces; the result of course is that they have to repeatedly return to the operating system for more large memory blocks.

Dervish is not immune from this disease, although the structure of the FMP was designed to minimise it. Fortunately there is an API to minimise its effects, memDefragment; it has been used to reduce the memory appetite of SDSS pipelines by a factor of two or so.

Other interesting features of Dervish Memory Management

In addition to tracking allocated memory and allowing statistical analysis, the Dervish Memory Manager provides a host of other interesting features as well. Primary among them are two abilities:
Referencing
Dervish Memory Manager provides the ability for multiple pointers to reference the same object without the fear of accidentally (or intentionally) deleting the object and ending up with dangling pointers. This is called referencing and two APIs are provided for this purpose: shMemRefCntrIncr() and shMemRefCntrDecr()
Call back mechanism
Dervish Memory Manager allows the user to register a call back function to the allocation of a certain serial number. When a memory block with that serial number is allocated, the registered call back function is triggered. shMemSerialCB() is provided for registering the callback function; additionally you can get a callback when a given block is freed -- see the discussion under Debugging Memory Problems.

Dervish Memory Management and Garbage Collection routines have both C and TCL bindings.