DERVISH Specific Arrays

TCL API

C Routine Interface

Arrays under Dervish

ARRAYs are defined in the header file shCArray.h.

Dervish provides a general ARRAY type object schema to permit the creation and manipulation of n-dimensional arrays. Besides storing array data, ARRAYs allow data to be accessed with C array referencing notation without knowing array dimensions. Row indices are in row-major order, where the rightmost (last) index is the fastest varying index.

ARRAY Structures/Object Schemas

For all arrays, information about the size of the array and where the data is located is maintained in the ARRAYstructure. The following interfaces are provided to create and access ARRAYs:

  • ANSI C
  • Tcl
  • ARRAY Fields

    The ARRAY structure allows access to data in an array. The data area is considered part of (owned by) an ARRAY, especially when the ARRAY is deleted (shArrayDel), as the storage area containing the data is also deleted.

    The description below is from a C point of view, but its use can be easily extended to Tcl and object schemas. The C header file, shCArray.h, contains the ARRAY declaration.

    +-----------+
    | dimCnt    |		Number of dimensions in the data.
    +-----------+---//---+
    | dim       |   ...  |	Array dimensions (in row-major order).
    +-----------+---//---+
    | subArrCnt |   ...  |	Element count in subarray based on
    +-----------+---//---+	   "depth."
    | arrayPtr o+-->	Access to data.  arrayPtr is tree root
    +-----------+------+	which allows array referencing without
    |       dataPtr   o+-->	knowing array bounds. dataPtr points to
    |       -----------+	1st data element (all are contiguous).
    |       schemaType |	Data must be shMalloced. schemaType is
    |       -----------+	    makeio's type for the data.
    | data  size       |	Size (bytes) of an object schema in the
    |       -----------+	    array.
    |       align      |	Alignment factor of 1st object schema
    |       -----------+	    in the array.
    |       incr       |	Address increment between object schemas
    +-----------+------+	    in the array.
    | nStar     |		Amount of indirection outside the ARRAY.
    +-----------+
    | info     o+-->	Optional auxiliary information describing
    +-----------+		the array data.  infoType is makeio's
    | infoType  |		type for the info structure.
    +-----------+
    

    ARRAYs describe the dimensions of an array with dimCnt (number of dimensions) and dim (the actual dimensions). ARRAYs are limited to 34 dimensions (a Dervish compile-time limitation). This should handle most needs, including FITS Binary Tables. Dimensions (dim) are in row-major order, where the last index varies fastest. Indices are 0-indexed.

    If available, data.schemaType describes the object schema type of the data (if it's not available, data.schemaType should be set to the UNKNOWN object schema type). ARRAYs do not restrict the user to primitive types of data. Structures (object schemas) can also be stored within an ARRAY. The data can also be additionally described with an optional information. If available, the infoType field describes which object schema type is pointed to by info.

    nStar indicates the amount of indirection outside the ARRAY. nStar of zero (0) indicates that the data in the ARRAY data area is of data.schemaType. If nStar is one (1), it indicates that the data area is an array of pointers to data of type data.schemaType. In essence, nStar is the number of asterisks (*) in a C declaration for a pointer. For example, if data.schemaType is equivalent to FLOAT and nStar is two (2), the data area contains pointers to pointers to FLOATs. The pointers to the FLOATs and the FLOATs themselves are outside the ARRAY and its data area.

    If no data is available, no hierarchy of array pointers is allocated, nor is any space for data allocated. Both the ARRAY structure members, arrayPtr and data.dataPtr, are set to point to a zero (0) address (null).

    subArrCnt contains the count of array elements at each "level" within the array. The array can be considered to be a tree, where the leaves are the array elements. The intermediate nodes form the hierarchy of array pointers. This is where the concept of "levels" within the array comes from. The sizes of subtrees at different depths within the array can be used to compute array element locations.

    Data Type Conventions

    In general, object schema types are self explanatory. But, there is one common data type that may seem to be used in an unconventional manner, STR, the (null terminated) character string. Consider an example where nStar is 0, that is, all data (the character strings per se) are contained in the ARRAY data area. In that case, the fastest varying (rightmost) index of the dimension is the number of characters in the string.

    The intention is to indicate that the ARRAY is an array of characters, not binary-valued bytes. The use of the STR object schema type to describe the ARRAY contents achieves this differentiation between UCHAR and SCHAR, a numeric data type, and characters themselves. Still, because the ARRAY is still an array of characters, the datum size and increment (data.size and data.incr respectively) reflect the size of a single character and the spacing between single characters, rather than the complete character string. (The alignment, data.align, is the same for a single character and a character string, so the user does not need to make any distinction there.)

    Now consider an example where nStar is 1 and data.schemaType is STR. In this case, the ARRAY data area contains pointers to character strings. The datum size, data.size reflects the size of a pointer, not of a character or character string.

            Realm of the ARRAY proper
    .--------------------------------------.
    |  object schema            data area  |
         |                          |
         V      .                   V
    |           :        |      +-------+  --. dimCnt & dim control
    +-----------+--------+ .--->|       |    | indexing to data area,
    | dimCnt    |          |    +-------+    | but not the final data
    +-----------+---//---+ |    /   .   /    | (such as a character
    | dim       |   ...  | |    /   :   /    | from a string).
    +-----------+---//---+ |    +-------+  --'
    | arrayPtr o+----------' .->|    o--+-.    ARRAY data area has
    +-----------+------+     |  +-------+ |    pointers to char.
    |       dataPtr   o+-----'  | null  | |
    |       -----------+        +-------+ |   +---+---+---+------+
    | STR = schemaType |        /   .   / `-->| a | b | c | \000 |
    |       -----------+        /   :   /     +---+---+---+------+
    /           .      /        +-------+
    /           :      /        |    o--+---.    +---+---+------+
    +-----------+------+        +-------+    `-->| y | z | \000 |
    | nStar = 1 |                                +---+---+------+
    +-----------+                            |                    |
    |     .     |                            `--------------------'
          :                          Final data: the strings exist
                                     outside the ARRAY data area.
    

    Accessing Data

    Data is accessed through the arrayPtr and data fields of the ARRAY structure. These two fields describe arbitrary n-dimensional arrays and allow access to them.

    The notation used throughout follows C's row-major ordering of array elements, that is, the last subscript varies most rapidly. The analogy to C arrays applies only to notation (other limitations with respect to the notation also apply). Since the ARRAY format is generic where array dimensions are not known until run-time, arrays are stored in such a manner that regular C syntax can be used to access these arrays without knowing their dimensions at compilation-time (thus, they're not stored the same a C arrays).

    Hierarchy of Array Pointers

    The C concept of an array name representing a pointer to the array is used to accomplish this type of addressing. For n-dimensional arrays, n sets of pointers are used. These array pointers are arranged in a hierarchical fashion, with n-1 levels of pointers. The nth level contains the array data. The top level (first set) of pointers are associated with the array name. For a 1-dimensional array, there is only one pointer to the start of the data itself. For an n-dimensional array, the first set of pointers point to the second set of pointers. For a 2-dimensional array, this second set of pointers point to the starts of data for 1-dimensional arrays. Otherwise, the second set of pointers (next to top level set) point to the third set of pointers, and so forth.

    Consider a general example of an an n-dimensional array with dimensions (in row-major order)

         (d , d , ..., d   )
           0   1        n-1
    
    The hierarchy of array pointers would be linked as shown below. The bottom level contains the actual data. level +---+---+-//-+----+ ----- |[0]|[1]| ...| d | | | | | 0 | 0 +-o-+-o-+-//-+-o--+ | | | .------------' | V V +---+---+-//-+----+ +---+---+-//-+----+ |[0]|[1]| ...| d | ... |[0]|[1]| ...| d | | | | | 1 | | | | | 1 | 1 +-o-+-o-+-//-+-o--+ +-o-+-o-+-//-+-o--+ | | | | | | . . . : : : .--...-' .--...-' V V +---+---+-//-+----+ +---+---+-//-+----+ |[0]|[1]| ...|d | ... |[0]|[1]| ...|d | d | | | | n-2| | | | | n-2| n-2 +-o-+-o-+-//-+-o--+ +-o-+-o-+-//-+-o--+ | | | | | | | | | .----' `-----------. .-------' V V V +---+---+-//-+----+ +---+---+-//-+----+ +---+---+-//-+----+ |[0]|[1]| ...|d | |[0]|[1]| ...|d | ... |[0]|[1]| ...|d | d +---+---+-//-+-n-1+ +---+---+-//-+-n-1+ +---+---+-//-+-n-1+ n-1
    Bouncing Down the Hierarchy of Array Pointers

    It's possible to access array elements by "bouncing" down the hierarchy of array pointers. The C code example below shows how this can be done. array is a pointer to the ARRAY. idx is an array of all indices.

    /* * Bounce down the hierarchy of pointers to find the first element that * we're interested in. * * o It's assumed that unspecified (trailing) array indices were set * to zero. */ elemPtr = ((unsigned char *)array->arrayPtr; for (dimIdx = 0; dimIdx < (array->dimCnt - 1); dimIdx++) { elemPtr = ((unsigned char **)elemPtr)[idx[dimIdx]]; } /* * o Apply the fastest changing (last) index. */ elemPtr += (array->data.incr * idx[array->dimCnt-1]);

    Because of the physical layout of ARRAY data, it's also possible to compute the location of an ARRAY element give a set of indices.

    Describing Scalars and Arrays by Example

    At a minimum, valid ARRAYs must be 1-dimensional. The n-dimensional examples in subsequent sections show the user declaring myArray in order to reference array data, but never setting its value (pointing it off to data). Consider the following example, in C, where array points to a ARRAY structure:

    #include "shCArray.h" short int **myArray; /* We know a priori this is a 2-D array */ . /* of shorts. */ : myArray = ((short int **)array->arrayPtr); . : ... myArray[y][x] ... array->arrayPtr should be used to initialize myArray in all cases. array->data.dataPtr, the pointer to the first byte of data in the array, should not be used. If C's array referencing notation is to be used, array->data.dataPtr will work properly only if the data is 1-dimensional. array->arrayPtr will work for any n-dimensional array.

    1-Dimensional Arrays

    A 1-dimensional array of i elements is stored as follows:

    +-------+ +-----+ myArray: | o---|---->| [0] | +-------+ +-----+ | [1] | +-----+ / . / / : / +-----+ |[i-1]| +-----+ where the C declaration int *myArray; /* Ptr to (array of) int */ will allow element x to be accessed with a C expression such as ... myArray[x] ...

    2-Dimensional Arrays

    A 2-dimensional array of j by i elements is stored as follows:

    +-------+ +-------+ +-----+-----+--//--+-----+ myArray: | o---|---->|[0] o-|---->| [0] | [1] | ... |[i-1]| +-------+ +-------+ +-----+-----+--//--+-----+ |[1] o-|--. +-------+ | +-----+-----+--//--+-----+ | . | `->| [0] | [1] | ... |[i-1]| / : / +-----+-----+--//--+-----+ | | +-------+ +-----+-----+--//--+-----+ |[j-1]o-|---->| [0] | [1] | ... |[i-1]| +-------+ +-----+-----+--//--+-----+ where the C declaration int **myArray; /* Ptr to ptr to (array of) int */ will allow element [y,x] to be accessed with a C expression such as ... myArray[y][x] ... If myArray were declared as char **, an array of character strings, the above example will access one character from a character string. The whole string is referenced with ... myArray[y] ...

    3-Dimensional Arrays

    A 3-dimensional array of k by j by i elements is stored as follows:

    +-----+ +-----+ +-----+ .->| [0] | | [0] |<-. .->| [0] | | +-----+ +-----+ | | +-----+ | | [1] | | [1] | | | | [1] | | +-----+ +-----+ | | +-----+ | / . / / . / | | / . / | / : / / : / | | / : / | +-----+ +-----+ | | +-----+ | |[i-1]| |[i-1]| | | |[i-1]| | +-----+ +-----+ | | +-----+ +-------+ +-------+ `-------. .---' `-------. myArray: | o---|---->|[0] o-|--- | | | +-------+ +-------+ +-----|-+-----|-+--//--+-----|-+ |[1] o-|---->|[0] o |[1] o | ... |[j-1]o | +-------+ +-------+-------+--//--+-------+ | . | / : / +-------+-------+--//--+-------+ | | .->|[0] o |[1] o | ... |[j-1]o | +-------+ | +-----|-+-----|-+--//--+-----|-+ |[k-1]o-|--' | | | +-------+ V V V +-----+ +-----+ +-----+ | [0] | | [0] | | [0] | +-----+ +-----+ +-----+ | [1] | | [1] | | [1] | +-----+ +-----+ +-----+ / . / / . / / . / / : / / : / / : / +-----+ +-----+ +-----+ |[i-1]| |[i-1]| |[i-1]| +-----+ +-----+ +-----+ where the C declaration int ***myArray; /* Ptr to ptr to ptr to (array of) int */ will allow element [z,y,x] to be accessed with a C expression such as ... myArray[z][y][x] ...

    4-Dimensional Arrays

    A 4-dimensional array of m by k by j by i elements is stored as follows:

    +-----+--//--+-----+ +-----+--//--+-----+ | [0] | ... |[i-1]|<-. +-------+ +-------+ .->| [0] | ... |[i-1]| +-----+--//--+-----+ `-|-o [0]|<-. .->|[0] o-|-' +-----+--//--+-----+ +-------+ | | +-------+ +-----+--//--+-----+ .-|-o [1]| | | |[1] o-|-. +-----+--//--+-----+ | [0] | ... |[i-1]|<-' +-------+ | | +-------+ `->| [0] | ... |[i-1]| +-----+--//--+-----+ / . / | | / . / +-----+--//--+-----+ / : / | | / : / +-----+--//--+-----+ +-------+ | | +-------+ +-----+--//--+-----+ | [0] | ... |[i-1]|<----|-o[j-1]| | | |[j-1]o-|---->| [0] | ... |[i-1]| +-----+--//--+-----+ +-------+ | | +-------+ +-----+--//--+-----+ | `---------------. +-------+ +-------+ `---. | myArray: | o---|---->|[0] o-|--- | | +-------+ +-------+ +-----|-+--//--+-----|-+ |[1] o-|---->|[0] o | ... |[k-1]o | +-------+ +-------+--//--+-------+ | . | / : / +-------+--//--+-------+ | | .->|[0] o | ... |[k-1]o | +-------+ | +-----|-+--//--+-----|-+ |[m-1]o-|--' | | +-------+ .---' | | .---------------' +-----+--//--+-----+ | | +-----+--//--+-----+ | [0] | ... |[i-1]|<-. +-------+ | | +-------+ .->| [0] | ... |[i-1]| +-----+--//--+-----+ `-|-o [0]|<-' `->|[0] o-|-' +-----+--//--+-----+ +-------+ +-------+ +-----+--//--+-----+ .-|-o [1]| |[1] o-|-. +-----+--//--+-----+ | [0] | ... |[i-1]|<-' +-------+ +-------+ `->| [0] | ... |[i-1]| +-----+--//--+-----+ / . / / . / +-----+--//--+-----+ / : / / : / +-----+--//--+-----+ +-------+ +-------+ +-----+--//--+-----+ | [0] | ... |[i-1]|<----|-o[j-1]| |[j-1]o-|---->| [0] | ... |[i-1]| +-----+--//--+-----+ +-------+ +-------+ +-----+--//--+-----+ where the C declaration int ****myArray; /* Ptr to ptr to ptr to ptr to (array of) int*/ will allow element [t,z,y,x] to be accessed with a C expression such as ... myArray[t][z][y][x] ...

    Limitations in Using the C Array Notation

    As mentioned above, the C notation for accessing arrays used here only reflects the access of data, not the layout of data in memory. This notation has some other restrictions that apply:

    Limiting the Number of Dimensions

    The maximum number of ARRAY dimensions is limited to 34. This value should be large enough for most applications. The value was chosen based on the practical number of dimensions that a FITS Binary Table could support. Because of the FITS header line size (80 bytes) and the format of the TDIMn keyword, FITS is practically limited to 33 dimensions. The additional dimension permits the slowest varying (first) index to reference the Table row.

    Physical Layout of Array Data in Memory

    Now that the limitations in using the C array notation have been described, they can be relaxed a bit. Data is stored in a C fashion (row-major) with the fastest varying indexed elements being stored adjacent to each other, then the next fastest varying indexed array is stored, etc. Consider an example of a 3-dimensional array of k by j by i elements (addresses increase left to right and top to bottom):

    k .-- j-index [0] ---.-- j-index [1] ---. .-- j-index [j-1] -. index | | | | | | V V V V V V .--> +-----+--//--+-----+-----+--//--+-----+--//--+-----+--//--+-----+ [0] | | [0] | ... |[i-1]| [0] | ... |[i-1]| ... | [0] | ... |[i-1]| +--> +-----+--//--+-----+-----+--//--+-----+--//--+-----+--//--+-----+ [1] | | [0] | ... |[i-1]| [0] | ... |[i-1]| ... | [0] | ... |[i-1]| +--> +-----+--//--+-----+-----+--//--+-----+--//--+-----+--//--+-----+ . / . / : / : / +--> +-----+--//--+-----+-----+--//--+-----+--//--+-----+--//--+-----+ [k-1] | | [0] | ... |[i-1]| [0] | ... |[i-1]| ... | [0] | ... |[i-1]| `--> +-----+--//--+-----+-----+--//--+-----+--//--+-----+--//--+-----+ If the 3-dimensional array size is known at compilation time, for example consider a function argument formally declared as int myArray[][j][i]; /* 3-dimensional array w/ unknown dimension */ then the normal C array referencing notation can also be used ... myArray[z][y][x] ... Notice that the slowest varying dimension does not need to be known at compilation (as the slowest varying index can be left unspecified in C). If the source was compiled with array bounds checking, the j and i bounds will be checked, but the unknown dimension index will not be checked.
    Computing Array Element Locations

    Rather than bouncing down the hierarchy of array pointers, it is possible to compute the location of an array element, given its indices (0-indexed for this description). ARRAY's subArrCnt contains the count of array elements at each "level" within the array. There are dimCnt valid elements in subArrCnt (just as for dim).

    The level (0-indexed) refers to the depth within the hierarchy of array pointers, where level n-1 contains the leaves, namely the array data itself. The level also refers to the number of array indices provided, less 1. Consider a 3-dimensional array with dimensions (2, 3, 2) (in row-major order):

         level
         -----
          -1            ___.___
                       /       \
                      /         \
           0        _0_         _1_  <-----+- slowest varying indices
                   /   \       /   \       |
                  /     \     /     \      |  hierarchy of array
           1     0---1---2   0---1---2     |  pointers (arrayPtr)
                /_\ /_\ /_\ /_\ /_\ /_\    |
                0 1 0 1 0 1 0 1 0 1 0 1  <-+- fastest varying indices
               +-+-+-+-+-+-+-+-+-+-+-+-+ --.
           2   | | | | | | | | | | | | |   |  data (data.dataPtr)
               +-+-+-+-+-+-+-+-+-+-+-+-+ --'
    
    At level l (0 <= l < dimCnt), there are dim[i] subarrays. In the above example, at level 0, there are two subarrays. If one index is provided, it explicitly refers to only one subarray at level 0 (as the index references only the slowest varying index). Passing two indices references only one of six subarrays at level 1. Passing three indices references only one of 12 array elements at level 2. Level -1 does not exist in ARRAY's subArrCnt. But, providing no indices to reference an entire array is equivalent to accessing a subarray at level -1.

    For an n-dimensional array with dimensions (in row-major order)

         (d , d , ..., d   )
           0   1        n-1
    
    each subarray size as an array element count, a, at level l (0 <= l < n-1) is
              ____n-1
               ||
         a  =  ||    d
          l     ||     j
                  j=l+1
    
    By definition,
         a  =  1
          n-1
    

    Because array elements are layed out sequentially, the array may be treated 1-dimensionally. Given a set of n indices, x, into the n-dimensional array

         (x , x , ..., x   )
           0   1        n-1
    
    a 1-dimensional index, i, can be computed
                    n-2
                .---.
          x   +  \     a * x
           n-1   /      k   k
                '---'
                    k=0
    
    If fewer than n indices were passed for x, the remaining indices should be treated as zeros (as it's 0-indexed) to locate the starting array element of a subarray. The above expression degenerates by changing the upper bound on the sum to the number of passed indices, j, less 1 and dropping the initial addend:
                    j-1
                .---.
                 \     a * x
                 /      k   k
                '---'
                    k=0
    
    In either case, multiplying i by data.incr results with the array offset (from data.dataPtr) for the element or subarray (respectively) indexed by x.

    These computations can be done using subArrCnt. For example, the following C code returns elemOff, the 0-indexed offset (expressed in array element units) to the start of the referenced subarray. idxCnt is the number of indices (in array idx) passed by the caller (which must be less than the number of array dimensions) and array points to the desired ARRAY:

    if (idxCnt == array->dimCnt) { elemOff = idx[idxCnt-1]; dimIdx = idxCnt - 2; } else { elemOff = 0; dimIdx = idxCnt - 1; } for ( ; dimIdx >= 0; dimIdx--) { elemOff += (array->subArrCnt[dimIdx] * idx[dimIdx]); }

    The number of elements within a subarray, subArrSize, referenced by idxCnt indices can be gotten with:

    subArrSize = (idxCnt > 0) ? array->subArrCnt[idxCnt-1] : array->subArrCnt[0] * array->dim[0]; Specific handling for the case where no indices are passed is needed, as level -1 does not exist.

    Given a 0-indexed 1-dimensional offset, elemOff, into an n-dimensional array, the corresponding n indices can be computed. Using subArrCnt, the following C code illustrates how this can be accomplished:

    elemIdx = elemOff; for (dimIdx = 0; dimIdx < (array->dimCnt - 1); dimIdx++) { indices[dimIdx] = elemIdx / array->subArrCnt[dimIdx]; elemIdx %= array->subArrCnt[dimIdx]; } indices[dimIdx] = elemIdx; /* Use final remainder */ All variables are signed integers.

    As mentioned before, STR object schema types are handled somewhat unusually. For STRs, the subarray count represents the total number of characters (including null terminators) in the subarray rather than the number of character strings.

    Memory Use

    For an n-dimensional array with dimensions (in row-major order)

         (d , d , ..., d   )
           0   1        n-1
    
    the number of pointers used in the hierarchy of array pointers is
             n-2
         .---.  ____ j
          \      ||    d
          /      ||     i
         '---'      i=0
             j=0
    
    The number of data elements is
                ____n-1
                 ||
                 ||    d
                 ||     i
                    i=0
    

    As mentioned before, STR object schema types are handled somewhat unusually. For STRs, the data element count represents the total number of characters (including null terminators) in the array rather than the number of character strings.