ARRAYs are defined in the header file shCArray.h.
Dervish provides a general ARRAY type object schema to permit the creation and manipulation of n-dimensional arrays. Besides storing array data, ARRAYs allow data to be accessed with C array referencing notation without knowing array dimensions. Row indices are in row-major order, where the rightmost (last) index is the fastest varying index.
For all arrays, information about the size of the array and where the data is located is maintained in the ARRAYstructure. The following interfaces are provided to create and access ARRAYs:
The ARRAY structure allows access to data in an array. The data area is considered part of (owned by) an ARRAY, especially when the ARRAY is deleted (shArrayDel), as the storage area containing the data is also deleted.
The description below is from a C point of view, but its use can be easily extended to Tcl and object schemas. The C header file, shCArray.h, contains the ARRAY declaration.
+-----------+ | dimCnt | Number of dimensions in the data. +-----------+---//---+ | dim | ... | Array dimensions (in row-major order). +-----------+---//---+ | subArrCnt | ... | Element count in subarray based on +-----------+---//---+"depth." | arrayPtr o+--> Access to data.
arrayPtr
is tree root +-----------+------+ which allows array referencing without | dataPtr o+--> knowing array bounds.dataPtr
points to | -----------+ 1st data element (all are contiguous). | schemaType | Data must be shMalloced.schemaType
is | -----------+makeio's type for the
data
. | data size | Size (bytes) of an object schema in the | -----------+array. | align | Alignment factor of 1st object schema | -----------+
in the array. | incr | Address increment between object schemas +-----------+------+
in the array. | nStar | Amount of indirection outside the ARRAY. +-----------+ | info o+--> Optional auxiliary information describing +-----------+ the array data.
infoType
is makeio's | infoType | type for theinfo
structure. +-----------+
ARRAYs describe the dimensions of an array with
dimCnt
(number of dimensions) and dim
(the actual
dimensions).
ARRAYs are
limited to 34 dimensions
(a Dervish compile-time limitation). This should handle most needs, including
FITS Binary Tables.
Dimensions (dim
) are in row-major order, where the last
index varies fastest. Indices are 0-indexed.
If available, data.schemaType
describes the
object schema type of the data
(if it's not available, data.schemaType
should be set to
the UNKNOWN object schema type).
ARRAYs do not restrict the user to primitive types of data.
Structures (object schemas) can also be stored within an ARRAY.
The data
can also be additionally described with an optional
info
rmation.
If available, the infoType
field describes which object schema type
is pointed to by info
.
nStar
indicates the amount of indirection
outside the ARRAY.
nStar
of zero (0) indicates that the data in the
ARRAY data
area is of data.schemaType
.
If nStar
is one (1), it indicates that the data
area
is an array of pointers to data of type data.schemaType
.
In essence, nStar
is the number of asterisks (*) in
a C declaration for a pointer.
For example, if data.schemaType
is equivalent to
FLOAT and nStar
is two (2), the data
area contains pointers to pointers to FLOATs.
The pointers to the FLOATs and the FLOATs themselves
are outside the ARRAY and its data
area.
If no data is available, no
hierarchy of array pointers
is allocated, nor is any space for data allocated.
Both the ARRAY structure
members, arrayPtr
and data.dataPtr
, are set to point
to a zero (0) address (null).
subArrCnt
contains the count of array elements at each "level"
within the array.
The array can be considered to be a tree, where the leaves are the array
elements.
The intermediate nodes form the
hierarchy of array pointers.
This is where the concept of "levels" within the array comes from.
The sizes of subtrees at different depths within the array can be used
to compute array element locations.
In general, object schema types are self explanatory.
But, there is one common data type that may seem to be used in an unconventional
manner, STR, the (null terminated) character string.
Consider an example where nStar
is 0, that is, all data (the
character strings per se) are contained in the ARRAY
data area. In that case, the fastest varying (rightmost) index of the dimension
is the number of characters in the string.
The intention is to indicate that the ARRAY is an array of
characters, not binary-valued bytes.
The use of the STR object schema type to describe the
ARRAY contents achieves this differentiation between
UCHAR and SCHAR, a numeric data type, and characters
themselves.
Still, because the ARRAY is still an array of characters, the
datum size and increment (data.size
and data.incr
respectively) reflect the size of a single character and the spacing
between single characters, rather than the complete character string.
(The alignment, data.align
, is the same for a single character and
a character string, so the user does not need to make any distinction there.)
Now consider an example where nStar
is 1 and
data.schemaType
is STR.
In this case, the ARRAY data area contains pointers to
character strings.
The datum size, data.size
reflects the size of a pointer, not
of a character or character string.
Realm of the ARRAY proper .--------------------------------------. | object schema data area | | | V . V | : | +-------+ --.dimCnt
&dim
control +-----------+--------+ .--->| | | indexing to data area, | dimCnt | | +-------+ | but not the final data +-----------+---//---+ | / . / | (such as a character | dim | ... | | / : / | from a string). +-----------+---//---+ | +-------+ --' | arrayPtr o+----------' .->| o--+-. ARRAY data area has +-----------+------+ | +-------+ | pointers tochar
. | dataPtr o+-----' | null | | | -----------+ +-------+ | +---+---+---+------+ | STR = schemaType | / . / `-->| a | b | c | \000 | | -----------+ / : / +---+---+---+------+ / . / +-------+ / : / | o--+---. +---+---+------+ +-----------+------+ +-------+ `-->| y | z | \000 | | nStar = 1 | +---+---+------+ +-----------+ | | | . | `--------------------' : Final data: the strings exist outside the ARRAY data area.
Data is accessed through the arrayPtr
and data
fields of the ARRAY structure.
These two fields describe arbitrary n-dimensional arrays and allow
access to them.
The notation used throughout follows C's row-major ordering of array elements, that is, the last subscript varies most rapidly. The analogy to C arrays applies only to notation (other limitations with respect to the notation also apply). Since the ARRAY format is generic where array dimensions are not known until run-time, arrays are stored in such a manner that regular C syntax can be used to access these arrays without knowing their dimensions at compilation-time (thus, they're not stored the same a C arrays).
The C concept of an array name representing a pointer to the array is used to accomplish this type of addressing. For n-dimensional arrays, n sets of pointers are used. These array pointers are arranged in a hierarchical fashion, with n-1 levels of pointers. The nth level contains the array data. The top level (first set) of pointers are associated with the array name. For a 1-dimensional array, there is only one pointer to the start of the data itself. For an n-dimensional array, the first set of pointers point to the second set of pointers. For a 2-dimensional array, this second set of pointers point to the starts of data for 1-dimensional arrays. Otherwise, the second set of pointers (next to top level set) point to the third set of pointers, and so forth.
Consider a general example of an an n-dimensional array with dimensions (in row-major order)
(d , d , ..., d ) 0 1 n-1The hierarchy of array pointers would be linked as shown below. The bottom level contains the actual data.
It's possible to access array elements by "bouncing" down the
hierarchy of array pointers.
The C code example below shows how this can be done.
array
is a pointer to the ARRAY.
idx
is an array of all indices.
Because of the physical layout of ARRAY data, it's also possible to compute the location of an ARRAY element give a set of indices.
At a minimum, valid ARRAYs must be 1-dimensional.
The n-dimensional examples in subsequent sections show the user
declaring myArray
in order to reference array data, but never
setting its value (pointing it off to data).
Consider the following example, in C, where array
points to a
ARRAY structure:
array->arrayPtr
should be used to initialize
myArray
in all cases.
array->data.dataPtr
, the pointer to the first byte of data in the
array, should not be used. If C's array referencing notation is to be
used, array->data.dataPtr
will work properly only if the data is
1-dimensional.
array->arrayPtr
will work for any n-dimensional array.
A 1-dimensional array of i elements is stored as follows:
A 2-dimensional array of j by i elements is stored as follows:
myArray
were declared as char **
, an array of
character strings, the above example will access one character from a
character string. The whole string is referenced with
A 3-dimensional array of k by j by i elements is stored as follows:
A 4-dimensional array of m by k by j by i elements is stored as follows:
As mentioned above, the C notation for accessing arrays used here only reflects the access of data, not the layout of data in memory. This notation has some other restrictions that apply:
The maximum number of ARRAY dimensions is limited to 34. This value should be large enough for most applications. The value was chosen based on the practical number of dimensions that a FITS Binary Table could support. Because of the FITS header line size (80 bytes) and the format of the TDIMn keyword, FITS is practically limited to 33 dimensions. The additional dimension permits the slowest varying (first) index to reference the Table row.
Now that the limitations in using the C array notation have been described, they can be relaxed a bit. Data is stored in a C fashion (row-major) with the fastest varying indexed elements being stored adjacent to each other, then the next fastest varying indexed array is stored, etc. Consider an example of a 3-dimensional array of k by j by i elements (addresses increase left to right and top to bottom):
Rather than bouncing down the
hierarchy of array pointers,
it is possible to compute the location of an array element, given its indices
(0-indexed for this description).
ARRAY's subArrCnt
contains the count of array elements at each "level" within the array.
There are dimCnt
valid elements in subArrCnt
(just
as for dim
).
The level (0-indexed) refers to the depth within the hierarchy of array pointers, where level n-1 contains the leaves, namely the array data itself. The level also refers to the number of array indices provided, less 1. Consider a 3-dimensional array with dimensions (2, 3, 2) (in row-major order):
level
-----
-1 ___.___
/ \
/ \
0 _0_ _1_ <-----+- slowest varying indices
/ \ / \ |
/ \ / \ | hierarchy of array
1 0---1---2 0---1---2 | pointers (arrayPtr)
/_\ /_\ /_\ /_\ /_\ /_\ |
0 1 0 1 0 1 0 1 0 1 0 1 <-+- fastest varying indices
+-+-+-+-+-+-+-+-+-+-+-+-+ --.
2 | | | | | | | | | | | | | | data (data.dataPtr
)
+-+-+-+-+-+-+-+-+-+-+-+-+ --'
At level l (0 <= l < dimCnt
),
there are dim[i]
subarrays.
In the above example, at level 0, there are two subarrays.
If one index is provided, it explicitly refers to only one subarray at level 0
(as the index references only the slowest varying index).
Passing two indices references only one of six subarrays at level 1.
Passing three indices references only one of 12 array elements at level 2.
Level -1 does not exist in ARRAY's subArrCnt
.
But, providing no indices to reference an entire array is equivalent to
accessing a subarray at level -1.
For an n-dimensional array with dimensions (in row-major order)
(d , d , ..., d ) 0 1 n-1each subarray size as an array element count, a, at level l (0 <= l < n-1) is
____n-1 || a = || d l || j j=l+1By definition,
a = 1 n-1
Because array elements are layed out sequentially, the array may be treated 1-dimensionally. Given a set of n indices, x, into the n-dimensional array
(x , x , ..., x ) 0 1 n-1a 1-dimensional index, i, can be computed
n-2 .---. x + \ a * x n-1 / k k '---' k=0If fewer than n indices were passed for x, the remaining indices should be treated as zeros (as it's 0-indexed) to locate the starting array element of a subarray. The above expression degenerates by changing the upper bound on the sum to the number of passed indices, j, less 1 and dropping the initial addend:
j-1 .---. \ a * x / k k '---' k=0In either case, multiplying i by
data.incr
results with
the array offset (from data.dataPtr
) for the element or subarray
(respectively) indexed by x.
These computations can be done using
subArrCnt
.
For example, the following C code returns elemOff
, the 0-indexed
offset (expressed in array element units) to the start of the referenced
subarray.
idxCnt
is the number of indices (in array idx
) passed
by the caller (which must be less than the number of array dimensions)
and array
points to the desired ARRAY:
The number of elements within a subarray, subArrSize
, referenced
by idxCnt
indices can be gotten with:
Given a 0-indexed 1-dimensional offset, elemOff
, into an
n-dimensional array, the corresponding n indices can
be computed.
Using subArrCnt
, the following C code illustrates how this can
be accomplished:
As mentioned before, STR object schema types are handled somewhat unusually. For STRs, the subarray count represents the total number of characters (including null terminators) in the subarray rather than the number of character strings.
For an n-dimensional array with dimensions (in row-major order)
(d , d , ..., d ) 0 1 n-1the number of pointers used in the hierarchy of array pointers is
n-2 .---. ____ j \ || d / || i '---' i=0 j=0The number of data elements is
____n-1 || || d || i i=0
As mentioned before, STR object schema types are handled somewhat unusually. For STRs, the data element count represents the total number of characters (including null terminators) in the array rather than the number of character strings.