The columnar (TBLCOL) format provides two benefits for the Dervish user:
The fields of a TBLCOL Table are described with a CHAIN of ARRAY and TBLFLD type object schemas. Generally, a TBLCOL Table looks like:
The TBLCOL structure mainly heads the CHAIN of ARRAYs, one per Table field. Each ARRAY describes the field data and points off to it. The TBLFLD structure is optional. It provides additional information about a Table field.
At first glance, having a CHAIN of ARRAYs point off to the data and additional Table information in TBLFLD object schemas may appear to be illogical. Since ARRAYs have already described the field data, the "non-essential" information in the TBLFLDs can now be optional. Thus, the construction of Tables by the user is simplified, as there are less mandatory structures that need to be filled in. Another benefit is that the ARRAY objects used to describe field data are not Table-specific; they can be used to describe other arrays in the Dervish environment, reducing the proliferation of application specific structures that are similar.
The above drawing is not all-encompasing. The concept of heap storage is not shown. A Table field consisting of heap allows variable length data to be stored efficiently.
TBLCOL data is maintained by ARRAYs. When reading about ARRAYs, keep in mind that when a FITS Table is read into a Dervish Table, an additional array dimension, namely the Table row is added to the Dervish Table. This is the slowest varying (first) index. A Binary Table scalar field becomes a 1-dimensional array in a Dervish Table. A Binary Table field containing an n-dimensional array becomes an (n+1)-dimensional array in the Dervish Table.
FITS ASCII and Binary Tables differ when it comes to arrays as data within a field. ASCII Tables (TABLE HDUs) do not permit arrays; a field can only hold a scalar value (or a character string). Binary Tables (BINTABLE HDUs), on the other hand, permit a field to contain both a scalar or an array of values.
Since the ASCII Table is more restrictive, it is still possible to discuss them as if they allowed 1-dimensional arrays (to permit character strings to be handled). In the discussion below, for an ASCII Table, TFORMn's element count would always be one (1). Additionally, even though ASCII Tables do not permit the TDIMn keyword, allowing its limited use (only row index and character string index are allowed) permits TBLCOL Tables to remain generic.
Although a Table field form (TFORMn keyword) can indicate a scalar value (a element count of one (1) without a corresponding TDIMn specification) or a multidimension array, all field values are stored as an array in the TBLCOL format. The ARRAY object schema describes how data is stored. At a minimum, the Table row number is the slowest varying index, as in this C code example:
The notation used throughout follows C's row-major ordering of array elements, that is, the last subscript varies most rapidly. FITS' column-major ordering of array dimensions (where the first subscript varies most rapidly) specified by TDIMn is reversed under Dervish. In the example above:
where i and TDIMn's subscripts are 1-indexed:
th d is TDIMn's
(c -
i+ 1)
subscript i
The analogy to C arrays applies only to notation (other limitations with respect to the notation also apply). Since the TBLCOL format is generic, array dimensions are unknown until the file is read. Therefore, multidimension arrays are stored in such a manner that regular C syntax can be used to access these arrays without knowing their dimensions at compilation-time (thus, they're not stored the same a C arrays). The C concept of an array name representing a pointer to the array is used to accomplish this type of addressing. For n-dimensional arrays, n sets of pointers are used. The first pointer is associated with the array name. It points to the data in a 1-dimensional array or to the second set of pointers. These pointers in turn point to the data in a 2-dimensional array or to the third set of pointers; and so forth.
Besides information maintained by the ARRAY and TBLFLD strutures, additional information can be retrieved from the Table's Dervish header. Dervish headers are not FITS headers, though they look remarkably similar. The similarity does extend to some restrictions Dervish places on the format of Dervish headers (for example, the length of character string values is limited). These restrictions allow Dervish headers to be readily converted to FITS headers, when a Table is written out to a FITS file, without any information loss.
The Dervish header is accessed through a Dervish handle. For example, in Tcl, if `h3' is the handle for a TBLCOL Table, the Dervish header entries can be accessed with
dervish> hdrGetLine h3.hdr AUTHOR
Information about the Table is maintained in several places, but never duplicated. Therefore, many FITS header keywords from the original FITS ASCII or Binary Table (if the Dervish Table was created by reading it from a FITS file) are not accessible from the Dervish header:
Information in the TBLCOL and TBLFLD structures that was originally filled from a FITS header will accurately represent the value from the FITS header keyword. But it will not retain any comment associated with the FITS header keyword. That information is lost.
This Dervish header fields listed above are a superset of those found in ASCII and Binary Tables. If the Table was read from a FITS file (extension HDU), the values for these fields came from the FITS header. These keywords, indexed by a field position (e.g., TTYPE5), are removed from the FITS header. Some of these keywords are recognized only for a particular Table type. For example, TDIMn is recognized in Binary Table headers only, while TBCOLn is recognized only in ASCII Table headers. Yet, these keywords are still removed from the FITS header for other Table types, without reading and saving their values in a TBLFLD handle. The resulting Dervish header has very little information categorizing it as coming from an ASCII or Binary Table, achieving the goal of a generic Table.
Unknown keywords (non-standard and not Dervish extension keywords) which are indexed by a field position (for example, TMINEn) are not removed from the Dervish header. This may cause problems if fields are inserted into or removed from the Table. In that case, the field positions (ordinals) might not match the user-defined keywords in the Dervish header. This does not happen with known keywords that have been removed from the Dervish header and are maintained in the TBLFLD handle, without any reference to a positional index (n).
Users may (but should not) add keywords that were removed from the header (for example, NAXIS). When a Table is written out to a FITS file, these "forbidden" keywords in the header are not used.
Additional object schemas have been defined to handle Dervish Table needs.
The object schema type LOGICAL describes a logical value, either zero (0) value to represent "false" or a non-zero value (which should be one (1)) to represent "true." LOGICAL is provided to differentiate between a byte and a boolean value. This is needed to support the Binary Table `L' data type (which takes on the values `T' and `F' rather than a binary number as a byte does).
ARRAYs also use the object schema type STR in a manner that may not be intuitive at first (but it is self consistent). STR's use is necessary to distinguish an array of bytes representing a character string from an array of numeric valued bytes.
Finally, the TBLHEAPDSC object schema type is used to implement heap storage (from Binary Tables).
Dervish Tables are headed by the TBLCOL object schema. TBLCOL provides access to the Dervish header associated with the Table and the Table fields.
The TBLCOL structure contains very little information. It does allow access to the Dervish header associated with the Table. It also heads the CHAIN of ARRAYs, one per Table field. The description is done from a C view, but its use can be easily extended to Tcl and object schemas. The C header file, shCTbl.h, contains the TBLCOL declaration.
+-----------+ Number of rows in Table. Must match each | rowCnt | ARRAY's first dimension,dim[0]
. +-----------+------+ | | / hdr / Dervish header. | | +------------------+ | | CHAIN head to fields (ARRAYs). This | -----------+ allows arrays to be added and removed / fld type / easily.type
is makeio's type for the | -----------+ ARRAY object schema (return from | | shTypeGetFromName ("ARRAY")). +------------------+
For all Tables, information about a Table field's data is maintained in the ARRAY structure. Optional information about a Table field is maintained in the TBLFLD structure, accessible from both a C structure and a Tcl handle and object schema. Access to a TBLFLD structure for a TBLCOL format Table is dependent upon the language in use (C or Tcl):
Under Tcl:
The TBLFLD structure contains mostly information retrieved from the FITS header when the ASCII or Binary Table was read (if the Table was created, the creator needs to fill in this information). The description is done from a C view, but its use can be easily extended to Tcl and object schemas. The C header file, shCTbl.h, contains the TBLFLD declaration.
+-----------+ Bitmask indicating Txxx keyword (TBLFLD | Tpres | member) presence (a shCUtils.h set). +--------+--//--+--+--+ | TTYPE | ... | | | Field label. +--+--+--+--//--+--+--+ | TUNIT | ... | | | Physical units of field +--+--+--+--//--+--+--+ | TDISP | ... | | | Suggested FORTRAN 90 format to display +--+--+--+--//--+--+--+field. | TSCAL | Scale factor (not applied to data). +-----------+ | TZERO | Zero offset (not applied to data). +-----------+ | TNULLSTR | Undefined value for integers (not applied +-----------+
to data) as a character string. | TNULLINT | Undefined value for integers (not applied +-----------+
to data) as a signed 32-bit integer. | array o+--> Permits access to data through ARRAY. +-----------+------+ | | Description of data store in heap, along / heap / with ARRAY's TBLHEAPDSC data.
Tpres
| | has SH_TBL_HEAP set to indicate heap use. +------------------+ | | Private data used when reading/writing / prvt / FITS files. None of the fields inprvt
| | should be used/relied upon by the user. +------------------+
TTYPE, TUNIT, and TDISP have preallocated space of shFitsHdrStrSize characters, plus one additional character for null termination of the string. Thus, the values of these fields can (must) be changed in situ.
The TBLFLD structure contains information that is the union of ASCII Table and Binary Table information. When writing the Table out to a FITS file, only the information appropriate to the HDU extension type (TABLE or BINTABLE) will be written to the FITS header. Also, when writing the Table, TSCALn/TZEROn are treated specially if it is necessary to generate a FITS-compliant HDU.
Data is accessed through TBLFLD's array
field.
ARRAY object schemas describe
arbitrary n-dimensional arrays and allow
access to them.
Consider a Dervish Table specific example, in C, involving arrays.
The example reflects Dervish Table's use of the slowest varying (first) index
as the Table row index.
myField
is the name of the "array" indexed by the Table row number.
array
points to the ARRAY structure of the field
we're interested in and tblFld
points to the associated
TBLFLD structure:
array->arrayPtr
should be used to initialize myField
in all cases.
array->data.dataPtr
, the pointer to the first byte of data in the
field, should not be used for reference through myField
.
But, array->data.dataPtr
can be used when filling in array data
in a linear fashion.
Heap storage is a concept from FITS Binary Tables. It allows variable length array data to be stored efficiently with respect to space. The ARRAY's data consists of TBLHEAPDSC object schemas which describe the variable length data. The actual data is owned by the TBLFLD. The general structure of a TBLCOL Table now expands a bit for fields with heap data:
The TBLHEAPDSC data type is used to describe the length, in
datum units, and position of heap data.
The ARRAY structure
will have a TBLFLD hung off its info
member.
Dimension information maintained by the ARRAY
(dimCnt
and dim
) reflects the array dimensions of
TBLHEAPDSCs (owned by the ARRAY), not the
heap data. The ARRAY's
hierarchy of array pointers
also applies only to the TBLHEAPDSCs.
+------------------+ | cnt | Number of elements in heap datum. | -----------+ | ptr | Address of first element of heap datum. +------------------+
NOTE: There may be some controversy as to what the FITS keyword TDIMn really means for variable length data (it should be noted that the FITS Binary Table Standard lists TDIMn as a convention for multidimensional arrays, not as part of the Standard). Under Dervish, the FITS TDIMn keyword is not supported with a Binary Table TFORMn data type of `P'.
TBLHEAPDSC data in the ARRAY only describes the
amount and location of the data. It does not provide any information about
the data type.
As data in heap is owned by the TBLFLD structure,
TBLFLD's heap
member describes that data.
That description is the same as the
ARRAY's data
member.
+------------------+ Points to 1st heap data element (all are
| dataPtr o+--> contiguous). Heap must be shMalloced.
| -----------+
| schemaType | makeio's type for the heap
.
| -----------+
| heap size | Size (bytes) of object schema in heap.
| -----------+
| align | Alignment factor of 1st object schema.
| -----------+
| incr | Address increment between object schemas.
+------------------+
If available, heap.schemaType
describes the object schema type
of the data stored in heap (if it's not available, heap.schemaType
is set to UNKNOWN).
Heap data types are handled in the same manner as the data types in the
ARRAY data area.
For example, character strings (STR object schema type) must have
an additional character used to
guarantee null-termination of character strings.
Heap data types are not restricted to primitive data types.
But, if the data type does not match a
Binary Table data type
that is mapped to an object schema, heap data will be
written as an unknown type to a
FITS file HDU.
The location of heap data is maintained in two places:
ptr
field should be
used when referencing heap data.
heap.dataPtr
can be used to point to a heap data area
which "belongs" to the TBLFLD. This structure member
provides information used for memory handling (namely deallocation).
heap.dataPtr
can be set to point to a zero (0) address
(null) while ARRAY TBLHEAPDSCs still point off to heap
data.
Consider an example of 3 rows of TBLHEAPDSC data in the
ARRAY.
The first row points to heap data outside of the heap area owned by the
Table field's TBLFLD. The remaining 2 rows point to data
contained within the TBLFLD's heap data area.
XXXX
For TBLFLD heap information to be considered valid,
SH_TBL_HEAP must be set in TBLFLD's Tpres
.
SH_TBL_HEAP does not necessarily indicate that the
TBLFLD owns the heap data, rather it indicates that
TBLFLD's heap
member contains valid information.
If SH_TBL_HEAP is not set and the ARRAY data type is
TBLHEAPDSC, the heap data should be considered of unknown type.
Heap data not owned by the TBLFLD should be freed prior to deleting the field or Table (shTblfldDel, shTblcolDel, or tblColDel); otherwise, that memory may become "lost" (a memory leak).
Heap data is accessed through the TBLFLD's array
field, just as main ARRAY data.
But, the ARRAY describes a 1-dimensional array of heap descriptors,
TBLHEAPDSC
(see the NOTE above about
why TBLHEAPDSC's are limited to 1 dimension, namely the row index).
The use of the ARRAY indices will only reference the
TBLHEAPDSC.
Since not all applications will use a
TBLCOL Table that was read in from
a FITS file, it's possible to create one from scratch and fill in the data.
The steps required are shown as an example in C.
Much of the work is done in a table-driven fashion.
Also, the label rtn_return
is a convenient place to jump to
when something fails; it prevents the need to clumsily nest testing to
such an extent that the code becomes unreadable.
array->dimCnt
) and set the first dimension to the
number of rows in the Table (ROWCNT
), the dimension count
for each field is bumped up rather than set explicitly.
The loop setting dimensions for a particular field counts down.
This is done to improve performance, since the iteration comparison
is done against a constant rather than an array reference. But, the
setting of array->dim
uses the more expensive means of
referencing array->dimCnt
each time; this protects
against any changes to shTblColCreate that might fill in more
than just the first array dimension.
___________________________________________________________
shTblFldHeapAlloc will allocate a TBLFLD structure
if the ARRAY does not have one (which it shouldn't, as
shTblColCreate was instructed not to allocate
TBLFLDs).
___________________________________________________________
shTblTBLFLDsetWithAscii
will allocate a TBLFLD if necessary.
The above example decided that truncation of the
TBLFLD TTYPE
member value
(indicated by a SH_TRUNC return value)
should not abort the Table creation.
___________________________________________________________
array->data.schemaType
of STR) is guaranteed
space for a null terminator.
That last character will be lost if the Table is
written out to a FITS HDU.
NOTE: For the sake of simplicity, an unsanctioned way of creating a handle is shown below. The sub-example is self-contained; the header file includes and variable declarations are out of place with respect to the rest of the example. The sub-example is also incomplete.
rtn_return
appears).
There are very few limitations placed on Tables when used in the Dervish environment. But, if a Table is to be written out as a FITS ASCII or Binary Table Header and Data Unit (HDU), some restrictions need to be adhered to.
nStar
must be zero.
NOTE: There may be some controversy as to what the FITS keyword TDIMn really means for variable length data (it should be noted that the FITS Binary Table Standard lists TDIMn as a convention for multidimensional arrays, not as part of the Standard). Under Dervish, the FITS TDIMn keyword is not supported with a Binary Table TFORMn data type of `P'.
cnt
)
must be a positive value (0 or greater).