TBLCOL/Schema Conversion and Its Tcl Interface
TBLCOL/Schema conversion is done through a translation table. Click
here for its C implementation. This
documentation describes Tcl routines for following purposes:
Building Translation Tables
Translation Table Syntax
TBLCOL/Schema Conversion
Examples
if you'd like to see the high-level description of translation table
or object conversion, please click here.
There are two ways to create/edit a translation table. The first way,
which reads in an ascii file and create entries, is much faster and
thus prefered. The other way, directly calling entry-addition routines,
is more suited for tables containing only a few entries and small
scale in-memory changes.
Click here to create a table from
ascii files. Click here to
create a table directly in Tcl.
Two Tcl verbs are provided to create a table from ascii files and write out
a table to an ascii file. Click herefor a
description of the ascii format.
schemaTransWriteToFile
schemaTransEntryAddFromFile
Purpose: write out all entries in the table to an ascii file
Arguments: tableHandle fileName
Return: none
Example: dervish> schemaTransWriteToFile h1 junkfile
dervish>
Purpose: read the entries in the ascii into a translation table
Arguments: tableHandle fileName
Return: none
Example: dervish> schemaTransNew
h1
dervish> schemaTransEntryAddFromFile h1 junkfile
dervish>
Following Tcl routines are for use with translation tables.
schemaTransNew
schemaTransDel
schemaTransEntryAdd
schemaTransEntryDel
schemaTransEntryClearAll
schemaTransEntryImport
schemaTransEntryShow
Purpose: mallocks a table of MAX_SCHEMATRANS_SIZE entries
Arguments: none
Return : a handle, if successful.
Example: dervish> schemaTransNew
h1
dervish>
Purpose: deletes a table and frees all memories associated with entries
Arguments: tbl_handle
Return : none.
Example: dervish> schemaTransDel h1
dervish>
Purpose: add an entry to an existing table
Arguments: tblHandle convType fitsName fldName data_type -proc -dimen -ratio -position
Return : none if successful.
Example: dervish> schemaTransEntryAdd h1 name GSC_ID id int
Entry added
dervish> schemaTransEntryAdd h1 name NROW mask struct -proc maskNew
Entry added
dervish> schemaTransEntryAdd h1 cont mask nrow int
dervish>
where, tblHandle is a handle to an existing
translation table and convType specifies how the following fields
are intepreted. If convType is "name", then the following two fields
are taken as field name in FITS file and in
schema respectively. If it is "ignore" (in this case, one still has to specify
fitsName, which will not be used, though), the field by fldName in the
given schema will ignored. This allows maximum flexibility especially when FITS files
don't have all the fields one wants. Lastly, convType can also
be "cont", indicating the line is a continuation line following the
previous line. See syntax
for more explanations.
Briefly, for ConvType, following types are allowed:
name
cont or continue
ignore
For data_type, following are allowed:
char, or unsigned char
short, or unsigned short
int, or unsigned int
long, or unsigned long
float
double
enum
string
struct
In addition, schemaTransEntryAdd also can deal with heap -- a variable length storage
available in FITS, and make enum type portable.
See Portable Enum and Heap
Capabilities for details.
Optional parameters are: -proc provides a custom Tcl constructor, which
, if specified, will be used to construct the objects; -dimen is the
dimension information, say, "5x10" (i.e, two dimension with size 5 and 10)
and the constructor will be called 50 times. If corresponding field is of
elementary type (which doesn't need constructor), memory of this size
will be allocated. Lastly, -position optionally specifies where to add
the entry, -position n adds to the n-th line.
Purpose: deletes a table entry and frees all memories associated with entry
Arguments: tbl_handle entry_number -only
Return : message.
Example: dervish> schemaTransEntryDel h1 1 -only
dervish>
The routine normally will delete the given entry and its associated
continuation lines. But if -only is given, only that entry is deleted.
Purpose: Empty the table by clearing all the entries
Arguments: tbl_handle
Return : none
Example: dervish> schemaTransEntryClearAll h1
dervish>
Purpose: Import to a table selected entries from another table
Arguments: src_tblHandle dst_tableHandle -from -to -at
Return : none
Example: dervish> schemaTransEntryImport h1 h2
dervish> schemaTransEntryImport h1 h3 -from 1 -to 3 -at 1
where -from/-to designates the begining and ending of the range in source table to copy,
and -at chooses where to insert in destination table (default is at the end).
First example shows the way to append the entire source table to the destination table.
Second example imports entries from 1 to 3 (inclusive) to h3 at entry number 1.
Purpose: print a translation table on the screen
Arguments: tblHandle
Return : none
Before proceeding, I assume you're already familiar with
translation table syntax.
Translation table's ascii format consists of entry list in
plain ascii. Each entry is expected to have at least 4 ascii
strings (sparated by spaces) to represent conversion type, fits
side name, object side name, and object data type, in that order.
Option parameters appear after the 4 basic units, e.g., -proc = regNew.
A entry must end with semicolon ';'. In short, an ascii entry resembles
the command line of schemaTransEntryAdd.
Comments are allowed in the ascii file. All comment lines start with a
pound sign '#' and ends at end of the line.
Here are some example ascii files.
#
# following entries are machine-generated. Editing is welcome.
#
# All comments start with pound sign.
#
# Entries are in following order:
# ConversionType FitsSideName ObjSideName ObjDataType others ;
#
# An entry always ends with a semicolon ';'
name OBJID id int ;
name NCOLOR ncolor int ;
name ID[0] color[0] struct -dimen=3 ; # option parameters seen
cont COLOR[0] id int ;
name REGPIX0 color[0] struct -dimen=3 ;
cont COLOR[0] region struct -proc="{regNew} {-mask}" ;
cont REGION rows_s16 heap -dimen=color<0>->region->nrow -heaptype=SHORT
-heaplength=color<0>->region->ncol ;
name NSPIX1 color[1] struct -dimen=3 ;
cont COLOR[1] noise struct -proc={regNew} ;
cont NOISE rows_u16 heap -dimen=color<1>->noise->nrow
-heaptype=SHORT -heaplength=color<1>->noise->ncol ;
#
# end of table
#
Single-line entries in a translation table are self-explanatory. They
match a specific FITS field name to a field in schema, and the conversion
routines will fill one field with the values found from the other. Note
1st field is always at FITS side and the 2nd is always the schema side when
adding entries. This way, a single translation table can be used in
conversion of either direction.
Following example shows how to add an entry for converting the field called "id"
in a schema to FITS field "MY_ID".
dervish> schemaTransNew
h1
dervish> schemaTransEntryAdd h1 name MY_ID id
Entry added
dervish>
Note capitalizing MY_ID is optional, as the routine will convert all FITS field names
to upper cases.
Single line entries are very useful when the fields one is interested in are all
elementary fields. However, such may not be all in the cases. If a user-defined
structure is present in a schema and the user wants to convert it, multiple line
entry is used. A multiple line entry has a single main line followed by one or
more continuation lines.
Let's see an example. Suppose we want to match a FITS
field called GSC_ID to nrow, i.e, region.mask.nrow (or even region->mask->nrow),
where region is a REGION and mask is a MASK. The translation table looks like:
ConvType FITS-side SCHEMA-side FieldType ratio constructor size
name GSC_ID mask struct 1.000 none none
cont mask nrow int 1.000 none none
This is achieved by:
dervish> schemaTransEntryAdd $xtbl name GSC_ID mask struct
dervish> schemaTransEntryAdd $xtbl cont mask nrow int
For each add, we may specify options like -proc, -dimen and -ratio to
elaborate the conversion at that level. In this case, mask has a Tcl
constructor called maskNew, we then:
dervish> schemaTransEntryAdd $xtbl name GSC_ID mask struct -proc maskNew
dervish> schemaTransEntryAdd $xtbl cont mask nrow int
The table now looks like
ConvType FITS-side SCHEMA-side FieldType ratio constructor size
name GSC_ID mask struct 1.000 maskNew none
cont mask nrow int 1.000 none none
So each time a mask is found, maskNew will be used to construct that objects.
(if one doesn't give a constructor, the conversion routines will try to
other ways. See Oject State Initialization
for more details.
Array specification is flexibly allowed in a translation
table. But before we explain that, we first go through some basics.
A schema field can be a combination of following three types:
C elementary types (e.g, char, int) or custom-defined types (e.g, struct)
Pointer type with a number of indirections (stars), e.g, char ***s;
Array, e.g, int i[10][20];
Although C generally allows users to mix pointers with arrays, these two
types have important differences. Array memories are static and contiguous,
whereas pointers
require initialization and/or run-time memory allocation, and may point to
discrete memory blocks. And loosely speaking, the former doesn't
need de-reference indirect addressing, while the latter does.
For a field that is already an array (static, contiguous memory), say, int
c[5], translation table allows specifications like c[3] to get the
4th element of the array. In other words, you can do
schemaTransEntryAdd name GSC_ID {c\[3\]} int
where escape \ is used. Now the translation table looks like:
ConvType FITS-side SCHEMA-side FieldType ratio constructo size
name GSC_ID c[3] int 1.000 none none
Therefore the value of GSC_ID will get copied to c[3], but not other
elements of the array.
Multidimensional arrays have similar syntax. Say now c is defined to be c[5][10].
You may then in the table specify c[3][2] to access the 3rd element of the 4th array,
whose value will be set to that of GSC_ID.
You may even be able to specify c[3] for the above multi-D array.
In this case, c[3] just mean c[3][0], but now GSC_ID has to be an 1-D array of
size 10 because c[3] is an array of size 10 whose top is pointed by &c[3][0]!
A field might also be just a pointer to pointer to ... pointer, say,
REGION ****region;
int ****i;
This may mean either a 3-D array of REGION pointers or
a 4-D array of REGIONs. However, in practice, such an ambiguity doesn't
exist because, in the situation of single star, REGION *region only
indicates one object but not a 1-D array of REGION objects. This is
so partly because region (of REGION* ) only points to ONE object
at a time, and objects like REGION are meant to have their own
memory blocks, which may not be contiguous.
When elementary types are associated with pointer (in the above example, i),
the concept of object is much weaker. These fields, say i, don't have
contructors and their memory can be (and in fact is) contiguous. So,
in the above case, i can be treated as a 4-D array.
We now agree that,
Pointer to pointer .. to pointer to complex types are not meant to
be array of that many dimension. Instead, they are array of pointers to
the complex type and have a dimension of number of indirections minus 1.
Pointer to pointer .. to pointer to elementary types are meant to
be array of that many dimension. They are pointers pointing to the elementary
type arrays having a dimension of size equal to the number of
indirections.
Briefly, pointer to a complex type is NOT an array but an indirection to
a single object, wherease pointer to an elementary type is an array.
When indirection is present, dimension/size specification is achieved in
one single string, i.e, the option -dimen for schemaTransEntryAdd
with x used to separate each dimension. For example, -dimen
5x3 means a 2-D of size 5 and 3 in each dimension, just like C's [5][3]
and indicates that the field has 3 indirections if a complex type, and
2 indirections if an elementary type.
When giving dimension information, one must fully resolve the indirections
for elementary types and leave one indirection for complex types.
Array access, when indirection is present, is just same as
the case for ordinary arrays, treating the added dimension by -dimen
as a regular part of the array. For example, MyType ****mytype[3][4]
is just like a 6-D array when MyType is elementary type, but is a 5-D
array of pointers to MyType when it is a complex type.
Array specification is also allowed in specifying FITS-side fields and
follows a syntax similar to that for schema-side fields.
dervish> schemaTransEntryAdd h1 name {NROW\[0\]} {subRegs\[0\]} struct -proc regNew -dimen 3
dervish> schemaTransEntryAdd h1 cont {subRegs\[0\] nrow int
dervish> schemaTransEntryAdd h1 name {NROW\[1\]} {subRegs\[1\]} struct
shvia> schemaTransEntryAdd h1 cont {subRegs\[1\] nrow int
dervish> schemaTransEntryAdd h1 name {NROW\[2\]} {subRegs\[2\]} struct
shvia> schemaTransEntryAdd h1 cont {subRegs\[2\] nrow int
In the above example, field "subRegs" is type REGION**. A 1-D array of size 3
is specified with tcl verb "regNew" to construct all 3 objects. Each object's
"nrow" will be copied to NROW, which is 1-D array of size 3.
Arrow brackets are also legitimate array specifiers.
Portable Enumeration
Enumerated types are integral types. While they are different types from integers,
C allows conversion (or cast) between enum value and integer. When writing
(or reading a ) FITS, however, an enum value become meaningless as it loses
the context. To avoid this, the author devised the code such that, when writing
a FITS, a composite ascii string that includes both the type and the enum memer
that associates with the value is written out.; when reading a FITS, both type
and value string will be read in and the integer value that associates with this
string found from this type will be assigned to the field. Therefore, the context
is maintained throught FITS I/O process.
dervish > schemaTransEntryAdd h1 name MY_ENUM type enum
Note that, instead of enum, one can still use "int".
Heap Capability
Heap is a variable storage available in FITS. If a particular field takes
variable size, then you want to use heap. Specifying heap is not much
different from specifying other types. In place of "int", "enum", "struct",
or whatever you may use in the 4th parameter, one simply says "heap".
In addition, heap requires at least one more parameter be given "-heaptype"
and, if you are converting schema objects to FITS (TBLCOL), "-heaplength".
Heaptype is the heap base type. It usually is one of the C elementary types,
but one can use customized structures. Heaplength is an expression whose
result gives the length of heap, i.e, number of base blocks. Obviously, when
reading FITS file, this information is not needed because length information
is stored in FITS.
dervish > schemaTransEntryAdd h1 name HP test heap -heaptype FLOAT
This example indicates "test" is a float heap. Data from HP in FITS should be
copied to test.
When writing FITS, heaplength is absolutely required:
dervish > schemaTransEntryAdd h1 name HP contents heap -heaptype FLOAT \
-heaplength ncol
This example shows how heap in a REGION is taken care of. Contents is a heap
of type FLOAT. Heap length is given by the value of nocl in the same object.
If heaplength is not a numerical string (say, 10, a case when one wants to
use fixed length for all objects), it should be a field specified with respect to
the object. In this example, ncol is a field in REGION. If the desired length is
not given directly in REGION but in objects pointed by some field in REGION,
one should use normal C syntax, e.g, mask->row0. The expression will be
evaluated for each object to convert, and hence length may vary between
objects.
There are cases where heap field is multi-dimensional. The "rows_u16"
in a REGION, for example, has two indirections (i.e, two stars) and
hence acts like a 2-D array. A "-dimen" option has to be specified, just
like any other multi-indireciton fields. Like "-heaplength", this
"-dimen" parameter also takes expressions. The result of evaluating the
expressions yields the dimension information, which may vary from objects
to objects.
dervish > schemaTransEntryAdd h1 name HP rows_u16 heap -heaptype FLOAT \
-heaplength ncol -dimen nrow
The result is that rows_u16 will be a field that has a size of nrow x ncol
x sizeof(float) bytes.
Multiplication is allowed when specifying -dimen or -heaplength. For example,
one may use -heaplength 2 x ncol, or -dimen nrow x ncol, etc, where "x"
is interpreted as the multiplication operation. To heaplength, only the
result of this product is significant. But to -dimen, "x" also suggests
dimension information, which will be matched against the number of
indirections of this field minus 1. For example, "rows_u16" has two
indirection, and therefore "-dimen" should only consist of a single number.
No "x" is allowed" (The fastest changing dimension is always given by
-heaplength). If "rows_u16" were "float ***", then "-dimen" would require
one "x" be present, making it to be something like "nrow x ncol".
In short, for heap types, the total number of blocks is given by the product
of dimen and heaplength, with each block having sizeof(heaptype) bytes.
both portable enum and heap can be used in multi-line table entries.
Two routines are provided, each for one direction.
tblToSchema
This routine converts TBLCOL format to one instance or
an array of instances. Parameters/options in order are:
TBLCOL handle a handle of TBLCOL that contains the FITS file
handle a schema handle or a container (ARRAY, LIST) handle
schemaTrans a translation table handle
-proc Tcl constructor to use when container is given.
When NULL (default), schema constructor and
malloc will be tried in order.
-schemaName specifies schema name when container is used.
-row row number in FITS to start when only a subset
of instances is needed. Default is 0.
-stopRow row number before which conversion stops. If
specified, a total of stopRow - row rows
(beginning at row) are converted to objects.
Default value is the ending row number.
-objectReuse if set, objects that are already in the given
container will be used. The conversion routine
will not create any new objects. This is very
useful when data from multiple FITS files are
needed for the objects. However, when a field
is found unitialized, the routine still will try
hard to initialize it.Default is FALSE.
-handleRetain if set, the handles associated with objects
created with Tcl commands will be retained.
Default value is FALSE.
On success, it returns the container/schema handle givenon command
line.
Example:
dervish> set array [handleNewFromType ARRAY]
dervish> tblToSchema $tblcol $array $table -schemaName REGION -proc regNew
In this case, array will be filled with REGION objects constructed by regNew
and filled with the values from $tblcol.
schemaToTbl
The verb converts one instance or more instances to TBLCOL. Parameters/options in order are:
handle a schema handle or a container (ARRAY, LIST) handle
schemaTrans a translation table handle
-schemaName specifies schema name when container is used.
-autoConvert if TRUE, elementary types in the given schema that are
not listed in the translation table will be converted.
Default is FALSE.
On success, it returns a TBLCOL handle.
Converting TBLCOL to one instance of schema
Suppose we want to convert a FITS file to a schema HG. Specifically,
the FITS file we use have fields like RA_DEG, GSC_ID, PLATE_ID and
MULTIPLE, which are double, int, char*4 and char respectively. We now
want to convert them to sum, id, xLabel, yLabel, which in schema HG
are double, int, char*, and char* respectively.
Build the translation table
dervish> set xtbl [schemaTransNew]
dervish> schemaTransEntryAdd $xtbl name RA_DEG sum double -ratio 2
dervish> schemaTransEntryAdd $xtbl name GSC_ID id int -ratio 20
dervish> schemaTransEntryAdd $xtbl name PLATE_ID xLabel string -dimen 5
dervish> schemaTransEntryAdd $xtbl name MULTIPLE yLabel string -dimen 2
Note conversion will match the other fields in HG not listed here
directly to FITS and will ignore those it fails to locate in FITS file.
Get FITS file into TBLCOL
dervish> set tblcol [handleNewFromType TBLCOL]
dervish> fitsRead $tblcol /data/sdss2/GuideStars/v1_1/gsc/n0000/0001.gsc -hdu 1
Create an instance of HG
dervish> set scheam [hgNew]
Now convert
dervish> tblToSchema $tblcol $schema $xtbl]
Note it is to the user's discretion whether to delete TBLCOL after
conversion is done.
Write back to fits just for fun
dervish> set ntblcol [schemaToTbl $schema $xtbl]
dervish> fitsWrite $ntblcol junkFits -ascii
Converting all in TBLCOL to array of instances of a schema
Suppose we now want to convert a FITS file to a schema REGION. We convert
a fits file whose GSC_ID interestes us to nrow, mask.nrow, xLabel, yLabel,
which in schema REGION are int, int (in MASK) and int respectively.
To make it more intereseting, we specify different conversion ratios for
these fields. Note, for type MASK, we'd have to specify a Tcl constructor
maskNew.
Build a tranlsation table...
dervish> schemaTransNew
dervish> schemaTransEntryAdd $xtbl name GSC_ID nrow int -ratio 2
dervish> schemaTransEntryAdd $xtbl name GSC_ID mask struct -proc maskNew
dervish> schemaTransEntryAdd $xtbl cont mask nrow int -ratio 20
dervish> schemaTransEntryAdd $xtbl name GSC_ID type int -ratio 20
Create a TBLCOL and get the FITS...
dervish> set tblcol [handleNewFromType TBLCOL]
dervish> fitsRead $tblcol /data/sdss2/GuideStars/v1_1/gsc/s8230/9537.gsc -hdu 1
Create an empty container...
dervish> set container [handleNewFromType ARRAY]
Convert TBLCOL to array...
dervish> tblToSchema $tblcol $schema $xtbl -proc regNew -schemaName REGION
Note on command line, we used -proc regNew to construct the REGION objects to put
in the array. If no -proc is specified, default constructor or manaully mallocking
will be used.
Write back for fun...
dervish> set ntblcol [schemaToTbl $container $xtbl -schemaName REGION]
dervish> fitsWrite $ntblcol junkFits -ascii -pdu MINIMAL
When oen or more FITS-side fields are specified as array, one has to write
to a binary fit file.
dervish> fitsWrite $ntblcol junkFits -binary -pdu MINIMAL
Wei Peng