Schema and Disk Dumps from C
Changes to the Schema package in V3.3
Examining Schema from C
Introduction to Diskio
How to Write Data to Disk
Dumping Data, the Whole Story
C Functions for Diskio
Details of Internals
In version v3.3 of dervish schema definitions are local to a package
rather than being global; the definition of a `package' is up to you;
it is all the types that you decided to lump together when you
ran make_io. The name of the package is specified on the make_io command line,
for example dervish/src/Makefile contains the lines
DISKIO_FILES = $(INC)/region.h $(INC)/shCList.h
#
diskio_gen.c : make_io $(DISKIO_FILES)
rm -f diskio_gen.c
$(DERVISH_DIR)/bin/make_io -v1 -m Dervish diskio_gen.c $(DISKIO_FILES)
chmod 444 diskio_gen.c
which defines a package called dervish (the -m flag) containing the schema
defined in shCList.h and region.h. Usually all generated dump functions will
start with the first two letters of the module name (e.g. shDumpIntRead),
but you can specify a two-character prefix explicitly using the -p flag
if the fancy takes you (e.g. -p xx).
You load the schema from a package into your program by calling the function
shSchemaLoadFromPackage()
where <Package>
is the name of your package, for example
shMainTcl_Declare calls shSchemaLoadFromDervish, and the test program
dervish_foo which declares a package called test calls
shSchemaLoadFromTest. There is currently no check that type names
are unique; the first one will be used (but note that the name of
a schema type is the C name of the type, so if you actually use two
types of the same name in the same file the C compiler will get very
upset). Still, this will be fixed in a future release.
Things that you'll have to change in your code:
-
Makefiles that use make_io need a number of changes:
-
Remove all references to diskio_gen.h (or whatever you called it; the
name diskio_gen.c is not hard coded into make_io but rather specified on the
command line).
-
Add a -m package_name flag to the make_io command.
-
Remove all include files from the list used by make_io that don't belong
in your package.
-
The function shSchemaInit() should be removed; it has been replaced
by one or more calls to shSchemaLoadFromPackage. In particular, you'll
need to load your new schema (with the name specified to make_io with -m).
-
You must remove lines that include types.h; the file is no longer needed
and it no longer exists.
-
Any reference in your code to things like TYPE_REGION must be
replaced by calls like shTypeGetFromName("REGION"); as this is a constant
you cannot use a switch statement to select code based on a type.
Probably the easiest way to proceed is to use strcmp and the names of
types; if the inefficiency of this approach makes you unhappy, and
if you are prepared to do a little more work you can still use types directly,
but remember that they are no longer compile time constants.
-
The way that you annotate include files to change the behaviour of make_io
has changed. Whereas you used to use commands like DUMP_SCHEMA we now
use an explicit pragma comment; for example the MASK definition looks like
typedef struct mask{
char *name; /* identifying name */
int nrow; /* number of rows in mask */
int ncol; /* number of columns in mask */
char **rows; /* pointer to pointers to rows */
int row0,col0; /* location of LLH corner of child in parent */
struct mask_p *prvt; /* information private to the pipeline */
} MASK; /*
pragma NOCONSTRUCTOR
pragma USER
*/
The possible pragmas are:
- AUTO
- Use generic diskio (dump) code. Make handles automatically, that is,
there is no need to write the code to simulate the TCL verbs typeNew and
typeDel. Has precedence over USER
- CONSTRUCTOR
- Make handles automatically, that is, there is no need to write the
code to generate the TCL verbs typeNew and typeDel. NO dumping of the
associated structure is allowed.
- IGNORE
- Ignore this type/structure completely. Do not create schema for it, no
handle making or dumping. Overrides any other pragmas.
- SCHEMA
- Instructs the schema generation code to generate schema for this
type/structure. No dumping or handle making. Overrides AUTO, CONSTRUCTOR
and USER.
- USER
- The user will provide read/write code (used for dumping to and reading
back from disk) for the associated structure. This is necessary for
complex types like REGIONS. No handle making. If not specified, AUTO
is assumed.
Any of these may be preceeded with NO to request the opposite effect. Default
values are AUTO CONSTRUCTOR NOIGNORE SCHEMA NOUSER.
-
As implied by the above, you do not need to write a TCL interface to
simple types, it is done for you.
-
The functions to access SCHEMA and SCHEM_ELEMs now return a pointer to
const; you may have to modify your code.
The example dervish_foo has been updated to use all of these features; it
now supports a simple type, FOO, and a complicated one, BAR.
Schema are described in terms of two structs, defined in
shCSchema.h.
The C functions that can be used to work with these are:
shSchemaGet
shSchemaElemGet
shElemGet
shElemSet
shPtrSprint
shDumpSchemaElemRead
shSchemaNew
Return a type's schema, given the name of the <type>.
const SCHEMA *shSchemaGet(char *type);
Return the schema of a member <elem> of a <type>.
const SCHEMA_ELEM *shSchemaElemGet(char *type, char *elem);
Return a pointer to the element described by <sch_el> of the
object <thing>.
The type is returned in <type> (if it isn't NULL).
void *shElemGet(void *thing, SCHEMA_ELEM *sch_el, TYPE *type);
Set the element described by <sch_el> of the
object <thing> to <value>.
RET_CODE shElemSet(void *thing, SCHEMA_ELEM *sch_el, char *value);
Return a string containing a printed representation of <ptr>, taken
to be of the given <type>. The string is stored in a buffer resident to
shPtrSprint.
char *shPtrSprint(void *ptr, TYPE type);
Read the element described by <sch_el> of a dumped structure <thing>
from the dump file pointer <fil>.
RET_CODE shDumpSchemaElemRead(FILE *fil,void *thing,SCHEMA_ELEM *sch_el);
Allocate a new SCHEMA and <nelems> SCHEMA_ELEM's. The
SCHEMA struct is filled with zeros or NULL's, except for the type,
which is set to UNKNOWN
, and the pointer to SCHEMA_ELEM's, which
is set to point to the allocated SCHEMA_ELEM's. The SCHEMA_ELEM's are
initialized to zero. Note that the nelem member of each
SCHEMA_ELEM is set to zero; you must allocate memory and fill with an appropriate
char string and set nelem to point to this string. The
SCHEMA returned by shSchemaNew is guaranteed to be followed by a zero byte;
this allows the structure to be passed to p_shLoadSchema to load into
the system SCHEMA tables.
SCHEMA *shNewSchema(int nelems);
Introduction to Diskio
The disk dump (`diskio') facility enables you to stop and restart Dervish,
saving the variables to disk. The format is binary and
complicated, you'd never want to write code to read it yourself, but
fortunately there is a programmer's interface;
all of these routines are also available from TCL.
If I may be permitted to boast for a few lines, these data dumps are
machine and compiler independent (providing that the
floating point format is IEEE and the type long is a 4-byte integer;
both of these restrictions could easily be lifted); in particular they
assume neither a byte order for integers nor the length of an int
(I have successfully written dumps on a 16-bit PC and read them on a
sun). Any pointers in the data are tracked down, and reinstated when
the dumps are read. In some cases a request to dump a set of variables
may not specify all the required data (for example, an object
list may contain subREGIONs of undumped REGIONs); in this
case a warning is issued and the missing data is appended to the
dump as `anonymous' structures.
It is essential to realise that pointers are not written to the dump
file, instead smallish integers are used. This means that until and
unless a file has been fully processed (i.e. closed without error) it
is dangerous to dereference pointers in your newly read data structures.
To use dump files you need to include the proper include files, namely
photo.h and shCDiskio.h in that order.
You will need standard C header files too, or
at least <stdio.h>.
Dump files are opened with
FILE *shDumpOpen(char *name, char *mode);
the return codes are SH_SUCCESS and SH_GENERIC_ERROR; the file is
called name; and permitted modes are "a", "r", and "w"
for append, read, and write. If you want to use "a" you'll have to
read the next section too.
Once a file is opened for append or write you can write data structures
to it with the functions defined in
shCDiskio.h, for example
shRegWrite. When you have written all
that you want, close the file with shDumpClose. Please
note that you should not be lazy and use fclose, as
shDumpClose cleans up various internal structures. If the file was
opened for read it also initialises pointers within your data structures;
if if was opened for write (or append), it checks that all the data structures
referenced by the things that you wrote have actually been written,
and writes any that you missed. If these activities fail, it returns
SH_GENERIC_ERROR.
How should you read back a dump? The simplest way is to use
LIST *shDumpRead(FILE *fil, int shallow) which returns a
list of the contents of the file. If shallow
is 1 (true) the data in the file
isn't actually read into your program; only as much as is needed
to correctly parse the file is read so you should not attempt
to dereference pointers inside the returned data structures (the
only exceptions being name, nrow, and ncol in
MASKs and REGIONs, and testing first against NULL
for lists). If shallow is 0 (false) the whole dump is read into memory.
The returned LIST is of THINGs (defined in
shCDiskio.h):
typedef struct struct_thing {
TYPE ltype; /* used by LIST stuff */
struct struct_thing *next, *prev;
void *ptr;
TYPE type;
} THING;
TYPE is defined in shCSchema.h. You can
then go through the list examining what you interests you:
FILE *fil;
LIST *list;
THING *thing;
fil = shDumpOpen(file,"r");
list = shDumpRead(fil,1)
shDumpClose(fil);
thing = (THING *)list->first;
printf("Date: %s\n",(char *)thing->ptr);
for(thing = thing->next;thing != NULL;thing = thing->next) {
printf("%s",shNameGetFromType(thing->type));
switch(thing->type) {
case TYPE_MASK:
/* ... */
}
}
(I have removed some error checking). You'll see that the first THING
is the date string (type TYPE_STR). The function shNameGetFromType is
used to convert an enumerated type such as TYPE_OBJ1 into a string
such as "OBJ1".
There is a rather more complete example of dump-reading code in
$DERVISH_DIR/examples/dump_list.c (it also prints out the schemas).
(Some of the details in this section are out of date. Please ask
Robert Lupton for updated help.)
If you don't want to read the whole dump you'll have to do a little more work.
Firstly, I didn't tell you the whole story about opening dumps; if the
mode letter is capitalised (e.g. "R") no cleanup is done when
the file is closed. You'll need to remember this in a moment. If you
haven't disabled this cleanup, shDumpClose will return
SH_GENERIC_ERROR if unresolved pointers remain, and ignoring this return
value is a short cut to a segmentation violation. Nothing irreversible
is done when a dumpfile is closed, but when a file is opened
some internal data structures are freed; you can avoid this by using
shDumpReopen() which is otherwise identical to shDumpOpen. If
you have been playing complicated games (e.g. with appending stuff with mode
"A") you may need to use shDumpReopen.
For every Dervish (and most C) datatypes there are two functions defined in
shCDiskio.h, for example:
int shMaskRead(FILE *fil, MASK **thing)
int shMaskWrite(FILE *fil, MASK *thing)
The C datatypes supported are
char, int, long, float, void * (called
ptr), and char * (called str). The type names are
capitalised (shMaskRead, shStrWrite).
There is a function int shDumpTypeGet(FILE *fil, TYPE *type) that can
be used to return the type of the next object written to the file (it'll
return SH_GENERIC_ERROR at the end of the file); once you know what
the type is you can call the proper read function, readPtr or
whatever.
There is a function shDumpNextRead that does
this for you.
If you want to be sneaky and only read some of the dump and skip over other
parts, there will in general be pointers in the read data items that
point to objects that you haven't read. Usually shDumpClose tries to
read any remaining stuff in the dump file in an attempt to find them; you
can disable this by opening the file with a mode of "R" (if you
prefer dirty hacks, (void)fseek(fil,0L,2) should work too). After
thus circumventing checks you may be left with invalid pointers in your
data structures; caveat lector. If you know enough to be reading this
section, you may know where these bad pointers are (e.g. you didn't read
any REGIONs, so don't look at the REGION pointers in OBJ1s).
- shDumpOpen
- Open a dump file
- shDumpReopen
- Reopen a dump (don't init data structures)
- shDumpClose
- Close a dump
- shDumpPtrsResolve
- Resolve pointer ids
- shDumpDateDel
- Replace a dump's date string with Xs
- shDumpDateGet
- Return a dump's date string
- shDumpTypeGet
- Return the TYPE of the next item in a dump
Functions to read/write something in a dump:
- shDumpCharRead
- shDumpCharWrite
- Chars
- shDumpFloatRead
- shDumpFloatWrite
- Floats
- shDumpIntRead
- shDumpIntWrite
- Ints
- shDumpLongRead
- shDumpLongWrite
- Longs
- shDumpMaskRead
- shDumpMaskWrite
- MASKs
- shDumpPtrRead
- shDumpPtrWrite
- Pointers
- shDumpRegRead
- shDumpRegWrite
- REGIONs
- shDumpStrRead
- shDumpStrWrite
- Strings
Extern functions that are really only for friends:
- p_shDumpStructsReset
- Reset the structs in a dump file
- p_shPtrIdSave
The internals of Dervish structures are liable to change.
In consequence, the dump package attempts
to extract almost all the information that it needs from the header
files defining them. There is a program, make_io,
which reads them and finds all structs that are typedef'd to something,
and all typedef'd enums. It then generates
two functions for each type, shTypeRead
and shTypeWrite, in a file called diskio_gen.c.
The prototypes for these i/o functions may be found in
shCDiskio.h. They could easily have been
machine generated, but they
change only when a new type is added, and being forced to copy
and two lines from shCDiskio.h into your type's header should remind you
to check that there is nothing weird about the new type.
The i/o functions for MASKs and REGIONs are not machine generated
as they are significantly different from the various list and object types;
at some point this may change, but they are still likely to be treated
specially by make_io. The way that this is achieved is by adding special
comments at the end of the structure definitions; those currently
supported are:
- NODUMP
- Diskio code should entirely ignore this struct; an example
would be LIST_ELEM
- NODUMP-R/W
- Diskio code should not generate Read/Write/Skip functions
for this struct; an example is REGION.
- DUMP-SCHEMA
- Just dump the schema, but don't generate any i/o code; for example REGINFO.
In addition to this struct i/o code, a few more functions
are generated automatically, basically those that need to deal with
every structure that can exist in a dump file (e.g. shDumpRead
and shTypeGetFromName).
Dump files have headers that include a version number for the format
of the dump and a date.
When you open a dump with shDumpOpen the header is written
or checked, as appropriate. Once the file is opened, the file pointer
is left just after the header information.
There are two main problems in writing dump files such as these: machine
independence and pointers. The first is solved reasonably easily; all
ints are written as longs, and all integral types are written in network
byte order (which is the same as that on a sun or sgi).
Pointers are much more of a problem. The same program may well use
different addresses for the same objects after a trivial change to the
source, and certainly the addresses written from a hardworked pipeline
will be different from those suitable for reading into a newly-started
one. The solution that I have adopted is to map each pointer to a
unique id number (of type IPTR, typically long), and to
write the id instead of the address. Each object is preceded by its
type (the enum TYPE) and its id. By keeping tabs on what has
been written, along with the type, it is possible to write all
referenced pointers to disk (these are the anonymous structures
referred to in the introduction).
Reading a dump is a little more complex. As we come to
each data object, we first read the type, and then its original id.
Knowing its type we can allocate space for it, and store the pair
(address, id) in a safe place, currently an AVL tree. As each pointer
is read its id is looked up in this tree, and if it has already been
seen it's replaced by the proper address and all is well. If it hasn't
been seen, we store the address of the desired pointer
along with the id. When we've read the entire file (or more precisely,
when shDumpClose is called) we go through this list and make
another attempt to find the correct address; if we find it we can
insert it into the proper place (that was why we stored the pointer's
address). If this goes well we are almost done; all that remains is to
deal with row pointers in submasks and subregions (those that we were
unable to process as we read the file), and return SH_SUCCESS.