sds/FITS Interface Description
SOI TN 97-137
R S Bogart
1998.01.14
Introduction
This document specifies the interface between sds, the internal
structure for representation and manipulation of data sets commonly used
by analysis procedures in the SOI programming environment, and FITS,
the primary format in which data are maintained for storage and distribution.
The document describes the goals, not the reality; not all the features
described have been implemented or implemented correctly. Elements of the
interface yet to be successfully implemented are noted.
sds
sds is a data structure strictly internal to those programs making
use of it for data manipulation. It is intended to provide, inter alia,
a mechanism for handling data independent of their external representation
by isolating input and output functions specific to thse representations.
sds can be used to easily translate data formats from one representation
or organizational structure to another.
The design of the sds structure was strongly guided by the
FITS format described below. Nevertheless, the
sds structure is independent and more extensive than the simple
representation of a FITS file, so the interface is not trivial.
Data Representations Supported by sds
Although designed to be representation-independent, sds presently has
only only one well-supported representation: FITS. There is
very limited support in the form of a few functions for I/O using GIF,
rdb files (ASCII tables), binary files, special formats
representing MDI telemetry streams, and a special format
designed to represent the full sds structure. Other representations
contemplated but not yet implemented are CDF and TIFF.
In any case, FITS is and
will remain for the foreseeable future the principal format for data storage
and distribution at the SSSC. Careful definition and adherence to the I/O
interface for FITS is thus essential.
FITS
FITS, the Flexible Image Transport System,
is a standard for data representation and transport in widespread use by the
astronomical and particularly the Solar Physics research community.
Information about FITS may be found at NRAO
(http://fits.cv.nrao.edu) and
at the FITS support office at NASA- Goddard
(http://fits.gsfc.nasa.gov;
earlier version at
http://ssdoo.gsfc.nasa.gov/astro/fits/fits_home.html).
sds Data Types and Type Conversions
sds Data Types
The following sds data types are defined (C-language type
correspondences and default fill values are given when appropriate):
- SDS_VOID
- SDS_BYTE - signed char : -256
- SDS_UBYTE - unsigned char : 511
- SDS_SHORT - signed short : -32768
- SDS_USHORT - unsigned short : 65535
- SDS_INT - signed int : -2147483648
- SDS_UINT - unsigned int : 4294967295
- SDS_LONG - signed long : -4611686018427387904
- SDS_ULONG - unsigned long : 9223372036854775807
- SDS_FLOAT - float : fNaN
- SDS_DOUBLE - double : dNaN
- SDS_COMPLEX
- SDS_STRING - char *
- SDS_TIME - double : 0.0
- SDS_LOGICAL
- SDS_ANY
- SDS_ASIS [NEW]
Most of the types are clearly intended to correspond to C-language types
for internal representation, and the rules for data types, sizes, operations,
and conversions appropriate to the C-language implementation in use govern
the data elements of the array sds.data. For example, if
(sds_datatype (SDS *sds) == SDS_LONG), then sds_numbytes (SDS *sds) will be
numerically equal to sizeof (long) on the machine architecture in use.
Not all of these types actually correspond to data storage. Types SDS_VOID
SDS_ANY, and SDS_RAW are for use as arguments or status: for example, the
data function sds_read() has an argument specifying the type of
conversion to be performed implicitly on input; SDS_ANY and SDS_RAW are
used to specify retention of the native type in senses described below.
Similarly, newly created sds
structs may have their datatype set to SDS_VOID. SDS_TIME data are
equivalent to SDS_DOUBLE data, but the type value is to be used to
signal different behaviour by routines that convert numeric data
from or into character strings. SDS_STRING, SDS_COMPLEX,
and SDS_LOGICAL types exist for possible future implementation, but are not
now supported.
The sds struct includes a member (void *)fillvalue; if
this pointer is non-null, its contents are interpreted as a numeric value of
the same type as the data to be used to represent non-numeric (e.g. missing)
data. Default fill values exist for the major data types. For
floating-point types, the default fill values are fNaN = 0x7fff0000 and
dNaN = 0x7fff000000000000, obtained by calls to A_Quiet_fNaN() and
A_Quiet_dNaN(), respectively.
Internal Type Conversion Rules
Conversions of data type within a program are implemented by the function
sds_data_convert(); use of the original function for this purpose,
sds_convert(), is discouraged, and at some point is to be at least
temporarily disabled for purposes of migration. sds_data_convert()
converts data according to the following rules:
- If the fillvalue is set and equal to the default fill value
for the old data type, the new fillvalue will be set to the default
fill value for the new data type, and all data "values" equal to the
old fill value will be set to the new fill value.
- If the fillvalue is set to a different value from the default
fill value for the old data type, the new fillvalue will be set to
the old fillvalue provided it is in the representable range of the
new data type; otherwise the new fillvalue will be set to the default
fill value for the new data type; in either case all data "values"
equal to the old fill value will be set to the new fill value. In
no case will the fillvalue for a floating point type be set to a
numeric value, however.
- If no fillvalue is set and any data values are outside the range
representable by the new data type, the new fillvalue is set to the
default fill value for the new data type; NaN's are by definition
outside the range representable by any data type.
- All values outside the range representable by the new data type
will be set to the new fillvalue.
- All remaining values will be converted to the values nearest them
in the new data type: fixed point-data are converted to floating-point
data by the assignment conversions of the C language; floating-point
data are converted to fixed-point data using a nearest integer function.
The direction of rounding of half-integer values is unspecified: the
behaviour is implementation-dependent.
- If the datatype is floating point, sds->fillval is always assumed to
be NULL. Only NaN's may be used to represent invalid floating-point data.
Single precision floating-point values of Infinity are converted to
double-precision Infinity. Double precision floating-point values of
Infinity are considered out of range when converting to single-precision
and are thus converted to NaN.
Note that these conversion rules involve no rescaling or biasing of
the original data values. In certain conversions it may be necessary
to first pre-scale or bias the data to avoid loss of precision: examples
are conversion of INTs to FLOATs when the values exceed 2**24, and
conversions of floating-point to fixed-point when the data range
is less than or of the order of unity. In these cases it is the
responsibility of the user to perform appropriate re-scaling prior to
data conversion.
A few examples of code fragments should suffice to illustrate the
implementation of the above rules.
- conversion of SHORT to FLOAT:
unsigned int n = sds_data_length (sds);
float new_fillval = A_Quiet_fNaN ();
signed short *old = (signed short *)sds_data (sds);
float *new = (float *)malloc (n * sizeof (float));
if (sds->fillvalue) {
signed short old_fillval = *(short *)sds->fillvalue;
while (n--) {
if (*old == old_fillval)
*new++ = new_fillval;
else
*new++ = *old;
old++;
}
} else {
while (n--)
*new++ = *old++;
}
sds->fillvalue = NULL;
- conversion of FLOAT to USHORT:
unsigned int n = sds_data_length (sds);
unsigned short new_fillval = 65535;
unsigned int out_of_range = 0;
float *old = (float *)sds_data (sds);
unsigned short *new = (unsigned short *)malloc (n * sizeof (short));
while (n--) {
if (isnan (*old)) {
*new++ = new_fillval;
out_of_range++;
} else if ((*old < -0.5) || (*old >= 65535.5)) {
*new++ = new_fillval;
out_of_range++;
} else
*new++ = *old + 0.5;
old++;
}
sds->fillvalue = (out_of_range) ? &new_fillval: NULL;
- conversion of SHORT to BYTE:
unsigned int n = sds_data_length (sds);
signed char new_fillval = -128;
unsigned int out_of_range = 0;
signed short *old = (signed short *)sds_data (sds);
signed char *new = (signed char *)malloc (n * sizeof (char));
if (sds->fillvalue) {
out_of_range = 1;
signed short old_fillval = *(short *)sds->fillvalue;
if (abs (old_fillval) < 128)
new_fillval = old_fillval;
while (n--) {
if ((*old == old_fillval) || (abs (*old) > 127))
*new++ = new_fillval;
else
*new++ = *old;
old++;
}
} else {
while (n--) {
if (abs (*old) > 127) {
*new++ = new_fillval;
out_of_range++;
} else
*new++ = *old;
old++;
}
}
sds->fillvalue = (out_of_range) ? &new_fillval: NULL;
FITS Representations
The FITS standard recognizes the following machine-independent
numeric data types, with reference to the ANSI/IEEE-754 standard for
numeric data representation:
- 8-bit unsigned fixed-point (BITPIX = 8)
- 16-bit twos-complement fixed-point (BITPIX = 16)
- 32-bit twos-complement fixed-point (BITPIX = 32)
- 32-bit IEEE floating-point (BITPIX = -32)
- 64-bit IEEE double-precision floating-point (BITPIX = -64)
Which of these data types applies in a particular FITS file is
governed exclusively by the value of the reserved keyword BITPIX.
All bit combinations in data are supported, including all IEEE special values
of NaN, Infinity, etc. For fixed-point data types only, one particular
bit-pattern corresponding to the numeric value specified by the keyword
BLANK may be reserved for non-numeric values equivalent to NaN
to specify missing or suppressed data.
The FITS reserved keywords BSCALE and BZERO may be
used to specify scaling of the data to physical (``brightness'') units
when data compression is required or for other purposes. The FITS
standard explicitly discourages their use with floating-point data types,
for which they are redundant. Likewise reserved keyword pairs
PSCALn and PZEROn and TSCALn and TZEROn
may be used similary in data for Random Groups and ASCII Tables, respectively.
The FITS standard formally recognizes two data types corresponding
to a BITPIX value of 8: Character and Unsigned eight-bit integers.
The description of Character data requires that the high-order bit be 0;
however, no mechanism is provided for distinguishing between the two data
types.
FITS Input & Output
Attributes and Headers
With the exception of certain reserved FITS keywords, there is a
general correspondence between sds attributes and FITS header
records. In both cases only upper-case alphabetic characters are allowed,
but there is no restriction on the use of other characters nor on the
length of sds attributes key-names. Key-names longer than eight
characters that are identical up to the first eight characters are unique
in the sds, and result on output in multiple FITS header
records with the same keyword. Such records cannot be distinguished
on input.
Data
In converting between FITS data representations and internal storage
formats it is assumed that (a) the machine supports fixed-point data types
of length 8, 16, and 32 bits; and (b) the machine supports two distinct levels
of precision in floating-point numbers. If the machine does not use IEEE
representation of floating-point numbers, then it will not necessarily
be possible to preserve data precision when reading from and writing to
FITS files; but the sds library has not been implemented on any
non-IEEE architectures, so the ramifications of this problem have not
been addressed.
The general rules for correspondence between FITS data-representation
types and sds data-storage types on input/output are as follows in
the absence of data scaling:
- BITPIX = 8 <=> sds_datatype = UBYTE
- BITPIX = 16 <=> sds_datatype = SHORT
- BITPIX = 32 <=> sds_datatype = INT
- BITPIX = -32 <=> sds_datatype = FLOAT
- BITPIX = -64 <=> sds_datatype = DOUBLE
(This assumes that sizeof(int) = 2*sizeof(short) = 4*sizeof(char). If
the C-language implementation assigns different sizes then the above
equivalences are to be correspondingly modified.) If the data are scaled,
then:
- BITPIX = 8 => sds_datatype = FLOAT
- BITPIX = 16 => sds_datatype = FLOAT
- BITPIX = 32 => sds_datatype = DOUBLE
Since the number of recognized sds data types is larger than
the number of supported FITS data types, the following
conversions are to be made on output only:
- sds_datatype = BYTE => BITPIX = 16
- sds_datatype = USHORT => BITPIX = 32
- sds_datatype = UINT => BITPIX = -64
- sds_datatype = ULONG => BITPIX = -64
- sds_datatype = TIME => BITPIX = -64
The following extension is supported, but not encouraged until
recognized by the FITS standard, provided that sizeof(long) = 2*sizeof(int):
- sds_datatype = LONG <=> BITPIX = 64
No other sds datatypes are supported by FITS I/O. Their
use must remain internal to the programs using them or be confined to
other external representations.
The sds structure contains an integer member scale_on_write
which is used to preserve or modify information about scaling of
floating-point to fixed-point values on output, and double-precision
members bscale and bzero used in conjunction with
sds.scale_on_write. The value of sds.scale_on_write is
numerically equal to the value of BITPIX in the output FITS
file; a zero-value of sds.scale_on_write implies
that the data are not to be scaled from floating-point to fixed point.
The values of sds.bscale and sds.bzero correspond to the
values of BSCALE and BZERO (or their equivalents) in the
output FITS file. Default values of sds.bscale and
sds.bzero are 1.0 and 0.0, respectively.
Pre-scaling of data to accomodate the range and precision of the output
data type is accomplished by the function sds_scale_data(). This
function sets the desired values of sds.scale_on_write,
sds.bscale, and sds.bzero, and adjusts the data (and the
fill value as necessary) accordingly. The rule for interpretation of
the scaling parameters in a FITS file is:
Physical Value = BZERO + BSCALE * Array Value
Hence, pre-scaling adjusts the values by first subtracting the desired
sds.bzero from the original valid value and then dividing by the
desired sds.bscale. Invalid values are unchanged. In the case
of pre-scaling floats this is not an issue, as the invalid value must
be a NaN. When pre-scaling fixed-point numbers, for conversion between
signed and unsigned types, for example, or when the values are too large
to retain proper precision when converting to floating-point, due caution
must be exercised that (a) values equal to the fill value are not rescaled,
and (b) the values resulting from re-scaling do not convert valid data
to invalid data. Pre-scaling does not affect the fill value.
The function sds_scale_data() is used to actually re-scale the
data to change their range. When it is only desired to set the scaling
parameters for output without affecting the data, then the function
sds_set_scaling() should be used. This function only modifies the
values of sds.scale_on_write, sds.bscale, and sds.bzero
without changing the data values. For example, if the data are known to
contain a positive bias of +2.512 from the physical value, setting
sds.bzero to -2.512 and sds.scale_on_write to a non-zero
value will assure that when the data are written to a FITS file
they will be interpreted correctly on subsequent reads.
Any conversions which take place in conjunction with writing FITS
files must be private to the functions implementing the write; the sds
should be returned unchanged.
Reading FITS Files
Consistent with the above guidelines, the following rules are implemented
for reading FITS files into sds data structures (``scaling''
means that either BSCALE or BZERO, or their appropriate
equivalents, is present in the FITS header):
- BITPIX = 8:
- Scaling:
- sds_datatype = FLOAT; sds.scale_on_write = 8
- No Scaling:
- sds_datatype = UBYTE; sds.scale_on_write = 0
- BITPIX = 16:
- Scaling:
- sds_datatype = FLOAT; sds.scale_on_write = 16
- No Scaling:
- sds_datatype = SHORT; sds.scale_on_write = 0
- BITPIX = 32:
- Scaling:
- sds_datatype = DOUBLE; sds.scale_on_write = 32
- No Scaling:
- sds_datatype = INT; sds.scale_on_write = 0
- BITPIX = 64:
- Scaling*:
- sds_datatype = DOUBLE; sds.scale_on_write = 64
- No Scaling:
- sds_datatype = LONG; sds.scale_on_write = 0
- BITPIX = -32:
- sds_datatype = FLOAT; sds.scale_on_write = 0
- BITPIX = -64:
- sds_datatype = DOUBLE; sds.scale_on_write = 0
*This case should be avoided.
Whenever sds.scale_on_write is non-zero, the sds.bscale and
sds.bzero members will be set to the corresponding values of the
appropriate FITS keywords if present, or to their defaults otherwise;
if sds.scale_on_write is zero they will be unset. The FITS
reader sets the sds.scale_on_write and other parameters via a call
to the sds_set_scaling() function; data scaling is performed directly
on input as indicated, according to the FITS rule, with (unscaled)
fill values being converted to NaN's.
The values of BITPIX, together with BSCALE and/or BZERO
if present, are used to set the sds members sds.datatype,
sds.scale_on_write, sds.bscale, and sds.bzero
according to the rules above. sds.rank is set to the value of
NAXIS and the vector sds.length to the set NAXISn.
The value of BLANK is either propagated into sds.fillvalue
or used to filter values prior to floating-point conversion with scaling.
All comments associated with the records for any of these keywords are
ignored. Comments on other records are propagated in the comment member
of the corresponding attribute.
Strings in all COMMENT and HISTORY records should be
concatenated into appropriate sds members; however, they are
currently treated like all other keywords, corresponding to the
sds.attributes members, so there is no checking for multiple
occurrences of other keywords.
The rules outlined here describe the default behaviour of the function
implementing the sds FITS reader. In normal use it is often desired
to specify the datatype of the sds into which the data will be placed.
This is achieve by an internal data type conversion as necessary. If the
requested data type is SDS_ANY, then the returned data type will
obey the above rules. If the requested data type is SDS_RAW, then
the data type will be converted to that appropriate to the value of
BITPIX if scaling were not in effect, while actually setting the
appropriate scaling parameters in the sds. In order to tightly
bind the scaling to the read function, it is recommended that this be
achieved by a secondary conversion. This would cause erroneous values
to be propagated only in cases in which scaling is in use, the BLANK
value is something other than its default value for the data type, and
some valid values attain the default value for fill. For example, a
FITS file with BITPIX = 16, BSCALE = 1.0, and
BLANK = 32767 when read as SDS_RAW would wind up having
all values of -32768 marked as missing in addition to those entries
originally marked as missing. This would of course also happen if the
data were explicitly converted to SDS_SHORT on input or any time
thereafter. On the other hand, if the same FITS file were read
in as SDS_ANY and subsequently written according to the rules below,
the two FITS files should be bit-for-bit identical (with the possible
exception of write time stamps in the headers) despite the internal
conversions.
Writing FITS Files
The following rules are implemented for writing FITS files from
sds data structures:
- sds_datatype = BYTE:
- BITPIX = 16; BSCALE and BZERO unset
- sds_datatype = UBYTE:
- BITPIX = 8; BSCALE and BZERO unset
- sds_datatype = SHORT:
- BITPIX = 16; BSCALE and BZERO unset
- sds_datatype = USHORT:
- BITPIX = 32; BSCALE and BZERO unset
- sds_datatype = INT:
- BITPIX = 32; BSCALE and BZERO unset
- sds_datatype = UINT:
- BITPIX = -64; BSCALE and BZERO unset
- sds_datatype = LONG*:
- BITPIX = 64; BSCALE and BZERO unset
- sds_datatype = ULONG*:
- BITPIX = -64; BSCALE and BZERO unset
- sds_datatype = FLOAT:
- sds.scale_on_write = 0:
- BITPIX = -32; BSCALE and BZERO unset
- sds.scale_on_write = 8:
- BITPIX = 8; BSCALE and BZERO set
- sds.scale_on_write = 16:
- BITPIX = 16; BSCALE and BZERO set
- sds.scale_on_write = 32*:
- BITPIX = 32; BSCALE and BZERO set
- sds_datatype = DOUBLE:
- sds.scale_on_write = 0:
- BITPIX = -64; BSCALE and BZERO unset
- sds.scale_on_write = 8:
- BITPIX = 8; BSCALE and BZERO set
- sds.scale_on_write = 16:
- BITPIX = 16; BSCALE and BZERO set
- sds.scale_on_write = 32:
- BITPIX = 32; BSCALE and BZERO set
- sds_datatype = TIME:
- BITPIX = -64; BSCALE and BZERO unset
- other:
*This case should be avoided.
Required data conversions from floating-point to fixed-point type are
accomplished by the function sds_data_convert() immediately prior
to the write. Previous setting of the sds.scale_on_write and
other parameters is sufficient to trigger this conversion on write.
The appropriate FITS keywords are written into the FITS
header, as outlined above. If sds.scale_on_write = 0 or the
sds data type is floating-point, no BSCALE nor BZERO
record is created. No BLANK record is created if the sds
data type is floating-point, in accord with the FITS standard,
nor is one created if sds.fillvalue is NULL. A comment providing
the current compiled SOI software version number is included in the
SIMPLE record.
Appendix I: Special FITS Keywords
The following FITS keywords are reserved for special use by the
FITS standard and do not correspond to any sds attributes,
although some do correspond to other elements of a sds:
- SIMPLE
- BITPIX
- NAXIS
- NAXISn
- BSCALE
- BZERO
- BLANK
- COMMENT
- HISTORY
- END
Appendix II: Functions for FITS I/O
See the man pages for individual descriptions.
- sds_read_fits()
- read an entire FITS file into a sds
- sds_get_one_fits()
- read an entire FITS file into a sds
- sds_get_fits_header()
- read a FITS header into a sds
- sds_read_FITS_header()
- read a FITS header into a sds
- sds_get_fits_data()
- sds_write_fits()
- write a sds into a FITS file pointer; as of release
2.8, this function conforms with the specifications herein.
- sds_put_fits() [Does not yet exist]
- write a sds into the named FITS file; conforms with the
specifications herein.
- sds_put_one_fits()
- write a sds into the named FITS file; does not conform
with these specification; to be retained for compatability only and not
as a supported library function.
- sds_put_fits_header()
- write a FITS file header from an sds; does not conform
with these specification; to be removed as supported library function as
of release 2.8 (with functionality included privately in module
track_region).
- sds_put_fits_data()
- write a FITS file data section from an sds; does not
conform with these specification; to be removed as supported library
function as of release 2.8.
Migration Notes
Currently (as of release 2.8) it is easy to verify (through references to
the special strings "SIMPLE" and "BITPIX") that all I/O associated with
FITS headers in controlled software occurs exclusively through the
functions libsds.d/sds_rfits.c/sds_get_fits_header() and
libsds.d/sds_wfits.c/sds_put_fits_header(). The corresponding
functions sds_get_fits_data() and sds_put_fits_data()
can be and sometimes are bypassed, but this is rare; as far as I know,
only module track_region writes directly into a FITS file.
sds_get_fits_header() is called in:
- libsds.d/sds_rfits.c/sds_get_one_fits()*
- libsds.d/sds_rfits.c/sds_get_fits_data()
- libsds.d/sds_rfits.c/sds_read_FITS_header()
- libsds.d/sds_fits.c/sds_read_fits()*
- libsds.d/sds_slice.c/sds_slice_file()
- libids.d/ids_series.c/ids_get_series_wrapper()*
- libvds.d/vds_open.c/find_vdsinfo()
- modules/gongimport - does not read data section
- modules/spole_import - does not read data section
(functions marked with an asterisk also call sds_get_fits_data()).
The equivalent front-end function sds_read_FITS_header() is called in:
- libids.d/ids_series.c/ids_get_FITS_series_wrapper()*
- libvds.d/vds_open.c/vds_open()
- pipe/ovr2name - does not read data section
- pipe/pe - does not read data section
sds_put_fits_header() is called in:
- libsds.d/sds_wfits.c/sds_put_one_fits()*
- libsds.d/sds_fits.c/sds_write_fits()*
- modules/track_region
The functions marked with an asterisk also call sds_put_fits_data().
sds_put_fits_data() is not called by any other functions or programs.
If the function sds_write_fits() were moved into the same file as
sds_put_one_fits(), sds_put_fits_header() and
sds_put_fits_data() could be made private and full control exercised
over scaling and conversion for virtually every program that writes FITS
files, since all except track_region do so via direct or indirect
calls to sds_write_fits() or sds_put_one_fits().
sds_write_fits() is referenced only by
libsds.d/sds_out.c/sds_write(), which in turn is used only in an
example module (dtr). It would thus be safe to change
sds_write_fits() to conform to this standard, but calls to
sds_put_one_fits() need to be eventually replaced by calls to
a replacement function for FITS output to conform.
Appendix III: Functions for sds data conversion and scaling
See the man pages for individual descriptions.
- sds_data_convert()
- convert the data in sds->data to a new type according to the
specifications described herein.
- sds_scale_data()
- apply a linear scaling to the data values in sds->data.
- sds_set_scaling()
- set the desired output scaling parameters sds->scale_on_write,
sds->bscale, and sds->bzero.
- sds_set_fillval()
- set the bit pattern to be used to represent missing or invalid data
- sds_set_fillvalue()
- superseded by sds_set_fillval(); for compatibility only
- sds_set_datatype()
- set the data type of the sds