sds/FITS Interface Description

SOI TN 97-137
R S Bogart
1998.01.14

Introduction

This document specifies the interface between sds, the internal structure for representation and manipulation of data sets commonly used by analysis procedures in the SOI programming environment, and FITS, the primary format in which data are maintained for storage and distribution. The document describes the goals, not the reality; not all the features described have been implemented or implemented correctly. Elements of the interface yet to be successfully implemented are noted.

sds

sds is a data structure strictly internal to those programs making use of it for data manipulation. It is intended to provide, inter alia, a mechanism for handling data independent of their external representation by isolating input and output functions specific to thse representations. sds can be used to easily translate data formats from one representation or organizational structure to another.

The design of the sds structure was strongly guided by the FITS format described below. Nevertheless, the sds structure is independent and more extensive than the simple representation of a FITS file, so the interface is not trivial.

Data Representations Supported by sds

Although designed to be representation-independent, sds presently has only only one well-supported representation: FITS. There is very limited support in the form of a few functions for I/O using GIF, rdb files (ASCII tables), binary files, special formats representing MDI telemetry streams, and a special format designed to represent the full sds structure. Other representations contemplated but not yet implemented are CDF and TIFF. In any case, FITS is and will remain for the foreseeable future the principal format for data storage and distribution at the SSSC. Careful definition and adherence to the I/O interface for FITS is thus essential.

FITS

FITS, the Flexible Image Transport System, is a standard for data representation and transport in widespread use by the astronomical and particularly the Solar Physics research community. Information about FITS may be found at NRAO (http://fits.cv.nrao.edu) and at the FITS support office at NASA- Goddard (http://fits.gsfc.nasa.gov; earlier version at http://ssdoo.gsfc.nasa.gov/astro/fits/fits_home.html).

sds Data Types and Type Conversions

sds Data Types

The following sds data types are defined (C-language type correspondences and default fill values are given when appropriate):

SDS_VOID
SDS_BYTE - signed char : -256
SDS_UBYTE - unsigned char : 511
SDS_SHORT - signed short : -32768
SDS_USHORT - unsigned short : 65535
SDS_INT - signed int : -2147483648
SDS_UINT - unsigned int : 4294967295
SDS_LONG - signed long : -4611686018427387904
SDS_ULONG - unsigned long : 9223372036854775807
SDS_FLOAT - float : fNaN
SDS_DOUBLE - double : dNaN
SDS_COMPLEX
SDS_STRING - char *
SDS_TIME - double : 0.0
SDS_LOGICAL
SDS_ANY
SDS_ASIS [NEW]

Most of the types are clearly intended to correspond to C-language types for internal representation, and the rules for data types, sizes, operations, and conversions appropriate to the C-language implementation in use govern the data elements of the array sds.data. For example, if (sds_datatype (SDS *sds) == SDS_LONG), then sds_numbytes (SDS *sds) will be numerically equal to sizeof (long) on the machine architecture in use. Not all of these types actually correspond to data storage. Types SDS_VOID SDS_ANY, and SDS_RAW are for use as arguments or status: for example, the data function sds_read() has an argument specifying the type of conversion to be performed implicitly on input; SDS_ANY and SDS_RAW are used to specify retention of the native type in senses described below. Similarly, newly created sds structs may have their datatype set to SDS_VOID. SDS_TIME data are equivalent to SDS_DOUBLE data, but the type value is to be used to signal different behaviour by routines that convert numeric data from or into character strings. SDS_STRING, SDS_COMPLEX, and SDS_LOGICAL types exist for possible future implementation, but are not now supported.

The sds struct includes a member (void *)fillvalue; if this pointer is non-null, its contents are interpreted as a numeric value of the same type as the data to be used to represent non-numeric (e.g. missing) data. Default fill values exist for the major data types. For floating-point types, the default fill values are fNaN = 0x7fff0000 and dNaN = 0x7fff000000000000, obtained by calls to A_Quiet_fNaN() and A_Quiet_dNaN(), respectively.

Internal Type Conversion Rules

Conversions of data type within a program are implemented by the function sds_data_convert(); use of the original function for this purpose, sds_convert(), is discouraged, and at some point is to be at least temporarily disabled for purposes of migration. sds_data_convert() converts data according to the following rules:

If the fillvalue is set and equal to the default fill value for the old data type, the new fillvalue will be set to the default fill value for the new data type, and all data "values" equal to the old fill value will be set to the new fill value.
If the fillvalue is set to a different value from the default fill value for the old data type, the new fillvalue will be set to the old fillvalue provided it is in the representable range of the new data type; otherwise the new fillvalue will be set to the default fill value for the new data type; in either case all data "values" equal to the old fill value will be set to the new fill value. In no case will the fillvalue for a floating point type be set to a numeric value, however.
If no fillvalue is set and any data values are outside the range representable by the new data type, the new fillvalue is set to the default fill value for the new data type; NaN's are by definition outside the range representable by any data type.
All values outside the range representable by the new data type will be set to the new fillvalue.
All remaining values will be converted to the values nearest them in the new data type: fixed point-data are converted to floating-point data by the assignment conversions of the C language; floating-point data are converted to fixed-point data using a nearest integer function. The direction of rounding of half-integer values is unspecified: the behaviour is implementation-dependent.
If the datatype is floating point, sds->fillval is always assumed to be NULL. Only NaN's may be used to represent invalid floating-point data. Single precision floating-point values of Infinity are converted to double-precision Infinity. Double precision floating-point values of Infinity are considered out of range when converting to single-precision and are thus converted to NaN.

Note that these conversion rules involve no rescaling or biasing of the original data values. In certain conversions it may be necessary to first pre-scale or bias the data to avoid loss of precision: examples are conversion of INTs to FLOATs when the values exceed 2**24, and conversions of floating-point to fixed-point when the data range is less than or of the order of unity. In these cases it is the responsibility of the user to perform appropriate re-scaling prior to data conversion.

A few examples of code fragments should suffice to illustrate the implementation of the above rules.

conversion of SHORT to FLOAT:

unsigned int n = sds_data_length (sds);
float new_fillval = A_Quiet_fNaN ();
signed short *old = (signed short *)sds_data (sds);
float *new = (float *)malloc (n * sizeof (float));

if (sds->fillvalue) {
  signed short old_fillval = *(short *)sds->fillvalue;
  while (n--) {
    if (*old == old_fillval)
      *new++ = new_fillval;
    else
      *new++ = *old;
    old++;
  }
} else {
  while (n--)
    *new++ = *old++;
}
sds->fillvalue = NULL;

conversion of FLOAT to USHORT:

unsigned int n = sds_data_length (sds);
unsigned short new_fillval = 65535;
unsigned int out_of_range = 0;
float *old = (float *)sds_data (sds);
unsigned short *new = (unsigned short *)malloc (n * sizeof (short));

while (n--) {
  if (isnan (*old)) {
    *new++ = new_fillval;
    out_of_range++;
  } else if ((*old < -0.5) || (*old >= 65535.5)) {
    *new++ = new_fillval;
    out_of_range++;
  } else
    *new++ = *old + 0.5;
  old++;
}
sds->fillvalue = (out_of_range) ? &new_fillval: NULL;

conversion of SHORT to BYTE:

unsigned int n = sds_data_length (sds);
signed char new_fillval = -128;
unsigned int out_of_range = 0;
signed short *old = (signed short *)sds_data (sds);
signed char *new = (signed char *)malloc (n * sizeof (char));

if (sds->fillvalue) {
  out_of_range = 1;
  signed short old_fillval = *(short *)sds->fillvalue;
  if (abs (old_fillval) < 128)
    new_fillval = old_fillval;
  while (n--) {
    if ((*old == old_fillval) || (abs (*old) > 127))
      *new++ = new_fillval;
    else
      *new++ = *old;
    old++;
  }
} else {
  while (n--) {
    if (abs (*old) > 127) {
      *new++ = new_fillval;
      out_of_range++;
    } else
      *new++ = *old;
    old++;
  }
}
sds->fillvalue = (out_of_range) ? &new_fillval: NULL;

FITS Representations

The FITS standard recognizes the following machine-independent numeric data types, with reference to the ANSI/IEEE-754 standard for numeric data representation:

8-bit unsigned fixed-point (BITPIX = 8)
16-bit twos-complement fixed-point (BITPIX = 16)
32-bit twos-complement fixed-point (BITPIX = 32)
32-bit IEEE floating-point (BITPIX = -32)
64-bit IEEE double-precision floating-point (BITPIX = -64)

Which of these data types applies in a particular FITS file is governed exclusively by the value of the reserved keyword BITPIX. All bit combinations in data are supported, including all IEEE special values of NaN, Infinity, etc. For fixed-point data types only, one particular bit-pattern corresponding to the numeric value specified by the keyword BLANK may be reserved for non-numeric values equivalent to NaN to specify missing or suppressed data.

The FITS reserved keywords BSCALE and BZERO may be used to specify scaling of the data to physical (``brightness'') units when data compression is required or for other purposes. The FITS standard explicitly discourages their use with floating-point data types, for which they are redundant. Likewise reserved keyword pairs PSCALn and PZEROn and TSCALn and TZEROn may be used similary in data for Random Groups and ASCII Tables, respectively.

The FITS standard formally recognizes two data types corresponding to a BITPIX value of 8: Character and Unsigned eight-bit integers. The description of Character data requires that the high-order bit be 0; however, no mechanism is provided for distinguishing between the two data types.

FITS Input & Output

Attributes and Headers

With the exception of certain reserved FITS keywords, there is a general correspondence between sds attributes and FITS header records. In both cases only upper-case alphabetic characters are allowed, but there is no restriction on the use of other characters nor on the length of sds attributes key-names. Key-names longer than eight characters that are identical up to the first eight characters are unique in the sds, and result on output in multiple FITS header records with the same keyword. Such records cannot be distinguished on input.

Data

In converting between FITS data representations and internal storage formats it is assumed that (a) the machine supports fixed-point data types of length 8, 16, and 32 bits; and (b) the machine supports two distinct levels of precision in floating-point numbers. If the machine does not use IEEE representation of floating-point numbers, then it will not necessarily be possible to preserve data precision when reading from and writing to FITS files; but the sds library has not been implemented on any non-IEEE architectures, so the ramifications of this problem have not been addressed.

The general rules for correspondence between FITS data-representation types and sds data-storage types on input/output are as follows in the absence of data scaling:

BITPIX = 8 <=> sds_datatype = UBYTE
BITPIX = 16 <=> sds_datatype = SHORT
BITPIX = 32 <=> sds_datatype = INT
BITPIX = -32 <=> sds_datatype = FLOAT
BITPIX = -64 <=> sds_datatype = DOUBLE

(This assumes that sizeof(int) = 2*sizeof(short) = 4*sizeof(char). If the C-language implementation assigns different sizes then the above equivalences are to be correspondingly modified.) If the data are scaled, then:

BITPIX = 8 => sds_datatype = FLOAT
BITPIX = 16 => sds_datatype = FLOAT
BITPIX = 32 => sds_datatype = DOUBLE

Since the number of recognized sds data types is larger than the number of supported FITS data types, the following conversions are to be made on output only:

sds_datatype = BYTE => BITPIX = 16
sds_datatype = USHORT => BITPIX = 32
sds_datatype = UINT => BITPIX = -64
sds_datatype = ULONG => BITPIX = -64
sds_datatype = TIME => BITPIX = -64

The following extension is supported, but not encouraged until recognized by the FITS standard, provided that sizeof(long) = 2*sizeof(int):

sds_datatype = LONG <=> BITPIX = 64

No other sds datatypes are supported by FITS I/O. Their use must remain internal to the programs using them or be confined to other external representations.

The sds structure contains an integer member scale_on_write which is used to preserve or modify information about scaling of floating-point to fixed-point values on output, and double-precision members bscale and bzero used in conjunction with sds.scale_on_write. The value of sds.scale_on_write is numerically equal to the value of BITPIX in the output FITS file; a zero-value of sds.scale_on_write implies that the data are not to be scaled from floating-point to fixed point. The values of sds.bscale and sds.bzero correspond to the values of BSCALE and BZERO (or their equivalents) in the output FITS file. Default values of sds.bscale and sds.bzero are 1.0 and 0.0, respectively.

Pre-scaling of data to accomodate the range and precision of the output data type is accomplished by the function sds_scale_data(). This function sets the desired values of sds.scale_on_write, sds.bscale, and sds.bzero, and adjusts the data (and the fill value as necessary) accordingly. The rule for interpretation of the scaling parameters in a FITS file is:

        Physical Value = BZERO + BSCALE * Array Value

Hence, pre-scaling adjusts the values by first subtracting the desired sds.bzero from the original valid value and then dividing by the desired sds.bscale. Invalid values are unchanged. In the case of pre-scaling floats this is not an issue, as the invalid value must be a NaN. When pre-scaling fixed-point numbers, for conversion between signed and unsigned types, for example, or when the values are too large to retain proper precision when converting to floating-point, due caution must be exercised that (a) values equal to the fill value are not rescaled, and (b) the values resulting from re-scaling do not convert valid data to invalid data. Pre-scaling does not affect the fill value.

The function sds_scale_data() is used to actually re-scale the data to change their range. When it is only desired to set the scaling parameters for output without affecting the data, then the function sds_set_scaling() should be used. This function only modifies the values of sds.scale_on_write, sds.bscale, and sds.bzero without changing the data values. For example, if the data are known to contain a positive bias of +2.512 from the physical value, setting sds.bzero to -2.512 and sds.scale_on_write to a non-zero value will assure that when the data are written to a FITS file they will be interpreted correctly on subsequent reads.

Any conversions which take place in conjunction with writing FITS files must be private to the functions implementing the write; the sds should be returned unchanged.

Reading FITS Files

Consistent with the above guidelines, the following rules are implemented for reading FITS files into sds data structures (``scaling'' means that either BSCALE or BZERO, or their appropriate equivalents, is present in the FITS header):

BITPIX = 8:
- Scaling:
  - sds_datatype = FLOAT; sds.scale_on_write = 8
- No Scaling:
  - sds_datatype = UBYTE; sds.scale_on_write = 0
BITPIX = 16:
- Scaling:
  - sds_datatype = FLOAT; sds.scale_on_write = 16
- No Scaling:
  - sds_datatype = SHORT; sds.scale_on_write = 0
BITPIX = 32:
- Scaling:
  - sds_datatype = DOUBLE; sds.scale_on_write = 32
- No Scaling:
  - sds_datatype = INT; sds.scale_on_write = 0
BITPIX = 64:
- Scaling*:
  - sds_datatype = DOUBLE; sds.scale_on_write = 64
- No Scaling:
  - sds_datatype = LONG; sds.scale_on_write = 0
BITPIX = -32:
- sds_datatype = FLOAT; sds.scale_on_write = 0
BITPIX = -64:
- sds_datatype = DOUBLE; sds.scale_on_write = 0

*This case should be avoided.

Whenever sds.scale_on_write is non-zero, the sds.bscale and sds.bzero members will be set to the corresponding values of the appropriate FITS keywords if present, or to their defaults otherwise; if sds.scale_on_write is zero they will be unset. The FITS reader sets the sds.scale_on_write and other parameters via a call to the sds_set_scaling() function; data scaling is performed directly on input as indicated, according to the FITS rule, with (unscaled) fill values being converted to NaN's.

The values of BITPIX, together with BSCALE and/or BZERO if present, are used to set the sds members sds.datatype, sds.scale_on_write, sds.bscale, and sds.bzero according to the rules above. sds.rank is set to the value of NAXIS and the vector sds.length to the set NAXISn. The value of BLANK is either propagated into sds.fillvalue or used to filter values prior to floating-point conversion with scaling. All comments associated with the records for any of these keywords are ignored. Comments on other records are propagated in the comment member of the corresponding attribute. Strings in all COMMENT and HISTORY records should be concatenated into appropriate sds members; however, they are currently treated like all other keywords, corresponding to the sds.attributes members, so there is no checking for multiple occurrences of other keywords.

The rules outlined here describe the default behaviour of the function implementing the sds FITS reader. In normal use it is often desired to specify the datatype of the sds into which the data will be placed. This is achieve by an internal data type conversion as necessary. If the requested data type is SDS_ANY, then the returned data type will obey the above rules. If the requested data type is SDS_RAW, then the data type will be converted to that appropriate to the value of BITPIX if scaling were not in effect, while actually setting the appropriate scaling parameters in the sds. In order to tightly bind the scaling to the read function, it is recommended that this be achieved by a secondary conversion. This would cause erroneous values to be propagated only in cases in which scaling is in use, the BLANK value is something other than its default value for the data type, and some valid values attain the default value for fill. For example, a FITS file with BITPIX = 16, BSCALE = 1.0, and BLANK = 32767 when read as SDS_RAW would wind up having all values of -32768 marked as missing in addition to those entries originally marked as missing. This would of course also happen if the data were explicitly converted to SDS_SHORT on input or any time thereafter. On the other hand, if the same FITS file were read in as SDS_ANY and subsequently written according to the rules below, the two FITS files should be bit-for-bit identical (with the possible exception of write time stamps in the headers) despite the internal conversions.

Writing FITS Files

The following rules are implemented for writing FITS files from sds data structures:

sds_datatype = BYTE:
- BITPIX = 16; BSCALE and BZERO unset
sds_datatype = UBYTE:
- BITPIX = 8; BSCALE and BZERO unset
sds_datatype = SHORT:
- BITPIX = 16; BSCALE and BZERO unset
sds_datatype = USHORT:
- BITPIX = 32; BSCALE and BZERO unset
sds_datatype = INT:
- BITPIX = 32; BSCALE and BZERO unset
sds_datatype = UINT:
- BITPIX = -64; BSCALE and BZERO unset
sds_datatype = LONG*:
- BITPIX = 64; BSCALE and BZERO unset
sds_datatype = ULONG*:
- BITPIX = -64; BSCALE and BZERO unset
sds_datatype = FLOAT:
- sds.scale_on_write = 0:
  - BITPIX = -32; BSCALE and BZERO unset
- sds.scale_on_write = 8:
  - BITPIX = 8; BSCALE and BZERO set
- sds.scale_on_write = 16:
  - BITPIX = 16; BSCALE and BZERO set
- sds.scale_on_write = 32*:
  - BITPIX = 32; BSCALE and BZERO set
sds_datatype = DOUBLE:
- sds.scale_on_write = 0:
  - BITPIX = -64; BSCALE and BZERO unset
- sds.scale_on_write = 8:
  - BITPIX = 8; BSCALE and BZERO set
- sds.scale_on_write = 16:
  - BITPIX = 16; BSCALE and BZERO set
- sds.scale_on_write = 32:
  - BITPIX = 32; BSCALE and BZERO set
sds_datatype = TIME:
- BITPIX = -64; BSCALE and BZERO unset
other:
- not allowed

*This case should be avoided.

Required data conversions from floating-point to fixed-point type are accomplished by the function sds_data_convert() immediately prior to the write. Previous setting of the sds.scale_on_write and other parameters is sufficient to trigger this conversion on write.

The appropriate FITS keywords are written into the FITS header, as outlined above. If sds.scale_on_write = 0 or the sds data type is floating-point, no BSCALE nor BZERO record is created. No BLANK record is created if the sds data type is floating-point, in accord with the FITS standard, nor is one created if sds.fillvalue is NULL. A comment providing the current compiled SOI software version number is included in the SIMPLE record.

Appendix I: Special FITS Keywords

The following FITS keywords are reserved for special use by the FITS standard and do not correspond to any sds attributes, although some do correspond to other elements of a sds:

SIMPLE
BITPIX
NAXIS
NAXISn
BSCALE
BZERO
BLANK
COMMENT
HISTORY
END

Appendix II: Functions for FITS I/O

See the man pages for individual descriptions.

sds_read_fits(): read an entire FITS file into a sds
sds_get_one_fits(): read an entire FITS file into a sds
sds_get_fits_header(): read a FITS header into a sds
sds_read_FITS_header(): read a FITS header into a sds
sds_get_fits_data()
sds_write_fits(): write a sds into a FITS file pointer; as of release 2.8, this function conforms with the specifications herein.
sds_put_fits() [Does not yet exist]: write a sds into the named FITS file; conforms with the specifications herein.
sds_put_one_fits(): write a sds into the named FITS file; does not conform with these specification; to be retained for compatability only and not as a supported library function.
sds_put_fits_header(): write a FITS file header from an sds; does not conform with these specification; to be removed as supported library function as of release 2.8 (with functionality included privately in module track_region).
sds_put_fits_data(): write a FITS file data section from an sds; does not conform with these specification; to be removed as supported library function as of release 2.8.

Migration Notes

Currently (as of release 2.8) it is easy to verify (through references to the special strings "SIMPLE" and "BITPIX") that all I/O associated with FITS headers in controlled software occurs exclusively through the functions libsds.d/sds_rfits.c/sds_get_fits_header() and libsds.d/sds_wfits.c/sds_put_fits_header(). The corresponding functions sds_get_fits_data() and sds_put_fits_data() can be and sometimes are bypassed, but this is rare; as far as I know, only module track_region writes directly into a FITS file.

sds_get_fits_header() is called in:

libsds.d/sds_rfits.c/sds_get_one_fits()*
libsds.d/sds_rfits.c/sds_get_fits_data()
libsds.d/sds_rfits.c/sds_read_FITS_header()
libsds.d/sds_fits.c/sds_read_fits()*
libsds.d/sds_slice.c/sds_slice_file()
libids.d/ids_series.c/ids_get_series_wrapper()*
libvds.d/vds_open.c/find_vdsinfo()
modules/gongimport - does not read data section
modules/spole_import - does not read data section

(functions marked with an asterisk also call sds_get_fits_data()). The equivalent front-end function sds_read_FITS_header() is called in:

libids.d/ids_series.c/ids_get_FITS_series_wrapper()*
libvds.d/vds_open.c/vds_open()
pipe/ovr2name - does not read data section
pipe/pe - does not read data section

sds_put_fits_header() is called in:

libsds.d/sds_wfits.c/sds_put_one_fits()*
libsds.d/sds_fits.c/sds_write_fits()*
modules/track_region

The functions marked with an asterisk also call sds_put_fits_data(). sds_put_fits_data() is not called by any other functions or programs. If the function sds_write_fits() were moved into the same file as sds_put_one_fits(), sds_put_fits_header() and sds_put_fits_data() could be made private and full control exercised over scaling and conversion for virtually every program that writes FITS files, since all except track_region do so via direct or indirect calls to sds_write_fits() or sds_put_one_fits(). sds_write_fits() is referenced only by libsds.d/sds_out.c/sds_write(), which in turn is used only in an example module (dtr). It would thus be safe to change sds_write_fits() to conform to this standard, but calls to sds_put_one_fits() need to be eventually replaced by calls to a replacement function for FITS output to conform.

Appendix III: Functions for sds data conversion and scaling

See the man pages for individual descriptions.

sds_data_convert(): convert the data in sds->data to a new type according to the specifications described herein.
sds_scale_data(): apply a linear scaling to the data values in sds->data.
sds_set_scaling(): set the desired output scaling parameters sds->scale_on_write, sds->bscale, and sds->bzero.
sds_set_fillval(): set the bit pattern to be used to represent missing or invalid data
sds_set_fillvalue(): superseded by sds_set_fillval(); for compatibility only
sds_set_datatype(): set the data type of the sds