SDS Numeric Data Types, Conversion, and Missing Representation. SOI TN 94-115 R S Bogart 1994.05.31 1. Among the various types used by SDS for internal data representation, the following are numeric types designated as follows: BYTE i1 fixed-point 8-bit integers in the range [-128,127] SHORT i2 16-bit integers in the range [-32768,32767] INT i4 32-bit integers in the range [-1073741824,1073741823] UBYTE u1 8-bit integers in the range [0,255] USHORT u2 16-bit integers in the range [0,65535] UINT u4 32-bit integers in the range [0,2147483647] FLOAT r4 floating-point 32-bit IEEE real numbers DOUBLE r8 floating-point 64-bit IEEE real numbers COMPLEX c8 ordered pairs of r4's representing complex numbers In the floating-point formats, the following special IEEE representations for data exist in addition to the representations for numbers: NaN Not a Number Infinity Larger than any real number -Infinity Smaller than any real number (+/- Infinity may only exist for doubles.) In principle the supported data types could be implemented on another architecture, but the rules governing intertype conversions are based on the internal representations described here. These rules must be obeyed in any implementation, even though they might not make sense in another architecture. Conversions between complex numbers and the other numeric types do not in general make sense, so complex numbers will not be discussed further. 2. Associated with a SDS data set there are possibly two attributes: SCALE and OFFSET. If the SDS is multivariate, there may be unique pairs of these attributes for each variable. The SCALE and OFFSET attributes may be set or not. If they are set, they are real (double-precision) numbers representing a linear mapping from integers to real numbers. If one is set and the other is unset, the default value for the unset attribute is used. The default values are 1.0 for SCALE and 0.0 for OFFSET. Setting a value of 0.0 for SCALE or NaN or +/- Infinity for either attribute causes both to be unset. The values of SCALE and OFFSET can be set, unset, or modified by appropriate functions. The SCALE and OFFSET attributes have different meanings depending on whether the associated data variable type is fixed-point or floating-point. If the data are floating-point, then the SCALE and OFFSET attributes are used to set corresponding attributes in an external data format (such as the BSCALE and BZERO keys in FITS) so that the data will be written to the external format as scaled integers, provided that the format supports such a feature; this conversion is automatic. Likewise, on input from an external data format providing scaling for integers, conversion and scaling of the data is automatic. Scaled integers in the external data representation should be thought of as simply a form of media compression of floating point numbers below the access level of the SDS I/O functions. For fixed-point data, the SCALE and OFFSET attributes imply an automatic type conversion to a floating-point representation on output. They evidently cannot be set on input, so they must be set internally. The purpose in setting them is for internal type conversion, as there can be no advantage in externally representing integers as floating-point numbers. Scaling parameters provided for floating-point data in an external data representation cannot be meaningful, as the floating-point number set with +/-Infinity is closed under multiplication and addition, so any floating-point number can be represented directly. (This is not quite true: scaling could be used to represent double-precision numbers, which have a larger range in the IEEE format, with single-precision numbers, but that case is highly unlikely to arise in real life.) Any scaling parameters apppearing in such an external data representation should be ignored. If the present rules are followed such a dataset cannot arise from manipulations with the SDS library. The net effect of these rules on input from and output to external data formats can be summarized as follows: Extern Data Rep Extern Scaling Intern Data Rep Intern Scaling integer no <-> integer unset integer yes <-> real set real no <-> real unset real no <- integer set ( real yes -> real unset ) Note that the system could have been closed by having external scaled reals correspond to internal scaled integers, but such a correspondence makes no sense. 3. The SCALE and OFFSET attributes govern internal data type conversions between integer and real types according to the following rules: If the scaling attributes are set, they are used to convert between reals and integers and are preserved. The two data representations are equivalent and a conversion of integers with set attributes to reals and then back to integers will reproduce the original data exactly, except for limits incurred by range and precision. If the scaling attributes are not set, integers are converted to reals using the default scaling values, but the scaling attributes remain unset. When reals with unset scaling attributes are converted to integers, the scale and offset will be automatically determined to optimally use the integers to describe the data. Thus, when the scaling attributes are unset, a conversion of integers to reals and back again to integers will not in general reproduce the same set of values. It is not possible to produce an integer dataset without scaling attributes by direct conversion of a real dataset. Datatype Scaling Datatype Scaling integer set <-> real set integer unset -> real unset (default scaling) integer autoset <- real unset 4. When casting on input or output takes place (datatype not declared SDS_ANY), the rule is that the implied I/O conversion takes place closer to the I/O than the internal conversion. On input, for example, if a dataset containing integers with scaling in the external format is read in cast to an internal integer datatype, the data will first be read in as reals with advisory internal scaling set, and then the scaling will be used to convert internally to integers. Practically, this ensures preservation of the external representation's scaling. If integer data are cast to reals on output, the actual external data representation will either be reals without scale parameters or scaled integers, according as the scaling attributes are unset or set. 5. Conversions within type (signed integer, unsigned integer, or real) from a smaller representation to a larger representation (e.g. i1 to i2, u2 to u4, r4 to r8) involve no changes in either the data values or the scaling attributes. Conversions from integer types of 1- or 2-byte length to reals and from i4 or u4 to r8 also involve no changes in data values or the scaling attributes. Likewise, conversions of unsigned integers to signed integers of a smaller size (u4 to i2 or i1, u2 to i1) do not affect values or scaling attributes. In all these cases the original set of possible values is a proper subset of the target set of possible values. Conversions within type from larger to smaller representations may invoke automatic scaling, but this should be avoided if possible. If the data values are contained in the range of numbers represented in the target type, there are no changes in values or in scaling attributes. If the range of data values is smaller than the range of numbers represented in the target type, then the OFFSET attribute is suitably modified or set to center the representations of the data values in the target range as closely as possible, while the SCALE attribute is unchanged, or set to its default value if it had been unset. If the range of data values exceeds the range of representable numbers in the target type, then both the scaling attributes are modified (or set) to approximately center the data values in the target numeric range and to have the data range cover approximately 3/4 of the target numeric range. In deciding whether scaling attributes need to be modified or set, the largest and smallest values in signed integer types and the largest value in unsigned integer types are to be excluded. Thus, a set of i4 data values in the range [-32767,32766] could be converted to type i2 without setting or modifying the scaling attributes. A set of i4 data values in the range [-32767,32767] would require the SCALE attribute to be set (to about 0.75) for conversion to type i2. Strictly speaking, these remarks apply to conversions from type r8 to r4 as well, but they would hardly ever apply, as the likelihood of the data range exceeding the range representable within type r4 is very small; still less of the data range being outside but smaller than the representable range. Note that any conversion requiring the setting of scaling attributes is in fact a conversion to reals, since that is how the data will be treated thenceforward. Conversions between signed integers and unsigned integers are governed by the same rules that apply to conversions within type. If the conversion can be effected without setting or changing the scaling attributes, that is done. If the conversion can be effected by only changing or setting the OFFSET attribute, that is done. Otherwise, both the scaling attributes are set to leave the data centred within a range 3/4 the range of represntable values in the target type. For example, a set of i4 data values in the range [0,65535] would be converted to type u2 without affecting the scaling attributes. A set of i4 data values in the range [-32767,32767] could be converted to type u2 by setting the OFFSET attribute to 32767.0. A set of u2 values in the range [0,65535] could be converted to i2 by setting the OFFSET attribute to -32768.0, but would force the SCALE attribute to be set to about 3/1024 on conversion to u1. In conversions of data from types i4 or u4 to type r4 care must be exercised for the fact that the integer types represent numbers with 8 bits greater precision. Integers with absolute value exceeding 8388607 cannot necessarily be exactly represented in the r4 representation. The scaling attributes cannot be used to shift the data range in this case because the scaling attributes imply that the floating-point data represent the actual data values. It would be useful to have the conversion routine set a warning value in soi_errno in cases in which precision may have been lost in the conversion. Note that this applies regardless of whether the scaling attributes are preset. Because the scaling attributes imply that the real numbers represent the data values, the real number values cannot be changed. Conversions from reals to integers are governed by the scaling attributes if they are set. (If only one is set, the default value of the other is used.) If the scaling attributes have not been set, autoscaling should always be applied: the data values should be centred in the representable range and occupy 3/4 of it. (If there is only a single data value, the SCALE attribute, if unset, should be set to its default value of 1.0.) 6. In general, any data value may be "missing". It may be missing because it is unavailable (clouds on that day), because there can be no data for a given array element (e.g. corresponding to locations off the solar disk), because the analyst has decided the datum is clearly invalid (e.g. bad pixels), or for other reasons. SOI has adopted the philosophy that there will always be at least one value that must be tested for prior to using data in a computation if there is any concievable possibility that some data might be missing. For the floating types, the IEEE NaN will be used to represent such data. For data stored as integer types there is no simple solution since any chosen value must come from the space of otherwise allowable data values. Therefore, when doing calculations using integer typed data which could be missing, the particular "fill_value" applicable to that data must be used. The SOI vds and sds function libraries check the data for fill_values on input and output. The following rules are applied: External form is an integer type: External fill_value is specified: Internal type is the same as the external type: NO checking for missing is done and the internal fill_value is set to be the same as the external fill_value. Internal type is different from external type: ALL data is checked before conversion. The internal fill_value is set according to the internal type as one of: BYTE -128 SHORT -32768 INT -1073741824 UBYTE 255 USHORT 65535 UINT 2147483647 FLOAT AQuietfNaN() DOUBLE AQuietdNaN() External fill_value not specified: NO checking for missing is done. The internal fill_value must be somehow left undefined to avoid later use. Method TBD. External form is a floating type: External fill_value is specified: Internal type and fill_value are the same as the external type: NO checking is done. Internal type or fill_value is different from external type or fill: ALL data is checked before conversion. The internal fill_value is set according to the internal type as one of: BYTE -128 SHORT -32768 INT -1073741824 UBYTE 255 USHORT 65535 UINT 2147483647 FLOAT AQuietfNaN() DOUBLE AQuietdNaN() External fill_value not specified: This case can not happen for external floating types since the external fill_value is defined for all our supported external protocols. Versions of library functions located in /usr/local/src rather than in ~soi/CM/src use the historic value MISSING defined as -(8388608.0*1E10). The keyword "MISSING" is reserved for that value and is not to be used in SOI code. The external fill_values for floating values in external protocols are: FITS NaN CDF NaN WSO_DS MISSING SDS library routines (really macros) are provided to test individual values for missing. They are: int sds_smissing(SDS *sds, short s); int sds_usmissing(SDS *sds, unsigned short u); int sds_imissing(SDS *sds, int i); int sds_uimissing(SDS *sds, unsigned int u); int sds_fmissing(float f); int sds_dmissing(double d); They are implemented as: #define sds_smissing(sds,x) (x == *(short*)sds->fillvalue) #define sds_imissing(sds,x) (x == *(int*)sds->fillvalue) #define sds_smissing(sds,x) (x == *(short*)sds->fillvalue) #define sds_uimissing(sds,x) (x == *(unsigned int*)sds->fillvalue) #define sds_usmissing(sds,x) (x == *(unsigned short*)sds->fillvalue) #define sds_fmissing Is_fNaN #define sds_dmissing Is_dNaN 7. Along with Missing attributes, INFINITE attributes should be provided for integer representations of converted real data with values of Infinity and -Infinity. The same remarks apply as to conversion, namely that data values corresponding to an INFINITE attribute should be mapped through to the INFINITE attribute in the target representation without undergoing scaling. When INFINITE attributes must be set, they should be set to the largest representable integer in signed integer formats and the 2nd largest representable integer in unsigned integer formats for +INFINITY (in order to distinguish from the Missing value). Negative INFINITY should be set to the 2nd smallest representable integer for signed integer formats, which does correspond to the negative of positive INFINITY. Negative INFINITY does not make sense in unsigned formats; conversion of -Infinity in a real format to unsigned integer format will result in replacement with the Missing attribute.