GATHER_INFO(1)



NAME

     gather_info


SYNOPSIS

     gather_info [-qNXf]
          [in={datacollection} (NOT SPECIFIED for default)]
          [keylist={datacollection} (NOT SPECIFIED for default)]
          out={datacollection}
          [D_VARS= (1 by default)]
          [MASKS= (NOT SPECIFIED for default)]
          [TIME_KEY= (T_OBS by default)]
          [TIME_MATCH= (T_REC by default)]
          [TIME_KIND= (SOI by default)]
          [ALT_KEY= (NOT SPECIFIED for default)]
          [CONFORMS= (TS_EQ by default)]
          [PROTOCOL= (RDB.FITS by default)]
          [T_EPOCH", SOI_BASE_EPOCH by default)],
          [T_BLOCK= (1h by default)]
          [EXT_INFO= (1s by default)]
          [DATAKIND= (NOT SPECIFIED for default)]
          [T_START= (1601.01.01_00:00:00_UT by default)]
          [T_STOP= (2499.12.31_23:59:59_UT by default)]
          [T_FIRST= (NOT SPECIFIED for default)]
          [T_LAST= (NOT SPECIFIED for default)]



DESCRIPTION

     gather_info is a module to gather  per-record  info  into  a
     conforming dataset.

     gather_info is given a target  collection  of  datasets,  an
     collection  of  input datasets, and a list of keywords which
     point to information that needs to  be  collected  from  the
     input datasets and included in the output datasets.

     The output datasets must be SOI "conforming" datasets.

     If the output datasets  do  not  yet  exist,  they  will  be
     created and the overview file will be generated.  The struc-
     ture of the dataset is set when it is first created.

     The program logic is as follows:

         {
         Get and validate the list of  keywords  to  be  gathered
     into the output
         datasets.  If the keyword list is not explicitly  speci-
     fied,
         all keywords and data is included in the output.

         Get  info  about  target  datasets  including  conforms,
     t_epoch, t_start,
         t_stop, t_block,  t_step  if  needed.   First  look  for
     "out_nsets"
         to see how many datasets are to be updated.   Then  look
     for them
         in order; when one is found use  the  overview  file  to
     derive the
         target global information.   If  none  of  the  overview
     files are
         present, use the command line arguments.

         The command line arguments  "T_FIRST"  and  T_LAST"  are
     used to
         possibly limit the range in output series  numbers  gen-
     erated.
         In the case of TS_EQ and TS_BLOCKED datasets, only whole
     datasets
         are selected, so if T_FIRST and T_LAST specify  a  range
     smaller than
         a single dataset from a dataseries, that  whole  dataset
     will be
         gathered.  In the case of TS or MISC  datasets,  T_FIRST
     and T_LAST
         are used as the included limits of the gathered data.

         The target times for the output dataset  depend  on  the
     conform type
         and on the input datasets.  In the case  of  TS_EQ  only
     the output
         times are derived from the conform info.  In  the  cases
     of
         TS, TS_BLOCKED, and MISC the output times are copies  of
     the
         input times found limited by the blocking; in  the  case
     of
         TS_BLOCKED and by T_FIRST and  T_LAST.   If  the  output
     dataset already
         exists the target times will be taken from a  time  key-
     word already
         in the output dataset.  The time keyword used is  speci-
     fied in the
         "TIME_MATCH" argument, default "T_REC".

         Open all the input datasets and sort  records  by  time,
     alternate key
         value, quality, and input dataset index.   If  the  "-q"
     flag is set
         and if the matching MDI log file is found,  the  quality
     info is taken
         from the MDI log file; otherwise it is set to  the  same
     default value
         for all records.  In order to use the quality  info  the
     log file must
         have   the   TIME_KEY   time   parameter.      Presently
     gather_qual produces
         the log file and includes both T_REF and T_OBS.

         Now, for each output dataset:
             {
             Compute t_start and t_stop.

             If the overview file is not present, make one.
             (should: If one is present, check for consistency.)

             Build up a list of output variables. Currently  only
     one
             variable is supported.

             open the output record info files.  If not found,
             create empty AT tables.

             Build up a list  of  input  datasets  that  will  be
     needed from their
             t_start, etc.

             Open the keyword description file.  if not present,
             then finish by closing the output dataset.
             Maybe only the overview file was wanted.

             for each keyword do:

                 Decide which variables are to be updated, of  if
     global value
                 for overview file.

                 If  global,  look  for  keyword  in  each  input
     dataset until it
                 is found.  If never found, take default.

                 If record keywords, for each t_obs in the target
     dataset,
                 choose the input data record which best  matches
     in quality,
                 closness in time,  closeness  in  alternate  key
     value, and
                 dataset index.  The record  must  be  within  =-
     half of the
                 output dataset t_step from  t_obs.   The  record
     quality must
                 also test sucessfully against reject masks which
     may be
                 specified in the parameter list.
                 Take the keyword value and add to the  appropri-
     ate AT table.

                 KEYWORD LIST FORMAT for gather_info program.

                 The list of keywords will be passed to the  pro-
     gram as a
                 dataset name using the parameter "keylist"
                 The list will be stored as an ".rdb" file.

                 The list will contain a list of keywords  to  be
     added to
                 the global (overview) file and to the per-record
     information
                 files.  For each keyword listed there  are  five
     fields
                 of information in the rdb file.

                 Scope:      A comma delimited list  of  variable
     numbers in
                             the  target  dataset,  or  the  word
     "ALL" or the
                             word "GLOBAL" or  the  word  "DATA".
     ALL means
                             all  variables.   GLOBAL  means  the
     attribute
                             will be added to the overview file
                             rather  than  record  files.    DATA
     means to take
                             the data record rather than  a  key-
     word.
                             DEFAULT is like GLOBAL but SourceVar
     will be
                             ignored.

                 Keyword:    Contains the attribute name  in  the
     target output
                             dataset.  A single  "*"  means  take
     all  keywords
                             from the source variable.

                 Default:    Contains an optional  default  value
     to use if the
                             input keyword is not present in  the
     input
                             dataset, or if  there  is  no  input
     dataset.

                 SourceVar:  Contains the input  variable  number
     or name.

                 SourceKey:  Contains the input attribute keyword
     name.
                             Must be absent if  Keyword  contains
     an "*".
                             If the SourceKey begins with  a  "$"
     character
                             the  value  will  be  generated   by
     gather_info.
                             If the SourceKey name begins with  a
     "#"
                             character the value will  be  looked
     for in
                             the "quality" dataset if the -q flag
     is set
                             and is otherwise an error.

             close the output dataset.
             }
         close the input datasets.
         done.
         }

     Gather_info must be called for each  input  dataseries  that
     will  be used to contribute global or per_record info to the
     output datasets.

     The default datasets are  TS_EQ,  starting  at  the  default
     SOI_BASE_EPOCH which is 1993.01.01_00:00_00_TAI. The default
     is hourly datasets with one second records.

     If there is a pre-existing dataset but the  existing  struc-
     ture  is  to be replaced, the "-N" flag must be set.  Other-
     wise the record time structure will be taken from the exist-
     ing  dataset.   Also,  alt_key  sorts  are only allowed when
     creating the output dataset from scratch.

     If NO data records are found in the input datasets the  pro-
     gram  exits  with  GATHER_INFO_NO_INPUT_DATA unless the "-X"
     flag is set.

     If the "f=1" or "-f" flag is set any DATA  records  gathered
     will be forced to floating point.

     There are several virtual SourceKeys that may be used as the
     source  of  keyword values to "gather" from for record info.
     These are:

         $DSNAME          The      4-part      dataset      name,
     prog,lev,series[#series] of the
                     current source with the record spec added as
     a
                     "sel:[recnumber]"
         $RUNTIME    The UT time at which this run of gather_info
     was made.

     These may be placed in the SourceKey position and placed  in
     the global or record info as desired.

     If the target SourceKey is not found the  Default  value  is
     used. If the Default value is the reserved value "$REQUIRED"
     gather_info exits with the  REQUIRED_KEYVALUE_MISSING  error
     code set.


     QUALITY NOTES

     Rejections and quality comparisons are based on a .rdb  file
     whose  name  can  be specified in the argument list with the
     keyword MASKS.  If a filename is not explicitly specified in
     the arguments, a configured file corresponding to the output
     series name is used.  The file should have the columns "COM-
     PARE" and "REJECT".  The entries should be strings which can
     be converted to unsigned ints by strtoul.

     Example:

     COMPARE REJECT
     ------- ------
     0x01    0x80
     0x7C

     No more than MAX_MASKS rows of the table will be used.   The
     number  of  entries in each column may differ.  The order of
     the COMPARE entries is significant.

     The function compare_quality compares two unsigned int qual-
     ities  by  testing  them successively against the comparison
     masks.  The two qualities are  bitwise  ANDed  with  a  com-
     parison  mask  and the two results are compared NUMERICALLY.
     Qualities are compared against the masks in order until  the
     results  are unequal or until all masks have been used.  The
     return value from compare_quality is the result of the first
     non-zero  NUMERICAL  comparison or 0 if all comparisons were
     equal.

     The function reject_quality "rejects" an unsigned int  qual-
     ity if the bitwise AND of the quality with any of the reject
     masks is non-zero.

     FOR EXAMPLE, if the mask table above is  used,  any  quality
     without  bit 0 (MDI_TELEM_ERRS) set is better than any qual-
     ity with bit 0 set.  And if two qualities have the same  bit
     0,  the quality with the smallest NUMERICAL bitwise AND with
     bits 2-6 (MDI_MISSVALS) is better.   Qualities  with  bit  7
     (MDI_ONBRD_PROC_ERR) set are rejected.  Note that the numer-
     ical comparison makes it possible to test sets of bits.

     If the masks table cannot be accessed  or  contains  errors,
     default  masks  which are equivalent to the previous version
     of gather_info are used.  These  are  set  by  the  function
     get_masks  to compare as in the example above but nothing is
     rejected.



FILES

     /home/soi/CM/include/soi_at.h
     /home/soi/CM/include/data_qual.h
     /home/soi/CM/include/soi_error.h
     /home/soi/CM/tables/observables/masks/{series}_masks.rdb
     /soidata/info/mdi_log/{level}/{series}/{#%06d#series}.record.rdb


SEE ALSO

     vds(3)


DIAGNOSTICS

     Writes informational messages to the history log.
     Writes warning and error messages to the error log.
     Returns non-zero error code on failure.


BUGS