Archiving Data from the Mt. Wilson 60-ft Tower

SOI TN 98-139
R S Bogart & A Jones
1998.10.08

Introduction

This document provides a design and interface specification for the archiving and processing of full-disc, high-resolution solar Doppler data from the Magneto-Optical Filter (MOF) on the Mt. Wilson 60-ft Solar Tower. The aim of the project is to provide an ongoing publicly-accessible archive of original and processed data from the instrument, ultimately covering the duration of the observing series (beginning 1988?).

Data Flow

The overall data flow is summarized in the following steps:

Production of raw data tapes in original (sunio) format: this will continue to be carried out as at present by the Mt. Wilson Observatory staff, with the tapes permanently archived at USC.
Creation of Level 0 data: the raw data will be converted to FITS format, temporarily stored on disc at USC, transferred to Stanford by ftp or other network transfer, and permanently archived at Stanford.
Calibration: the Level 0 data will be calibrated to Dopplergams and archived in that format at Stanford, following procedures designed jointly by the scientists and programmers at USC and Stanford.
Helioseismology processing: the Level 1 data will be processed through the same pipeline modules in use at Stanford for processing MDI data to such products as spherical-harmonic amplitudes, mode frequencies, ring-diagram and time-distance data sets, and archived at Stanford.

Data flow notes

The processing of Mt. Wilson data from native format takes about 4 hours per tape, which contains one day's worth of data, primarily due to tape read time; apart from observation, this is the principle limitation on overall data throughput.
Determination of the type of the filtergram (red or blue wing) for insertion into the FITS header requires crude processing into Dopplergrams to determine the order of observations (which changes from time to time). This may be simply a difference without registration.
Tests conducted on 8 Oct. 1998 suggest that under good conditions the transfer rate from USC to Stanford is on the order of 13.5 2-MB files per minute; it should thus require about 1.5 hour to transfer one day's worth of data. The data may be either pushed to Stanford by a job run in conjunction with their production, or pulled from USC by a process watching for a signal. Provision must be made for verification of successful receipt of the Level 0 data and recreation of the data at USC for re-transfer in case of failure.
The Level 1 data are expected to be functionally equivalent to MDI Level 1.5 data, i.e. capable of being processed directly by the level 2 pipeline modules; this requires that they have all required ancillary data described under appropriate keywords, and that they be organized in sets of equal time steps, with gaps noted in the descriptor file. Creation of the Level 1 data will require a wholly original module, based on algorithms developed at USC. The equivalent data set is currently produced only in transient form at USC, as raw data are processed directly to spherical harmonic amplitudes.
It is our aim and expectation that minimal changes will be required to the post-level 1 processing modules: the only modifications that should be needed are fixes to bugs that may be uncovered in the processing of ground data with problems not encountered or successfully solved previously.

Estimated data volume

The Mt. Wilson observations have been proceeding for over ten years, with an average of 200+ days of observing per year. Daily observations typically last 10 - 14 hours, producing 60 filtergram pairs per hour. The filtergrams (and processed Dopplergrams) are 16 bits deep. The overall data volume is thus:

Level 0: 240 MB / hr (dataset); 3 GB / day; 0.6 TB / yr; 6+ TB to date
Level 1: 120 MB / hr (dataset); 1.5 GB / day; 0.3 TB / yr; 3+ TB to date
Level 2: mode amplitudes comparable to Level 1?

Data Product Descriptions

Raw data

format described elsewhere?

Level 0 data

The Level 0 data consist of FITS files organized in directories containing all images observed within a given clock hour. These will include both red-wing and blue-wing filtergrams, darks, and Ronchi images. The file naming convention is YYMMDD:hhmmss.fits, where the file name reflects the nominal observation time (PST?) of the image. In addition to the required FITS keywords SIMPLE, BITPIX, NAXIS, and NAXISn, the data should contain the following keywords:

T_OBS: observation time (string), in format YYYY.MM.DD_hh:mm:ss_TYP, where TYP is UT, UTC, TAI, PST, or PDT.
TYPE: type of filtergram (string); valid values are "red", "blue", "dark", or "ronchi".
ORIENT: image orientation (string); "SESW" if image is direct, "NENW" if inverted
STATUS: observation status (int)

The Level 0 data are 1024*1024 uncropped images. They should provide a primary data source from which any modifications to the analysis procedure (e.g. changes in the registration and calibration algorithms) can proceed; the Raw data effectively supply a backup archive.

The archived Level 0 data sets will be organized under dataset names prog:mwo60,level:lev0,series:fgram[hour-number], with an epoch for the hour number early enough to assure non-negative values in the archive; 1986.01.01_00:00:00_TAI may suffice. The data directories will also require overview.fits and record.rdb files with the header info to make them "conforming" data sets; whether these files will be supplied by USC in the process of creation or at Stanford prior to archiving is TBD.

Level 1 data

One Level 1 data product is planned, Dopplergrams in the same FITS format and directory arrangement as the Level 0 data. If additional Level 1 data products such as line intensity are produced, they should follow the same organization, differing only in series name.

The Dopplergrams will be produced by registering pairs of neighboring filtergrams to the same image center, radius, and orientation, differencing them (in the sense blue - red), and calibrating by fitting running averages to a model for solar rotation corrected for limb shift and orbital velocity. The following keywords must be supplied in the Level 1 data (for detailed discussion of their meaning, see Technical Note 95.122):

DSNAME:
PROTOCOL:
CONFORMS:
T_BLOCK:
T_START:
T_STOP:
T_EPOCH:
DATAFILE:
BLDVER10:
SOURCE:
I_DREC:
T_REC:
T_OBS:
INTERVAL:
DATASIGN:
S_MAJOR:
S_MINOR:
S_ANGLE:
ORIENT:
IM_SCALE:
XSCALE:
YSCALE:
XCEN:
YCEN:
X0:
Y0:
DATAMIN:
DATAMAX:
DATAMEAN:
DATA_RMS:
DATASKEW:
DATAKURT:
DATAMEDN:
DATAVALS:
MISSVALS:
OBS_B0:
OBS_L0:
OBS_DIST:
OBS_VR:
OBS_VW:
OBS_VN:
OBS_R0:
ORIGIN:
TELESCOP:
INSTRUME:
DATE_OBS:
SOLAR_P0:
R_SUN:

Additional keywords may be required.

The archived Level 1 data sets will be organized under dataset names prog:mwo60,level:lev1,series:V[hour-number], with the same epoch for the hour number as for the Level 0 data. The data sets will be conforming FITS_RDB, and TS_EQ (image files numbered corresponding to the minute of the hour and mainly blank records in the RDB file corresponding to minutes for which no data image exists). Individual files will be named V_01h.hhhhhh.mm.fits, where hhhhhh is the 6-digit index number of the data set (hours from the epoch), and mm is the 2-digit minute of the hour corresponding to the image observing time. Because some observations have been made centered near HH:MM:00 and others centered near HH:MM:30, 15 seconds should be added to the observing time before truncating to the minute number, or 15 seconds subtracted from the observing time before rounding to the closest minute number.

Level 2 data products

The Level 2 data products to be produced should be in the same format and organization as the corresponding products generated from MDI full-disc data, and must include at a minimum spherical harmonic mode amplitudes, which would be archived under dataset names prog:mwo60,level:lev2_shc,series:V_l0-l1_01d[day-number].