Archiving Data from the Mt. Wilson 60-ft Tower
SOI TN 98-139
R S Bogart & A Jones
1998.10.08
Introduction
This document provides a design and interface specification for the
archiving and processing of full-disc, high-resolution solar Doppler
data from the Magneto-Optical Filter (MOF) on the Mt. Wilson 60-ft Solar
Tower. The aim of the project is to provide an ongoing publicly-accessible
archive of original and processed data from the instrument, ultimately
covering the duration of the observing series (beginning 1988?).
Data Flow
The overall data flow is summarized in the following steps:
- Production of raw data tapes in original (sunio) format:
this will continue to be carried out as at present by the Mt. Wilson
Observatory staff, with the tapes permanently archived at USC.
- Creation of Level 0 data: the raw data will be converted to FITS
format, temporarily stored on disc at USC, transferred to Stanford by ftp
or other network transfer, and permanently archived at Stanford.
- Calibration: the Level 0 data will be calibrated to Dopplergams and
archived in that format at Stanford, following procedures designed jointly
by the scientists and programmers at USC and Stanford.
- Helioseismology processing: the Level 1 data will be processed through
the same pipeline modules in use at Stanford for processing MDI data to
such products as spherical-harmonic amplitudes, mode frequencies, ring-diagram
and time-distance data sets, and archived at Stanford.
Data flow notes
- The processing of Mt. Wilson data from native format takes about 4 hours
per tape, which contains one day's worth of data, primarily due to tape read
time; apart from observation, this is the principle limitation on overall data
throughput.
- Determination of the type of the filtergram (red or blue wing) for
insertion into the FITS header requires crude processing into Dopplergrams
to determine the order of observations (which changes from time to time).
This may be simply a difference without registration.
- Tests conducted on 8 Oct. 1998 suggest that under good conditions the
transfer rate from USC to Stanford is on the order of 13.5 2-MB files per
minute; it should thus require about 1.5 hour to transfer one day's worth
of data. The data may be either pushed to Stanford by a job run in
conjunction with their production, or pulled from USC by a process watching
for a signal. Provision must be made for verification of successful receipt
of the Level 0 data and recreation of the data at USC for re-transfer in
case of failure.
- The Level 1 data are expected to be functionally equivalent to MDI
Level 1.5 data, i.e. capable of being processed directly by the
level 2 pipeline modules; this requires that they have all required
ancillary data described under appropriate keywords, and that they be
organized in sets of equal time steps, with gaps noted in the descriptor
file. Creation of the Level 1 data will require a wholly original module,
based on algorithms developed at USC. The equivalent data set is currently
produced only in transient form at USC, as raw data are processed directly
to spherical harmonic amplitudes.
- It is our aim and expectation that minimal changes will be required to
the post-level 1 processing modules: the only modifications that should be
needed are fixes to bugs that may be uncovered in the processing of ground
data with problems not encountered or successfully solved previously.
Estimated data volume
The Mt. Wilson observations have been proceeding for over ten years, with
an average of 200+ days of observing per year. Daily observations typically
last 10 - 14 hours, producing 60 filtergram pairs per hour. The filtergrams
(and processed Dopplergrams) are 16 bits deep. The overall data volume is thus:
- Level 0: 240 MB / hr (dataset); 3 GB / day; 0.6 TB / yr; 6+ TB to date
- Level 1: 120 MB / hr (dataset); 1.5 GB / day; 0.3 TB / yr; 3+ TB to date
- Level 2: mode amplitudes comparable to Level 1?
Data Product Descriptions
Raw data
format described elsewhere?
Level 0 data
The Level 0 data consist of FITS files organized in directories
containing all images observed within a given clock hour. These will
include both red-wing and blue-wing filtergrams, darks, and Ronchi images.
The file naming convention is YYMMDD:hhmmss.fits, where the file
name reflects the nominal observation time (PST?) of the image. In
addition to the required FITS keywords SIMPLE, BITPIX,
NAXIS, and NAXISn, the data should contain the following
keywords:
- T_OBS: observation time (string), in format
YYYY.MM.DD_hh:mm:ss_TYP, where TYP is UT, UTC, TAI, PST, or PDT.
- TYPE: type of filtergram (string); valid values are "red", "blue",
"dark", or "ronchi".
- ORIENT: image orientation (string); "SESW" if image is direct,
"NENW" if inverted
- STATUS: observation status (int)
The Level 0 data are 1024*1024 uncropped images. They should provide a
primary data source from which any modifications to the analysis procedure
(e.g. changes in the registration and calibration algorithms) can
proceed; the Raw data effectively supply a backup archive.
The archived Level 0 data sets will be organized under dataset names
prog:mwo60,level:lev0,series:fgram[hour-number], with an epoch
for the hour number early enough to assure non-negative values in the archive;
1986.01.01_00:00:00_TAI may suffice. The data directories will also
require overview.fits and record.rdb files with the header info
to make them "conforming" data sets; whether these files will be supplied
by USC in the process of creation or at Stanford prior to archiving is TBD.
Level 1 data
One Level 1 data product is planned, Dopplergrams in the same FITS
format and directory arrangement as the Level 0 data. If additional Level 1
data products such as line intensity are produced, they should follow the
same organization, differing only in series name.
The Dopplergrams will be produced by registering pairs of neighboring
filtergrams to the same image center, radius, and orientation, differencing
them (in the sense blue - red), and calibrating by fitting running averages
to a model for solar rotation corrected for limb shift and orbital velocity.
The following keywords must be supplied in the Level 1 data (for detailed
discussion of their meaning, see Technical
Note 95.122):
- DSNAME:
- PROTOCOL:
- CONFORMS:
- T_BLOCK:
- T_START:
- T_STOP:
- T_EPOCH:
- DATAFILE:
- BLDVER10:
- SOURCE:
- I_DREC:
- T_REC:
- T_OBS:
- INTERVAL:
- DATASIGN:
- S_MAJOR:
- S_MINOR:
- S_ANGLE:
- ORIENT:
- IM_SCALE:
- XSCALE:
- YSCALE:
- XCEN:
- YCEN:
- X0:
- Y0:
- DATAMIN:
- DATAMAX:
- DATAMEAN:
- DATA_RMS:
- DATASKEW:
- DATAKURT:
- DATAMEDN:
- DATAVALS:
- MISSVALS:
- OBS_B0:
- OBS_L0:
- OBS_DIST:
- OBS_VR:
- OBS_VW:
- OBS_VN:
- OBS_R0:
- ORIGIN:
- TELESCOP:
- INSTRUME:
- DATE_OBS:
- SOLAR_P0:
- R_SUN:
Additional keywords may be required.
The archived Level 1 data sets will be organized under dataset names
prog:mwo60,level:lev1,series:V[hour-number], with the same epoch
for the hour number as for the Level 0 data. The data sets will be conforming
FITS_RDB, and TS_EQ (image files numbered corresponding to the minute of the
hour and mainly blank records in the RDB file corresponding to minutes for
which no data image exists). Individual files will be named
V_01h.hhhhhh.mm.fits, where hhhhhh is the 6-digit index
number of the data set (hours from the epoch), and mm is the 2-digit
minute of the hour corresponding to the image observing time. Because some
observations have been made centered near HH:MM:00 and others centered near
HH:MM:30, 15 seconds should be added to the observing time before truncating
to the minute number, or 15 seconds subtracted from the observing time
before rounding to the closest minute number.
Level 2 data products
The Level 2 data products to be produced should be in the same format and
organization as the corresponding products generated from MDI full-disc
data, and must include at a minimum spherical harmonic mode amplitudes,
which would be archived under dataset names
prog:mwo60,level:lev2_shc,series:V_l0-l1_01d[day-number].