Archiving Data from the Taiwan Oscillations Network
SOI TN 98-140
R S Bogart & D Y Chou
1998.12.08
Introduction
This document provides a design and interface specification for the
archiving and processing of full-disc, high-resolution Ca-K line intensity
data from the Taiwan Oscillations Network (TON).
The aim of the project is to provide an ongoing publicly-accessible
archive of original and processed data from the network.
Data Flow
The overall data flow is summarized in the following steps:
- Production of raw data tapes in original format:
this will continue to be carried out as at present by the TON staff, with
the raw data permanently archived at Tsing Hua University on exabyte tapes.
Copies of these tapes will be produced by the TON staff and mailed to
the Stanford Helioseismology Archive as they become available. Upon
receipt they will be archived at Stanford as well, from where the data will
be publicly distributed.
- Creation of Level 0 data: the raw data will be converted to FITS
format at Stanford and permanently archived.
- Calibration: the Level 0 data will be calibrated and archived in
FITS format at Stanford, following procedures designed jointly
by the scientists and programmers at Tsinghua University, Stanford and
other TON member sites.
- Merging: the Level 1 data from the different TON sites may eventually
be merged into a single data set, following procedures to be designed by
the TON team.
- Helioseismology processing: the Level 1 data will be available for
processing through the same pipeline modules in use at Stanford for
processing MDI data to such products as spherical-harmonic amplitudes,
mode frequencies, ring-diagram and time-distance data sets, and archived
at Stanford. Details of this processing are to be negotiated and specified
in future.
Data flow notes
- The exabyte tapes are written at an unknown block size, TBD. Using
a variable block length to read them at Stanford requires about 20 seconds
per file, or about 100 kB/sec. Keeping pace with raw data input at 100%
coverage would require 8 hours per day of tape read time at this rate.
Estimated data volume
The TON observations began ???. They currently proceed at four sites
with an average of ??? site-days per year. Daily observations at each
site typically last ??? hours, producing up to 60 1080*1080 pixel
photograms per hour. The photograms are 16 bits deep.
The calibrated (level 1) images are 1051*1051. The average coverage for
the entire network is ??%.
The overall data volume is thus:
- Raw: ??? MB / site-day (dataset); ??? GB / day; ??? TB / yr; ??? TB to date
- Level 0: ??? MB / site-day (dataset); ??? GB / day; ??? TB / yr; ??? TB to date
- Level 1: ??? MB / site-day (dataset); ??? GB / day; ??? TB / yr; ??? TB to date
- Level 1 Merged: ??? MB / hour (dataset); ??? GB / day; ??? TB / yr; ??? TB to date
Data Product Descriptions
Raw data
The raw data tapes each contain data from a particular site for one to a few
days. Each tape contains several tar files, one per day of observations at
the site. The tar file includes a single directory containing a set of
identically formatted image files, one per image (minute). The directories
are named by the day of observation (e.g. 960801) and the files by the day
and UT minute number of the observation (e.g. 960801.433, 960801.1129).
(Observations from sites spanning multiple UT days within the observing
day are split onto separate directories; those from the same UT day but
different observing days are combined in the same directory.) IS THIS
TRUE? The raw file format is a 1028 byte header followed by 1080*1080
2-byte unsigned little-endian unsigned shorts representing the data
values. The file is null-padded with 892 bytes to a total of 2334720
bytes.
The header structure is a 1024 byte ASCII string followed by two 2-byte
little-endian short integers representing the number of columns and rows
in the data; these should always both be 1080 (0x0438). The ASCII string
consists of the following:
- A 25-character null-terminated identification string, e.g.
260_016:40:00W_028:18:00N
- A null-terminated string of the following format:
N No. 1, 617 GMT1996/08/01_18:43:55670 O(538,542) M( 2, -2) B= 4736 R=496 T=226 E=0 size=1080,1080 expousre=1500 ms
- A newline character
- A null-terminated string of the following format:
; O(537.6,541.7) B= 4736 R=496.0 E= 0 Sx=0.01011 Sy=0.01301 Av= 5197.9 No= 1
The archived Raw data sets will be organized under dataset names
prog:ton,level:raw,series:site[day-number], with an epoch
for the day number early enough to assure non-negative values in the archive;
1993.01.01_00:00:00_TAI, the MDI epoch, will suffice. The site name can
take on one of the following values (more may be added):
- bb : Big Bear Observatory
- hr : Huairou Observatory
- tf : Tenerife
- ub : Uzbekistan
Level 0 data
The Level 0 data consist of FITS files organized in directories
containing all images observed within a given UT day at a given site.
These will include all filtergrams, darks, and calibration (diffuser) images.
The file naming convention is YYMMDD.mmmm.fits, where the file
name reflects the nominal observation time (UT minute) of the image. In
addition to the required FITS keywords SIMPLE, BITPIX,
NAXIS, and NAXISn, the data should contain the following
keywords:
- T_OBS: observation time (string), in format
YYYY.MM.DD_hh:mm:ss_TYP, where TYP is UT, UTC, TAI, PST, or PDT.
- TYPE: type of filtergram (string); valid values are "obs",
"dark", or "cal".
- ORIENT: image orientation (string); "SESW" if image is direct,
"NENW" if inverted
The Level 0 data are 1080*1080 uncropped images. They should provide a
primary data source from which any modifications to the analysis procedure
(e.g. changes in the registration and calibration algorithms) can
proceed; the Raw data effectively supply a backup archive.
The archived Level 0 data sets will be organized under dataset names
prog:ton,level:lev0,series:site[day-number], with the
same epoch for the day number as for the raw data. The data directories
will also require overview.fits and record.rdb files with the
header info to make them "conforming" data sets.
Level 1 data
One Level 1 data product is planned, calibrated intensity in the same
FITS format and directory arrangement as the Level 0 data.
The calibrated photograms will be produced by techniques TBD.
Calibration will involve flat-fielding, normalization of the intensity
values, and registration of the images to a fixed location on a 1051*1051
pixel grid.
The following keywords must be supplied in the Level 1 data (for detailed
discussion of their meaning, see Technical
Note 95.122):
- DSNAME:
- PROTOCOL:
- CONFORMS:
- T_BLOCK:
- T_START:
- T_STOP:
- T_EPOCH:
- DATAFILE:
- BLDVER10:
- SOURCE:
- I_DREC:
- T_REC:
- T_OBS:
- INTERVAL:
- DATASIGN:
- S_MAJOR:
- S_MINOR:
- S_ANGLE:
- ORIENT:
- IM_SCALE:
- XSCALE:
- YSCALE:
- XCEN:
- YCEN:
- X0:
- Y0:
- DATAMIN:
- DATAMAX:
- DATAMEAN:
- DATA_RMS:
- DATASKEW:
- DATAKURT:
- DATAMEDN:
- DATAVALS:
- MISSVALS:
- OBS_B0:
- OBS_L0:
- OBS_DIST:
- OBS_VR:
- OBS_VW:
- OBS_VN:
- OBS_R0:
- ORIGIN:
- TELESCOP:
- INSTRUME:
- DATE_OBS:
- SOLAR_P0:
- R_SUN:
Additional keywords may be required.
The archived Level 1 data sets will be organized under dataset names
prog:ton,level:lev1,series:site[day-number], with the
same epoch for the day number as for the Level 0 data. The data sets will
be conforming FITS_RDB, and TS_EQ (image files numbered corresponding to
the minute of the hour and mainly blank records in the RDB file corresponding
to minutes for which no data image exists). Individual files will be named
???.
Level 1 Merged data
TBD
Level 2 data products
The Level 2 data products to be produced should be in the same format and
organization as the corresponding products generated from MDI full-disc
data, and should include at a minimum spherical harmonic mode amplitudes,
which would be archived under dataset names
prog:ton,level:lev2_shc,series:V_l0-l1_01d[day-number].
This page last revised
Thursday, 09-Aug-2001 15:57:35 PDT