Designs for Enhanced Browse/Export Pages

SOI TN 99-141
R S Bogart, J A Aloise, R I Bush, K Leibrand, J Schou, & J Sommers
1999.11.03

Introduction

This document provides a design specification for a set of enhancements to the existing web browser-based set of tools for searching and requesting data from the SSSC data base. These enhancements are specifically aimed at making it easier for users to obtain Level 1.5 data from the MDI program, although the supporting tools should be designed and built with sufficient flexibility to support similar interfaces for other types of data.

Design

The existing pls program searches the database for all records matching a given prog name, level name, and series name. This information is used by it and its child programs (plsm etc.) to construct on the fly hierarchichal sets of forms for time-based restrictions to the query preparatory to generating a specific set of dataset names. This approach is appropriate for extracting long time-series of individual data products, an essential requirement for helioseismology. It is not well suited to finding data for other kinds of science such as correlative studies, however.

To solve the orthogonal problem, of finding data of multiple types for a given time, there should ideally be a set of similar programs to query the database to find all dataset names, possibly matching certain selection criteria, within a specified time interval. Unfortunately the existing database is not well-structured to support such queries: time comparisons (based on the T_START and T_STOP parameters for example) are very slow, and there is no guarantee that these time intervals can be converted to valid record number ranges.

To solve the problem in the restricted case of data in prog:mdi,level:lev1.5 (or lev1.4), the solution is fairly simple, since all such datasets are organized either hourly or daily, and furthermore all obey the naming convention that the series names are *_01h, *_06h, or *_01d depending on the organization. Since use of the DS naming table to construct correspondences between record numbers and time intervals is inherently more flexible and trustworthy than relying on name conventions, this approach should be used. It is the same approach currently used by the pls family.

A general-purpose tool should query the DS naming table (and epoch table) for all datasets matching a given set of Block values; in this specific implementation those should be 1hr, 6hr, and 1day, and the Prog and Level values should be further restricted to mdi and lev1.5 respectively. The DS naming query tool should also support (but not require!) restriction of the Series value as well. The result of the query to the DS naming table would be sets of prog:level:series triads for each Block value (15 for prog:mdi,level:lev1.5; block:1day and about 65 for prog:mdi,level:lev1.5; block:1hr). A separate program should combine these triads with appropriate record ranges corresponding to a selected time interval for direct queries to the database in order to generate lists of found datasets for preparation of request tables similar to the existing ones. Presumably data should be organized in this scheme by record number: for example, all data series for a given hour (or day) should be listed together, rather than all hours for a given series; but the query tool described could support either method, that merely depends on the design of the web-page generating program. Obviously tools are required to translate times to record numbers based on blocking and epoch information, but the existing tools probably suffice.

Note that the results of the query to the DS naming table could well be cached in semi-permanent files, since the table is very seldom changed and the number of records is quite small. An example of such a table might look like the following:

sun> cat $QRY_CACHE/prog:mdi,lev:lev1.5,block:1day mdi^lev1.5^fd2loi64_Ic_01d^TS_EQ^t_obs^1993.01.01_00:00:00_TAI mdi^lev1.5^fd_Ic_6h_01d^TS_EQ^t_obs^1993.01.01_00:00:00_TAI mdi^lev1.5^fd_M_01d^TS_EQ^t_obs^1993.01.01_00:00:00_TAI mdi^lev1.5^fd_M_96m_01d^TS_EQ^t_obs^1993.01.01_00:00:00_TAI mdi^lev1.5^fd_Mag_01d^TS_EQ^t_obs^1993.01.01_00:00:00_TAI mdi^lev1.5^fd_V_6h_01d^TS_EQ^t_obs^1993.01.01_00:00:00_TAI mdi^lev1.5^loi64_Ic_01d^TS_EQ^t_obs^1993.01.01_00:00:00_TAI mdi^lev1.5^loi64_V_01d^TS_EQ^t_obs^1993.01.01_00:00:00_TAI mdi^lev1.5^loi_Ld_01d^TS_EQ^t_obs^1993.01.01_00:00:00_TAI mdi^lev1.5^loi_V_01d^TS_EQ^t_obs^1993.01.01_00:00:00_TAI mdi^lev1.5^rwbin_Ic_01d^TS_EQ^t_obs^1993.01.01_00:00:00_TAI mdi^lev1.5^rwbin_Ld_01d^TS_EQ^t_obs^1993.01.01_00:00:00_TAI mdi^lev1.5^vw_Ld_01d^TS_EQ^t_obs^1993.01.01_00:00:00_TAI mdi^lev1.5^vw_V_01d^TS_EQ^t_obs^1993.01.01_00:00:00_TAI

Form Interface

Evidently the programs described above will require a CGI form interface that provides them with

The name of the appropriate program to execute
A list of 0 or more prog names to restrict the search (0 implying no restriction)
A list of 0 or more level names for restriction
A list of 0 or more blocking names for restriction
A list of 0 or more series names for restriction
A start time and a stop time

The entry-level form could be open-ended, or it could use either radio, menu, or checkbox selectors, depending on how many such entry-level forms we wish to provide and how specifically we want to target them. For a page designed to target requestors of synoptic correlative data, for example, the prog, level, and blocking names could be hardwired, and the series names selected from various menu-driven combinations: an example might be a selector name of "Low-Resolution Magnetograms" that would send the list of series names [fd_M_01h, fd_M_1024x750_01h, fd_M_1024x500_01h, fd_M_96m_01d]. Note that our use of the term "Full-Disc" to describe all data taken at nominal resolution of 2"/pixel is misleading. "Low-Resolution" may also be misleading, but the important point is that accurate and readily understandable descriptions are preferable to faithful adherence to the actual dataset naming conventions, which need not be visible to the users.

An example of such an entry level form might look like the following (checkboxes may be preferable to some of the menus):

Start Date/Time:
Stop Date/Time:					default: 24 hours after Start

Program: MDI MDI_EOF
Observable Type:
Resolution
Spatial Coverage

Lower-level forms, generated on the fly by the query programs, would send the same information to the query programs, except that since the first level program would generate a complete list of available data sets matching the selection criteria, only one additional selection level would be required. An example of such a page, generated by a request for all MDI level 1.5 Doppler data between 1999.10.09_00:00 and 1999.10.09_17:59, might look like the following:

    Select all or
Select Series            : Time              (#)  (Size)
     fd_V_6h_01d       : 1999.10.09    ( 2472)  (3.131 MB)
     loi64_V_01d       : 1999.10.09    ( 2472)  (46.376 MB)
     vw_V_06h          : 1999.10.09.00 ( 9888)  (51.646 MB)
     hr_Vm_bin2x2_01h  : 1999.10.09_00 (59328)  (23.399 MB)
     fd_V_01h          : 1999.10.09_01 (59329)  (4.160 MB)
     hr_Vm_bin2x2_01h  : 1999.10.09_01 (59329)  (23.399 MB)
     hr_Vm_bin2x2_01h  : 1999.10.09_02 (59330)  (30.113 MB)
     hr_Vm_bin2x2_01h  : 1999.10.09_03 (59331)  (28.564 MB)
     hr_Vm_bin2x2_01h  : 1999.10.09_04 (59332)  (28.564 MB)
     hr_Vm_bin2x2_01h  : 1999.10.09_05 (59333)  (30.630 MB)
     vw_V_06h          : 1999.10.09.25 ( 9889)  (51.646 MB)
     hr_Vm_bin2x2_01h  : 1999.10.09_06 (59334)  (29.597 MB)
     hr_Vm_bin2x2_01h  : 1999.10.09_07 (59335)  (30.113 MB)
     hr_Vm_bin2x2_01h  : 1999.10.09_08 (59336)  (27.014 MB)
     hr_Vm_bin2x2_01h  : 1999.10.09_09 (59337)  (29.597 MB)
     hr_Vm_bin2x2_01h  : 1999.10.09_10 (59338)  (30.630 MB)
     hr_Vm_bin2x2_01h  : 1999.10.09_11 (59339)  (28.047 MB)
     vw_V_06h          : 1999.10.09.50 ( 9890)  (51.647 MB)
     fd_V_01h          : 1999.10.09_12 (59340)  (4.160 MB)
     hr_Vm_bin2x2_01h  : 1999.10.09_12 (59340)  (22.883 MB)
     hr_Vm_bin2x2_01h  : 1999.10.09_13 (59341)  (31.146 MB)
     hr_Vm_bin2x2_01h  : 1999.10.09_14 (59342)  (28.047 MB)
     hr_Vm_bin2x2_01h  : 1999.10.09_15 (59343)  (29.597 MB)
     hr_Vm_bin2x2_01h  : 1999.10.09_16 (59344)  (30.630 MB)
     hr_Vm_bin2x2_01h  : 1999.10.09_17 (59345)  (30.630 MB)

Up to 25 datasets selected

Key:

Series	Observable	Cadence	Image Size	Nominal Resolution ["/pxl]
fd_V_6h_01d	Doppler	4 / day	1024×1024	2
loi64_V_01d	Doppler	1440 / day	64×64	32
vw_V_06h	Doppler	360 / 6 hr	192×192	10
hr_Vm_bin2x2_01h	Doppler	60 / hour	512×512	1.3
fd_V_01h	Doppler	60 / hour	1024×1024	2

Image Selection

There is a separate problem involving the ability of users to extract single images rather than entire data sets for export from our data archive. This may be soluble in a straightforward way if the peq program for staging data accepts (or is taught to accept) selector information in the data-set name. Since the actual data transfer from tape to staging disk is accomplished via the gtar command, it is possible that individual filename information could be extracted from the online dataset headers (or conventions?) and passed on to gtar. Whether this is feasible or efficient remains to be seen, and in any case it would not be acceptable to mark partial data sets staged to disc as being online as far as the DSDS is concerned. The simpler solution is of course to doubly stage the data, first to the disc cache as entire datasets and then to a separate space reserved for export, much like the mechanism for ftp by requestor. This might even be possible with the module no-op, and is certainly not difficult.

The issue of extracting single images, however, is independent of cuts through the database, and ought to be addressed separately.

Program:	MDI MDI_EOF
Observable Type:
Resolution
Spatial Coverage