A Prototype for SSSC Data Reduction and Analysis R. S. Bogart 1991.IX.17 SOI Technical Note 075 This note is an attempt to define a prototype for the SOI-MDI Science Support Center that will provide a) a core of essential utilities for the production system; b) a model of data operations from which reasonable scaling to the full system can be made; and c) a testbed in which prototype instrument data may be analyzed. The plan assumes that such a prototype will be completed in approximately one year, i.e. by 9/30/92, using predominantly existing hardware. There are two fundamentally different approaches that can be taken to building the overall data system, characterized as "vertical" and "horizontal". In the vertical approach, we design entire "systems" carrying a particular kind of data through all processing steps up to the points of scientific analysis (Level 3). In a horizontal approach individual subsystems corresponding to levels and sub-levels of data processing (e.g. velocity calibration, 2-d spatial Fourier transforms) are designed as modules from which the systems can be constructed. It has been proposed that our approach should be vertical and this makes sense for a prototype, particularly in meeting goals (b) and (c). There is a substantial degree of overlap in the subsystems required for many of the systems, so with careful design the modularity advantages of horizontal construction need not be sacrificed. The SOI Data Management Plan identifies 27 systems categorized by the nature of the raw data to be processed (see the appendix to this note for the list). Many of these systems are, however, nearly identical in structure with one another. For example, Systems 1 and 2, processing of the continuous full-disc Dopplergrams during the Dynamics and Extended Campaign periods respectively, differ only in the total length of the input data streams and consequently in the back-end combination of temporal power spectra (which is not even called out as a sub-system). Conversely, however, there are certain to be additional systems required for the processing of certain types of Campaign data that are not yet identified in the Plan. Broadly speaking, the systems fall into the following 5 categories: A. Full-disc helioseismic data 1, 2, 4, 5 All these data require traditional Dynamics Program type processing, i.e. analysis into spherical harmonic oscillatory modes. Front-end processing differs according to observable, back-end processing differs according to length of observation. B. Other full-disc data 3, 6, 11 May or may not be analyzed as oscillations, but early processing is similar to class A for same observable. Back-end processing may be similar to either class A or class C. Synoptic analyses are unique requirement C. High-resolution data 7 - 10, 12 Front-end processing as with A. Back-end processing unknown in many cases, but examples are known: ring-and-trumpet and Hankel-function analyses for seismology, correlation tracking for surface analysis. D. Structure Program data 13 - 23 Minimal front-end processing except for telemetry and distribution pre-processing common to all classes. Unique calibration problems poorly understood at present. Back-end processing similar to A. E. Ancillary data 24 - 27 Unique requirements specific to instrument design and experiment operations structure. Minimal impact on overall throughput. It is my judgement that development of class E systems is premature; in any case they are comparatively useless in providing subsystems of wider application. It will, however, be necessary at some point to develop a prototype for system 24, the Velocity Calibration, in the process of instrument prototype analysis. This properly belongs within the scope of instrument development; some careful coordination of the two programs will be required, though, if the system is to be easily absorbed into operations. Likewise, because the subsystems unique to class D are poorly defined at present, there is little sense in developing any of these systems as part of a prototype. Again, their contribution to the overall throughput is modest, so they would not be of much help in scaling. Since the back-end data sets for class A in particular are rather similar to these data sets their places and special problems are not likely to be overlooked. In particular, many of the observables will be recreated as part of the analysis and calibration procedures of systems 1 and 4, the Dynamics Program data, so the prototyping of these systems would assure sufficient attention to class D systems at this time. In deciding which of the remaining 12 systems should be prototyped, it is well to bear in mind that there are only two observables, velocity and line depth common to most of them. Magnetic flux and continuum intensity measurements must also be made of course, and we can expect more complex filtergram analyses, but they are of limited application and do not present any unique problems from the standpoint of data processing. Thus, I suggest that the prototype should include at least one system based on velocity and one on line depth. A third system might profitably be built using the magnetic flux as the observable. As far as the back-end processing of Classes A-C is concerned, there are three distinct categories currently recognized: full-disc seismology, high-resolution seismology, and correlation tracking. All of these are complex and essential, have unique problems and requirements, and at least in the latter two cases are not well-understood and present interesting problems and challenges. The full-disc helioseismology is essential if the Structure Program is not to be completely ignored in the prototype, and represents the case in which we can learn the most from the experience of others. Thus I suggest as a strawman for the prototype that we construct the following systems: 1. Continuous Full Disk Velocity, Dynamics or Extended Campaign or some intermediate length TBD. (System 1 or 2) 2. High Resolution Line Intensity, Campaign without transverse velocities but with a spatial transform TBD suitable for seismology or tomography. (System 10) 3. High Resolution Magnetogram, Campaign, including transverse velocities and synoptic mapping. (System 12) An alternative prototype of somewhat more limited but still sufficient scope would be to remove the third, magnetogram-based system and include the transverse velocity calculation in the second, line-depth based system. It is still necessary to define the scope of the prototype in terms of the quantity, rate, and fidelity of the data to be processed, but those requirements must be analyzed in the context of the particular systems being designed. APPENDIX List of SOI Data Systems in Data Management Plan 1. Continuous Full Disk Velocity, Dynamics (2 months) 2. Continuous Full Disk Velocity, Extended Campaign (3 days repeated) 3. Full Disk Velocity, Campaign (< 3 days) 4. Continuous Full Disk Line Intensity, Dynamics (1 month) 5. Continuous Full Disk Line Intensity, Extended Campaign (3 days repeated) 6. Full Disk Line Intensity, Campaign (< 3 days) 7. Continuous High Resolution Velocity, Dynamics (1 month) 8. Continuous High Resolution Velocity, Extended Campaign (3 days repeated) 9. High Resolution Velocity, Campaign (< 3 days) 10. High Resolution Line Intensity, Campaign (< 3 days) 11. Full Disk Magnetograms 12. High Resolution Magnetograms 13. Structure Program Mode Amplitudes 14. Structure Program 90*90 Velocity, Mapped 15. Structure Program 128*128 Continuum Intensity 16. Structure Program LOI Resolution Velocity 17. Structure Program LOI Resolution Line Depth 18. Structure Program LOI Resolution Velocity, Time Averages 19. Structure Program LOI Resolution Line Depth, Time Averages 20. Structure Program Limb Figure 21. Structure Program 128*128 Magnetic Proxy 22. Structure Program 256*256 Velocity, Time Averages 23. Structure Program 128*128 Tracking Subraster Velocity, Time Averages 24. Velocity Calibration 25. Ephemeris and Ancillary Data 26. SOI-MDI Housekeeping 27. SOHO Housekeeping