A SSSC User's Guide

All About Strategy Modules

Introduction

Strategy modules provide a mechanism for isolating programs from the calling interface, and in particular from input and output mechanisms. The reason for designing and using strategy modules is that the SSSC data processing pipeline does not operate in an ordinary shell environment. In order to assume full control of the resource allocations for input and output data availability, processing programs are executed in the pipeline by a special process known as a pe. Strategy modules are programs that can be run by a pe. Other interfaces to strategy modules are possible. For purposes of testing and development, there is a shell command-line interface to strategy modules. The command-line interface makes it possible for anyone to run any pipeline module outside the pipeline. There is also a CGI forms-based interface, developed solely for demonstration and testing purposes, which nonetheless works satisfactorily within certain limits. These interfaces are described in more detail in the section Running a Module below.

There are only a few key distinctions between a module and an ordinary main program. (The module interfaces are written in C and it is assumed that all modules are also written in C, at least at the top level.) Briefly, these are as follows:

There is no main() routine. In its place, the body of a module is declared as a subroutine with a fixed collection of arguments:
```
  int module_name (KEY *params, KEY **results,
    int (*history)(const char *fmt, ...),
    int (*errlog)(const char *fmt, ...)) {module body}
```
Because the external interface to all modules is identical, the module name must be connected to a function pointer contained in the external driver by the following statement external to any routine:
```
  int (*DoIt)() = module_name;
```
A global reference name of the module needs to be declared as strategy_name[]; it need not be the routine name itself. Example:
```
  char strategy_name[] = "My Module";
```
In place of the list char **args by which a shell communicates with a program as its child process, a module requires a declared argument list. This list, which is external to the module or any routine, declares the name, type, and default values of any parameters which may be passed to the module as arguments. A module cannot be exercised unless all its required arguments are somehow satisfied; extraneous arguments are ignored (at leats by the existing interfaces). A sample arguments list:
```
  argument arguments[] = {
    {ARG_DATA_IN, "in", "", "", ""},
    {ARG_DATA_OUT, "out", "", "", ""},
    {ARG_STRING, "name", "Not Specified", "", ""},
    {ARG_FLOAT, "value", "0.0", "", ""},
    {ARG_END, "", "", "", ""}
  };
```
The first line and the last two lines in the above list are required; an indefinite number of optional arguments may be supplied.
The argument values are made available to the module through the keylist params in the module declaration. The module may communicate results to the calling routine through the keylist results. Appropriate functions are available for manipulating these keylists.
Input and output via the pointers stdin, stdout, and stderr should be avoided within a module, since there is no guaranteed connection, depending on the interface. Consequently, the functions printf() and scanf() should be replaced by generalized extensions which can be bound to appropriate I/O functions via the interface. There is no mechanism provided for stdin; input must be via the arguments list or by reads from declared file pointers. The function printf() is replaced by the two declared function pointers (*history)() and (*errlog)(), which offer the possibility of ditinguishing the output channel (analogously to the use of stdout and stderr).

The following simple strategy module (from the CM directory src/examples) illustrates all the essential features of a module.

Example 1

					    /*  required .h inclusion  */
#include <module.h>

/*  required module name (comparable to argv[0] in shell environment)  */
  char strategy_name[] = "Hello World";

			       /*  list of required arguments, if any  */
  argument arguments[] = {
    {ARG_STRING,   "msg", "Hello, world!", "", ""}, 
    {ARG_END,      "", "", "", ""}
  };

  int hello_world (KEY *params, KEY **results,
    int (*history)(const char*, ...), int (*errlog)(const char*, ...)) {
			   /*  Put the message on the history channel  */
    (*history) ("%s\n", getkey_str (params, "msg"));
  }

	  /*  required binding to external driver from user interface  */
  int (*DoIt)() = hello_world;

The Arguments List

Every module must have an arguments list declared globally as in the example above. The arguments list is an array of structures defined as:

typedef struct ARG {
   int   kind;
   char *key;
   char *default_value;
   char *range;
   char *description;
} argument;

The first member of the structure, kind, declares the type of the associated parameter. The second member, key, is a key name associated with the parameter. Individual arguments are parsed by the driver program into a keylist of name/value associations according to rules appropriate for their kinds. The argument kind must be declared for all arguments, and unique key names must be declared for all arguments except those of type ARG_END. The remaining three members are optional, although allocations must be provided of course, even if they are only empty strings.

Twelve different values for argument.kind are recognized. Rather than using their numerical values, it is better to use their associated defined constants.

ARG_INT ARG_INTS ARG_FLAG ARG_END
ARG_FLOAT ARG_FLOATS ARG_DATA_IN ARG_NUME
ARG_STRING ARG_TIME ARG_DATA_OUT ARG_FILEPTR

ARG_INT, ARG_FLOAT, ARG_STRING: These types should be self-explanatory. Arguments of type ARG_FLOAT are associated with double-precision floating-point values. If an invalid string is supplied as a value for an argument of type ARG_INT or ARG_FLOAT, the returned value will be the result of its parsing by the functions strtol() or strtod(), namely -2147483648 or NaN, respectively. This is actually a useful feature, since it allows the use of values forcing default actions. For example, if a parameter provides a way of manually overriding a normal data-dependent value, its default value would normally be specified as a string like "Not Specified". That way, any valid value could be used as the manual override; only with the default would the data-dependent action be taken.
ARG_INTS, ARG_FLOATS: These types declare arguments whose types are arrays of integers or double-precision floats rather than single values. The lengths of the arrays are arbitrary, determined by the number of comma separated values of the appropriate type within the delimiter pair, which can be brackets [], braces {}, or parentheses (). An additional key named key_nvals is supplied to the module, giving the number of values in the array for the parameter key.
ARG_TIME: This is really a special case of ARG_FLOAT, in which the supplied value string is parsed according to the conventions of date-time representations discussed in SOI TN 94-116. The value returned is of type SOI_TIME (a double representing seconds since 1977.0 TAI, as described in that note).
ARG_FLAG: Arguments of this type are integers with a very restricted range, either two-valued (0, -1) or three-valued (0, ±1). The idea is that they can be represented on a shell command line as flag-type arguments, i.e. a single character preceded by a "-" (or "+") sign, and that further the flags can be concatenated, so that for example "-pdf" coul;d be used to set the argument values corresponding to the flag keys p, d, and f all "true". (The pe, however, does not distinguish between arguments of type ARG_FLAG and ARG_INT.)
ARG_DATA_IN, ARG_DATA_OUT: These are special strings representing the names of collections of input or output data. They are parsed according to special rules that provide a large number of additional keywords that can be used in the selection of data sets or records. The distinction between input and output data descriptors is only meaningful to the pe, which must assure availability of input data sets and must allocate space for output data sets before processing the module. The pe will also copy the history file produced by the module and the generating map file into all output data sets. (There is an older general type ARG_DATASET, but it is not allowed by the pe and its use elsewhere is consequently to be discouraged.)
ARG_END: The ARG_END type is a special marker used to denote the last element of the arguments array, whose length is otherwise arbitrary. One element of its type is required. The values of the remaining structure members for this argument are ignored.
ARG_NUME: This is a new file type not yet fully implemented (nor documented). It is intended to represent an enumerated class (corresponding to the C type enum) of arbitrary values.
ARG_FILEPTR: This is an argument type which can refer either to a named file (to be opened as a file pointer) or a standard file pointer stdin or stdout, represented as "<" or ">". It was intended for constructing shell-style pipelines of modules, but it has not been implemented outside the shell. It is deprecated.

A simple module that merely echoes the parsed values of its arguments whose types cover most of the supported ones can be run here: ecco

Compilation

As should be evident from the introduction, simply compiling a strategy module source file will not produce an executable image, merely an object module. That object module must be linked with at least one driver program (the "main" program), and may in fact be linked with several to provide different target executables appropriate for the diferent environments.

To compile a module it is generally sufficient to include the standard SSSC headers contained in the directory /CM/include in the path, along with the system-supplied headers. These include module.h which has all the declarations necessary for a general module, plus many others useful for specific analysis applications as described in the relevant man pages.

$ cc -c mymodule.c -I/CM/include -I/usr/include

N.B. On solaris, you must compile with gcc rather than with /usr/ucb/cc on account of the use of the (const char *) in the function declarations.

To load an executable, you must link the resulting object module mymodule.o with the appropriate driver and libraries. If you have MACHINE defined to one of the appropriate names for supported architectures (e.g. sgi, sgi4, sol, linux; see the environment guide), then the required libraries are found in /CM/lib/_$MACHINE, and the various compiled driver programs reside in /CM/src/cmd/_$MACHINE.b (with a couple of exceptions discussed below). To create a module that can be run in the shell environment, the driver program is called main.o, and the simplest required load command is:

$ cc -o $BIN/mymodule mymodule.o /CM/src/cmd/_$MACHINE.b/main.o \
  -L/CM/lib/_$MACHINE -lsoi

All of the essential libraries for processing modules, including the handling of key lists and data structures in a generalized form, are contained in libsoi.a library. Additional libraries such as the system math library libm.a and instrument-specific functions such as those in libMDI.a might also be needed of course, depending on the application and the functions called.

$ cc -o $CGIBIN/mymodule mymodule.o /CM/src/cmd/_$MACHINE.b/wpgen.o \
  -L/CM/lib/_$MACHINE -lsoi
$ cc -o $CGIBIN/mymodule_cgi mymodule.o /CM/src/cmd/_$MACHINE.b/main_cgi.o \
  -L/CM/lib/_$MACHINE -lsoi

$ cc -o $BIN/mymodule_svc mymodule.o /CM/src/cmd/_$MACHINE.b/main_svc.o \
  /CM/src/pipe/_$MACHINE.b/logkey.o -L/CM/pvm3/lib/$ARCH -lpvm3 \
  -L/CM/lib/_$MACHINE -lpe -lsoi

Running a Module

As mentioned at the outset there are, by design, several ways to actually invoke a strategy module. One mechanism, designed for operation in a processing pipeline, is the pe process. A command-line interface allows both interactive and scripted invocation of a strategy module, and an experimental CGI forms interface allows interactive remote invocation of modules.

The shell command-line interface to strategy modules is straightforward. If the executable image has been named myprog, it is invoked from the command line by typing [PATH/]myprog with a list of flags and arguments as appropriate. Argument values (except flags) are given by the key name immediately followed by an "=" sign (no white space), followed by the argument value, which may be optionally separated by white space from the "=" sign. (Strings with embedded white space must be quoted of course, and special symbols of lexical significance to the shell, such as "*" and "[", must be escaped.) In this environment the history channel defaults to stdout and the errlog channel to stderr unless otherwise declared through the special history and/or errlog arguments. Example module 1 would be run as follows, with the output shown:

$ hello_world
Hello, world!
$ hello_world msg= "hi there"
hi there

There is a full description in the man page module.

A pe process is itself invoked interactively or from a script. It reads information from a designated map file of a specific format containing the name of the module to be run, values for the arguments, and certain other information needed to parse dataset names and to control interaction with the associated services of the DSDS.

history=history \
logfile=logfile \
msg=hi there \
p=hello_world d=0 a=t5

For a fuller description, see the man page pe.

Page last revised Thursday, 29-Nov-2001 10:33:29 PST
Please address comments and questions to Rick Bogart, Jima Aloise, or Phil Scherrer.

SSSC Users Guide - SOI Home

ARG_INT	ARG_INTS	ARG_FLAG	ARG_END
ARG_FLOAT	ARG_FLOATS	ARG_DATA_IN	ARG_NUME
ARG_STRING	ARG_TIME	ARG_DATA_OUT	ARG_FILEPTR