A SSSC User's Guide
All About Strategy Modules
Contents
- Introduction
- The Arguments List
- Compilation
- Running a Module
Introduction
Strategy modules provide a mechanism for isolating programs from the
calling interface, and in particular from input and output mechanisms.
The reason for designing and using strategy modules is that the SSSC
data processing pipeline does not operate in an ordinary shell environment.
In order to assume full control of the resource allocations for input
and output data availability, processing programs are executed in the
pipeline by a special process known as a pe. Strategy modules are
programs that can be run by a pe. Other interfaces to strategy
modules are possible. For purposes of testing and development, there is
a shell command-line interface to strategy modules. The command-line
interface makes it possible for anyone to run any pipeline
module outside the pipeline. There is also a CGI forms-based interface,
developed solely for demonstration and testing purposes, which nonetheless
works satisfactorily within certain limits. These interfaces are described
in more detail in the section
Running a Module below.
There are only a few key distinctions between a module and an ordinary
main program. (The module interfaces are written in C and it is
assumed that all modules are also written in C, at least at the
top level.) Briefly, these are as follows:
- There is no main() routine. In its place, the body of a module
is declared as a subroutine with a fixed collection of arguments:
int module_name (KEY *params, KEY **results,
int (*history)(const char *fmt, ...),
int (*errlog)(const char *fmt, ...)) {module body}
Because the external interface to all modules is identical, the module name
must be connected to a function pointer contained in the external driver by
the following statement external to any routine:
int (*DoIt)() = module_name;
- A global reference name of the module needs to be declared as
strategy_name[]; it need not be the routine name itself. Example:
char strategy_name[] = "My Module";
- In place of the list char **args by which a shell communicates
with a program as its child process, a module requires a declared argument
list. This list, which is external to the module or any routine, declares
the name, type, and default values of any parameters which may be passed
to the module as arguments. A module cannot be exercised unless all its
required arguments are somehow satisfied; extraneous arguments are ignored
(at leats by the existing interfaces). A sample arguments list:
argument arguments[] = {
{ARG_DATA_IN, "in", "", "", ""},
{ARG_DATA_OUT, "out", "", "", ""},
{ARG_STRING, "name", "Not Specified", "", ""},
{ARG_FLOAT, "value", "0.0", "", ""},
{ARG_END, "", "", "", ""}
};
The first line and the last two lines in the above list are required; an
indefinite number of optional arguments may be supplied.
The argument values are made available to the module through the keylist
params in the module declaration. The module may communicate
results to the calling routine through the keylist results.
Appropriate functions are available for manipulating these keylists.
- Input and output via the pointers stdin, stdout, and
stderr should be avoided within a module, since there is no
guaranteed connection, depending on the interface. Consequently, the
functions printf() and scanf() should be replaced by generalized
extensions which can be bound to appropriate I/O functions via the
interface. There is no mechanism provided for stdin; input
must be via the arguments list or by reads from declared file pointers.
The function printf() is replaced by the two declared function
pointers (*history)() and (*errlog)(), which offer the
possibility of ditinguishing the output channel (analogously to the
use of stdout and stderr).
The following simple strategy module (from the CM directory src/examples)
illustrates all the essential features of a module.
Example 1
/* required .h inclusion */
#include <module.h>
/* required module name (comparable to argv[0] in shell environment) */
char strategy_name[] = "Hello World";
/* list of required arguments, if any */
argument arguments[] = {
{ARG_STRING, "msg", "Hello, world!", "", ""},
{ARG_END, "", "", "", ""}
};
int hello_world (KEY *params, KEY **results,
int (*history)(const char*, ...), int (*errlog)(const char*, ...)) {
/* Put the message on the history channel */
(*history) ("%s\n", getkey_str (params, "msg"));
}
/* required binding to external driver from user interface */
int (*DoIt)() = hello_world;
The Arguments List
Every module must have an arguments list declared globally as in the
example above.
The arguments list is an array of structures defined as:
typedef struct ARG {
int kind;
char *key;
char *default_value;
char *range;
char *description;
} argument;
The first member of the structure, kind, declares the type of
the associated parameter.
The second member, key, is a key name associated with the parameter.
Individual arguments are parsed by the driver program into a keylist
of name/value associations according to rules appropriate for their kinds.
The argument kind must be declared for all arguments, and unique key names
must be declared for all arguments except those of type ARG_END.
The remaining three members are optional, although allocations must be
provided of course, even if they are only empty strings.
Twelve different values for argument.kind are recognized. Rather
than using their numerical values, it is better to use their associated
defined constants.
ARG_INT |
ARG_INTS |
ARG_FLAG |
ARG_END |
ARG_FLOAT |
ARG_FLOATS |
ARG_DATA_IN |
ARG_NUME |
ARG_STRING |
ARG_TIME |
ARG_DATA_OUT |
ARG_FILEPTR |
- ARG_INT, ARG_FLOAT, ARG_STRING
- These types should be self-explanatory. Arguments of type ARG_FLOAT
are associated with double-precision floating-point values. If an invalid
string is supplied as a value for an argument of type ARG_INT or ARG_FLOAT,
the returned value will be the result of its parsing by the functions
strtol() or strtod(), namely -2147483648 or NaN, respectively.
This is actually a useful feature, since it allows the use of values
forcing default actions. For example, if a parameter provides a way
of manually overriding a normal data-dependent value, its default value
would normally be specified as a string like "Not Specified". That way,
any valid value could be used as the manual override; only with the
default would the data-dependent action be taken.
- ARG_INTS, ARG_FLOATS
- These types declare arguments whose types are arrays of integers
or double-precision floats rather than single values. The lengths of
the arrays are arbitrary, determined by the number of comma separated
values of the appropriate type within the delimiter pair, which can be
brackets [], braces {}, or parentheses (). An
additional key named key_nvals is supplied to the module, giving
the number of values in the array for the parameter key.
- ARG_TIME
- This is really a special case of ARG_FLOAT, in which the supplied
value string is parsed according to the conventions of date-time
representations discussed in
SOI
TN 94-116. The value returned is of type SOI_TIME (a double
representing seconds since 1977.0 TAI, as described in that note).
- ARG_FLAG
- Arguments of this type are integers with a very restricted range, either
two-valued (0, -1) or three-valued (0, ±1). The idea is that they
can be represented on a shell command line as flag-type arguments, i.e.
a single character preceded by a "-" (or "+") sign, and that further the
flags can be concatenated, so that for example "-pdf" coul;d be used to
set the argument values corresponding to the flag keys p, d, and
f all "true". (The pe, however, does not distinguish
between arguments of type ARG_FLAG and ARG_INT.)
- ARG_DATA_IN, ARG_DATA_OUT
- These are special strings representing the names of collections of
input or output data. They are parsed according to special rules that
provide a large number of additional keywords that can be used in the
selection of data sets or records.
The distinction between input and output data descriptors is only
meaningful to the pe, which must assure availability of input
data sets and must allocate space for output data sets before processing
the module. The pe will also copy the history file
produced by the module and the generating map file into all output
data sets. (There is an older general type ARG_DATASET, but it is
not allowed by the pe and its use elsewhere is consequently
to be discouraged.)
- ARG_END
- The ARG_END type is a special marker used to denote the last element
of the arguments array, whose length is otherwise arbitrary. One element
of its type is required. The values of the remaining structure members for
this argument are ignored.
- ARG_NUME
- This is a new file type not yet fully implemented (nor documented).
It is intended to represent an enumerated class (corresponding to the C
type enum) of arbitrary values.
- ARG_FILEPTR
- This is an argument type which can refer either to a named file
(to be opened as a file pointer) or a standard file pointer stdin
or stdout, represented as "<" or ">". It was intended for
constructing shell-style pipelines of modules, but it has not been
implemented outside the shell. It is deprecated.
A simple module that merely echoes the parsed values of its arguments
whose types cover most of the supported ones can be run here:
ecco
Compilation
As should be evident from the introduction, simply compiling a strategy
module source file will not produce an executable image, merely an
object module. That object module must be linked with at least one
driver program (the "main" program), and may in fact be
linked with several to provide different target executables appropriate
for the diferent environments.
To compile a module it is generally sufficient to include the standard
SSSC headers contained in the directory /CM/include in the path,
along with the system-supplied headers. These include module.h
which has all the declarations necessary for a general module, plus
many others useful for specific analysis applications as described
in the relevant man pages.
$ cc -c mymodule.c -I/CM/include -I/usr/include
N.B. On solaris, you must compile with gcc rather than
with /usr/ucb/cc on account of the use of the (const char *)
in the function declarations.
To load an executable, you must link the resulting object module
mymodule.o with the appropriate driver and libraries. If
you have MACHINE defined to one of the appropriate
names for supported architectures (e.g. sgi, sgi4, sol, linux;
see the environment guide),
then the required libraries are found in /CM/lib/_$MACHINE,
and the various compiled driver programs reside in
/CM/src/cmd/_$MACHINE.b
(with a couple of exceptions discussed below). To create a module
that can be run in the shell environment, the driver program is
called main.o, and the simplest required load command is:
$ cc -o $BIN/mymodule mymodule.o /CM/src/cmd/_$MACHINE.b/main.o \
-L/CM/lib/_$MACHINE -lsoi
All of the essential libraries for processing modules, including the
handling of key lists and data structures in a generalized form,
are contained in libsoi.a library. Additional libraries such as
the system math library libm.a and instrument-specific
functions such as those in libMDI.a might also be needed of course,
depending on the application and the functions called.
$ cc -o $CGIBIN/mymodule mymodule.o /CM/src/cmd/_$MACHINE.b/wpgen.o \
-L/CM/lib/_$MACHINE -lsoi
$ cc -o $CGIBIN/mymodule_cgi mymodule.o /CM/src/cmd/_$MACHINE.b/main_cgi.o \
-L/CM/lib/_$MACHINE -lsoi
$ cc -o $BIN/mymodule_svc mymodule.o /CM/src/cmd/_$MACHINE.b/main_svc.o \
/CM/src/pipe/_$MACHINE.b/logkey.o -L/CM/pvm3/lib/$ARCH -lpvm3 \
-L/CM/lib/_$MACHINE -lpe -lsoi
Running a Module
As mentioned at the outset there are, by design, several ways to actually
invoke a strategy module. One mechanism, designed for operation in a
processing pipeline, is the pe process. A command-line
interface allows both interactive and scripted invocation of a strategy
module, and an experimental CGI forms interface allows interactive remote
invocation of modules.
The shell command-line interface to strategy modules is straightforward.
If the executable image has been named myprog, it is invoked from
the command line by typing [PATH/]myprog with a list of flags
and arguments as appropriate. Argument values (except flags) are given
by the key name immediately followed by an "=" sign (no white space),
followed by the argument value, which may be optionally separated by
white space from the "=" sign. (Strings with embedded white space must
be quoted of course, and special symbols of lexical significance to the
shell, such as "*" and "[", must be escaped.) In this environment the
history channel defaults to stdout and the errlog channel to
stderr unless otherwise declared through the special
history and/or errlog arguments.
Example module 1 would be run as follows, with the output shown:
$ hello_world
Hello, world!
$ hello_world msg= "hi there"
hi there
There is a full description in the man page
module.
A pe process is itself invoked interactively or from a script.
It reads information from a designated map file of a specific format
containing the name of the module to be run, values for the arguments,
and certain other information needed to parse dataset names and to control
interaction with the associated services of the DSDS.
history=history \
logfile=logfile \
msg=hi there \
p=hello_world d=0 a=t5
For a fuller description, see the man page
pe.
Page last revised
Thursday, 29-Nov-2001 10:33:29 PST
Please address comments and questions to Rick Bogart, Jima Aloise, or
Phil Scherrer.
SSSC Users
Guide
-
SOI Home