OPEN directive 

Opens files.

 

No options

 

Parameters

NAME = texts External names of the files

CHANNEL = scalars Channel number to be used to refer to each file in other statements (numbers for each type of file are independent); if this is set to a scalar containing a missing value, the first available channel of the specified type is opened and the scalar is set to the channel number

FILETYPE = strings Type of each file (input, output, unformatted, backingstore, procedurelibrary, graphics); default inpu

WIDTH = scalars Maximum width of a record in each file; if omitted, 80 is assumed for input files, the full line-printer width (usually 132) for output files

INDENTATION = scalar Number of spaces to leave at the start of each line; default 0

PAGE = scalars Number of lines per page (relevant only for output files)

ACCESS = string Allowed type of access (readonly, writeonly, both); default both

 

Description

Genstat makes use of various types of file. These are classified according to the information that they store. The files are accessed via channels. For each type there is a set of numbered channels that can be used to reference different files in the relevant directives. For example, there are five input channels, numbered 1 up to 5. Likewise, there are five output channels. Genstat distinguishes between the different types of channel, so you can have one file attached to output channel 3 and a different file simultaneously attached to backing store channel 3. Then, setting the option CHANNEL=3 in PRINT and STORE statements will send the different kinds of output to the appropriate files. With backing-store files, there are six channels, numbered 0 to 5, but channel 0 is reserved for the backing-store workfile. Similarly, there are six channels, numbered 0 to 5, for unformatted files. For procedure libraries there are three channels, numbered 1 to 3. For graphics files, each channel is used for output in a particular graphics format, corresponding to the number of the device selected by the DEVICE directive.

When you run Genstat it starts taking input from input channel 1 and produces output on output channel 1. In an interactive run, these will be keyboard and screen, while in a batch run they will be files on the computer. Another file that is attached automatically is the start-up file of instructions that are executed at the outset of each job; this is attached to input channel 5. The start-up file may attach other files. For example, if you are working interactively, the standard start-up file arranges for output channel 5 to store a transcript of your output. (This is done using the COPY directive.) The command that you use to run Genstat may allow you to arrange for other files to be attached when Genstat starts running. Alternatively, within Genstat, you can use the OPEN directive.

Usually you need specify only the name of the file, the channel number and type of file, and leave the other parameters to take their default settings. For example, the following statements attach a file called WEATHER.DAT to the second input channel, and then read data from it.

OPEN 'WEATHER.DAT'; CHANNEL=2; FILETYPE=input

READ [CHANNEL=2] Rain,Temperature,Sunshine

The filename can be anything that is acceptable to your computer system. You should, however, check for any constraints: for example, plotting software may require HPGL graphics files to have the extension .HPGL. You should check in your local documentation for information regarding any features that are specific to your computer or version of Genstat. For example, logical or symbolic names may be automatically translated by Genstat before files are accessed; upper and lower case characters may be significant, as on Unix systems. The filename may involve characters that have special meaning within Genstat. For example, the character \ may be required to specify directories and sub-directories on a PC. This character needs to be duplicated in a string to avoid Genstat interpreting it as the continuation symbol: for example

OPEN 'C:\\RES\\WEATHER.DAT'; CHANNEL=2; FILETYPE=input

to open the file 'C:\RES\WEATHER.DAT'. As a more convenient alternative, the PC version of Genstat allows you to use / instead.

You are free to choose which channels you want to use (within the range available for the specified type of file), apart from input and output channel 1 which are "reserved" for use by the files specified on the command line. As already mentioned, input channel 5 is used for the start-up file, and this may arrange for output channel 5 to store a transcript of your output. However, you can use the CLOSE directive to disconnect these files if you want to use the channels for some other purpose. The backing-store and unformatted work files are attached to channel 0, and this channel cannot be used in OPEN or CLOSE. Graphics files must be opened on the channel corresponding to the device number.

Obviously you cannot open more than one file on a channel, so if you wish to open a file on a channel that is currently in use you must first close that channel. Sometimes, in general programs or procedures, you may not know which channels are available. You can then let OPEN find a free channel: if CHANNEL is set to a scalar containing a missing value, the file is opened on the next available channel of the appropriate type, and the scalar is set to the number of the channel. The scalar need not be declared in advance; if CHANNEL is set to an undeclared structure, this will be defined as a scalar automatically.

SCALAR FreeChan

OPEN 'WEATHER.DAT'; CHANNEL=FreeChan; FILETYPE=input

READ [CHANNEL=FreeChan] Rain,Temperature,Sunshine

Another constraint is that you cannot open the same file on more than one channel at once.

Input files must already exist when they are opened, whereas output files will be created by Genstat. If an output file with the specified name exists already, Genstat may create an extra "version" of the file, or report a fault, or cause the file to be overwritten, depending on the usual conventions on your type of computer. Your local documentation will describe what rules apply in this situation, and should also explain if there are any system variables you can set to control this action.

When you open a file for use by backing store or unformatted input and output, you can both read from it and send output to it, unless you set the ACCESS parameter (see below). Procedure libraries are a special type of backing-store file.

The WIDTH parameter sets the maximum number of characters per line for input and output files. It is ignored for other types of file. The default values for WIDTH are designed to be appropriate for each implementation of Genstat and may differ between input and output; details will be found in your local documentation. For input and output with screen displays that use windows WIDTH may be set automatically from the size of the appropriate window. For input files the default is normally 80, reflecting the size of most screen displays. You can change this if necessary, to read either fewer characters from each line, or longer lines. If the WIDTH is set to be too small any extra characters will be lost, which may cause unexpected action or syntax errors. Remember that if you use READ with LAYOUT=fixed to read fixed-format data, short lines are extended with spaces up to the WIDTH setting. If you want to read data from a file with, say, 64 characters per line, setting WIDTH=64 when you open the file may make the format specification easier (rather than taking the default width of 80 and having to remember to skip 16 characters at the end of each line).

For output files, the default is the largest number of characters that can usually be displayed in a single line. This number is typically 80 for terminals but for files it is likely to be either 80, 120 or 132, depending on the type of computer. You can use the WIDTH parameter to restrict the number of output characters to a smaller number, or to a larger number up to 200. The statement

HELP environment,channel

can be used to obtain information about current width settings for input and output channels.

The PAGE parameter specifies the size of page in output, affecting directives like GRAPH, QUESTION, and HELP. For output to files, the default value of PAGE is designed to be suitable for printers. For windowed displays Genstat will, if possible, detect the size of the window and set the page size appropriately. You can also set option OUTPRINT=page in either JOB or SET to ensure that graphs and statistical analyses each start on a new page.

The INDENTATION parameter can be used to leave a specified number of blank characters to the left of each line of an output file, so that printed output can be bound for example. The indentation is subtracted from the WIDTH setting, so if you set WIDTH=80 and INDENTATION=10 then only 70 characters will be printed on each line of output.

The ACCESS parameter is used to control the way in which unformatted and backing-store files can be accessed, on computers that allow this.

 

OPTION directive

Defines the options of a Genstat procedure with information to allow them to be checked when the procedure is executed.

 

No options

 

Parameters

NAME = texts Names of the options

MODE = strings Mode of each option (e, f, p, t, v, as for unnamed structures); default p

NVALUES = scalars or variates Specifies allowed numbers of values

VALUES = variates or texts Defines the allowed values for a structure of type variate or text

DEFAULT = identifiers Default values for each option

SET = strings Indicates whether or not each option must be set (yes, no); default no

DECLARED = strings Indicates whether or not the setting of each option must have been declared (yes, no); default no

TYPE = texts Text for each option, whose values indicate the types allowed (ASAVE, datamatrix {i.e. pointer to variates of equal lengths as required in multivariate analysis}, diagonalmatrix, dummy, expression, factor, formula, LRV, matrix, pointer, RSAVE, scalar, SSPM, symmetricmatrix, table, text, TSAVE, TSM, variate); default * meaning no limitation

COMPATIBLE = texts Defines aspects to check for compatibility with the first parameter of the directive or procedure (nvalues, nlevels, nrows, ncolumns, type, levels, labels {of factors or pointers}, mode, rows, columns, classification, margins, associatedidentifier, suffixes {of pointers}, restriction)

PRESENT = strings Indicates whether or not each structure must have values (yes, no); default no

LIST = strings Whether to allow a list of identifiers (MODE=p) or of values (MODE=v or t) instead of just one (yes, no); default no

 

Description

The OPTION directive is used at the start of the definition of a Genstat procedure (initiated by the PROCEDURE directive) to define the options of the procedure. The NAMES parameter defines the names of the options. Each name also defines the identifier of a data structure that will be used, within the procedure itself, to refer to the information transmitted by the relevant option. When you use the procedure, you have the choice of typing each name in capital letters, or in small letters, or in any mixture of the two; this corresponds to the rules for the names of options and parameters of directives. However, to avoid ambiguity, Genstat automatically converts the corresponding identifiers so that they are all in capital letters, and it is in this form that you must use them in the statements within the procedure.

The MODE parameter tells Genstat whether the setting of each option is to be a number (v), or an identifier of a data structure (p), or a string (t), or an expression (e), or a formula (f). These codes are exactly the same as those that indicate the mode of the values to appear within the brackets containing an unnamed structure.

The type of the structure used to represent an option of the procedure depends on the MODE and LIST parameters of the OPTION directive.

For anything other than mode p, the structure will be a dummy. This will point to an expression for mode e, a formula for mode f, and a text for mode t. With mode v, it will point to a scalar if the corresponding setting of the LIST parameter is no, and a variate if LIST=yes.

For mode p and LIST=no, the structure is a dummy, which will point to whichever structure is supplied for the option when the procedure is called; alternatively, when LIST=yes, it is a pointer which will store the list of structures that are supplied. For example, suppose that procedure ALLPOSS which contains the option definitions

OPTION NAMES='EXP','FORM','VLN','VLY','TLN','TLY','PLN','PLY'; \

MODE= e, f, v, v, t, t, p, p; \

LIST= no, no, no, no, yes, yes, no, yes

is called with these options settings:

ALLPOSS [EXP=LOG10(X+1); FORM=Variety*Nitrogen; VLN=2; \

VLY=1,3,5,7; TLN=oneval; TLY=one,two,three; PLN=A; PLY=B,C,D]

Inside the procedure it will be as though the identifiers had been defined as follows:

DUMMY [VALUE=!E(LOG10(X+1))] EXP

& [VALUE=!F(Variety*Nitrogen)] FORM

& [VALUE=2] VLN

& [VALUE=!(1,3,5,7)] VLY

& [VALUE='oneval'] TLN

& [VALUE=!T(one,two,three)] TLY

& [VALUE=A] PLN

POINTER [VALUE=B,C,D] PLY

The other parameters allow the settings that are supplied, when the procedure is called, to be checked automatically.

The NVALUES parameter indicates how many values the structures that are supplied for an option of mode p may contain. For example,

OPTION NAME='X','Y'; NVALUES=3,!(3,4); TYPE='variate'

indicates that the variates supplied for X must be of length 3, while those supplied for Y can be of length 3 or 4.

The VALUES parameter can be used with modes t and v to specify an allowed set of values against which those supplied for the option will be checked. In this example, the values allowed for METHOD are LOGIT, COMPLOGL, or ANGULAR.

OPTION NAME='METHOD'; MODE=t; \

VALUES=!t(LOGIT,COMPLOGL,ANGULAR); \

DEFAULT='LOGIT'

The allowed values for mode t can be up to eight characters in length; characters 9 onwards are ignored and the values are converted to upper case. When the procedure is used, Genstat will check the specified string against those in the VALUES list, using the same abbreviation rules as for options or parameters of the ordinary Genstat directives. Thus, for example, to request an angular transformation we need merely put METHOD=A as the first letter A is sufficient to distinguish ANGULAR from LOGIT and COMPLOGL. Within the procedure, Genstat then sets METHOD to the full string, in capitals, ANGULAR and this greatly simplifies its subsequent use. As an example of mode v, this specification would ensure that the numbers supplied for an option NV were all odd integers between one and nine

OPTION NAME='NV'; MODE=v; VALUES=!(1,3,5,7,9)

The DEFAULT parameter specifies default values to be used if the option or parameter or option is not set. Above METHOD will be set by default to 'LOGIT'.

The SET parameter indicates whether or not an option must be set. The DECLARED parameter specifies whether or not the structures to which options of mode p are set must already have been declared. The TYPE parameter can be used to specify a text to indicate the allowed types of the structures to which an option of mode p is set. The COMPATIBLE parameter can be used to specify compatibility checks to be made for the setting of an option against the first parameter of the procedure. (The parameters are specified using the PARAMETER directive.) Finally, the PRESENT parameter allows you to indicate that the structure to which an option is set must have values.

For example, here the options DATA and RESULT can be can be either scalars, variates, tables, or any type of matrix (rectangular, symmetric, or diagonal). Structures to which the DATA option is set must have been declared, but for the RESULTS option they need not have been. Likewise the DATA option must have values, but the RESULTS option need not.

PARAMETER NAME='PERCENT','RESULT'; \

MODE=p; SET=yes; DECLARED=yes,no; \

TYPE=!t(scalar,variate,matrix,symmetric,diagonal,table); \

PRESENT=yes,no

 

OR directive

Introduces a set of alternative statements in a "multiple-selection" control structure.

 

No options or parameters

 

Description

A multiple-selection control structure consists of several alternative blocks of statements. The first of these is introduced by a CASE statement. This has a single parameter, which is an expression that must yield a single number. Subsequent blocks are each introduced by an OR statement. There can then be a final block, introduced by an ELSE statement, as in the block-if structure. The whole structure is terminated by an ENDCASE statement. Full details are given in the description of the CASE directive.

 

OUTPUT directive

Defines where output is to be stored or displayed.

 

Options

PRINT = strings Additions to output (dots, page, unchanged); default dots,page

DIAGNOSTIC = strings What diagnostic printing is required (messages, warnings, faults, extra, unchanged); default faul,mess,warn

WIDTH = scalar Limit on number of characters per record; default width of output file

INDENTATION = scalar Number of spaces to leave at the start of each line; default 0

PAGE = scalar Number of lines per page

 

Parameter

scalar Channel number of output file

 

Description

The OUTPUT directive changes the current output channel and thus re-defines where the output will be sent by the subsequent statements in a program, until another OUTPUT statement is given (excluding any statements that use a CHANNEL option to redirect their output). Thus

OUTPUT 2

PRINT X

PRINT [CHANNEL=3] Y

ANOVA X

sends the values of X, and the analysis of X by the ANOVA statement, to the file on the second output channel, and the values of Y to the file on the third.

The PRINT option controls two aspects of the output produced for example from statistical analyses: whether a line of dots is printed at the start, and whether the output begins on a new page; this can also be controlled by the OUTPRINT option of SET. Similarly, the DIAGNOSTIC option has exactly the same effect as the DIAGNOSTIC option of SET.

The WIDTH option specifies the maximum width to be used when producing output. The default value is the width specified when the file was opened, but you can subsequently decrease it; you cannot use OUTPUT to set the width to a greater value than that specified when the file was opened. The PAGE option allows you to reset the number of lines per page.

 

OWN directive

Does work specified in Fortran subprograms linked into Genstat by the user.

 

Option

SELECT = scalar Sets a switch, designed to allow OWN to be used for many applications; standard set-up assumes a scalar in the range 0-9; default 0

 

Parameters

IN = identifiers Supplies input structures, which must have values, needed by the auxiliary subprograms

OUT = identifiers Supplies output structures whose values or attributes are to be defined by the auxiliary subprograms

 

Description

To implement the OWN directive, you must get access to some of the Genstat source code. The relevant section of the code is named Module X, and is distributed with Genstat to all sites, probably in a file called X.FOR. The module consists of several Fortran 77 subprograms but to implement the OWN directive you need to modify only the subprogram called G5XZXO. This contains extensive comments that describe the way it works, and the straightforward changes that you would need to make in order to call your own subprograms. These comments are designed to be the complete documentation, and so the details are not repeated here.

The IN parameter allows you to pass values of data structures into your subprograms. Genstat will check these input structures before calling your subprograms, to ensure that they are of the right type and length for your program, and that they have been assigned values. The OUT parameter copies values calculated by your subprograms into Genstat data structures. You can arrange to define the type and length of these output structures either before or after calling your subprograms.

If the setting of the IN parameter is a list of identifiers, the OWN directive will call your subprograms more than once. Each time it will make available to your subprograms the values of one structure in the IN list, and will take information from the subprograms and put them into the corresponding structure in the OUT list. Therefore, to pass several structures at a time to your subprograms, you must put the structures into pointers. For example,

OWN IN=!p(A1,A2,A3),!p(B1,B2,B3); OUT=X,Y

will call your subprograms twice, passing information about A1, A2, A3, and X the first time, and about B1, B2, B3, and Y the second time. It does this because !p(A1,A2,A3), for example, is a single structure.

If you want to pass just one pointer to your subprograms, you must ensure that OWN does not treat the pointer as a set of structures each of which is to be passed. You can do this by constructing another pointer to hold just the identifier of the pointer that you want to pass; for example:

POINTER [VALUES=A,B,C] S1

OWN IN=!p(S1)

The SELECT option allows you to call any number of subprograms independently. Thus, you can set up OWN so that the statements

OWN [SELECT=1]

and

OWN [SELECT=2]

do totally unrelated tasks. The standard version of G5XZXO deals only with the default value, 0, of SELECT, and would need to be extended if you wanted to cater for alternative values. However, you should be able to use much of the Fortran that deals with the default setting.

The distributed version of Genstat contains a version of the G5XZXO subprogram that carries out a simple calculation, purely for illustration of how the subroutine works. In this version, the result of

OWN IN=!p(V,S,M); OUT=W

is to shift, square, and scale the values of V; that is, it does the calculation

W = M * (V + S)**2

The subprogram checks that precisely three structures are given in the pointer specified by the IN parameter, and that they are a variate and two scalars with values already present. It also checks that there is precisely one output structure, a variate; this is implicitly declared by OWN if necessary, based on the length of the input variate. Missing values in the input structures are also checked for and dealt with appropriately. The subprogram calls another one called G5XZSQ actually to carry out the transformation. To modify G5XZXO, you need to alter the details of the checks on the structures and substitute the call for one to your own subprogram.

The standard version of the G5XZXO subprogram will produce Genstat diagnostics if the checks on the input or output structures fail, or if there is not enough workspace. These diagnostics are the standard ones with codes VA, SX, and SP, and are dealt with by a section at the end of the G5XZXO subprogram. You can define your own diagnostics, using the code ZZ. You are not allowed to edit the standard file of error messages that stores the one-line definitions of each diagnostic code. However, you can edit the G5XZPF subprogram which is in module X. This prints extra messages after a ZZ diagnostic; instructions for editing the subprogram are contained as comments in it.

Output from your subprograms is most easily arranged by storing the information that you want in data structures, and printing these with a PRINT statement after the OWN statement. Alternatively, you can give Fortran WRITE statements; there are standard routines in Genstat for outputting numbers and strings, but they are not described here. You should use the correct Fortran unit numbers for output, and this varies between implementations of Genstat. You can find out the unit numbers by giving the command

HELP environment

in any Genstat program. Note that a Fortran unit number is not the same as a Genstat channel number.

 

PAGE directive

Moves to the top of the next page of an output file.

 

Option

CHANNEL = scalar Channel number of file; default * i.e. current output file

 

No parameters

 

Description

PAGE arranges for future output to start on a new page. By default, PAGE works on the current output channel, but you can use the CHANNEL option if you are sending output to another file. PAGE has no effect unless output is to a file, and it achieves its effect by printing a line consisting of just the control code for a form feed (ASCII character 12). The effect of PAGE is therefore independent of the page size set by the OPEN directive.

 

PARAMETER directive

Defines the parameters of a Genstat procedure with information to allow them to be checked when the procedure is executed.

 

No options

 

Parameters

NAME = texts Names of the parameters

MODE = strings Mode of each parameter (e, f, p, t, v, as for unnamed structures); default p

NVALUES = scalars or variates Specifies allowed numbers of values

VALUES = variates or texts Defines the allowed values for a structure of type variate or text

DEFAULT = identifiers Default values for each parameter

SET = strings Indicates whether or not each parameter must be set (yes, no); default no

DECLARED = strings Indicates whether or not the setting of each parameter must have been declared (yes, no); default no

TYPE = texts Text for each option, whose values indicate the types allowed (ASAVE, datamatrix {i.e. pointer to variates of equal lengths as required in multivariate analysis}, diagonalmatrix, dummy, expression, factor, formula, LRV, matrix, pointer, RSAVE, scalar, SSPM, symmetricmatrix, table, text, TSAVE, TSM, variate); default * meaning no limitation

COMPATIBLE = texts Defines aspects to check for compatibility with the first parameter of the directive or procedure (nvalues, nlevels, nrows, ncolumns, type, levels, labels {of factors or pointers}, mode, rows, columns, classification, margins, associatedidentifier, suffixes {of pointers}, restriction)

PRESENT = strings Indicates whether or not each structure must have values (yes, no); default no

 

Description

The PARAMETER directive is used at the start of the definition of a Genstat procedure (initiated by the PROCEDURE directive) to define the parameters of the procedure. The NAMES parameter defines the names of the parameters. Each name also defines the identifier of a data structure that will be used, within the procedure itself, to refer to the information transmitted by the relevant parameter. When you use the procedure, you have the choice of typing each name in capital letters, or in small letters, or in any mixture of the two; this corresponds to the rules for the names of options and parameters of directives. However, to avoid ambiguity, Genstat automatically converts the corresponding identifiers so that they are all in capital letters, and it is in this form that you must use them in the statements within the procedure. The data structures within the procedure are either all dummies or all pointers, according to the setting of the PARAMETER option of the PROCEDURE directive. If they are pointers, they store all the settings, and the procedure is called only once; if they are dummies, the procedure is called once for every item in the lists.

The other parameters of PARAMETER allow the settings that are supplied, when the procedure is called, to be checked automatically. The MODE parameter tells Genstat whether the setting of each parameter is to be a number (v), or an identifier of a data structure (p), or a string (t), or an expression (e), or a formula (f). These codes are exactly the same as those that indicate the mode of the values to appear within the brackets containing an unnamed structure. The NVALUES parameter indicates how many values the structures that are supplied for a parameter of mode p may contain. The VALUES parameter can be used with modes t and v to specify an allowed set of values against which those supplied for the parameter will be checked. The DEFAULT parameter specifies default values to be used if the parameter is not set, and the SET parameter indicates whether or not a parameter must be set. The DECLARED parameter specifies whether or not the structures to which options or parameters of mode p are set must already have been declared. The TYPE parameter can be used to specify a text to indicate the allowed types of the structures to which an option or parameter of mode p is set. Finally, the PRESENT parameter allows you to indicate that the structure to which an option or parameter is set must have values.

 

PASS directive

Does work specified in subprograms supplied by the user, but not linked into Genstat. This directive may not be available on some computers.

 

Option

NAME = text Filename of external executable program; default 'GNPASS'

 

Parameter

pointers Structures whose values are to be passed to the external program, and returned

 

Description

On some computers, you can arrange that one program, such as Genstat, calls for another to be executed, passing information directly between the two. You can then cause Genstat to execute your own subprograms without having to modify Genstat in any way. This is done by the PASS directive.

To find out if the PASS directive has been implemented in your version, you can either look at the Users' Note or type

PASS

in any Genstat program. You will either get a message saying that the PASS directive has not been implemented, or you will get a Genstat diagnostic telling you that Genstat has failed to initiate a sub-process: this means that PASS has been implemented. If PASS has not been implemented, you could use the OWN directive. Alternatively, you may be able to use the SUSPEND directive.

To use the PASS directive when it is available, you must first get access to the GNPASS program which is distributed with Genstat. You then form an executable program consisting of GNPASS, slightly modified as detailed below, and your own subprograms. GNPASS, like Genstat, is written in Fortran 77; however, on most computers, it is possible to use equivalent programs in other computing languages. The GNPASS program deals with communication with Genstat, and passes information to and from your subprograms.

You can pass the values of any data structures except texts. All the structures needed by your subprograms must be combined in a pointer structure, unless only one structure is needed and it is not a pointer. The structures must have values before you include them in a PASS statement; if you want to use some of the structures to store results from your subprograms, you must initialize them to some arbitrary values, such as zero or missing. If you specify several pointers in a PASS statement, your subprograms will be invoked several times, to deal in turn with each set of structures stored by the pointers. However, the values of the structures in all the pointers are copied before any work is done by your subprograms. Thus, if you want to operate with PASS on the results of a previous operation by PASS, you must give two PASS statements with one pointer each rather than one statement with two pointers.

As an example, consider using PASS to carry out a simple transformation of a variate, as would be done by the statement

CALCULATE W = M*(V+S)**2

where V and W are variates, and M and S are scalars. You would need a Fortran subprogram to calculate the values of W from supplied values of M, V, and S. The distributed version of the GNPASS program is accompanied with just such a subprogram, called SQUARE, for the purpose of illustrating how to use PASS. So all you need to do is to compile and link the program and subprogram into an executable program, called GNPASS for convenience. Then you can run Genstat and give the following statements:

SCALAR S,M; VALUE=2,10

VARIATE [VALUES=1...10] V

& [VALUES=10(*)] W

PASS !p(V,S,M,W)

The PASS statement will cause the GNPASS program to run, and assign the calculated values to the variate W.

Numbers can be used in place of scalars, as usual in Genstat statements:

PASS !p(V,2,10,W)

To transform the values in both V, as above, and another variate X, with values 10...50 say, you could give the statements:

VARIATE [VALUES=41(*)] Y

PASS !p(V,2,10,W),!p(X,2,10,Y)

The NAME option is used to specify the filename of the executable program formed from the GNPASS program and your subprograms. By default, the name GNPASS is assumed.

The distributed form of the GNPASS program, if available on your computer, consists of Fortran statements that receive information from Genstat as supplied by a PASS statement, call the SQUARE subprogram, and then send back the information as modified by SQUARE. To make it do the task that you require, you need to edit the program to call your subprograms instead of SQUARE. The documentation for GNPASS is provided as comments within the GNPASS program, so the details are not included here as well. After preparing the Fortran, you need to form it into an executable program. This will require a Fortran compiler, and to be certain of communicating successfully with Genstat, the compiler should be the same as that used in preparing Genstat - this information is given in the Installers' Note that accompanies Genstat. It may also be possible to use other source languages, provided the input and output formats of their compilers are compatible with that used by Genstat.

 

PCO directive

Performs principal coordinates analysis, also principal components and canonical variates analysis (but with different weighting from that used in CVA) as special cases.

 

Options

PRINT = strings Printed output required (roots, scores, loadings, residuals, centroid, distances); default * i.e. no printing

NROOTS = scalar Number of latent roots for printed output; default * requests them all to be printed

SMALLEST = string Whether to print the smallest roots instead of the largest (yes, no); default no

 

Parameters

DATA = identifiers These can be specified either as a symmetric matrix of similarities or transformed distances or, for the canonical variate analysis, as an SSPM containing within-group sums of squares and products etc or, for principal components analysis, either as a pointer containing the variates of the data matrix or as a matrix storing the variates by columns

LRV = LRVs Latent vectors (i.e. coordinates or scores), roots, and trace from each analysis

CENTROID = diagonal matrices Squared distances of the units from their centroid

RESIDUALS = matrices or variates Distances of the units from the fitted space

LOADINGS = matrices Principal component loadings, or canonical variate loadings

DISTANCES = symmetric matrices Computed inter-unit distances calculated from the variates of a data matrix, or inter-group Mahalanobis distances calculated from a within-group SSPM

 

Description

The PCO directive is used for principal coordinates analysis. This method encompasses principal components analysis and a form of canonical variates analysis as special cases as explained above.

There are six sections of output from PCO, requested using the PRINT option:

roots prints the latent roots and trace;

scores prints the principal coordinate scores;

loadings when the directive is being used for principal components analysis or canonical variates analysis, this specifies that the loadings from the analysis are to be printed;

residuals prints the residuals, this is relevant only if results are to be printed corresponding to only some of the latent roots;

centroid prints the distances (not squared distances) of each unit from their overall centroid;

distances prints the matrix of inter-unit distances (not squared distances).

The NROOTS and SMALLEST options control the printed output of roots, scores, loadings, and residuals. By default, results are printed for all the roots, but you can set the NROOTS option to specify a lesser number. If option SMALLEST has the default setting no these are taken to be the largest roots, but if you set SMALLEST=yes the results are for the smallest non-zero roots. The inter-unit distances are unaffected by the setting of the NROOTS option.

The DATA parameter supplies the data. In its simplest form, PCO works on a symmetric matrix, with values giving the associations amongst a set of objects. This could, for example, be a similarity matrix produced by FSIMILARITY.

Alternatively, the input to PCO can be a pointer whose values are the identifiers of a set of variates, or a matrix storing the variates by columns. Now the PCO directive will construct the matrix of inter-unit squared distances, and will base the analysis on associations derived from this. This is equivalent to a principal components analysis; however, the results are derived by analysing the distance matrix rather than an SSPM. When there are more units than variates, using PCO for principal components analysis is less efficient than using the PCP directive; however, if there are more variates than units the PCO directive is more efficient. When PCO is used for principal components analysis, all the variates must be of the same length and none of their values may be missing; any restrictions on the variates are ignored.

The third type of input to PCO is an SSPM structure. This must be a within-group SSPM: that is, you must have set the GROUP option of the SSPM directive when the SSPM was declared. Now the PCO directive will calculate the Mahalanobis distances amongst the group means, and base the analysis on them. This will give results similar to a canonical variates analysis. The representation of distances will be better than that of CVA, but CVA will be better if you are interested in loadings for discriminatory purposes.

The second and subsequent parameters of PCO allow you to save the results. The number of units that determine the sizes of the output structures differs according to the input to PCO. For a matrix or a symmetric matrix the number of units is the number of rows of the matrix, for a pointer it is the number of values in the variates that the pointer contains, while for an SSPM the number of units is the number of groups.

The latent roots, scores, and trace can be saved in an LRV structure using the LRV parameter. If you have declared the LRV already, its number of rows must equal the number of units.

If the input to PCO is a pointer, a matrix, or an SSPM, the principal component or canonical variate loadings can be saved in a matrix using the LOADINGS parameter. The number of rows of the matrix is equal to the number of variates (either those specified by an input pointer or those specified in the SSPM directive for an input SSPM structure), or the number of columns in an input matrix.

The number of columns of the LRV and of the LOADINGS matrix corresponds to the number of dimensions to be saved from the analysis, and this must be the same for both of them. If the structures have been declared already, Genstat will take the larger of the numbers of columns declared for either, and declare (or redeclare) the other one to match. If neither has been declared and option SMALLEST retains the default setting no, Genstat takes the number of columns from the setting of the NROOTS option. Otherwise, Genstat saves results for the full set of dimensions. The trace saved as the third component of the LRV structure, however, will contain the sums of all the latent roots, whether or not they have all been saved.

The distances of the units from their centroid can be saved in a diagonal matrix using the CENTROID parameter. The diagonal matrix has the same number of rows as the number of units, defined above. The RESIDUALS parameter allows you to save residuals, formed from the dimensions that have not been saved, in a matrix with one column and number of rows equal to the number of units. Finally, the inter-unit distances can be saved in a symmetric matrix using the DISTANCES parameter. The number of rows of the symmetric matrix is again the same as the number of units.

Having obtained an ordination, you may sometimes want to add points to the ordination for additional units. If you know the squared distances of the new units from the old, the technique of Gower (1968) can be used to add points to the ordination for the new units. You can do this in Genstat by using the ADDPOINTS directive.

 

Reference

Gower, J.C. (1968). Adding a point to vector diagrams in multivariate analysis. Biometrika 55, 582-585.

 

PCP directive

Performs principal components analysis.

 

Options

PRINT = strings Printed output required (loadings, roots, residuals, scores, tests); default * i.e. no printing

NROOTS = scalar Number of latent roots for printed output; default * requests them all to be printed

SMALLEST = string Whether to print the smallest roots instead of the largest (yes, no); default no

METHOD = string Whether to use sums of squares, correlations or variances and covariances (ssp, correlation, variancecovariance); default ssp

 

Parameters

DATA = pointers or matrices or SSPMs

Pointer of variates forming the data matrix, or matrix storing the variate values by columns, or SSPM giving their sums of squares and products (or correlations) etc

LRV = LRVs To store the principal component loadings, roots, and trace from each analysis

SSPM = SSPMs To store the computed sum-of-squares-and-products or correlation matrix

SCORES = matrices To store the principal component scores

RESIDUALS = matrices or variates To store residuals from the dimensions fitted in the analysis (i.e. number of columns of the SCORES matrix, or as defined by the NROOTS option)

 

Description

Principal components analysis finds linear combinations of a set of variates that maximize the variation contained within them, thereby displaying most of the original variability in a smaller number of dimensions. Principal components analysis operates on sums of squares and products, or a correlation matrix, or a matrix of variances and covariances, formed from the variates.

You supply the input for PCP using the first parameter; this list may have more than one entry, in which case Genstat repeats the analysis for each of the input structures. Instead of supplying an SSPM, you can supply a pointer containing the set of variates, or a matrix storing the variate values by columns. Genstat will then calculate the sums of squares and products, or correlations, or variances and covariances for the analysis (see option METHOD below).

For example, these two forms of input are equivalent:

SSPM [TERMS=Height,Length,Width,Weight] S

FSSPM S

PCP [PRINT=roots] S

and

PCP [PRINT=roots] !P(Height,Length,Width,Weight)

But the first form does mean that you have the sums of squares and products available for later use, in the SSPM S. Here the pointer is unnamed but you may wish to use a named pointer. For example:

POINTER [VALUES=Height,Length,Width,Weight] Dmat

PCP [PRINT=roots] Dmat

By default the PCP directive does not print any results: you use the PRINT option to specify what output you require. The printed output is in five sections, each with a corresponding setting, as illustrated in the examples below.

The columns of the matrices of principal component loadings and scores correspond to the latent roots. Each latent root corresponds to a single dimension, and gives the variability of the scores in that dimension. The loadings give the linear coefficients of the variables that are used to construct the scores in each dimension.

The significance tests are for equality of the k smallest roots: li (i = 1, 2, ... k). The test statistic is

 

where n is the number of units and p is the number of variables. Asymptotically, the statistics have a chi-squared distribution with (k+2)(k-1)/2 degrees of freedom. If any latent roots are zero, Genstat excludes them from the calculation of the test statistic; the effective value of p is reduced accordingly.

If you omit the NROOTS option, Genstat prints by default the results corresponding to all the latent roots. The number of latent roots is the number of variates involved in the input to PCP. The NROOTS option allows you to print only part of the results, corresponding to the first or last r latent roots. You may then want to print the residuals formed from the remaining columns of scores. The residuals are all positive: this is because residuals from multivariate analyses generally occupy several dimensions, so they represent distances in multidimensional space and signs cannot be attached to them.

To print results corresponding to the r smallest latent roots, you must set option NROOTS to r and option SMALLEST to yes. Now if residuals are printed they will be formed from the scores corresponding to the largest roots. The NROOTS and SMALLEST options apply to the latent roots and vectors, the principal component scores and the residuals. So you cannot print directly, for example, the first two columns of scores and the last three columns of loadings. This is rarely required but, if necessary, it can be done by saving the relevant results and printing them separately.

By default, the PCP directive operates on the SSPM but you can set the METHOD option to correlations to operate on a derived matrix of correlations, or to variancecovariance to use variances and covariances. Note that when correlations are analysed the significance-test statistics no longer have asymptotic chi-squared distributions.

The LRV parameter allows you to save the principal component loadings, the latent roots, and their sum (the trace) in an LRV structure, while the SCORES parameter saves the principal component scores in a matrix. If you have declared the LRV already, its number of rows must be the same as the number of variates supplied in an input pointer or implied by an input SSPM. The number of rows of the SCORES matrix, if previously declared, must be equal to the number of units.

The number of columns of the LRV and of the SCORES matrix corresponds to the number of dimensions to be saved from the analysis, and this must be the same for both of them. If the structures have been declared already, Genstat will take the larger of the numbers of columns declared for either, and declare (or redeclare) the other one to match. If neither has been declared and option SMALLEST retains the default setting no, Genstat takes the number of columns from the setting of the NROOTS option. Otherwise, Genstat saves results for the full set of dimensions. The trace saved as the third component of the LRV structure, however, will contain the sums of all the latent roots, whether or not they have all been saved. Procedure LRVSCREE can be used to produce a "scree" diagram which can be helpful in deciding how many dimensions to save.

The SSPM parameter can save the SSPM structure used for the analysis. A particularly convenient instance is when you have supplied an SSPM structure as input but, for example, have set METHOD=correlation: the SSPM that is saved will then contain correlations instead of sums of squares and products.

The RESIDUALS parameter allows you to save the principal component residuals, in a matrix with number of rows equal to the number of units and one column. If the latent roots and vectors (loadings) are saved from the analysis, the residuals will correspond to the dimensions not saved; the same applies if you save scores. If neither the LRV nor scores are saved, the saved residuals will correspond to the smallest latent roots not printed.

If the variables used to form the SSPM structure are restricted, then the analysis will be subject to that restriction. Similarly, if a pointer to a set of variates is used as input to PCP, then any restriction on the variates will be taken into account by the analysis. If you want principal component scores or residuals to be printed or saved from the analysis, the original data must be available. The matrices to save such results must have been declared with as many rows as the variates have values, ignoring the restriction. You can calculate the analysis from one subset of units, but calculate the scores and residuals for all the units, by using as input to PCP an SSPM structure formed using a weight variate with zeros for the excluded sampling units and unity for those to be included. For example, to exclude a known set of outliers from an analysis, but to print scores for them, these statements could be used:

POINTER [NVALUES=5] V

FACTOR [LABELS=!T(No,Yes)] Outlier

READ [CHANNEL=2] Outlier,V[]

CALCULATE Wt = Outlier .IN. `No'

SSPM [TERMS=V] S

FSSPM [WEIGHT=Wt] S

PCP [PRINT=scores] S

Principal component regression is provided by procedure RIDGE.

 

PEN directive

Defines the properties of "pens" for high-resolution graphics.

 

No options

 

Parameters

NUMBER = scalars Numbers associated with the pens

COLOUR = scalars Number of the colour used with each pen

LINESTYLE = scalars Style for line used by each pen when joining points

METHOD = strings Method for determining line (point, line, monotonic, closed, open, fill)

SYMBOLS = scalars, pointers, factors or texts

Plotting symbols: scalar for special symbols, pointer for user defined symbols, text or factor for character symbols

LABELS = texts or factors Define labels that will be printed alongside the plotting symbols, provided these consist of a single character

ROTATION = scalars or variates Rotation required for the plotting symbols (in degrees)

JOIN = strings Order in which points are to be joined by each pen (ascending, given)

BRUSH = scalars Number of the type of area filling used with each pen when drawing pie charts or histograms

FONT = scalars Font for to be used for any text written by each pen

THICKNESS = scalars Thickness with which any lines are drawn by each pen

SIZE = scalars or variates Multiplier used in the calculation of the size in which to draw characters and symbols by each pen

SAVE = pointers Saves details of the current settings for the pen concerned

 

Description

Graphical displays are drawn using graphical pens. Certain pens are used by default, or you can specify other pens, as described in the preceding sections. The attributes of each pen, such as colour, font, and symbol-type, determine how they are used to generate output. The PEN directive can be used to change these attributes of the pens so that you can modify the resulting display. Different attributes are relevant for different types of output, for example symbols and labels are used only within DGRAPH.

The NUMBER parameter lists the numbers of the pens, in the range 1 to 32, that you wish to redefine. For many of the directives that produce pictures, pens 1 to 32 are used in turn for the different structures being plotted, while the default pens used for axis annotation are 30, 31, and 32. Thus, if you modify the attributes of the pens you should be aware of possible side-effects, for example, when the axes are drawn.

The COLOUR parameter specifies a colour in the range 0 to 32 to be used by the pen. If fewer colours are available the colours are used in turn, and then recycled. Where possible, colour 0 is interpreted as the background colour, thus allowing points to be easily erased from a plot. The association of colour numbers with actual colours depends on the particular device. Using a colour graphics screen, the colours should be as defined by the COLOUR directive. However, for many plotters, the colour number relates only to the physical position in which the pen is mounted in the plotter, so the actual colours may vary between devices.

The SYMBOLS parameter determines what symbol is drawn at each point by DGRAPH. The numbers 1 to 9 correspond to various graphical markers. The initial default symbols are device specific. For colour displays, symbol 1 is used for all pens but in different colours. On monochrome displays, the pens all use colour 1 and symbols 1 to 9 are used in turn: symbol 1 for pen 1, symbol 2 for pen 2, and so on.

You can also use any standard character to mark the points (for example you could set SYMBOLS='+' to use the plus character), or you can request device-specific symbols. If you do not want to plot symbols at the data points, for example when drawing a line through the points, you can set SYMBOLS=0. You can also set SYMBOLS to a pointer containing a pair of variates, to define your own symbol. The variates contain the coordinates of a set of points to be joined by straight line segments; these points should be within a notional square with bounds -1.0 to 1.0 in each direction. The square is centred on the data point and scaled to the same size as the standard symbols. Missing values can be included in the definition so that separate pen strokes are used draw line segments. You can mark different points with different symbols (for example to indicate groupings in the data) by setting the PEN parameter of DGRAPH to a variate or factor specifying a pen with the appropriate symbol for each point.

You can also label each point with a string or number. The LABELS parameter is set to a text structure specifying the strings to be plotted at each point. You can specify a single string to be plotted at every point, otherwise the text must have the same number of values as the Y and X variates that are being plotted. LABELS can also be specified as a factor; the factor labels are then used, if available, otherwise the levels. This provides another means of representing grouped data.

The graphical symbols are drawn so that they are centred at the specified position. If LABELS are specified they are aligned to the left of the markers, unless you have set SYMBOLS=0 to suppress the markers, in which case the labels are drawn so that the bottom left point of the first character is at the specified (x,y) position. For compatibility with previous releases of Genstat you can also set SYMBOLS to a factor or text, which has the same effect as setting LABELS with SYMBOLS=0.

The METHOD parameter specifies the type of graph to be plotted: points, lines, or filled polygons. The initial default for every pen, METHOD=point, will result in points being plotted using the corresponding symbols, labels, colours, and fonts. Various types of line can be drawn through the plotted points; either straight lines (line) or smooth curves (monotonic, open, and closed). The monotonic setting specifies that a smooth single-valued curve is to be drawn through the data points. The name is derived from the requirement that the x-values (rather than the fitted curve) must be strictly monotonic, so that there is only one y-value for each distinct x-value. To ensure this, a copy of the data is made and sorted before the curve is fitted. This setting is recommended for plotting curves fitted to data, for example with FITCURVE. You should ensure that the points are close enough for the plotted line to be a reasonable approximation. When you know the functional form of the curve, it may be advantageous to calculate extra points. The open and closed settings specify that a smooth, possibly multi-valued, curve is to be drawn through the data points, using the method of McConalogue (1970); the resulting curve is rotationally invariant, although it is not invariant under scaling. The closed setting connects the last point to the first. McConalogue's method (open or closed) is more suited to the situation where the plotted curve is intended to represent the shape of an object. The setting METHOD=fill joins the data points by straight lines to produce one or more polygons. Each polygon is then shaded in the style specified by BRUSH (see below). The plotting method also determines how contours will be drawn. Also, the combination of SYMBOLS=0 and METHOD=point will produce no plotting at all (and no warning) within DGRAPH.

If the requested plotting method produces a line through the points, the LINESTYLE parameter will specify what sort of line is drawn (for example a solid, dotted, or dashed line). The type of line style is denoted by a number in the range 1 up to 10. The exact appearance of the different line styles is device-specific, and there are not necessarily 10 different line styles available on a particular device, but line style 1 should always produce a solid line.

The JOIN parameter controls the order in which points are connected when lines are to be drawn or the points define a polygon to be shaded. Given requests that the data are to be plotted in the order in which they are stored, whereas ascending implies that the data are copied and sorted so that the x-values are in ascending order before plotting. This parameter is ignored when METHOD=monotonic, as this requires that the data must always be sorted.

The BRUSH parameter controls how areas are shaded when METHOD is set to fill, or when plotting histograms and pie charts. There are 16 available patterns indicated by the integers 1 to 16. In general, the higher the number, the denser the hatching, and the longer such areas take to plot. The device-specific brush styles are generally faster, and produce smaller output files; however results are not guaranteed to be the same on every type of device.

The THICKNESS parameter allows you to specify an amount by which the standard thickness of plotted lines is to be multiplied. This allows you to increase the thickness of lines, perhaps to highlight some feature of a plot. You can also use thickness to emphasize the axes, by redefining the appropriate pen. For some devices, it is not possible to control the thickness of plotted lines; the THICKNESS parameter is then ignored.

The default size of characters and symbols is determined from the dimensions of the current window. The SIZE parameter can be used to modify the size, by specifying a value by which this default size is to be multiplied. For example when plotting a graph in a small window you may wish to increase the size of annotation in order to make it legible. SIZE can be set to a scalar, or to a variate to allow the different points to be scaled in different ways.

The ROTATION parameter controls the angle (in degrees) at which to plot text or user-defined symbols. The initial setting of zero will produce text "conventionally" orientated. You can set ROTATION to a scalar value that will apply to all points, or to a variate that allows a different angle to be used at each point.

The FONT parameter can be set to an integer between 1 and 25 to select different fonts for text appearing as titles, axis annotation, plotting symbols, and key information. This allows you to control the appearance of textual information and also use other character sets, for example the character 'a' will appear as a when one of the Greek fonts is selected. The available fonts are as follows.

01 Simplex Roman 14 Cyrillic

02 Duplex Roman 15 Triplex Roman

03 Complex Roman 16 Triplex Italic

04 Simplex Greek 17 Map Symbols

05 Complex Greek 18 Astronomical Symbols

06 Complex Italic 19 Music Symbols

07 Mathematical Symbols 20 Monospace Typewriter

08 Meteorological Symbols 21 Typewriter

09 Gothic English 22 Simplex

10 Simplex Script 23 Italic

11 Complex Script 24 Complex

12 Gothic Italian 25 Complex Cyrillic

13 Gothic German

Some of the symbols produced by fonts 7, 8, 17, and 19 are of non-standard size. Device-specific fonts can also be used where available.

You can also use in-line typesetting commands to change font or character size part-way through plotting text, when specified either as a title or label. This allows you to insert Greek characters in an equation, for example, and also to use subscripts, superscripts, and mathematical symbols. The escape character '!' is used to signal a change of font or character size and must be followed immediately by a code indicating the required action. For a simple change of font the code is just the new font number, for example '!07' will switch to the mathematical symbols font. For fonts 1 up to 9 the leading zero may be omitted, so that '!7' may be used instead, but you should be careful of ambiguities; for example '!021' will plot the character '1' in font 2, whereas '!21' will just switch to font 21. The mnemonics 'G' for (Simplex) Greek, 'M' for mathematical, and 'W' for simplex script can be used as well as the font numbers. The additional codes below specify other in-line commands.

A shift above the fraction line I move to index level

B shift below the fraction line L move to lower subscript level

U move up to superscript level N move to normal base line

D move down to subscript level S save current position and size on stack

E move to exponent level R restore position and size from stack

In-line commands can be specified in upper or lower-case. To print the escape character, !, it should be entered twice; for example the string 'Outlier!!' could be used to label a point. If an invalid sequence of characters is specified the remainder of the string will not be plotted and a warning will be printed.

The current settings of each pen can be saved in a pointer supplied by the SAVE parameter. The elements of the pointer are labelled to identify the components. Initial default settings are represented by missing values; the actual values used for these attributes when plotting will depend on the output device.

The standard text fonts, graphical symbols, and brush styles are software generated. However, you can set negative values for these parameters of the PEN directive to select device-specific alternatives. For each parameter, the device-specific settings have the same range as the standard settings; thus you can select symbols -1 to -9, fonts -1 to -25, and brush styles -1 to -16. If fewer device-specific settings are actually available, the settings are taken in turn, and then recycled. Where a feature has no device-specific settings on a particular device, the standard form is used instead (for example, font -3 appearing as font 3). Device-specific font numbers cannot be used within the in-line typesetting system; Genstat will use either the standard fonts or the corresponding device-specific fonts depending on the base font originally specified by the PEN directive. In some cases, device-specific symbols or fonts may be of fixed size; the SIZE parameter will then have no effect, and some of the typesetting commands may not function correctly. Although the device-specific settings are likely to be different from device to device, they are arranged to be consistent where possible, so that for example brush style -1 will select solid fill, if available.

By default, Genstat uses software generated symbols and fonts. For colour displays, by default symbol 1 is used for all pens but in different colours. On monochrome displays, the pens all use colour 1 and symbols 1 to 9 are used in turn: symbol 1 for pen 1, symbol 2 for pen 2, and so on. When solid fill and colour are available, the default brush style is -1, in different colours for each pen; otherwise software-generated brushes are used by default.

 

Reference

McConalogue, D.J. (1970). A quasi-intrinsic scheme for passing a smooth curve through a discrete set of points. Computer Journal 13, 392-396.

 

POINTER directive

Declares one or more pointer data structures.

 

Options

NVALUES = scalar or text Number of values, or labels for values; default *

VALUES = identifiers Values for all the pointers; default *

SUFFIXES = variate Defines an integer number for each of the suffixes; default * indicates that the numbers 1,2,... are to be used

CASE = string Whether to distinguish upper- and lower-case in the labels of the pointers (significant, ignored); default sign

ABBREVIATE = string Whether or not to allow the labels to be abbreviated (yes, no); default no

FIXNVALUES = string Whether or not to prohibit automatic extension of the pointers (yes, no); default no

RENAME = string Whether to reset the default names of elements of the pointer if they do not have their own identifiers (yes, no); default no

MODIFY = string Whether to modify (instead of redefining) existing structures (yes, no); default no

 

Parameters

IDENTIFIER = identifiers Identifiers of the pointers

VALUES = pointers Values for each pointer

EXTRA = texts Extra text associated with each identifier

 

Description

Lists of data structures can be stored in a Genstat pointer structure to save having to type the list in full every time it is used. For example

POINTER [VALUES=Rain,Temp,Windspeed] Vars

VARIATE #Vars

READ [CHANNEL=2] #Vars

PRINT #Vars; DECIMALS=2,1,2

defines Rain, Temp, and Windspeed to be variates, and then reads and prints their values. When none of the structures in the list is itself a pointer, the substitution symbol (#) simply replaces the pointer by its values. If, however, there are pointers in the list, they too are substituted, as are any pointers to which they point. An example is given below.

The individual elements of a pointer can also be referred to by the use of suffixes. We can refer to Rain above either using its own identifier, or as the first element of Vars by using the suffix [1]: so

Vars[1] is Rain

Vars[2] is Temp

Vars[3] is Windspeed

Furthermore, we can put a list within the brackets:

Vars[3,1] is Windspeed,Rain.

Also, you can put a null list to mean all the available suffixes of the pointer:

Vars[] is Rain,Temp,Windspeed.

Identifiers like Vars[1], Vars[2], and Vars[3] are called suffixed identifiers and, in fact, you can use these even without defining the identifier of the pointer explicitly. Whenever a suffixed identifier is used, Genstat automatically sets up a pointer for the unsuffixed part of the identifier if it does not already exist. Furthermore the pointer will automatically be extended (whether it has been set up by you or by Genstat) if you later use a new suffix, like Vars[93] for example. Notice that the suffixes do not need to be a contiguous list, nor need they run from one upwards, although they must be integers; if you give a decimal number it will be rounded to the nearest integer (for example, -27.2 becomes -27).

The SUFFIXES option of the POINTER directive allows you to specify the required suffixes for pointers that are defined explicitly. For example

VARIATE [VALUES=1990,1991,1992,1993] Suffs

POINTER [NVALUES=4; SUFFIXES=Suffs] Profit

defines Profit to be a pointer of length four, with suffixes 1990 to 1993. If you are setting the suffixes explicitly, you might want to forbid Genstat to extend the pointer if another suffix is encountered later in the program; this can be done by setting option FIXNVALUES=yes.

We could actually omit the NVALUES option in the definition of the pointer Profit above, as Genstat can determine the length of the pointer by counting the number of values. However, by supplying a text instead of a scalar for NVALUES you can define labels for the suffixes of the pointer. The length of the text defines the number of values of the pointer, and its values give the labels. For example

TEXT [VALUES=name,salary,grade] Labs

POINTER [NVALUES=Labs] Employee

would allow you to refer to Employee['name'], Employee['salary'], and so on.

Usually, when the pointer is later used, Genstat requires the labels to be given exactly as in the definition. However, you can set option CASE=ignored to indicate that case is unimportant, so they can be specified in capitals, or lower-case, or in any mixture. You can also set option ABBREVIATE=yes to allow each one to be abbreviated to the minimum number of characters required to distinguish it from the labels of earlier elements of the pointer.

The identifiers in a suffix list can be of scalars, variates, or texts; this of course includes numbers and strings as unnamed scalars and texts respectively. If one of these structures contains several values, it defines a sub-pointer: for example Vars[!(3,2)] is a pointer with two elements, Windspeed and Temp. You must thus be careful not to confuse a sub-pointer with a list of some of the elements of a pointer: for example Vars[!(3,2)] is a single pointer with two elements, whereas Vars[3,2] is a list of the two structures Windspeed and Temp.

Elements of pointers can themselves be pointers, allowing you to construct trees of structures. For example

VARIATE A,B,C,D,E

POINTER R; VALUES=!P(D,E)

& S; VALUES=!P(B,C)

& Q; VALUES=!P(A,S)

& P; VALUES=!P(Q,R)

You can refer to elements within the tree by giving several levels of suffixes: for example P[2][1] is R[1] which is D; P[2,1][1,2] is (R,Q)[1,2] or D,E,A,S. The special symbol # allows you to list all the structures at the ends of the branches of the tree: #P replaces P by the identifiers of the structures to which it points (Q and R); then, if any of these is a pointer, it replaces it by its own values, and so on. Thus #P is the list A,B,C,D,E.

As you have seen, structures need not have an identifiers of their own, but may simply be identifiable as a member of a pointer using the suffix notation. Where a structure like this is a member of more than one pointer, Genstat will refer to it in output using the pointer with which it was first associated. So, for example, in

POINTER [NVALUES=2] P

& [VALUES=P[1,2],C] Q

VARIATE [VALUES=1,2,3,4] Q[]

PRINT Q[]

the output will be labelled as P[1], P[2], and C. However, we can set option RENAME=yes when Q is defined

POINTER [VALUES=P1,2],C; RENAME=yes]

to request that the pointer Q takes precedence over earlier definitions, so the labels become Q[1], Q[2], and C.

 

PREDICT directive

Forms predictions from a linear or generalized linear model.

 

Options

PRINT = string What to print (description, predictions, se); default desc,pred,se

CHANNEL = scalar Channel number for output; default * i.e. current output channel

COMBINATIONS = string Which combinations of factors in the current model to include (full, present, estimable); default full

ADJUSTMENT = string Type of adjustment (marginal, equal); default marg

WEIGHTS = table Weights classified by some or all of the factors in the model; default *

OFFSET = scalar Value of offset on which to base predictions; default mean of offset variate

METHOD = string Method of forming margin (mean, total); default mean

ALIASING = string How to deal with aliased parameters (fault, ignore); default faul

BACKTRANSFORM = string What back-transformation to apply to the values on the linear scale, before calculating the predicted means (link, none); default link

SCOPE = string Controls whether the variance of predictions is calculated on the basis of forecasting new observations rather than summarizing the data to which the model has been fitted (data, new); default data

NOMESSAGE = strings Which warning messages to suppress (dispersion, nonlinear); default *

DISPERSION = scalar Value of dispersion parameter in calculation of s.e.s; default is as set in the MODEL statement

DMETHOD = string Basis of estimate of dispersion, if not fixed by DISPERSION option (deviance, Pearson); default is as set in the MODEL statement

PREDICTIONS = tables or scalars To save tables of predictions for each y variate; default *

SE = tables or scalars To save tables of standard errors of predictions for each y variate; default *

VCOVARIANCE = symmetric matrices

To save variance-covariance matrices of predictions for each y variate; default *

SAVE = identifier Specifies save structure of model to display; default * i.e. that from latest model fitted

 

Parameters

CLASSIFY = vectors Variates and/or factors to classify table of predictions

LEVELS = variates or scalars To specify values of variates, levels of factors

 

Description

The PREDICT directive can be used after the FIT directive to summarize the results of the regression, by using the fitted relationship to predict the values of the response variate at particular values of the explanatory variables. CLASSIFY, the first parameter of PREDICT, specifies those variates or factors in the current regression model whose effects you want to summarize. Any variate or factor in the current model that you do not include will be standardized in some way, as described below.

The LEVELS parameter specifies values at which the summaries are to be calculated, for each of the structures in the CLASSIFY list. For factors, you can select some or all of the levels, while for variates you can specify any set of values. A single level or value is represented by a scalar; several levels or values must be combined into a variate (which may of course be unnamed). A missing value in the LEVELS parameter is taken by Genstat to stand for all the levels of a factor, or for the mean value of a variate.

You can best understand how Genstat forms predictions by regarding its calculations as consisting of two steps. The first step, referred to below as Step A, is to calculate the full table of predictions, classified by every factor in the current model. For any variate in the model, the predictions are formed at its mean, unless you have specified some other values using the LEVELS parameter; if so, these are then taken as a further classification of the table of predictions. The second step, referred to as Step B, is to average the full table of predictions over the classifications that do not appear in the CLASSIFY parameter: you can control the type of averaging using the COMBINATIONS, ADJUSTMENT, and WEIGHTS options. By default, the predictions are made at the mean of any offset variate, but option OFFSET can be used to specify another value at which the predictions should be made instead.

Printed output is controlled by the PRINT option. The description setting produces a summary of what standardization policies are used when forming the predictions, the predictions setting prints the predictions, and se produces predictions and standard errors; by default all these components of output are printed. The standard errors are relevant for the predictions when considered as means of those data that have been analysed (with the means formed according to the averaging policy defined by the options of PREDICT). The word prediction is used because these are predictions of what the means would have been if the factor levels been replicated differently in the data; see Lane and Nelder (1982) for more details.

By default, the standard errors are not augmented by any component corresponding to the estimated variability of a new observation. However, you can set option SCOPE=new to request that the variance of predictions should be calculated on the basis of forecasting new observations rather than of summarizing the data to which the model has been fitted. This setting cannot be used if the predictions are to be standardized for the effects of any factors in the model; in other words, all factors in the current model must be listed in the CLASSIFY parameter of the PREDICT statement. In addition, it cannot be used when making predictions from generalized linear models with option BACKTRANSFORMATION=none. The effect of SCOPE=new is to form variances for each predicted value by combining the variance of the estimated mean value of the prediction (as produced for SCOPE=data) together with the estimated variance of a new observation with the same values of explanatory variates and factors:

"new" variance = "data" variance + (dispersion ´ variance function)

The DISPERSION and DMETHOD options allow you to change the method by which the variance of the distribution of the response values is obtained for calculating the standard errors. These options operate like the corresponding options of MODEL (except that they apply only to the current statement). The default is to use the method as originally defined by the MODEL statement.

You can send the output to another channel, or to a text structure, by setting the CHANNEL option.

The COMBINATIONS option specifies which cells of the full table in Step A are to be filled for averaging in Step B. By default all the cells are used. Alternatively, you can set COMBINATIONS=present to exclude cells for factor combinations that do not occur in the data, or COMBINATIONS=estimable to exclude combinations that involve parameters that cannot be estimated, for example because of aliasing. Setting COMBINATIONS=present or COMBINATIONS=estimable overrules the LEVELS parameter. Any subsets of factor levels in the LEVELS parameter are ignored, and predictions are formed for all the factor levels that occur in the data or are estimable. Likewise, the full table cannot then be classified by any sets of values of variates; the LEVELS parameter must then supply only single values for variates.

The ADJUSTMENT and WEIGHTS options define how the averaging is done in Step B. Values in the full table produced in Step A are averaged with respect to all those factors that you have not included in the settings of the CLASSIFY parameter. By default, the levels of any such factor are combined with what we call marginal weights: that is, by the number of occurrences of each of its levels in the whole dataset. The ADJUSTMENT and WEIGHTS options allow you to change the weights. The setting ADJUSTMENT=equal specifies that the levels are to be weighted equally. The WEIGHTS option is more powerful than the ADJUSTMENT option, allowing you to specify an explicit table of weights. This table can be classified by any, or all, of the factors over whose levels the predictions are to be averaged; the levels of remaining factors will be weighted according to the ADJUSTMENT option. Moreover, you can classify the weights by the factors in the CLASSIFY parameter as well, to provide different weightings for different combinations of levels of these factors. If you supply explicit weights in the WEIGHTS option, any setting of the COMBINATIONS option is ignored. You will find explicit weights useful in particular when you have population estimates of the proportions of each level of a factor - proportions which may not be matched well in the available data.

If a model contains any aliased parameters, predicted values cannot be formed for some cells of the full table without assuming a value for the aliased parameters. If the aliased parameters simply represent effects of variates that are correlated with other explanatory variables in the model, it may be sufficient just to ignore them. This can be done by setting the ALIASING option to ignore. The aliased parameters are then taken to be zero, and fitted values are calculated for all cells of the table from the remaining parameters in the model. Alternatively, you can set COMBINATIONS=estimable to form predictions only for the cells where all the parameters are estimable. Aliasing can also occur if there are some combinations of factors that do not occur in the data, and here it may be more sensible to set option COMBINATIONS=present so that these cells are all excluded from the calculation of predictions. The fourth way to overcome aliasing is to supply explicit weights using the WEIGHTS option.

Averaging is usually the appropriate way of combining predicted values over levels of a factor. But sometimes summation is needed, for example in the analysis of counts by log-linear models. You can achieve this by setting the METHOD option to total. The rules about weights and so on still apply. In a generalized linear model, averaging is done by default on the scale of the original response variable, not on the scale transformed by the link function. In other words, linear predictors are formed for all the combinations of factor levels and variate values specified by PREDICT, and then transformed by the link function back to the natural scale. This back-transformation may be useful when you are reporting results, since the tables from PREDICT can then be interpreted as natural averages of means predicted by the fitted model. You can set option BACKTRANSFORM=none if you want the averaging to be done on the scale of the linear predictor; PREDICT will then form averages and report predictions on the transformed scale.

PREDICT calculates the standard errors of predictions from iterative models by using first-order approximations that allow for the effect of the link function. Thus you should interpret them only as a rough guide to the variability of individual predictions.

The PREDICTIONS, SE, and VCOVARIANCE options let you save the results of PREDICT as well as, or instead of, printing them.

The SAVE option allows you to specify the regression save structure of the analysis on which the predictions are based. If SAVE is not set, the most recent regression model is used.

The NOMESSAGE option controls printing of messages. The nonlinear setting suppresses messages about the approximate nature of standard errors of predictions in generalized linear models, and the dispersion setting prevents reminders appearing about the basis of the standard errors.

 

Reference

Lane, P.W. and Nelder, J.A. (1982). Analysis of covariance and standardization as instances of prediction. Biometrics 38, 613-621.

 

PRINT directive

Prints data in tabular format in an output file, unformatted file, or text.

 

Options

CHANNEL = identifier Channel number of file, or identifier of a text to store output; default current output file

SERIAL = string Whether structures are to be printed in serial order, i.e. all values of the first structure, then all of the second, and so on (yes, no); default no, i.e. values in parallel

IPRINT = strings What identifier and/or text to print for the structure (identifier, extra, associatedidentifier), for a table associatedidentifier prints the identifier of the variate from which the table was formed (e.g. by TABULATE), IPRINT=* suppresses the identifier altogether; default iden

RLPRINT = strings What row labels to print (labels, integers), RLPRINT=* suppresses row labels altogether; default labe

CLPRINT = strings What column labels to print (labels, integers), CLPRINT=* suppresses column labels altogether; default labe

RLWIDTH = scalar Field width for row labels; default 13

INDENTATION = scalar Number of spaces to leave before the first character in the line; default 0

WIDTH = scalar Last allowed position for characters in the line; default width of current output file

SQUASH = string Whether to omit blank lines in the layout of values (yes, no); default no

MISSING = text What to print for missing value; default '*'

ORIENTATION = string How to print vectors or pointers (down, across); default down, i.e. down the page

ACROSS = scalar or factors Number of factors or list of factors to be printed across the page when printing tables; default for a table with two or more classifying factors prints the final factor in the classifying set and the notional factor indexing a parallel list of tables across the page, for a one-way table only the notional factor is printed across the page

DOWN = scalar or factors Number of factors or list of factors to be printed down the page when printing tables; default is to print all other factors down the page

WAFER = scalar or factors Number of factors or list of factors to classify the separate "wafers" (or slices) used to print the tables; default 0

PUNKNOWN = string When to print unknown cells of tables (present, always, zero, missing, never); default pres

UNFORMATTED = string Whether file is unformatted (yes, no); default no

REWIND = string Whether to rewind unformatted file before printing (no,yes); default no

WRAP = string Whether to wrap output that is too long for one line onto subsequent lines, rather than putting it into a subsequent "block" (yes, no); default no

 

Parameters

STRUCTURE = identifiers Structures to be printed

FIELDWIDTH = scalars Field width in which to print the values of each structure (a negative value -n prints numbers in E-format in width n); if omitted, a default is determined (for numbers, this is usually 12; for text, the width is one more character than the longest line)

DECIMALS = scalars Number of decimal places for numbers; if omitted, a default is determined which prints the mean absolute value to 4 significant figures

CHARACTERS = scalars Number of characters to print in strings

SKIP = scalars or variates Number of spaces to leave before each value of a structure (* means newline before structure)

FREPRESENTATION = strings How to represent factor values (labels, levels, ordinals); default is to use labels if available, otherwise levels

JUSTIFICATION = strings How to position values within the field (right, left); if omitted, right is assumed

MNAME = strings Name to print for table margins (margin, total, nobservd, mean, minimum, maximum, variance, count, median, quantile); if omitted, "Margin" is printed

 

Description

The contents of Genstat data structures can be displayed, with appropriate labelling, using the PRINT directive. Output can be printed in the current output channel, or sent to other channels, or put into a text structure. PRINT has many options and parameters to allow you to control the style and format of the output but, in most cases, these can be left with their default settings.

For a quick display of the contents of a list of data structures, you need only give the name of the directive, PRINT, and then list their identifiers. For example

PRINT Source,Amount,Gain

The output is fully annotated with the identifiers, and with row and column labels or numbers, where appropriate. Factors are represented by their labels if available, and otherwise by their levels. The layout of the values is determined automatically by the size and shape of the structures to be printed, and by the space needed to print individual values. The output is arranged in columns; the structures are split if the page is not wide enough, so that one set of columns is completed before the next is printed.

With vectors that all contain the same number of values, the default is to print their values in parallel. Alternatively, you can request that structures are printed in series, one below another, by setting option SERIAL=yes. Of course, if the structures to be printed have different shapes or sizes, their values can be printed only in series. The setting SERIAL=no is then ignored except that, to save space, any vectors or pointers are then printed across the page (that is, as though you had set ORIENTATION=across).

You can use the RESTRICT directive to specify that only a subset of the units of a vector should be printed. When printing in series the vectors can be restricted to different subsets; but with parallel printing any restriction is applied to all the vectors (and any pointers) so, if more than one vector is restricted, they must all have been restricted in the same way.

Genstat annotates each set of values by the identifier of the structure (but this can be controlled by option IPRINT described below) and automatically chooses a suitable format. For a numerical structure, the default is to use a field of 12 characters. If the DECIMALS parameter was set when the structure was declared, this will define the number of decimal places in the output; otherwise, the number of decimal places is determined by calculating the number that would be required to print its mean absolute value to at least four significant figures. Texts (and labels of factors) are usually printed in a field of 12 characters but this is extended if any of the strings in the text requires a wider field. You can define your own formats using the parameters FIELDWIDTH, DECIMALS, CHARACTERS, SKIP, and JUSTIFICATION.

FIELDWIDTH and DECIMALS both operate in a straightforward way. The only potential complication is that a negative FIELDWIDTH can be used to print numbers in scientific format (for example 7.3 E1 instead of 73), with DECIMALS significant places. The DECIMALS parameter is ignored for strings, like the labels of the factors Source and Amount.

In the same way, the CHARACTERS parameter is ignored for numbers; for strings, it allows you to control the number of characters that are printed. By default, Genstat prints all the characters in each string of a text or factor label, unless the CHARACTERS parameter was set to a lesser number when the text or factor was declared.

The SKIP parameter allows you to place extra spaces between the values of each structure. By default, no extra spaces are inserted unless a value fills the field completely, when a single space will be inserted; there is also a blank line before the first printed line. SKIP can be set to either a scalar or a variate in which a positive integer n requests that n spaces are left and a missing value can be used to request a blank line.

The values can be left-justified by setting the JUSTIFICATION parameter to left.

The FREPRESENTATION parameter controls the printing of the factor values. By default Genstat will print labels if there are any; if there are none, it prints the levels. The ordinals setting represents the values by the integers 1 upwards.

The ORIENTATION option is relevant only when you are printing vectors or pointers. By setting ORIENTATION=across, the values are printed in alternate lines, across the page. To ensure that these line up correctly, the fieldwidth is taken as the maximum of those specified for the printed structures, while the field used to print their identifiers is given by the RLWIDTH option (by default 13).

When there is too much output to fit across the page, Genstat will print the output in more than one block unless option WRAP is set to yes. Then Genstat simply wraps each line onto subsequent lines. This is likely to be useful mainly if you are printing the contents of the structures to be read by another program. You might then also wish to suppress the identifiers by setting option IPRINT=* and remove blank lines by setting option SQUASH=yes.

By default, IPRINT=identifier will label the output with the identifier of the structure. Putting IPRINT=identifier,extra will also include any text that has been associated with the structure by the EXTRA parameter when it was declared, while the setting associatedidentifier can be used when a table has been produced by the TABULATE and AKEEP directives, to request that the output be labelled with the identifier of the variate from which the table was formed.

The width of each line can be controlled by the WIDTH option; the default is to take the full available width. The INDENTATION option specifies the number of spaces to leave before each line; by default there are none.

The CHANNEL option determines where the output appears. By default, the output is placed in the current output channel, but CHANNEL can be set to a scalar to send it to another output channel; the correspondence between channels and files on the computer is explained in the description of the OPEN directive. Alternatively, you can set CHANNEL to the identifier of a text to store the output. The text need not be declared in advance; any undeclared structure that is specified by CHANNEL will be defined automatically as a text. Each line of output becomes one value of the text and if the text already has values they will be replaced. You are most likely to want to do this in order to manipulate the text further. Remember, however, that if you print the text later on, its strings will be right-justified by default, so you will need to set JUSTIFICATION=left in the later PRINT statement to achieve the normal appearance of your output. The maximum (and default) line length of this text is the length of what is called the output buffer. This is likely to be 200 on most computers. If you intend to print it to an output file, you should set the WIDTH option as appropriate.

The MISSING option allows you to specify a string to be used instead of the default asterisk symbol to represent missing values. For example, you could set MISSING='unknown' or MISSING=' '.

PRINT can easily be used to print matrices and tables, by taking the default layout and labelling. For tables with more than one dimension, the usual layout has one factor across the page and the others down the page; tables with only one dimension are printed down the page. Several tables can be printed in parallel, provided they all have the same classifying factors. The tables are then printed in alternate columns, as though they formed a larger table with an extra factor (called the table-factor) representing the list of tables. This extra factor thus becomes another (in fact, the final) factor to be printed across the page.

This default layout can be changed using the ACROSS, DOWN, and WAFER options. You may wish to do this simply by changing the factors which appear down and across the page. The ACROSS option can be set to a scalar to specify how many factors should be printed across the page, or to a list of factors to say which ones they should be. DOWN similarly specifies the factors to be printed down the page. However, you cannot specify a list of factors for one of these options and a scalar for any of the others. The table-factor can be represented in these lists by inserting a * in the required position; if you do not mention the table-factor in either list it remains as the last factor in the ACROSS list.

The WAFER option allows you to split the output up into subtables or "wafers". This is particularly useful if the tables have many classifying factors, or if the factors have very long labels. The setting can again be either a scalar or a list of factors (possibly including the table-factor). Each subtable has a heading indicating its position in the full table. If the table-factor is included in the wafer, the identifier of the appropriate table will be printed at the beginning of the label for that wafer; this does not mean that the table-factor itself has been moved, simply that the labelling has been rearranged to make it easier to read.

You need not specify all the options DOWN, ACROSS and WAFER. If you leave any of them out PRINT will deduce the missing information.

You can control the space allowed for labels of the DOWN factors by using the RLWIDTH option. By default this is set to 13, but you might want something else if the labels are very small. If the width provided (by you, or implicitly) is inadequate, PRINT automatically resets it to accommodate the longest row label. You can suppress the labelling by the down factors by setting option RLPRINT=*, and the labelling of the across factors by setting CLPRINT=*.

When tables are produced by TABULATE Genstat sets an internal indicator for use by PRINT to indicate the appropriate label for any margins. When a single table is printed this name will be used by default. When printing tables in parallel, if they all have the same setting of the margin name indicator, the appropriate name is used. If they have different settings, or none at all (tables from sources other than TABULATE) the margins will be labelled Margin by default. You can change the label by setting the MNAME parameter. Tables printed in parallel must have the same label throughout, and Genstat will take the one specified for the first table in the list. But in serial printing, you can use a different margin name for each table.

The TABULATE and AKEEP directives also record the identifier of the variate from which the table was formed, and you can request that this be used to label the output, instead of the identifier of the table itself, by setting the IPRINT option to associatedidentifier.

The PUNKNOWN option controls the printing of the "unknown" cell of a table. The default action is to print this cell, labelled with the table identifier, but only if it contains a value other than missing value or zero. You can select one of five settings:

present (default) print value if not missing or zero

always print the unknown cell regardless of value

zero print unless the value is zero

missing print unless the value is missing

never do not print the unknown cell whatever its value

Options ACROSS, DOWN, WAFER, RLPRINT, and CLPRINT also apply to matrices. By default, though, if you have several matrices they will be printed one after another on the page.

With symmetric matrices the only options of these that are relevant are RLPRINT and CLPRINT; a further setting integer is available for these to request that the rows or columns be labelled by the integers 1 onwards, as well as, or instead of the labels provided with the symmetric matrix: for example setting RLPRINT=integers and CLPRINT=integers, labels would identify the rows by integers and the columns with integers and labels.

The UNFORMATTED option can be used to send output to unformatted files. These can store values of data structures, so that they can later be input again using READ. This provides a convenient of way to free some space temporarily. It can also save computing time if you have a large data set that may need to be read several times. Input from character files is slow. So after vetting a large data set, it will be read more efficiently on future occasions if you transfer its contents to an unformatted file. As an alternative you could use backing store, but this stores the attributes of the structures as well as their values, and so access will take longer. You can also use these facilities to transfer data between Genstat and other programs. The only other options that are relevant to unformatted files are CHANNEL, REWIND, and SERIAL. Genstat automatically creates an unformatted workfile, on channel 0, to which unformatted output is sent by default (by PRINT), and from which unformatted input is taken by default (by READ). This file is deleted automatically at the end of a Genstat run. It is usually quicker to read and write structures in series. Also the values of the structures transferred in parallel must all be of the same mode. Neither texts nor factors can be stored in parallel with values of the other, numerical, structures: scalars, variates, matrices or tables. As an example, we first open a file, and declare some variates, matrices, and factors.

OPEN 'BDAT'; CHANNEL=3; FILETYPE=unformatted

VARIATE X,Y,Z; VALUES=!(11...19),!(21...29),!(31...39)

MATRIX [ROWS=2; COLUMNS=3; VALUES=11,12,13,21,22,23] M

FACTOR [LEVELS=3; VALUES=1,3,2,3,1,2,2,2,1,3] F

The next three statements store data for M and F on the file named BDAT and data for X, Y and Z (in parallel) on the workfile.

PRINT [CHANNEL=3; SERIAL=yes; UNFORMATTED=yes] M,F

PRINT [UNFORMATTED=yes] X,Y,Z

You can now free the space for numerical data for other purposes, by putting

DELETE X,Y,Z,F,M

By rewinding the files we can read the data back into Genstat.

READ [UNFORMATTED=yes; REWIND=yes] X,Y,Z

READ [CHANNEL=3; SERIAL=yes; UNFORMATTED=yes; REWIND=yes] M,F

You can also re-use the external file BDAT in a later job. If you change the lengths of structures, you must remember to reset them to their original values before you use unformatted READ to recover the data values from the file. Only the data values are stored in unformatted files, and not the attributes (such as lengths) as in backing-store files.

 

PROCEDURE directive

Introduces a Genstat procedure.

 

Options

PARAMETER = string Whether to process the structures in each parameter list of the procedure sequentially using a dummy to store each one in turn, or whether to put them all into a pointer so that the procedure is called only once (dummy, pointer); default dumm

RESTORE = strings Which aspects of the Genstat environment to store at the start of the procedure and restore at the end (inprint, outprint, diagnostic, errors, pause, prompt, newline, case, run, units, blockstructure, treatmentstructure, covariate, asave, dsave, rsave, tsave, vsave); default *

SAVE = text Text to save the contents of the procedure (omitting comments and some spaces)

 

Parameter

text Name of the procedure

 

Description

Once you start to write programs for complicated tasks, you may wish to keep them to use again in future. The most convenient way of doing this is to form them into procedures. You may also wish to use procedures written by other people. The use of a Genstat procedure looks exactly the same as the use one of the standard Genstat directives. The only difference is that the name of the procedure must not be abbreviated beyond eight letters, whereas directive names can be abbreviated to four. Thus, you simply give the name of the procedure, and then specify options and parameters as required.

When Genstat meets a statement with a name that it does not recognize as one of the standard Genstat directives, it first looks to see whether you have a procedure of that name already stored in your program. Then it looks in any procedure library that you may have attached explicitly to your program, taking these in order of their channel number (see the OPEN directive). The people that manage your computer can define a special site library and arrange for this to be attached to Genstat automatically when it is run. If they have done so, this library will be examined next. Finally Genstat looks in the official Genstat procedure library, which is also attached automatically to your program. After locating the required procedure, Genstat reads it in, if necessary, and then executes it. So you do not have to do any more than you would to use a Genstat directive.

The official library thus allows new facilities to be offered to all users. Or your computer manager can make procedures available that cover the special needs of the users at your site, and these will over-ride any procedures of the same name in the official library. Or you can form your own libraries of the procedures that you find particularly useful, and these will always be taken in preference to procedures in the site or the official library. Note however that a procedure cannot have the same name as any of the Genstat directives.

Information is transferred to and from a procedure only by means of its options and parameters. Otherwise the procedure is completely self-contained. Anyone who uses it does not need to know how the program inside operates, what data structures it contains, nor what directives it uses. The data structures inside the procedure are local to the procedure and cannot be accessed from outside.

To write your own procedures, you start by giving a PROCEDURE statement. This has a single parameter which defines the name of the procedure. The name can be up to eight characters with the same rules as for the identifiers of data structures: the first character must be a letter, the second to the eighth can be either letters or digits, and characters beyond the eighth are ignored. However the name cannot be suffixed, neither must it be the same as the name of any of the standard Genstat directives, nor any of their valid abbreviations. Thus you could have a procedure with the name CALCULUS but not CALC or CALCUL as these are abbreviations of the directive name CALCULATE. As already mentioned, when you use the procedure, you must give the name in full - all eight characters (or as many as were defined for the name if that was less than eight).

The PARAMETER option indicates whether the settings in any list specified for the parameters of the procedure are to be taken one at a time, or whether they need to be processed together. The difference between these alternatives can be illustrated by considering some of the Genstat directives. For example, with

ANOVA Height,Weight; RESIDUALS=Hres,Wres

Genstat will first do analysis with the values in the Height variate and store the resulting residuals in the variate Hres; it then analyses Weight and stores the residuals in Wres. This action corresponds to the default setting PARAMETER=dummy; inside the procedure, each parameter will then be a dummy data structure which will point to each item of the list in turn, in the same way as the parameters of a FOR loop. Conversely, in the statement

PRINT Height,Hres

the values of Height and Hres are printed together down the page, and this is possible only if PRINT is able to access both variates simultaneously. In a procedure this would require the setting PARAMETER=pointer; each parameter is then a pointer, storing the whole list.

You may change some aspects of the Genstat environment within a procedure. This may be the intended purpose of the procedure; but if it is an unwanted side effect, you should reset them afterwards. The RESTORE option allows you to list aspects that would like Genstat to reset automatically when it finishes executing the procedure. Alternatively, you can save and restore the environment explicitly using the SET and GET directives, but this is usually less efficient.

Finally, the SAVE option allows you to store the contents of the procedure, up to and including ENDPROCEDURE, in a text so that you can edit and redefine it or, for example, print it to a file or save it on backing store. The saved version is a modified form of the original input. Each line of the text contains a single statement; thus, where a statement spans several lines of input, these are concatenated into a single line in the text (deleting the continuation characters). Any line that contains several statements is split. Comments are removed, and any occurrence of several contiguous spaces is replaced by a single space. Also, a colon is placed at the end of each line.

After the PROCEDURE statement, you must define what options and parameters the procedure is to have; this is done by the directives OPTION and PARAMETER respectively. Only one of each of these should be given, and they must appear immediately after the PROCEDURE statement, but it does not matter which of the two you give first. They have very similar syntaxes, except that OPTION has an extra parameter which allows you to indicate whether a list of values or of identifiers is allowed. If you do not wish to define options or parameters for a procedure you can simply omit these directives; alternatively you can use OPTION or PARAMETER but with none of their parameters set, which has precisely the same effect.

After the OPTION and PARAMETER statements, you then list the statements that are to be executed when the procedure is called: these statements are the sub-program that makes up the procedure. Any data structures defined within the procedure are local to the procedure and cannot be accessed from outside. So you can use any identifiers for the structures, without having to worry about whether they may also be used outside by someone who may later use the procedure. You end these statements making up the procedure by an ENDPROCEDURE statement.

You are allowed to redefine an existing procedure if you wish to change any of the statements that it contains. To do this you specify the PROCEDURE statement, as usual, followed by the statements making up the new version of the procedure, and then an ENDPROCEDURE statement. However, you are not allowed to change the option or parameter definitions, and if there are any changes in the OPTION or PARAMETER statements, Genstat will give an error diagnostic.