SCALAR directive
Declares one or more scalar data structures.
Options
VALUE
= scalar Value for all the scalars; default is a missing valueMODIFY
= string Whether to modify (instead of redefining) existing structures (yes, no); default no
Parameters
IDENTIFIER
= identifiers Identifiers of the scalarsVALUE
= scalars Value for each scalarDECIMALS
= scalars Number of decimal places for printingEXTRA
= texts Extra text associated with each identifierMINIMUM
= scalars Minimum value for the contents of each structureMAXIMUM
= scalars Maximum value for the contents of each structure
Description
A scalar data structure stores a single number. The
IDENTIFIER parameter lists the identifiers of the scalars that are to be declared.Values can be assigned to the scalars by either the
VALUE option or the VALUE parameter. The option defines a common value for all the structures in the declaration, while the parameter allows them each to be given a different value. If both the option and the parameter are specified, the parameter takes precedence. However, if you do not define a value explicitly for a scalar, Genstat gives it a missing value.The
DECIMALS parameter allows you to define a number of decimal places to be used by default when each symmetric matrix is printed. You can associate a text of extra annotation with each scalar using the EXTRA parameter. The MINIMUM and MAXIMUM parameters allow you to define lower and upper limits on the values in each symmetric matrix. Genstat then prints warnings if any values outside that range are allocated to the scalar.If the
MODIFY option is set to yes any existing attributes and values of the scalars are retained; otherwise these are lost.
SET directive
Sets details of the "environment" of a Genstat job.
Options
INPRINT
= strings Printing of input as in PRINT option of INPUT (statements, macros, procedures, unchanged); default unchOUTPRINT
= strings Additions to output as in PRINT option of OUTPUT (dots, page, unchanged); default unchDIAGNOSTIC
= strings Defines the least serious class of Genstat diagnostic which should still be generated (messages, warnings, faults, extra, unchanged); default unchERRORS
= scalar Number of errors that a job may contain before it is abandoned (0 implies no limit); default is to leave unchangedFAULT
= text Sets the Genstat fault indicator (for example, FAULT=* clears the last fault); default is to leave the indicator unchangedPAUSE
= scalar Number of lines to output before pausing (interactive use only; 0 implies no pausing); default is no changePROMPT
= text Characters to be printed for the input prompt; default is to leave unchangedNEWLINE
= string How to treat newline (significant,ignored); default is no changeCASE
= string Whether lower- and upper-case (small and capital) letters are to be regarded as identical in identifiers (significant, ignored); default is no changeRUN
= string Whether or not the run is interactive (interactive, batch); by default the current setting is left unchangedUNITS
= identifier To (re)set the current units structure; default is to leave unchangedBLOCKSTRUCTURE
= identifier To (re)set the internal record of the most recent BLOCKSTRUCTURE statement; default is to leave unchangedTREATMENTSTRUCTURE
= identifierTo (re)set the internal record of the most recent
TREATMENTSTRUCTURE statement; default is to leave unchangedCOVARIATE
= identifier To (re)set the internal record of the most recent COVARIATE statement; default is to leave unchangedASAVE
= identifier To (re)set the current ANOVA save structure; default is to leave unchangedDSAVE
= identifier To (re)set the current save structure for the high-resolution graphics environment; default is to leave unchangedRSAVE
= identifier To (re)set the current regression save structure; default is to leave unchangedTSAVE
= identifier To (re)set the current time-series save structure; default is to leave unchangedVSAVE
= identifier To (re)set the current REML save structure; default is to leave unchanged
No parameters
Description
The default of
SET is to do nothing: that is, each option by default leaves the corresponding attribute of the environment unchanged. Of course you have to start somewhere, so an initial environment is defined at the start of any Genstat program; the corresponding initial settings of the options of SET, known as the initial defaults, are described below.The
INPRINT option controls what parts of a Genstat job supplied in the current input channel are recorded in the current output file; the input channel can be either an input file or the keyboard. Three parts are distinguished: explicit statements; statements, or parts of statements, that you have supplied in macros using either the ## notation or the EXECUTE directive; and statements that you have supplied in procedures. The initial default is to record nothing if the output is to the screen, otherwise to record the statements. This aspect of the environment can be modified also by the PRINT option of the INPUT directive and by the INPRINT option of JOB.The
OUTPRINT option controls how the output from many Genstat directives starts: the output can be preceded by a move to the top of a new page, or by a line of dots beginning with the line number of the statement producing the analysis, or by both. If output is directly to the screen, no new pages are given. The initial default is to give neither if output is to the screen, otherwise to give a new page and a line of dots. Alternatively, this aspect can be modified by the PRINT option of the OUTPUT directive or by the OUTPRINT option of JOB. The lines of dots are produced by the directives for regression analysis, analysis of designed experiments, REML analysis, multivariate analysis, and time series; also from the FLRV, FSSPM, and SVD directives. If you give an analysis statement within a FOR loop, the line number preceding the line of dots is that of the ENDFOR statement rather than of the analysis statement. New pages are produced with any of the above, and with the GRAPH, HISTOGRAM, and CONTOUR directives.The
DIAGNOSTIC option lets you control the level of diagnostic reporting. You might want to do this within a procedure, to prevent faults being reported to a user who does not need to know in detail what is going on inside the procedure. By initial default, all diagnostics - messages, warnings, and faults - are printed. You can switch off messages by setting DIAGNOSTIC=warning, or switch off both messages and warnings by setting DIAGNOSTIC=fault. If you set DIAGNOSTIC=*, then no diagnostics will appear. The extra setting gives you extra information, in the form of a dump of the current state of the job; but this is likely to be useful only for developers of Genstat. Printing of diagnostics can also be controlled by the DIAGNOSTIC option of JOB.The
ERRORS option controls what Genstat does when many faults happen within a single job while in batch mode. By initial default, up to five errors per job are reported, and successive faults will not generate diagnostic messages. This ensures, for example, that input intended to be read by a READ statement will not generate many lines of diagnostics if execution halts because of a fault before the READ statement. Note, however, that this option does not affect the detailed error messages printed by the READ directive itself: these are controlled separately by the corresponding ERRORS option of READ. In interactive mode, the count of errors is restarted after each successful statement is issued, though the option is unlikely to be useful in this mode.The
FAULT option is provided primarily to allow procedure writers to modify the internal record that is kept of the most recent fault indicator. Setting FAULT=* clears the record; you can then use the GET directive to ascertain whether a fault has occurred since the record was cleared. You can also set the fault indicator to a particular diagnostic, for exampleSET [FAULT='VA4']
A subsequent
DISPLAY statement will then report the chosen fault in the standard way. The fault indicator is automatically cleared at the start of each job.The
PAUSE option lets you specify how many lines of output are produced at a time; you might, for example, want to read the output on a terminal screen before more output replaces it. Obviously this is relevant only in interactive mode, and may not be needed in the implementations of Genstat that provide a scrollable output window. By initial default, all output is sent to the current output channel as soon as it is available. Some computers can store the output, irrespective of whether Genstat itself has a scrollable window, and let you scroll forward and back to read it at leisure: others just provide keys to freeze the output while you are reading a section, and then to continue to the next segment of output. If you set PAUSE=n, then after every n lines of output Genstat gives a prompt:*Press RETURN to continue*
After you have read the displayed section of output, you can press the <
RETURN> key to get the next n lines. The counting of lines is restarted each time you give a statement from the keyboard: it is not restarted between separate statements in a macro, procedure, or auxiliary input channel. If you have specified that Genstat should echo input lines, these are included among the n. Once all the output has been displayed, Genstat prompts for further statements.The
PROMPT option specifies the characters used to prompt for interactive input. The initial default is the greater-than character followed by a space "> ". The prompt can also be modified by the PROMPT option of JOB. Other prompts are used by READ, EDIT, HELP, and QUESTION, and these cannot be altered.The
NEWLINE option allows you to cancel the initial default whereby a newline (<RETURN>) is a terminator both for strings within a string list (1.6.2) and for a statement (1.8). Thus, for example, if you specifySET [NEWLINE=ignored]
you need no longer use a backslash (
\) to continue a statement onto a new line, since <RETURN> is no longer interpreted as the end of a statement. But you will then have to terminate each statement explicitly with a colon.The
CASE option specifies whether upper-case and lower-case letters are to be treated as the same in identifiers. The initial default is that upper and lower case are not the same; thus, an identifier X is distinct from an identifier x. If CASE is set to ignored, then in later statements, both x and X are treated as the same identifier, X. Thus the structure with identifier x cannot be referenced, unless CASE is later reset to significant.The
RUN option controls whether Genstat interprets the program as being in batch or in interactive mode; this assumed mode is independent of whether the program really is being run in batch or interactively. Initially, a program is taken to be in interactive mode only if the first input channel and the first output channel are both connected to a terminal. The setting of the assumed mode has two effects - on recovery from faults, and on how HELP and EDIT operate.The
UNITS option provides another way of setting the units structure in addition to the UNITS directive. The setting can be the identifier of a variate or text structure; this will become the default labelling structure of other variates, texts, or factors with the same length, in those directives that use such labels. The setting can also be a scalar to specify the default number of units. The setting of the UNITS option is lost at the end of each job within a program.The last eight options of the
SET directive specify special save structures for graphical and analysis directives. You can set the options only to an identifier that you have previously established by the SPECIAL option of the GET directive or by the SAVE options of the various analysis directives themselves. For example, if two sets of regression analyses are in progress in one job, the SET directive can be used to switch between them:MODEL [SAVE=S1] Y1
FIT X1
MODEL [SAVE=S2] Y2
FIT X1
SET [RSAVE=S1]
FIT X1,X2
This program fits the regression of
Y1 on X1, using save structure S1, then the regression of Y2 on X1 with save structure S2. Finally, the regression of Y1 on X1 and X2 is fitted, because the current regression save structure is changed to S1 before the last FIT statement.The settings of these last eight options are all lost at the end of a job.
SETOPTION directive
Sets or modifies defaults of options of Genstat directives or procedures.
Option
DIRECTIVE
= string Directive (or procedure) to be modified
Parameters
NAME
= strings Option namesDEFAULT
= identifiers New default values
Description
The
SETOPTION directive changes the default settings of options of a directive or procedure for the remainder of the current job. If you use this directive in your start-up file you can make the changed default apply in all your use of Genstat.To achieve any effect, the option and both parameters of the directive must be set. The
DIRECTIVE option specifies the name of the directive or procedure that is affected, and the NAME parameter indicates the option whose default is to be changed. The settings are strings, so need not be quoted because all directive and procedure names are valid as unquoted strings. The DEFAULT parameter is then set to a data structure to provide the new default that you want to be assumed. For example, the following statement modifies the PRINT option of the FIT directive.SETOPTION [DIRECTIVE=FIT] PRINT; DEFAULT='deviance'
The usual default of the
PRINT option in FIT is to print a statement of the model, a summary of the analysis, and the parameter estimates: this corresponds to the setting PRINT=model,summary,estimates. This SETOPTION statement therefore redefines the default so that any subsequent FIT statement in the job will report only the residual deviance unless you explicitly set the PRINT option.The defined mode of the
PRINT option of FIT is "strings" (8.1.2). However, the DEFAULT parameter of SETOPTION expects a data structure (to allow for all the other modes that might occur), and so it must be set to a text structure containing the string (or strings) that you want to be the default. Similarly, if the defined mode of the option is "numbers", "expression", or "formula", you must supply a variate, an expression structure, or a formula structure containing the new default. If the defined mode is "identifier", the setting of DEFAULT is simply an identifier, which must be of the required type if this is specified in the definition of the directive or procedure.To reset the
PRINT option of FIT back to its usual default, you would need to give the statementSETOPTION [DIRECTIVE=FIT] PRINT; DEFAULT=!t(model,summary,\
estimates)
The
SETOPTION directive can also be used to change defaults of any procedure: this may be a procedure in the standard Procedure Library, the Site Library, or a personal library that you have already opened in the current program, or it may be a procedure that you have defined explicitly in the job.
SETPARAMETER directive
Sets or modifies defaults of parameters of Genstat directives or procedures.
Option
DIRECTIVE
= string Directive (or procedure) to be modified
Parameters
NAME
= strings Parameter namesDEFAULT
= identifiers New default values
Description
The
SETPARAMETER directive changes the default settings of parameters of a directive or procedure for the remainder of the current job. If you use this directive in your start-up file you can make the changed default apply in all your use of Genstat. The option and parameters are used in exactly the same way as by the SETOPTION directive to change the defaults of options of directives and procedures. For full details see the description of SETOPTION.
SKIP directive
Skips lines in input or output files.
Options
CHANNEL
= scalar Channel number of file; default current channel of the specified typeFILETYPE
= string Type of the file concerned (input, output); default inpu
Parameter
identifiers How many lines to skip; for input files, a text means skip until the contents of the text have been found, further input is then taken from the following line
Description
SKIP
can be used with either input or output files. The FILETYPE and CHANNEL options indicate which file is to be skipped. By default this is the current input channel.For input files you can skip over unwanted lines, which might be comments describing the data that is to follow, or might be some statements that you do not want to use in your current job. You can skip a specified number of lines, n say, by setting the parameter to a scalar containing the value n. Alternatively, you can skip everything up to and including a particular string of characters by setting the parameter to a text containing that string. For example,
SKIP [CHANNEL=2] 'Section 2'
will skip the contents of the input file on channel 2 from the current position until the string
Section 2 is found. The next line to be read from channel 2 will then be the one immediately after the line containing Section 2.For output files you can use
SKIP to print blank lines to separate one section of output from another. You might want to do this if you had set the PRINT option SQUASH=yes to suppress the automatic blank lines within a section of output. For example,PRINT [CHANNEL=2; IPRINT=*; SQUASH=yes] Heading
SKIP [CHANNEL=2; FILETYPE=output] 2
PRINT [CHANNEL=2; IPRINT=*; SQUASH=yes] Table
places two blank lines between
Heading and Table when printing their values to channel 2.
SORT directive
Sorts units of vectors according to an index vector.
Options
INDEX
= vectors Variates, texts or factors whose values are to define the ordering; default is to use the first vector in the OLDVECTOR listDIRECTION
= string Order in which to sort (ascending, descending); default asceDECIMALS
= scalar Number of decimal places to which to round before sorting numbers; default * i.e. no rounding
Parameters
OLDVECTOR
= vectors or pointers Factors, pointers, texts, or variates whose values are to be sortedNEWVECTOR
= vectors or pointers Structure to receive each set of sorted values; if any are omitted, the values are placed in the corresponding OLDVECTOR
Description
The
SORT directive allows you to reorder the units of a list of vectors or pointers according to one or more "index" vectors. These can be specified explicitly using the INDEX option (and they need not be among the vectors actually sorted). If you omit the INDEX option, Genstat uses the first vector in the OLDVECTOR list. The DECIMALS option allows you to define the number of decimal places that are taken into account for an index variate: for example DECIMALS=0 would round each value to the nearest integer. If you do not set this, there is no rounding. The DIRECTION option controls whether the ordering is into ascending or descending order; by default DIRECTION=ascending.The vectors or pointers whose values are to be sorted are listed by the
OLDVECTOR parameter. The units of each structure are permuted in exactly the same way, into an ordering determined from the index vectors. The NEWVECTOR parameter allows you to specify new vectors to contain the sorted values, and thus keep the unsorted values in the original vectors. For exampleSORT [INDEX=Name] Age,Income,Name,Sex; NEWVECTOR=A,*,N,S
would place the sorted values of
Age, Name, and Sex into A, N, and S; as there is a null entry (*) corresponding to Income in the NEWVECTOR list, the sorted incomes would replace the original values of Income. Any undeclared vector in the NEWVECTOR list is declared implicitly to match the corresponding OLDVECTOR.
SPREADSHEET directive
Allows interactive entry or editing of data (available in only some implementations).
No options
Parameters
STRUCTURE
= identifiers Structures into which to read the data; this can be left unset, and identifiers supplied interactively as requiredFIELDWIDTH
= scalars Field width in which to display values of each structureDECIMALS
= scalars Number of decimal places to display for numerical dataMINIMUM
= scalars Minimum value for numerical dataMAXIMUM
= scalars Maximum value for numerical dataFREPRESENTATION
= string How to display factor values (labels, levels, ordinals); default labe if set, otherwise leve
Description
The
SPREADSHEET directive is available in some implementations of Genstat, opening a specially designed spreadsheet that allows the vectors specified by the STRUCTURE parameter to be edited. The FIELDWIDTH and DECIMALS parameters can be used to control the formats used to represent the data within the spreadsheet. The MINIMUM and MAXIMUM parameters allow limits to be set on numerical values, and the FREPRESENTATION parameter controls the way in which factor values are displayed.
SSPM directive
Declares one or more SSPM data structures.
Options
TERMS
= formula Terms for which sums of squares and products are to be calculated; default *FACTORIAL
= scalar Maximum number of vectors in a term; default 3FULL
= string Full factor parameterization (yes, no); default noGROUPS
= factor Groups for within-group SSPMs; default *DF
= scalar Number of degrees of freedom for sums of squares; default *
Parameters
IDENTIFIER
= identifiers Identifiers of the SSPMsSSP
= symmetric matrices Symmetric matrix to contain the sums of squares and products for each SSPMMEANS
= variates Variate to contain the means for each SSPMNUNITS
= scalars Number of units or sum of weights for each SSPMWMEANS
= pointers Pointers to variates of group means for each SSPM
Description
The SSPM structure stores a matrix of corrected sums of squares and products, and associated information, as used for regression and some multivariate analyses. You can form values for SSPM structures by the
FSSPM directive. However, most multivariate and regression analyses can be done without declaring and forming an SSPM explicitly.An SSPM comprises four structures (identified by their suffixes).
[1]
or ['SUMS'] is a symmetric matrix containing the sums of squares and products. The number of rows and columns of this matrix will equal the number of parameters defined by the expanded terms list: that is, the number of variates plus the number of dummy variates generated by the model formula. (See the TERMS directive.)[2]
or ['MEANS'] is a variate containing the mean for each variate or dummy variate.[3]
or ['NUNITS'] is a scalar holding the total number of units used in constructing the sums of squares and products matrix. If the SSPM is weighted, this scalar will hold the sum of the weights.A within-group SSPM has one additional element:
[4]
or ['WMEANS'] is a pointer, pointing to variates holding within-group means. There is one variate for each row of the 'SUMS' matrix plus one extra. They are all of the same length, namely the number of levels of the GROUPS factor. The extra variate holds counts of the number of units in each group.The
TERMS option of the SSPM directive defines the model for whose components the sums of squares and products are to be calculated. In the simplest case the model is just a list of variates, but you can use more complex model formulae, involving variates and factors; this is done in conjunction with the FACTORIAL and FULL options.You can form a within-group matrix of sums of squares and products by specifying the relevant factor with the
GROUPS option.Sometimes you may already have calculated values for the matrix of sums of squares and products. You can then assign them to the component structures of the SSPM for example by
READ. You would still, however, need to set the number of degrees of freedom associated with the matrix, and for that you use the DF option.The parameter lists let you specify identifiers for the four components of an SSPM. You can have declared them previously (and you can have given them values), but if so they must be of the correct type.
STEP directive
Selects terms to include in or exclude from a linear, generalized linear, or generalized additive model according to the ratio of residual mean squares.
Options
FACTORIAL
= scalar Limit for expansion of model terms; default * i.e. that in previous TERMS statementPOOL
= string Whether to pool ss in accumulated summary between all terms fitted in a linear model (yes, no); default noDENOMINATOR
= string Whether to base ratios in accumulated summary on rms from model with smallest residual ss or smallest residual ms (ss, ms); default ssNOMESSAGE
= strings Which warning messages to suppress (dispersion, leverage, residual, aliasing, marginality, vertical, df, inflation); default *FPROBABILITY
= string Printing of probabilities for variance and deviance ratios (yes, no); default noTPROBABILITY
= string Printing of probabilities for t-statistics (yes, no); default noSELECTION
= strings Statistics to be displayed in the summary of analysis produced by PRINT=summary, the first four are relevant only for a Normally distributed response, and the last only for a gamma-distributed response (%variance, %ss, adjustedr2, r2, seobservations, dispersion, %cv); default %var,seob if DIST=normal, %cv if DIST=gamma, and disp for other distributionsINRATIO
= scalar Criterion for inclusion of terms; default 1.0OUTRATIO
= scalar Criterion for exclusion of terms; default 1.0MAXCYCLE
= scalar Limit on number of times to repeat stepwise selection, unless no change is made; default 1
Parameter
formula List of explanatory variates and factors, or model formula
Description
STEP
modifies the current regression model, which may be linear, generalized linear or generalized additive, in order to achieve the biggest "improvement". Terms in the specified formula are dropped from the current model if they are already there, or are added to it if they are not. For each term, the residual sum of squares (or deviance) and the residual degrees of freedom are recorded; then Genstat reverts to the original model before trying the next term.The current model is finally modified by the best term, according to a criterion based on the variance (or deviance) ratios. In a linear model, suppose that the residual sum of squares and residual degrees of freedom of the current model are s0 and d0, and of the model after making a one-term change are s1 and d1. If the variance ratio for any term that is dropped is less than the value of the setting of the
OUTRATIO option, then the term that most reduces the residual mean square is dropped. That is, a term will be dropped only if at least one term has{(s1-s0) / (d1-d0)} / {s0/d0} <
OUTRATIOIf you have set
OUTRATIO=*, then no term is dropped. Note that, though the criteria are ratios of variances, you should not interpret them as F-statistics with the usual interpretation of significance. The probability levels would need be adjusted to take account of correlations between the explanatory variables concerned, and the number of changes being considered.If no term satisfies the criterion for dropping, then the term that most reduces the residual mean square will be added to the model if its variance ratio is greater than the setting of the
INRATIO option. That is, if{(s0-s1) / (d0-d1)} / {s1/d1} >
INRATIOLikewise, if you have set
INRATIO=*, no term will be added.If neither criterion is met, the current model is left unchanged.
Usually, the effect of the
STEP directive is to make one change of a stepwise regression search. You can make STEP do forward selection by setting the MAXCYCLE option to define a maximum number of changes; STEP will stop at this limit, or earlier if no further changes can be made.The
changes setting of the PRINT option produces a list of terms with the corresponding residual mean squares and residual degrees of freedom, ordered according to the sizes of the residual mean squares; this list is not available for display later by the RDISPLAY directive. The INRATIO and OUTRATIO options are explained above. The rest of the options are as in the FIT directive, except that there is no CONSTANT option.
STOP directive
Ends a Genstat program.
No options or parameters
Description
The
STOP directive indicates the end of a Genstat program, thus telling the computer that you have finished using Genstat. It also ends the existing job, so there is no need to give an ENDJOB statement beforehand. Any input that follows a STOP statement is ignored.
STORE directive
To store structures in a subfile of a backing-store file.
Options
CHANNEL
= scalar Channel number of the backing-store file where the subfile is to be stored; default 0, i.e. the workfileSUBFILE
= identifier Identifier of the subfile; default SUBFILELIST
= string How to interpret the list of structures (inclusive, exclusive, all); default inclMETHOD
= string How to append the subfile to the file (add, overwrite, replace, update); default add, i.e. clashes in subfile identifiers cause a fault (note: replace overwrites the complete file)PASSWORD
= text Password to be stored with the file; default *PROCEDURE
= string Whether subfile contains procedures only (yes, no); default noUNNAMED
= string Whether to list unnamed structures (yes, no); default noMERGE
= string Whether or not to merge the structures with the existing contents of the subfile (yes, no); default no
Parameters
IDENTIFIER
= identifiers Identifiers of the structures to be storedSTOREDIDENTIFIER
= identifiers Identifier to be used for each structure when it is stored
Description
The
STORE directive allows you to store data structures or procedures in a backing-store file. The file can be opened using the OPEN directive, and there is also a backing-store workfile attached to channel 0 which is automatically deleted at the end of a Genstat run.Each backing-store file contains a number of subfiles. Each subfile starts with a catalogue, recording which structures it stores. Then come the attributes and the values of each data structure. A subfile name can be either an unsuffixed identifier or a suffixed identifier with a numerical suffix. The identifiers of subfiles are kept in a separate catalogue to the identifiers of data structures, so you do not need to worry about keeping the identifiers of data structures and subfiles distinct. However, if you use a suffixed identifier for a subfile,
Sub[1] say, you cannot also use the identifier Sub. There are two types of subfiles. Ordinary subfiles can hold any type of structures except procedures; procedure subfiles hold only procedures (and their dependent structures).Whenever you store a structure in a subfile, Genstat automatically stores also all the associated structures to which it points. If these latter also point to further structures, then they are stored too, and so on. Some of the structures may be unnamed and some structures may be system structures. For example
TEXT [VALUES=A,B,C] T
FACTOR [LABELS=T; VALUES=1...3] F
STORE F
creates a subfile containing factor
F. The complete definition of factor F depends on text T to supply level names. So T is stored too. The text T depends on a system structure (indicating the length of each line), which is therefore also stored. Hence to save factor F, Genstat has actually saved three structures. However, this is all automatic, so you do not need to worry about any of the details of the system structures.When you store a structure with a suffixed identifier, Genstat may have to set up a series of pointer structures if they are not already present. An example is:
VARIATE [VALUES=1,2] V[1,2]
STORE [PRINT=catalogue] V[1]
The first line sets up a pointer structure
V, pointing to V[1] and V[2]. To store variate V[1], a pointer structure V has to be set up in the subfile, pointing to V[1] only. Thus two structures are saved on backing store, namely V and V[1]. The original pointer V in the program is left unchanged. (If the example had stored the whole of V, no such complications would have arisen.)The structures to be stored are specified by the
IDENTIFIER parameter. The CHANNEL option indicates the backing-store file to use, and the SUBFILE option specifies the subfile that is created. Both these options can be omitted; by default the file will be the workfile, and the subfile will be called SUBFILE. The structures that are stored in the subfile are merely copies of the structures in the job, so the original structures remain available for further use within the job.The
STOREDIDENTIFIER parameter allows you to give a structure a different name within the subfile: For example,VARIATE [VALUES=10.2,15.3,21.4,16.8,22.3] Weight
STORE Weight; STOREDIDENTIFIER=WtWeek2
stores a structure with identifier
Weight within Genstat as a structure with identifier WtWeek2 in the backing-store file. If you want to rename only some of the structures, you can either respecify the existing identifier, or insert * at the appropriate point in the list. For example, you could store X and Y, renaming only Y as Yy, bySTORE X,Y; STOREDIDENTIFIER=X,Yy
or by
STORE X,Y; STOREDIDENTIFIER=*,Yy
You can give an unnamed structure in the list of either parameter. For example
STORE !(10.2,15.3,21.4,16.8,22.3); STOREDIDENTIFIER=WtWeek2
But of course you will not be able to retrieve any structure that has been stored as an unnamed structure (except perhaps as a dependent structure of another structure).
All the structures in a subfile must have distinct identifiers, and Genstat will report a fault if you try to give two the same name. You thus need to be careful if you are storing structures inside a procedure, as the same identifier can be used for one structure within the procedure, and for another one outside; you cannot store both in the same subfile.
Procedures that have been retrieved automatically from libraries cannot be stored by
STORE.You can set option
PRINT=catalogue to obtain a catalogue of the subfiles in the backing-store file, and of the structures in the subfile just created. If you also set option UNNAMED=yes Genstat will also list any unnamed structures, with details of how they depend on each other.The
LIST option controls how the IDENTIFIER list is interpreted. The default setting inclusive simply stores the structures that have been listed.Alternatively, if you set
LIST=all Genstat will store all the structures in the current job that have identifiers and whose types have been defined. If the statement is inside a procedure, then only the structures defined within the procedure are stored. If you are storing procedures, then this setting will store all procedures that you have created explicitly in this job, by PROCEDURE or RETRIEVE statements.Finally, you can set
LIST=exclusive to store everything that you have not included in the IDENTIFIER parameter: that is, all the other named structures that are currently accessible, or all the other procedures that have been created in this job. Note, though, that some of the structures in the IDENTIFIER list may be stored if they are needed to complete the set of structures to be stored. If you use this setting, the STOREDIDENTIFIER parameter is ignored. For exampleTEXT [VALUES=a,b] T
FACTOR [LABELS=T] F
TEXT [VALUES='variate text'] Vt
VARIATE V; EXTRA=Vt
creates four named structures,
T, F, V and Vt. The statementSTORE [LIST=inclusive] T
stores the text
T;STORE [LIST=all]
stores all the four structures that have identifiers;
STORE [LIST=exclusive] F,T
stores
Vt and V; andSTORE [LIST=exclusive] Vt,T
results in all four structures being saved, because
V points to Vt, and F points to T.If a subfile of the specified name already exists on the backing-store file, the storing operation will usually fail. You can then set option
METHOD=overwrite to overwrite the old subfile, that is, to replace the old subfile with a new subfile; alternatively, you can put METHOD=replace to form a new backing-store file containing only the new subfile. Setting METHOD=update adds new structures to an existing subfile. The MERGE option then controls what happens if a data structure that is being added to the file is already present; by default it overwrites the previous version but, if you put MERGE=yes, only new structures are added to the file.To make your files secure, you can specify a password using the
PASSWORD option. Once you have done this, you must include the same password in any future use of STORE or MERGE with this same userfile; spaces, case, and newlines are significant in the password. You cannot change the password in a userfile once you have set it, but you can use the MERGE directive to create a new userfile with no password or with a new password. If you set the password to be a text whose values have been have restricted, the restriction is ignored.The
PROCEDURE option indicates whether the subfile is to store procedures (PROCEDURE=yes), or ordinary data structures.
STRUCTURE directive
Defines a compound data structure.
Options
NAME
= text Single-valued text defining a name for the type of structure, which must not clash with the name of any existing type of structureSTRUCTURELIST
= string Whether or not the structure consists of a list (of any length) of structures of the same type or types (yes, no); default no
Parameters
LABEL
= texts Single-valued texts defining the labels of the elements of the structureSUFFIX
= scalars Suffix numbers for the elements; default assumes the numbers 1, 2 ...TYPE
= texts Texts defining the allowed types for each elementCOMPATIBLE
= texts Defines aspects to check for compatibility with the first element
Description
The
STRUCTURE directive allows you to define customized compound data structures for use, for example, in procedures. The NAME option supplies a single-valued text to define the name to be used for the new "type" of data structure. This can then be used as a setting for the TYPE parameter in either the OPTION or PARAMETER directives within a procedure, to indicate that the option or parameter concerned must be supplied with this type of structure. The case of the letters in the name is not significant. So they can be specified in capitals, or in lower case, or in any mixture.The parameters of the directive define the contents of the structure. The
LABEL parameter lists the labels to be used with each element of the structure, and the SUFFIX parameter lists the corresponding suffix numbers (by default the numbers 1, 2, etc.). The TYPE parameter allows you to define the types of structure that are allowed in each element (which may be any of the standard Genstat data structures, or other customized types), and the COMPATIBLE parameter allows you to define aspects that must be compatible with the first element of the structure similarly to the COMPATIBLE parameter of the OPTION and PARAMETER directives. These are checked when the structure is declared, and when it is used as an option or parameter setting of a procedure that requests that type.For example, we could define a complex matrix structure by
STRUCTURE [NAME='complex_matrix'] 'real','imaginary'; \
TYPE='matrix'; COMPATIBLE=!t(rows,columns)
A particular complex matrix,
Cmat say, could then be declared using the DECLARE directive:DECLARE [TYPE='complex_matrix'] Cmat
The elements of the compound structure can be referred to like those of an ordinary pointer declared using the
POINTER directive with options CASE=ignored, ABBREVIATE=yes and FIXNVALUES=yes. So, the labels can be given in either upper- or lower-case or in any mixture, and each can be abbreviated to the minimum number of characters required to distinguish it from the previous labels. So the imaginary part of the complex matrix above could, for example, be referred to as Cmat['imaginary'] or Cmat['IMAGINARY'] or simply Cmat['i'].
SUSPEND directive
Suspends execution of Genstat to carry out commands in the operating system. This directive may not be available on some computers.
Options
SYSTEM
= text Commands for the operating system; default: prompt for commands (interactive mode only)CONTINUE
= string Whether to continue execution of Genstat without waiting for commands to complete (yes, no); default no
No parameters
Description
The
SUSPEND directive may not be implemented on all the types of computers for which Genstat is available. However, you can find out simply by typing the commandSUSPEND
This will produce a message saying either that Genstat has been suspended, or that
SUSPEND is not implemented. In the latter case, it is not possible to communicate between Genstat and other programs.If
SUSPEND is implemented, after the message you will get a prompt `SUSPEND>' for an operating-system command. You will also be told what command in the operating system will return you to Genstat. For example, with the VMS system on VAX computers you typeLOGOUT
The
SYSTEM option of SUSPEND allows you to give operating-system commands without explicitly returning to the operating system and, if you do not want to wait for an operating-system command to be executed before returning to Genstat, you can set the CONTINUE option together with the SYSTEM option.
SVD directive
Calculates singular value decompositions of matrices i.e. (
LEFT *+ SINGULAR *+ TRANSPOSE(RIGHT) ).
Option
Parameters
INMATRIX
= matrices Matrices to be decomposedLEFT
= matrices Left-hand matrix of each decompositionSINGULAR
= diagonal matrices Singular values (middle) matrixRIGHT
= matrices Right-hand matrix of each decomposition
Description
Suppose that we have a rectangular matrix A with m rows and n columns, and that p is the minimum of m and n. The singular value decomposition can be defined as
m
An = mUp p Sp pVnThe diagonal matrix S contains the p singular values of A, ordered such that
s1
The matrices U and V contain the left and right singular vectors of A, and are orthonormal:
U
The smaller of U and V will be orthogonal. So, if A has more rows than columns, m>n, p=n and VV
¢ =Ip.The least-squares approximation of rank r to A can be formed as
Ar = Ur Sr Vr
where Ur and Vr are the first r columns of U and V, and Sr contains the first r singular values of A (Eckart and Young 1936).
The
INMATRIX parameter specifies the matrices to be decomposed. The algorithm uses Householder transformations to reduce A to bi-diagonal form, followed by a QR algorithm to find the singular values of the bi-diagonal matrix (Golub and Reinsch 1971). The other parameters allow you to save the component parts of the decomposition: LEFT, SINGULAR, and RIGHT for U, S, and V respectively.The
PRINT option allows you to print any of the components of the decomposition; by default, nothing is printed. If any of the matrices is to be printed, all p columns are shown, even if you are storing only the first r columns.Genstat will decide how many columns and singular values r to store, and will store that number for any of the components that you specify. If none of the matrices in the
LEFT, SINGULAR, and RIGHT lists has been declared in advance, the full number of singular values (r=p) is stored; otherwise Genstat sets r to the maximum number of columns contained in any of the matrices. If r<p, the first r singular values will be saved, along with the corresponding columns of singular vectors.One practical application of the singular value decomposition is to form generalized inverses of matrices. If you use the singular value decomposition you obtain the Moore-Penrose generalized inverse, sometimes called the pseudo-inverse, and this is the method used by the
GINVERSE procedure.
References
Eckart, C. and Young, G. (1936). The approximation of one matrix by another of lower rank. Psychometrika 1, 211-218.
Golub, G.H. and Reinsch, C. (1971). Singular value decomposition and least squares solutions. Numerische Mathematik 14, 403-420.
SWITCH directive
Adds terms to, or drops them from a linear, generalized linear, generalized additive, or nonlinear model.
Options
NONLINEAR
= string How to treat nonlinear parameters between groups (common, separate, unchanged); default unchCONSTANT
= string How to treat the constant (estimate, omit, unchanged); default unchFACTORIAL
= scalar Limit for expansion of model terms; default * i.e. that in previous TERMS statementPOOL
= string Whether to pool ss in accumulated summary between all terms fitted in a linear model (yes, no); default noDENOMINATOR
= string Whether to base ratios in accumulated summary on rms from model with smallest residual ss or smallest residual ms (ss, ms); default ssNOMESSAGE
= strings Which warning messages to suppress (dispersion, leverage, residual, aliasing, marginality, vertical, df, inflation); default *FPROBABILITY
= string Printing of probabilities for variance and deviance ratios (yes, no); default noTPROBABILITY
= string Printing of probabilities for t-statistics (yes, no); default noSELECTION
= strings Statistics to be displayed in the summary of analysis produced by PRINT=summary, the first four are relevant only for a Normally distributed response, and the last only for a gamma-distributed response (%variance, %ss, adjustedr2, r2, seobservations, dispersion, %cv); default %var,seob if DIST=normal, %cv if DIST=gamma, and disp for other distributions
Parameter
formula List of explanatory variates and factors, or model formula
Description
SWITCH
modifies the current regression model, which may be linear, generalized linear, generalized additive, standard curve, or nonlinear. Terms in the specified formula are dropped from the current model if they are already there, or are added to it if they are not. It is best to give a TERMS statement before investigating sequences of models using SWITCH, in order to define a common set of units for the models to be explored. If no model is fitted after the TERMS statement, the current model is taken to be the null model.The model fitted by
SWITCH will include a constant term if the previous model included one, and will not include one if the previous model did not. You can, however, change this using the CONSTANT option.The options of
SWITCH are the same as those of the FIT directive, but with the extra NONLINEAR option which controls whether separate nonlinear parameters are fitted to different groups when fitting curves, as in FITCURVE.
SYMMETRICMATRIX directive
Declares one or more symmetric matrix data structures.
Options
ROWS
= scalar, vector, or pointer Number of rows, or labels for rows (and columns); default *VALUES
= numbers Values for all the symmetric matrices; default *MODIFY
= string Whether to modify (instead of redefining) existing structures (yes, no); default no
Parameters
IDENTIFIER
= identifiers Identifiers of the symmetric matricesVALUES
= identifiers Values for each symmetric matrixDECIMALS
= scalars Number of decimal places for printingEXTRA
= texts Extra text associated with each identifierMINIMUM
= scalars Minimum value for the contents of each structureMAXIMUM
= scalars Maximum value for the contents of each structure
Description
A symmetric square matrix is symmetric about its leading diagonal: that is, the value in column i of row j is the same as that in column j of row i. For example:
1 2 3
2 1 4
3 4 1
Symmetric matrices often occur in statistics. Suppose, for example, that we have n random variables X1 ... Xn. Then the covariance of Xi with Xj is the same as the covariance of Xj with Xi. The covariance matrix of the random variables is therefore symmetric: the off-diagonal elements of the matrix are the covariances (and the diagonal elements are the variances).
Because of this symmetry, Genstat stores only the diagonal elements and those below it; this is called the lower triangle. So you must specify only these values, whether in the declaration by
SSPM or in a READ statement. (As always, you give them in row order: so if there are n rows, then for the first you supply one value, for the second two, and so on.) Likewise, Genstat prints only the lower triangle in output, for example with PRINT.The
ROWS option defines both the number of rows and the number of columns. The simplest way of doing this is to use a scalar to define the number of rows and columns explicitly. Alternatively, you can set ROWS to a variate, text, or pointer, whose length then defines the number of rows and whose values will then be used as labels, for example when the symmetric matrix is printed. Finally, if you specify a factor, the number of levels defines the number of rows and the labels if available, or otherwise the levels, are used for labelling.Values can be assigned to the symmetric matrices by either the
VALUES option or the VALUES parameter. The option defines a common value for all the matrices in the declaration, while the parameter allows them each to be given a different value. If both the option and the parameter are specified, the parameter takes precedence.If the
MODIFY option is set to yes any existing attributes and values of the symmetric matrices are retained (if still appropriate); otherwise these are lost.The
DECIMALS parameter allows you to define a number of decimal places to be used by default when each symmetric matrix is printed. You can associate a text of extra annotation with each symmetric matrix using the EXTRA parameter. The MINIMUM and MAXIMUM parameters allow you to define lower and upper limits on the values in each symmetric matrix. Genstat then prints warnings if any values outside that range are allocated to the matrix.
TABLE directive
Declares one or more table data structures.
Options
CLASSIFICATION
= factors Factors classifying the tables; default *MARGINS
= string Whether to add margins (yes, no); default noVALUES
= numbers Values for all the tables; default *MODIFY
= string Whether to modify (instead of redefining) existing structures (yes, no); default no
Parameters
IDENTIFIER
= identifiers Identifiers of the tablesVALUES
= identifiers Values for each tableDECIMALS
= scalars Number of decimal places for printingEXTRA
= texts Extra text associated with each identifierUNKNOWN
= identifiers Identifier for scalar to hold summary of unclassified data associated with each tableMINIMUM
= scalars Minimum value for the contents of each structureMAXIMUM
= scalars Maximum value for the contents of each structure
Description
Tables are used to store numerical summaries of data that are classified into groups. With Genstat, the classification into groups is specified by a set of factors. The table contains an element, called a cell, for each combination of the levels of the factors that classify it.
Tables are declared using the
TABLE directive. The CLASSIFICATION option specifies the factors classifying the table.Values can be assigned to the tables by either the
VALUES option or the VALUES parameter. The option defines a common value for all the tables in the declaration, while the parameter allows them each to be given a different value. If both the option and the parameter are specified, the parameter takes precedence.If the
MODIFY option is set to yes any existing attributes and values of the tables are retained (if still appropriate); otherwise these are lost.The
DECIMALS parameter allows you to define a number of decimal places to be used by default when each table is printed. You can associate a text of extra annotation with each table using the EXTRA parameter. The MINIMUM and MAXIMUM parameters allow you to define lower and upper limits on the values in each table. Genstat then prints warnings if any values outside that range are allocated to the table.A table can also have margins. There is then a margin for each classifying factor; this contains some sort of summary over the levels of that factor. For example, if you have a table in which the cells contain totals of the observations, you would want the marginal cells to contain totals across the levels of the factor. You can define a table to have margins when you declare it, by setting the
MARGINS option of the TABLE directive to yes. Alternatively you can add margins later by the MARGIN directive.Tables also have an associated scalar which collects a summary of all the observations for which any of the classifying factors has a missing value; these observations cannot be assigned to any cell of the table itself. This scalar can be given an identifier, so that you can refer to it, using the
UNKNOWN parameter of the TABLE directive.
TABULATE directive
Forms summary tables of variate values.
Options
CLASSIFICATION
= factors Factors classifying the tables; default * i.e. these are taken from the tables in the parameter listsCOUNTS
= tables Saves a table counting the number of units with each factor combination; default *SEQUENTIAL
= scalar Used for sequential formation of tables; a positive value indicates that formation is not yet complete (see READ); default *MARGINS
= string Whether the tables should be given margins if not already declared (yes, no); default noIPRINT
= string Whether to print the identifier of the table or the identifier of the (associated) variate that was used to form it (identifier, extra, associatedidentifier); default idenWEIGHTS
= variate Weights to be used in the tabulations; default * indicates that all units have weight 1PERCENTQUANTILES
= scalar or variatePercentage points for which quantiles are required; default 50 (i.e. median)
OWN
= scalar or variate Specifies option settings for the OWNTAB subroutine and indicates that this is to supply the data values instead of the variates in the DATA list; default *OWNFACTORS
= factors Factors whose values are to be read by OWNTAB (must include the factors of the classification set); default *OWNVARIATES
= variates Variates whose values are to be read by OWNTAB (must include the DATA variates); default *INCHANNEL
= scalar Channel number of the file from which the OWNTAB subroutine is to read the data (previously opened by an OPEN statement)INFILETYPE
= string Type of the OWN data file (input, unformatted); default inpu
Parameters
DATA
= variates Data values to be tabulatedTOTALS
= tables Tables to contain totalsNOBSERVATIONS
= tables Tables containing the numbers of non-missing values in each cellMEANS
= tables Tables of meansMINIMA
= tables Tables of minimum values in each cellMAXIMA
= tables Tables of maximum values in each cellVARIANCES
= tables Tables of cell variancesQUANTILES
= tables or pointers Table to contain quantiles at a single PERCENTQUANTILE or pointer of tables for severalPERCENTQUANTILEs
(not available for sequential or OWN tabulation)
Description
TABULATE
allows you to produce the various types of tabular summary listed in the settings of its PRINT option. The variates whose values are to be summarized are listed with the DATA parameter. If you want to save the summaries in tables, for manipulating or for printing later on, you should list identifiers of the tables in the appropriate parameter list: for example, you would save the totals in a table T by including T in the list for the TOTALS parameter. The other parameters similarly give the other kinds of summary: numbers of non-missing values, means, minima, maxima, variances, and quantiles.The simplest quantile, and the one produced by default, is the median (50% quantile), but the
PERCENTQUANTILE option allows you to request any percentage point (between 0 and 100, of course). Moreover, by specifying a variate as the setting for PERCENTQUANTILE, you can obtain several quantiles at the same time. However, if you then want to save the results the setting of the QUANTILE parameter must be a pointer with length equal to the required number of quantiles, instead of a single table.If you merely want to print the summaries, you do not usually need to list any tables; you need only specify the
PRINT option. The only exception to this is with sequential tabulation, described at the end of this subsection.Any table that you have not declared in advance will be declared implicitly. If you have not declared any of the tables, the classifying factors are taken from the
CLASSIFICATION option, which in that case you must have set; likewise, the MARGINS option determines whether or not the tables will have margins. Otherwise these two options are ignored, and the undeclared tables are defined to have the same classifying factors and status for margins as the tables that you have declared previously; all these previously declared tables must have the same set of classifying factors, and must be all with margins or all without margins.In the tables that correspond to the parameters of
TABULATE, missing values of the data variates are ignored. So the NOBSERVATIONS parameter and the nobservations setting of the PRINT option provide the numbers of non-missing units of the data variates for each factor combination. You can however obtain a count of the numbers of units that would have contributed to each group if no values had been missing: you use the COUNTS option if you want to save the table, or put PRINT=counts if you want to print it. If any of the factor values are missing Genstat ascribes the corresponding units to the unknown cell associated with the table (see the TABLE directive).If there are no observations in one of the groups, the corresponding cell will be zero in a table of numbers of observations or counts; in a table of totals, means, minima, maxima, or variances, the cell will contain a missing value.
Weighted tables can be obtained by setting the
WEIGHT option to a variate of weights. You can, in general, think of weights as a set of multipliers which are applied to the data before any operations are performed. Thus, for most aspects of weighted tabulation you can replace x by wx and 1 by w (that is, n by åw) in the standard formulae (see below). This is not quite what happens in the case of variances, but it is certainly true for all other functions (including counts).Unweighted Weighted
Count n å w
Total å x å wx
Nobservations n å w (x not missing)
Mean å x/n å wx / å w
Minimum Min( x ) Min( wx )
Maximum Max( x ) Max( wx )
Variance å ( x - (åx/n) )2 / n-1å w ( x - (åx/n ) )2
/ å w-1
A quick look at the formula used for weighted variance will show that it breaks down for åw<1, and, in fact, is valid only for integer values of w. If an invalid weight is found during the calculation of a variance it will be reported and the tabulation halted. Temporary tables will be deleted, but named tables may contain partial results. Non-integer weights are allowed in contexts other than variances.
If you have many observations to summarize, there may be insufficient space within Genstat for you to read them all and then form the tables. To cater for such situations, Genstat allows you to process the data in sections, using the SEQUENTIAL option of TABULATE in conjunction with the SEQUENTIAL option of READ. After READ, the absolute value of the option indicates the number of units that have been read in this particular section; the value is positive during interim sections and negative or zero once the terminator at the end of the data is reached. TABULATE will not print any tables until the final section has been processed. If you want to see the intermediate tables, you can include a PRINT statement after the TABULATE statement. To allow Genstat to keep contact with the working tables in which the results are accumulating, you must save at least one out of the various types of table for every DATA variate. Genstat can then link the working tables to this named table during the course of the sequential tabulation, so that the information is not lost between the successive uses of TABULATE.
The final five options of TABULATE (OWN, OWNFACTORS, OWNVARIATES, INCHANNEL, and INFILETYPE) allow you to link your own Fortran subroutine, G5XZIT, to Genstat to allow you to handle complicated arrangements of data, as can occur for example in hierarchical surveys. To implement this, you must get access to some of the Genstat source code. The relevant section of the code is named Module X, and is distributed with Genstat to all sites, probably in a file called X.FOR. The documentation of G5XZIT is included with the Fortran and so is not repeated here. G5XZIT is thus a Fortran subprogram, to be modified by you, which is called from within TABULATE for each unit to be tabulated. It contains switches to tell TABULATE when a data error occurs or when all the data have been read. To use it you have to link your own version of Genstat, as when using the OWN directive. Then your version of G5XZIT will be used instead of the standard version supplied as part of Genstat.
The subprogram can be as simple or as complicated as you like (or need), provided it obeys a few simple rules. A very simple version, reading two variates and two factors, is supplied with Genstat. This should provide sufficient information for you to write your own version, and link it into your own private version of Genstat.
The OWN option should be set to a variate allowing you to communicate between your Genstat code and your G5XZIT subprogram. The OWNFACTORS option provides the list of factors to be read by G5XZIT. It must include the classifying factors needed in the current TABULATE instruction, but it may contain others as well. The OWNVARIATES option should provide a similar list of variates. The INCHANNEL option should be set to the Genstat channel number of the data file, as specified in a previous OPEN statement or in the Genstat command line. The INFILETYPE option specifies whether the data file is character (input) or binary (unformatted).
TABULATE
allows only one classification set to be used at a time. If the data set is complicated enough to require G5XZIT, then several tabulations with different classifying sets are likely to be needed. Rather than have a separate branch in G5XZIT for each tabulation, you can put all the factors and all the variates that you will need into the settings of the OWNFACTORS and OWNVARIATES options, and leave TABULATE to extract the ones it needs each time. If you have several TABULATE statements as suggested, you will have to close the data file and re-open it between them.
TDISPLAY directive
Displays further output after an analysis by
ESTIMATE.
Options
CHANNEL
= scalar Channel number for output; default * i.e. current output channelSAVE
= identifier Save structure to supply fitted model; default * i.e. that from the last model fitted
No parameters
Description
You can use
TDISPLAY to print further output from an ESTIMATE statement. The PRINT option has the same interpretation as in ESTIMATE, except that information is not available to monitor convergence. Also, if the ESTIMATE statement used the setting METHOD=initialize you will not be able to print the standard errors or correlations between the parameter estimates.The
CHANNEL option allows you to send the output to another output channel.You can use the
SAVE option to specify the time-series save structure (from ESTIMATE) from which the output is to be taken. By default TDISPLAY uses the structure from the most recent ESTIMATE statement.
TERMS directive
Specifies a maximal model, containing all terms to be used in subsequent linear, generalized linear, generalized additive, and nonlinear models.
Options
FACTORIAL
= scalar Limit for expansion of model terms; default 3FULL
= string Whether to assign all possible parameters to factors and interactions (yes, no); default noSSPM
= SSPM or DSSP Gives sums of squares and products on which to base calculations; default *TOLERANCE
= scalar Criterion for testing for linear dependence; default is 107e, where e is the smallest real value such that 1+e is greater than 1 on the computerDESIGNMATRIX
= matrix Saves the design matrix for the maximal model
Parameter
formula List of explanatory variates and factors, or model formula
Description
You can use the
TERMS directive before starting to explore different subsets of explanatory variables, to allow Genstat to define a common set of units for the regression and to carry out some initial calculations. TERMS thus initializes Genstat ready for an exploration using the directives ADD, DROP, SWITCH, TRY, or STEP. It overrules any model that has already been fitted with FIT, FITCURVE, or FITNONLINEAR and resets the current model to be the null model containing only the constant term.TERMS
need not be specified before exploring a linear, generalized linear or generalized additive model, that is one that is fitted initially using FIT with its CALCULATION option unset. However, it is essential before exploring a nonlinear model, that is one fitted initially by FIT with CALCULATION set, or by FITCURVE or FITNONLINEAR. Furthermore, if some of the explanatory variables to be used in a linear, generalized linear or generalized aditive model contain missing values or have restrictions, the use of TERMS ensures that the sequence of models are fitted using a common set of units. Otherwise, if a variate or factor which is introduced into the model has a missing value where previous explanatory variates or factors did not, or is restricted whereas previous ones were not, the set of units has to be changed. The previous model is automatically refitted with the new set of units before the new model is fitted, but the accumulated summary will then show only these two fits.The formula specified by the parameter of
TERMS should contain all the explanatory variables and model terms that you may wish to use in the subsets. The model containing all the terms specified in the formula, excluding the response variates, is called the maximal model.The
PRINT option allows you to display the sums of squares and products between the variates in the model (including the response variates and dummy variates set up to represent any factors and their interactions), the means of the variates, and the degrees of freedom. It can also display the corresponding matrix of correlations between variables, and group means if the regression is within groups. The calculations are weighted if you have specified weights in the MODEL statement, and they are made within groups if you have specified a grouping factor. All units of the variates are used unless there are restrictions or missing values. You are not allowed to have different restrictions on the different vectors. Thus you can define the set of units that Genstat uses for the regressions by putting a restriction on any one of: a response variate, an explanatory variate, the weight variate, the offset variate, or the grouping factor. A missing value in any of these structures except a response variate will also exclude the corresponding unit. You should not alter the restriction applied to the vectors between the TERMS statement and subsequent fitting statements.The
FACTORIAL option controls the inclusion of interaction terms in the model. All terms involving more than the specified number of factors and variates are omitted. By default FACTORIAL is set to three. The FULL option can be set to yes to ensure that Genstat allocates a parameter to each level of any factor in a linear, generalized linear or generalized additive model; otherwise it will include parameters only for levels 2 upwards (and their estimates will represent the differences between the estimated parameters for their levels and the estimate for level 1).The
SSPM option lets you use values that you have already calculated for an SSPM or DSSP structure. This is feasible only with ordinary linear regression but it can be useful when you are analysing very large sets of data: you can accumulate a DSSP sequentially with the FSSPM directive to avoid storing all the data at one time. Later regression calculations will be based on the supplied values of the DSSP, though no fitted values, residuals, or leverages will be available. However, the values of a supplied SSPM or DSSP are accepted without checking by the TERMS directive: Genstat simply assumes you are giving it something sensible.The
TOLERANCE option controls the detection of aliasing in subsequent model fitting. By default, a parameter in a linear or generalized linear model will be deemed to be aliased if the ratio between the original diagonal value of the SSPM corresponding to this parameter and the current diagonal value of the partially inverted SSPM is less than 107e. The quantity e depends on the computer and is defined to be the smallest number such that the computer recognizes 1.0 + e as greater than 1.0 in double precision. Any positive value can be supplied by the TOLERANCE option to replace this default criterion in subsequent linear regression and generalized linear regression.The
DESIGNMATRIX option can be set to a matrix to save the design matrix corresponding to the maximal model.
TEXT directive
Declares one or more text data structures.
Options
NVALUES
= scalar or vector Number of strings, or vector of labels; default * takes the setting from the preceding UNITS statement, if anyVALUES
= strings Values for all the texts; default *MODIFY
= string Whether to modify (instead of redefining) existing structures (yes, no); default no
Parameters
IDENTIFIER
= identifiers Identifiers of the textsVALUES
= texts Values for each textCHARACTERS
= scalars Numbers of characters of the lines of each text to be printed by defaultEXTRA
= texts Extra text associated with each identifier
Description
Each unit of a Genstat text structure is a
string which you can regard as a line of textual description. Texts can be used to label vectors and pointers, for captions or pieces of explanation within output, to store Genstat statements, and to store output.The
IDENTIFIER parameter lists the texts that are to be declared. Values can be assigned to the texts by either the VALUES option or the VALUES parameter. The option defines a common value for all the texts in the declaration, while the parameter allows them each to be given a different value. If both the option and the parameter are specified, the parameter takes precedence.The
NVALUES option allows the number of values in the texts to be defined. If this is not set, the lengths of the texts are defined from the numbers that are supplied by the VALUES option or parameter. If these too are unset, Genstat takes the length specified by the preceding UNITS statement, if any.The
CHARACTERS parameter allows you to define the number of characters to be printed by default when the strings of each text are printed. You can associate a text of extra annotation with each table using the EXTRA parameter.If the
MODIFY option is set to yes any existing attributes and values of the texts are retained (if still appropriate); otherwise these are lost.
TKEEP directive
Saves results after an analysis by
ESTIMATE.
Option
SAVE
= identifier Save structure to supply fitted model; default * i.e. that from last model fitted
Parameters
OUTPUTSERIES
= variate Output series to which model was fittedRESIDUALS
= variate Residual seriesESTIMATES
= variate Estimates of parametersSE
= variate Standard errors of estimatesINVERSE
= symmetric matrix Inverse matrixVCOVARIANCE
= symmetric matrix Variance-covariance matrix of parametersDEVIANCE
= scalar Residual devianceDF
= scalar Residual degrees of freedomMVESTIMATES
= variate Estimates of missing values in seriesSEMV
= variate Standard errors of estimates of missing valuesCOMPONENTS
= pointer Variates to save components of output seriesSCORES
= variate To save scores (derivatives of the log-likelihood with respect to the parameters)
Description
An
ESTIMATE statement produces many quantities that you may want to use to assess, interpret, and apply the fitted model. The TKEEP directive allows you to copy these quantities into Genstat data structures. If the METHOD option of the ESTIMATE statement was set to initialize, then the results saved by the options SE, INVERSE, VCOVARIANCE, and SCORE are unavailable. However, you can save the estimates of the missing values and their standard errors. The residual degrees of freedom in this case does not make allowance for the number of parameters in the model, but does allow for the missing values that have been estimated.The
OUTPUTSERIES parameter specifies the variate that was supplied by the SERIES parameter of the ESTIMATE statement; this can be omitted.You can use the
RESIDUALS parameter to save the residuals in a variate, exactly as in the ESTIMATE directive.The
ESTIMATES parameter can supply a variate to store the estimated parameters of the TSM. Each estimated parameter is represented once, but the innovation variance is omitted entirely. Genstat includes only the first of any set of parameters constrained to be equal using the FIX option of ESTIMATE. The order of the parameters otherwise corresponds to their order in the variate of parameters in TSM, and is unaffected by any numbering used in the FIX option.The
SE parameter allows you to specify a variate to save the standard errors of the estimated parameters of the TSM. The values correspond exactly to those in the ESTIMATES variate. Parameters in a time series model may be aliased. This is detected when the equations for the estimates are being solved, and the message ALIASED is printed instead of the standard error when the PRINT option of ESTIMATE or TDISPLAY includes the setting estimates. The corresponding units of the SE variate are set to missing values.The
INVERSE parameter can provide a symmetric matrix to save the product (X¢ X)-1, where X is the most recent design matrix derived from the linearized least-squares regressions that were used to minimize the deviance. The ordering of the rows and columns corresponds exactly to that used for the ESTIMATES variate. The row of this matrix corresponding to any aliased parameter is set to zero except that the diagonal element is set to the missing value.The
VCOVARIANCE parameter allows you to supply a symmetric matrix for the estimated variance-covariance matrix, a2(X¢ X)-1, of the TSM parameters. The ordering of the rows and columns and the treatment of aliased parameters corresponds exactly to that used for the ESTIMATES variate.The
DEVIANCE parameter specifies a scalar to hold the final value of the deviance criterion defined by the LIKELIHOOD option of ESTIMATE.The
DF parameter saves the residual number of degrees of freedom, defined for a simple ARIMA model by N-d-(number of estimated parameters). If a seasonal model is used, this number is further reduced by Ds.The
MVESTIMATES parameter specifies a variate to hold estimates of the missing values of the series, in the order they appear in the series. You can thereby obtain forecasts of the series, by extending the SERIES in ESTIMATE with a set of missing values. This is less efficient than using the FORECAST directive, but it does have the advantage that the standard errors of the estimates take into account the finite extent of the data, and also the fact that the model parameters are estimated.The
SEMV parameter can supply a variate to hold the estimated standard errors of the missing values of the series, in the order they appear in the series.The
COMPONENTS parameter can be used after a multi-input model has been fitted using ESTIMATE to access the components of the output series that are due to the various input series; you can also access the output noise. In simple regression, the input components are proportional to the input series. But the component resulting from a transfer-function model may be quite different from this. You can examine these components separately, or sum them to show the total fit to the output series that is explained by the input series. Note that the fitted values may appear to be offset from that output series, because the constant term is part of the noise component, and so is not included. You may want to examine the output noise component. For example, if you thought that the ARIMA model for the output noise was inadequate, you could investigate the noise component with univariate ARIMA modelling.The
SCORE parameter can specify a variate to hold the model scores. The scores are usually defined as the first derivatives of the log likelihood with respect to the model parameters. To get these, the scores supplied by TKEEP should be scaled by dividing by the estimated residual variance and reversing its sign. The elements of the SCORE variate correspond exactly to the parameters as they appear in the ESTIMATES variate. After using ESTIMATE to fit a time series model, the scores should in theory be zero provided the model parameters do not lie on the boundary of their allowed range. The scores are used within ESTIMATE to calculate the parameter changes at each iteration.You can use the
SAVE option to specify the time-series save structure from which the output is to be taken. By default TKEEP uses the structure from the most recent ESTIMATE statement.
TRANSFERFUNCTION directive
Specifies input series and transfer function models for subsequent estimation of a model for an output series.
Option
SAVE
= identifier To name time-series save structure; default *
Parameters
SERIES
= variates Input time seriesTRANSFERFUNCTION
= TSMs Transfer-function models; if omitted, model with 1 moving-average parameter, lag 0BOXCOXMETHOD
= strings How to treat transformation parameters (fix, estimate); default fixPRIORMETHOD
= strings How to treat prior values (fix, estimate); default fixARIMA
= TSMs ARIMA models for input series
Description
TRANSFERFUNCTION
can be used to define input series and transfer-function models to be used by subsequent ESTIMATE statements.In its simplest form, when the
TRANSFERFUNCTION and PRIORMETHOD parameters are unset, TRANSFERFUNCTION can be used to specify the explanatory variables for a regression with autocorrelated errors.The first parameter,
SERIES, specifies a list of variates holding the time series of explanatory variables.The
BOXCOXMETHOD parameter allows you to estimate separate power transformations for the explanatory variables: the variable xt is transformed toxt(
xt(0) = log(xt)
The default is no transformation, corresponding to xt(
l) = xt. You can choose whether the transformations are to be fixed or estimated, by specifying one string for each explanatory variable.The
ARIMA parameter allows you to associate with each explanatory variable a univariate ARIMA model for the time-series structure of that variable. If you think such a model is inappropriate, then you should give a missing value in place of the TSM identifier, or leave this parameter unset. You can use these models in any subsequent FORECAST statement to incorporate, into the error limits of the forecasts, an allowance for uncertainties in the predicted explanatory variables; the allowance assumes that the future values of the explanatory variables are forecasts obtained using these ARIMA models.The
TRANSFERFUNCTION and PRIORMETHOD parameters are used to define multi-input transfer-function models.The
TRANSFERFUNCTION parameter specifies the transfer-function TSMs that are to be associated with the input series. A missing value in place of a TSM identifier causes Genstat to treat the corresponding input series as a simple explanatory variable, equivalent to a transfer-function model with orders (0,0,0,0).The
PRIORMETHOD parameter specifies, for each input series, how Genstat is to treat the transients associated with the early values of the transfer-function response. In calculating the input component zt from the input xt, Genstat has to make assumptions about the unknown values of xt which came before the observation period. The default is that xt (or generally xt(l)) is assumed to be equal to the reference constant c of the transfer-function model. The pattern of the transient can be controlled by introducing a number max(p+d,b+q) of nuisance parameters to represent the combined effects of all earlier input values on the observed output. Setting PRIORMETHOD=estimate specifies that these nuisance parameters are estimated so as to minimize the transients. You should, however, be careful in using this. Often all you will have to do is make a sensible choice of the reference constant c. Estimating the transients is best done as a final stage in refining the model; earlier, this may give poor numerical conditioning.The
SAVE option allows you to name the time-series save structure created by TRANSFERFUNCTION. You can use this identifier in a later ESTIMATE statement, and eventually in a FORECAST statement. If you do not name the save structure Genstat will use the most recent save structure, which will be overwritten each time a new TRANSFERFUNCTION statement is given.
TREATMENTSTRUCTURE directive
Specifies the treatment terms to be fitted by subsequent
ANOVA statements.
No options
Parameter
formula Treatment formula, specifies the treatment model terms to be fitted by subsequent
Description
The
TREATMENTSTRUCTURE directive defines the treatment formula which specifies treatment, or systematic, terms to be fitted in subsequent ANOVA statements. For a simple one-way analysis of variance this has the formTREATMENTSTRUCTURE Tfac
where
Tfac is a factor which indicates which treatment was received by each unit in the design. Most experiments, however, are devised to study several treatment factors. For these factorial experiments TREATMENTSTRUCTURE specifies a model formula to define the model terms to be fitted. Each model term will then have its own line in the analysis-of-variance table and, for example, will have a table of means.Initially in a job, there is no treatment formula. This situation can be restored by a
TREATMENTSTRUCTURE directive with a null setting:TREATMENTSTRUCTURE
In its simplest form, a model formula is a list of model terms, linked by the operator "
+". For example,A + B
is a formula containing two terms,
A and B, representing the main effects of factors A and B respectively. Higher-order terms (like interactions) are specified as series of factors separated by dots, but their precise meaning depends on which other terms the formula contains, as we explain below. The other operators provide ways of specifying a formula more succinctly, and of representing its structure more clearly.The crossing operator
* is used to specify factorial structures. For example, the treatment formulaTREATMENTSTRUCTURE Nitrogen * Sulphur
is expanded automatically by Genstat to become the formula
Nitrogen + Sulphur + Nitrogen.Sulphur
which has three terms:
Nitrogen for the nitrogen main effect, Sulphur for the main effect of sulphur, and Nitrogen.Sulphur for the nitrogen by sulphur interaction. Higher-order terms like Nitrogen.Sulphur represent all the joint effects of the factors Nitrogen and Sulphur that have not been removed by earlier terms in the formula. Thus here it represents the interaction between nitrogen and sulphur as both main effects have been removed.The other most-commonly used operator is the nesting operator (
/). This occurs most often in block models (specified by the BLOCKSTRUCTURE directive). For example, the formulaLitter / Rat
is expanded to become the formula
Litter + Litter.Rat
This could define the block model for a design in which there are several litters of rats. As the formula contains no "main effect" for
rat, the term Litter.Rat represents rat-within-litter effects (that is the differences between individual rats after removing any overall similarity between rats that belong to the same litter).A formula can contain more than one of these operators. The three-factor factorial model
A * B * C
becomes
A + B + C + A.B + A.C + B.C + A.B.C
and the nested structure
Block / Wplot / Subplot
which specifies the block model of a split-plot design becomes
Block + Block.Wplot + Block.Wplot.Subplot
The operators can also be mixed in the same formula. In general, if
l and m are two model formulae:l
* m = l + m + l.ml / m = l + fac(l).m
(where
l.m is the sum of all pairwise dot products of a term in l and a term in m, and fac(l) is the dot product of all factors in l). For example:(A + B) * (C + D) = (A + B) + (C + D) + (A + B).(C + D)
= A + B + C + D + A.C + A.D + B.C + B.D
(A + B)/C = A + B + fac(A + B).C = A + B + A.B.C
The other important operator for
ANOVA is the pseudo-factorial operator //. This allows you to partition an unbalanced treatment term into pseudo-terms, which are each balanced.Contrasts can be fitted by putting a function of a factor into the treatment formula, instead of the factor itself. Polynomial contrasts can be specified using the
POL or POLND functions. Other, user-defined regression models can be defined using the REG or REGND functions. COMPARISON, the other function relevant to ANOVA allows comparisons to be calculated between levels of the factor.
TRY directive
Displays results of single-term changes to a linear, generalized linear, or generalized additive model.
Options
FACTORIAL
= scalar Limit for expansion of model terms; default * i.e. that in previous TERMS statementPOOL
= string Whether to pool ss in accumulated summary between all terms fitted in a linear model (yes, no); default noDENOMINATOR
= string Whether to base ratios in accumulated summary on rms from model with smallest residual ss or smallest residual ms (ss, ms); default ssNOMESSAGE
= strings Which warning messages to suppress (dispersion, leverage, residual, aliasing, marginality, vertical, df, inflation); default *FPROBABILITY
= string Printing of probabilities for variance and deviance ratios (yes, no); default noTPROBABILITY
= string Printing of probabilities for t-statistics (yes, no); default noSELECTION
= strings Statistics to be displayed in the summary of analysis produced by PRINT=summary, the first four are relevant only for a Normally distributed response, and the last only for a gamma-distributed response (%variance, %ss, adjustedr2, r2, seobservations, dispersion, %cv); default %var,seob if DIST=normal, %cv if DIST=gamma, and disp for other distributions
Parameter
formula List of explanatory variates and factors, or model formula
Description
TRY
investigates modifications to the current regression model, which may be linear, generalized linear or generalized additive. Terms in the specified formula are dropped from the current model if they are already there, or are added to it if they are not. After each change, Genstat prints the requested output and then restores the original model before trying the next change. It is best to give a TERMS statement before using TRY to define a common set of units for the models to be investigated. If no model is fitted after the TERMS statement, the current model is taken to be the null model.The options of
TRY are the same as those of the FIT directive, except that there is no CONSTANT option. The accumulated setting of the PRINT option will show only one change at a time. Accumulated summaries produced by later statements will not have any entries for a TRY statement.
TSM directive
Declares one or more TSM data structures.
Option
MODEL
= string Type of model (arima, transfer); default arim
Parameters
IDENTIFIER
= identifiers Identifiers of the TSMsORDERS
= variates Orders of the autoregressive, integrated, and moving-average parts of each TSMPARAMETERS
= variates Parameters of each TSMLAGS
= variates Lags, if not default
Description
The TSM structure stores a time-series model which you can use with directives such as
ESTIMATE for Box-Jenkins modelling of time series. The information that you give to specify the model is stored in two variates, called the orders and the parameters; an optional third variate contains lags. The elements of a TSM are thus[1]
or ['ORDERS'];[2]
or ['PARAMETERS'];[3]
or ['LAGS'].To declare a TSM you use the
TSM directive. You set the type of model by the MODEL option. The default setting defines an ARIMA model. This is an equation relating the present value yt of an observed time series to past values. The equation includes lagged values not only of the series itself, but also of an unobserved series of innovations, at ; you can interpret the innovations as the error in predicting yt from past values yt-1, yt-2 .... The usual statistical model assumes that the innovations are a series of independent Normal deviates with mean zero and constant variance. The residuals obtained from fitting the model can be used to estimate the innovations.Using the notation of Box and Jenkins (1970), the simple non-seasonal ARIMA model for the time series yt is
q(B) {Ñ dyt(l) - c} = f(B)at
where B is the backward shift operator Bpyt =yt-p ,
Ñ
is the differencing operator Ñ yt =yt-yt-1 , Ñ dyt =Ñ d-1 (yt-yt-1 ), andq(B) = 1 - q1B - ... - qpBp
f(B) = 1 - f1B - ... - fqBq
The parameter l specifies a Box-Cox power transformation defined by
yt(l) = (ytl - 1) / l, l 0
yt(0) = log(yt)
However, in the default case when l is fixed and not estimated, the value l=1 implies no transformation and then yt(1)=yt rather than yt-1. If l 1 or if l is to be estimated, then Genstat will not let you have values of yt £ 0. The usual case however is that l=1 and is not to be estimated, so that yt may take any values.
The ORDERS parameter is a list of variates, one for each of the models. For each simple ARIMA model, the variate contains the three values p, d, and q.
The PARAMETERS parameter is a list of variates, one for each of the models. For each simple ARIMA model, the variate contains (3+p+q) values: l, c, sa2, q1...qp, f1...fq. You must always include the first three parameters. The parameter sa2 is the innovation variance.
Whenever a TSM is used, Genstat checks its values. The orders must all be non-negative. The parameters l and c can take any values, but sa2 must be non-negative. The next p+q values specify the autoregressive and moving-average parameters: they must satisfy the stationarity and invertibility conditions for ARIMA models (see Box and Jenkins 1970). An exception is that before estimation the model parameters may be unset, in which case Genstat sets them to default values. You can omit the PARAMETERS parameter, in which case an unnamed structure is defined to contain the default values. However, you should usually specify the variate of parameters, and if possible assign good preliminary values before estimation (see FTSM) as this will speed up the model fitting process.
The LAGS parameter is a list of variates, one for each of the models. For each simple ARIMA model, this variate contains p+q values, one corresponding to each of the autoregressive and moving-average parameters. Genstat then modifies the ARIMA model by defining
The LAGS parameter for this model contains l1...lp, m1...mq. The sequences of lags l1...lp must be positive integers that are strictly increasing; the default values are 1...p if LAGS is not set. The same rule applies to m1...mq.
The seasonal ARIMA model for the time series yt is an extension of the simple model, to the form
q(B) F(Bs) { Ñ dÑ sDyt(l) - c } = f(B) Q(Bs) at
where the extra, seasonal, operators associated with seasonal period s are of three types:
which is seasonal autoregression of order P;
which is seasonal differencing of order D; and
which is seasonal moving average of order Q.
When seasonal terms are to be included, you must extend the ORDERS parameter so that it contains p, d, q, P, D, Q, and s. Even if the non-seasonal part of the model has p=d=q=0, these parameters must still be included at the beginning of the list. The seasonal orders must satisfy P³ 0, D³ 0, Q³ 0 and s³ 1.
You must also extend the PARAMETERS parameter to contain:
l, c, sa2, q1...qp, f1...fq, F1...FP, Q1...QQ
You can modify the seasonal model to allow other lags:
The sequence of lags L1...LP must be strictly increasing and must be positive-integer multiples of the period s; the default values are s, 2s ... Ps. The same rules apply to M1...MQ. For any seasonal model, you must extend the LAGS parameter, if supplied, so that it contains
l1 ... lp, m1 ... mq, L1 ... LP, M1 ... MQ.
You can use multiple seasonal periods, by extending the variate of ORDERS with further seasonal orders P¢ , D¢ , Q¢ , and s¢ . You must correspondingly extend the variates of PARAMETERS and LAGS. It is also possible to set the seasonal periods to 1, which means you can estimate non-seasonal models with factored operators.
You can declare an ORDERS variate to have more values than is necessary, provided that the extra values are filled with zeroes, and that the number of values is 3+4k, k being the number of seasonal periods. The same applies to PARAMETERS and LAGS variates, except that Genstat ignores the extra values whatever they may be. Thus you can extend a simple model to a seasonal model, simply by resetting the extra values.
Setting MODEL=transferfunction defines a transfer-function model. The simple non-seasonal transfer-function model relates a component zt of the output series to the corresponding input series xt, by the equation
d(B) Ñ d zt = w(B) Bb {xt(l) - c}
where
d(B) = 1 - d1 B - ... - dp BP
w(B) = w0 - w1 B - ... - wq Bq .
The integer b>0 defines a pure delay, and the integer d>0 defines the order of differencing in the transfer function.
The parameter l specifies a Box-Cox power transformation for the input series, and the parameter c specifies a reference level for the transformed input. There is no mean correction of the input series when transfer-function models are estimated, and you should use a value of c close to the series mean so as to improve the numerical conditioning of the estimation procedure. However, if the input series xt is trend-like rather than stationary, you could alternatively use a value for c close to the early series values, because this reduces the transient errors that arise when the transfer function is applied. The PRIORMETHOD parameter of TRANSFERFUNCTION, described below, provides further means of handling these transients.
The parameters l and c are not estimated unless you specify otherwise by the BOXCOXMETHOD parameter of TRANSFERFUNCTION or the FIX option of ESTIMATE. Often c in the transfer-function model is aliased with the constant term in the ARIMA errors, and so they should not both be estimated. In some circumstances, however, they both could be estimated, for example in a differenced transfer-function model with stationary noise.
The ORDERS parameter for the simple transfer-function model described above specifies a variate containing the four values b, p, d, and q.
The PARAMETERS parameter specifies a variate containing 3+p+q values: l, c, d1, ... dp, w0, w1 ... wq. You must always include the parameters l, c, and w0. When you use a transfer-function model, Genstat will check its parameter values. In particular the operator d(B) must satisfy the stability or stationarity condition.
The LAGS parameter is optional, and may be used to change the lags associated with the parameters, from the default values of 1 ... p, 1 ... q. The variate of lags contains values corresponding to the parameters d1 ... dp, w1 ... wq. They have the same interpretation as the lags in ARIMA models, and must satisfy the same conditions. Note that there is no lag associated with w0, because the delay b provides the necessary flexibility for this.
You can also have seasonal extensions of transfer-function models:
d(B)D(Bs)Ñ dÑ sDzt = w(B)W(Bs)Bb{xt(l) - c}
D(Bs) = 1 - D1 Bs - ... - DP BPs
W(Bs) = 1 - W1 Bs - ... - WQ BQs
Note that there is no W0 coefficient, because w0 is always present in the model and provides sufficient flexibility.
The ORDERS parameter here contains b, p, d, q, P, D, Q, and s, and the PARAMETERS parameter contains l, c, d1 ... dp, w0 ... wq, D1 ... DP, W1 ... WQ. You can analogously extend the LAGS parameter. You can also have extensions to multiple seasonal periods, as for ARIMA models.
Reference
Box, G.E.P. and Jenkins, G.M. (1970). Time series analysis, forecasting and control. Holden-Day, San Francisco.
TSUMMARIZE directive
Displays characteristics of time series models.
Options
GRAPH
= strings What to display with graphs (autocorrelations, impulse, piweight, psiweight); default *MAXLAG
= scalar Maximum lag for results; default 0
Parameters
TSM
= TSMs Models to be displayedAUTOCORRELATIONS
= variates To save theoretical autocorrelationsIMPULSERESPONSE
= variates To save impulse-response functionSTEPFUNCTION
= variates To save step function from impulsePIWEIGHTS
= variates To save pi-weightsPSIWEIGHTS
= variates To save psi-weightsEXPANSION
= TSMs To save expanded modelsVARIANCE
= scalars To save variance of each TSM
Description
The
TSUMMARIZE directive helps you investigate time-series models by displaying or saving various characteristics. These are the theoretical autocorrelation function of an ARIMA model, and the pi-weights and psi-weights; also the impulse-response function of a transfer-function model. TSUMMARIZE can derive the expanded form of a model, in which all seasonal terms are combined with the non-seasonal term.For an ARIMA model in the
TSM parameter, you can set only the AUTOCORRELATIONS, PSIWEIGHTS, and PIWEIGHTS parameters. Also, you can set the IMPULSERESPONSE parameter only for a transfer-function model. You can set the EXPAND parameter for either type of model. The TSMs in any TSUMMARIZE statement must be completely defined; that is, you must have set the orders and parameters, and the lags if you are using them. The only exceptions are that Genstat takes the transformation parameter to be 1.0 if it is missing, and that the innovation variance of an ARIMA model need not be set.The
MAXLAG option specifies the maximum lag to which Genstat is to do calculations: this applies to autocorrelations, psi-weights, pi-weights, and impulse responses. If MAXLAG is unset, the maximum lag is defined implicitly as the length of the first variate in the parameters. However, if the length of this variate is also undefined, the maximum lag cannot be defined and Genstat reports a fault.You can set the
PRINT and GRAPH options independently of the parameters: these store results, and display the various characteristics of models.The
AUTOCORRELATIONS parameter allows you to store the theoretical autocorrelation function of an ARIMA model. Such a model uniquely defines an autocorrelation function whose values r0 ... rm are assigned by Genstat to the variate R, where m is the maximum lag. If the model has differencing parameters d=D=0, then the autocorrelation function is that of a series yt that follows this model.If either d>0 or D>0, then the theoretical autocorrelations are calculated as if d=D=0, and so they correspond to those of the differenced yt series. This is because the autocorrelations of yt are undefined for non-stationary models.
The
PSIWEIGHTS parameter allows you to store the theoretical psi-weights y0 ... ym of an ARIMA model. These are used internally by Genstat when error limits are calculated for forecasts obtained using the model. You will need them for example if you want to calculate the variance of the total of the forecast values up to some specified maximum lead time. They are defined for a non-seasonal model by1 + y1B + y2B2 + ... = f(B) / { q(B)Ñ d }
The PIWEIGHTS parameter allows you to store the theoretical pi-weights P0 ... Pm of an ARIMA model: these show explicitly how past values contribute to a forecast. The weights are defined by:
1 - P1B - P2B2 - ... = { q(B)Ñ d } / f(B)
The IMPULSERESPONSE parameter allows you to store the theoretical impulse-response function, v0 ... vm, of a transfer-function model. This function can help you interpret the model. The sequence is defined for a non-seasonal transfer-function model by:
v0 + v1B + v2B2 + ... = w(B)Bb / { d(B)Ñ d }
For an ARIMA model you can combine into one generalized autoregressive operator all the differencing operators, the non-seasonal autoregressive operators, and the seasonal autoregressive operators. The non-seasonal and seasonal moving-average operators may similarly be combined. This expanded model can be printed using the expansion setting of PRINT and saved using the EXPANSION parameter. It can be used to help you understand a series. But you might also want to re-estimate the parameters in the expanded model, to test whether the differencing operators or seasonal factors unnecessarily constrain the structure of the original model. If you have not previously defined one of the identifiers supplied by the EXPANSION parameter, Genstat will automatically define it to be a TSM, and its component variates will be set up to have the length defined by the corresponding model in the TSM parameter. The expansion does not change the transformation parameter of the model, nor the constant term, nor the innovation variance. If the model that you have supplied contains non-zero differencing orders, then the generalized model does not satisfy the stationarity constraint on the parameters; neither does the constant term have the same interpretation as it had in the supplied model. The expansion of transfer-function models exactly parallels that of ARIMA models.
UNITS directive
Defines an auxiliary vector of labels and/or the length of any vector whose length is not defined when a statement needing it is executed.
Option
NVALUES
= scalar Default length for vectors
Parameter
variate or text Vector of labels
Description
The
UNITS directive can be used to define a default length which will then be used, if necessary, for any new vectors encountered later in the job. For example, in the statementsUNITS [NVALUES=20]
TEXT Subject
VARIATE [VALUES=0,1,2,4,8] Dlev
FACTOR [LEVELS=Dlev] Drug
VARIATE Age,Response; DECIMALS=0,2
the text
Subject, the factor Drug, and the variates Age and Response are all defined to be of length 20. However, the length of the variate Dlev does not need to be set by default, but is deduced to be five from the number of values that have been specified by the VALUES option.The
READ directive will use UNITS if values are to be read into a previously undeclared vector, as will the RESTRICT directive if you use it to restrict a structure that has not yet been declared. The UNITS setting is also used by the CALCULATE directive with the EXPAND and URAND functions if their secondary argument is not specified.The parameter of the
UNITS directive allows you to specify the units structure, which is a variate or a text whose values will then be used as labels for output from regression or time-series directives, provided the vectors in the analysis have the same length as the units structure and provided also that these vectors do not have labels associated with them already.The length of the units structure must match the value set by the
NVALUES option if both are set. However, either one can be used to define the other. Thus, eitherTEXT [VALUES=Sun,Mon,Tue,Wed,Thur,Fri,Sat] Day
UNITS Day
or
TEXT Day
UNITS [NVALUES=7] Day
would specify the default length for vectors to be seven. In the second example this default would be applied to
Day too but, of course, its (seven) values would need to be read or defined in some other way before it could be used for labelling. If the type of the units structure has not been declared, UNITS will define it as a variate.You can cancel the effect of a
UNITS statement byUNITS [NVALUES=*]
This means that statements that require a units structure will fail, which is the situation at the start of each job in a program. Similarly, the statement
UNITS *
cancels any reference to a units structure, but retains the default length if that has already been defined.
VARIATE directive
Declares one or more variate data structures.
Options
NVALUES
= scalar or vector Number of units, or vector of labels; default * takes the setting from the preceding UNITS statement, if anyVALUES
= numbers Values for all the variates; default *MODIFY
= string Whether to modify (instead of redefining) existing structures (yes, no); default no
Parameters
IDENTIFIER
= identifiers Identifiers of the variatesVALUES
= identifiers Values for each variateDECIMALS
= scalars Number of decimal places for outputEXTRA
= texts Extra text associated with each identifierMINIMUM
= scalars Minimum value for the contents of each structureMAXIMUM
= scalars Maximum value for the contents of each structure
Description
The variate is probably the structure that you will use most often in Genstat. You can think of this as being just a list of numbers - a vector, in mathematical language. Variates occur for example as the response and explanatory variables in regression, as covariates and y-variables in analysis of variance, and can be used to form the matrices of correlations, similarities, or sums of squares and products required for multivariate analyses.
The
IDENTIFIER parameter lists the variates that are to be declared. Values can be assigned by either the VALUES option or the VALUES parameter. The option defines a common value for all the variates in the declaration, while the parameter allows them each to be given a different value. If both the option and the parameter are specified, the parameter takes precedence.The
NVALUES option allows the number of values in the variates to be defined. If this is not set, the lengths of the variates are defined from the numbers that are supplied by the VALUES option or parameter. If these too are unset, Genstat takes the length specified by the preceding UNITS statement, if any.The
DECIMALS parameter allows you to define a number of decimal places to be used by default when each variate is printed. You can associate a text of extra annotation with each variate using the EXTRA parameter. The MINIMUM and MAXIMUM parameters allow you to define lower and upper limits on the values in each variate. Genstat then prints warnings if any values outside that range are allocated to the variate.If the
MODIFY option is set to yes any existing attributes and values of the variates are retained (if still appropriate); otherwise these are lost.
VCOMPONENTS directive
Defines the variance-components model for
REML.
Options
FIXED
= formula Fixed effects; default *ABSORB
= factor Defines the absorbing factor; default * i.e. noneCONSTANT
= string How to treat the constant term (estimate, omit); default estiFACTORIAL
= scalar Limit on the number of factors or covariates in each fixed term; default 3CADJUST
= string What adjustment to make to covariates before analysis (mean, none); default meanRELATIONSHIP
= matrix Defines relationships constraining the values of the components; default *SPLINE
= formula Defines random cubic spline terms to be generated: each term must contain only one variate, if there is more than one factor in a term, separate splines are calculated for each combination of levels of the factorsSPLESTIMATES
= string Whether to use common or separate smoothing parameters for splines corresponding to different factor levels within a term (common, separate); default commSPLDEVIATIONS
= string Whether to estimate the deviations between the spline fit and the fitting of a separate effect for each distinct value of the variate (possible only when there are replicate observations for the values of the variate) (no, yes); default no
Parameters
RANDOM
= formula Random effectsINITIAL
= scalars Initial values for each componentCONSTRAINTS
= strings How to constrain each component (none, positive, fixrelative, fixabsolute); default none
Description
The
VCOMPONENTS directive specifies the linear mixed model to be fitted by subsequent REML statements.Random effects are used to describe the effects of factors where the values present in the experiment represent a random selection of values from some large homogeneous population. Inference about this population can then be made, for example estimation of its variance. Predictions of random effects may also be of interest. Fixed effects are used to describe treatments imposed in an experiment where the effect of those specific choices of treatment are of interest.
For example, consider a split-plot experiment used to assess the effects on yield of three oat varieties with four levels of nitrogen application. Here specific levels of nitrogen application have been used and the aim is to estimate the effects of these levels; so they would be considered as fixed effects in the model, as would the three oat varieties. However, the effects of the actual blocks and plots in the experiment are not of interest in themselves, but they do provide a means of estimating the variability of the more general population of blocks and plots in order to get an estimate of background variation against which to compare the fixed effects. Blocks and plots would therefore be defined as random effects. In this case, the fixed effects correspond to the effects used as treatments in
ANOVA and the random effects would correspond to the blocking factors in ANOVA.In general, both the fixed and random parts of the model are constructed from several factors or variates. The structure of both parts is specified using model formulae and can contain both factors and variates with the usual adding, crossing or nesting operators.
The fixed terms in the model are defined by a model formula supplied using the
FIXED option, and the random model terms are defined by a model formula supplied by the RANDOM parameter. Thus, for example, the model for the split-plot experiment described above would be specified byVCOMPONENTS [FIXED=Nitrogen*Variety] \
RANDOM=Block/Wplot/Subplot
where
Nitrogen and Variety are factors indicating the treatments applied to each unit, and Block, Wplot, and Subplot are factors indicating the block, wholeplot (within block) and subplot (within wholeplot) to which each unit belongs.The default fixed model consists of just the constant term, which then becomes the grand mean. The constant term can be omitted by setting option
CONSTANT=omit, provided a fixed model has been specified. If the random model is unset, only a single source of variation (the residual component called *units*) is used.When covariates are included in the fixed or random models, by default they are automatically centred before analysis. However, you can set option
CADJUST=none to specify that the uncentred covariates are to be used instead.The
FACTORIAL option is used to set a limit on the number of factors and variates allowed in each fixed term; any term containing more than that number is deleted from the model.The
SPLINE option can be used to generate cubic spline terms to be fitted as part of the random model. The smoothing parameter is estimated by REML and the fitted spline is interpreted as a BLUP (best linear unbiased predictor). If a term consists of a single variate, for example SPLINE=X, a cubic spline will be generated using all distinct covariate values present as knots, with weighting for replicate points. If factors are included, for example SPLINE=N.V.X where N and V are factors, separate cubic splines will be generated from the distinct covariate values present at each level of the combined set of factors. In this case, a common smoothing parameter will be fit to all levels unless option SPLESTIMATE=separate is used to generate a separate smoothing parameter for each factor level. When replicate observations are available at each covariate value, option SPLDEVIATIONS can be used to fit a separate effect at each knot point (i.e. a saturated model). This can be used to assess the fit of the spline.For random terms, initial values for the ratio of variance components to the error variance (the gamma ratios) are supplied using the
INITIAL parameter, and you can impose constraints on the variance components using either the RELATIONSHIP option or the CONSTRAINTS parameter. By default, all the gamma ratios have initial values of one. The CONSTRAINTS parameter can request that any variance component should be held positive or fixed at its initial value. The default setting, none, allows the variance components to become negative, provided the overall estimated variance-covariance matrix for the data remains positive definite. The RELATIONSHIP option can be used to define linear relationships between the variance components, for example that component A should be constrained to be twice component B.Covariance models for random terms, including unknown parameters to be estimated, can be specified using the
VSTRUCTURE directive.The
ABSORB option allows you to specify a factor from either the fixed or the random model to act as an absorbing factor for the model. Note that the absorbing factor is ignored for the AI algorithm with sparse matrix methods: that is, either when VSTRUCTURE is used to define covariance models or when the REML option METHOD=AI is set. The absorbing factor is used to divide the model terms into two groups; this partition is then used in calculations during the fitting process to reduce the size of the matrices that have to be inverted and stored. Use of an absorbing factor can therefore save computing time and data space. However, although exactly the same model is fitted when an absorbing factor is used, some of the standard errors are unavailable. A good choice of absorbing factor might be a factor with a large number of levels, or any factor whose effects and standard errors are not of interest.
VDISPLAY directive
Displays further output from a
REML analysis.
Options
CHANNEL
= identifier Channel number of file, or identifier of a text to store output; default current output filePTERMS
= formula Terms (fixed or random) for which effects or means are to be printed; default * implies all the fixed termsPSE
= string Standard errors to be printed with tables of effects and means (differences, estimates, alldifferences, allestimates, none); default diff
Parameter
pointers Save structure containing the details of each analysis; default is to take the save structure from the latest
Description
The
VDISPLAY directive allows further output to be produced from one or more REML analyses without having to repeat all the calculations.Information from a
REML analysis can be stored using the parameter SAVE in the REML statement for use in the SAVE parameter of VDISPLAY. Several SAVE structures can be specified, corresponding to the analyses of several different variates. By default, the save structure for the last y-variate analysed is saved automatically and used by VDISPLAY.The options of
VDISPLAY are the same as those that control output from REML: PRINT, PTERMS and PSE, plus the CHANNEL option which allows output to be directed to another output channel or into a text structure. The available settings of PRINT are identical to those in REML. For example, the commandsVCOMPONENTS [FIXED=Nitrogen*Variety] RANDOM=Block/Wplot/Splot
REML [PRINT=model,wald,components] Yield
VDISPLAY [PRINT=effects]
print the effects for the fixed terms after the analysis, without having to re-run the algorithm.
VKEEP directive
Copies information from a
REML analysis into Genstat data structures.
Options
RESIDUALS
= variate Residuals from the analysisFITTEDVALUES
= variate Fitted values from the analysisSIGMA2
= scalar Variance component for the lowest stratumVCOVARIANCE
= symmetric matrix Variance-covariance matrix for the estimates of the variance componentsFULLVCOVARIANCE
= symmetric matrixVariance-covariance matrix for the full set of fixed and random effects not associated with the absorbing factor
DEVIANCE
= scalar Residual deviance from fitting the full fixed modelDF
= scalar Residual degrees of freedom after fitting the full fixed modelSUBDEVIANCE
= scalar Residual deviance after fitting the sub-model of the fixed modelSUBDF
= scalar Residual degrees of freedom after fitting the sub-model of the fixed modelRSS
= scalar Residual sum of squares from fitting the FIXED model by general least squares with a covariance matrix derived from the estimated variance componentsRMETHOD
= string Which random terms to use when calculating RESIDUALS (final, all, notspline); default uses the setting from the REML statementSAVE
= pointer Save structure from the required analysis; default * takes the save structure from the latest REML statement
Parameters
TERMS
= formula Terms for which information is to be savedCOMPONENTS
= scalars Estimated variance componentsMEANS
= tables Table of predicted means for each termSEDMEANS
= symmetric matrices Standard errors of differences between the predicted meansVARMEANS
= symmetric matrices Variance-covariance matrix of the meansEFFECTS
= tables Table of estimated regression coefficients for each termSEDEFFECTS
= symmetric matricesP Standard errors of differences between the estimated parameters of each term
VAREFFECTS
= symmetric matricesVariance-covariance matrix of the effects of a term
Description
The
VKEEP directive is used to copy results from a REML analysis into Genstat data structures. Genstat automatically stores the save structure for the last y-variate that was analysed using REML, and by default this save structure provides the information for VKEEP. Alternatively, you can save the information from a REML analysis in a save structure using the SAVE parameter in the REML directive, then access the information by specifying the same structure in the SAVE option of VKEEP.Overall information from the analysis is saved using the options of
VKEEP, while the parameters are used to save information for specific model terms. The terms (fixed, random or a mixture) for which you require information are defined by a formula using the TERMS parameter. The other parameters can then be used to specify structures for saving information for each of the model terms.Options
RESIDUALS and FITTEDVALUES are used to specify variates to hold the residuals and fitted values, which are defined according to the setting of the RMETHOD option, as for the REML directive. The residual variance can be stored in a scalar using option SIGMA2. The VCOVARIANCE option can supply a symmetric matrix to save the variance-covariance matrix for the estimates of variance components. The FULLVCOVARIANCE option can be used to store the variance-covariance matrix for the full set of fixed and random effects, excluding those in the absorbing factor model. This matrix will often be very large, and is useful only for looking at covariances between effects associated with different model terms, since the variance-covariance matrices for individual model terms can be stored using the VAREFFECTS parameter. The residual deviance from fitting the full fixed model or the submodel can be saved using options DEVIANCE and SUBDEVIANCE respectively, and the associated residual degrees of freedom can be saved using options DF and SUBDF. The RSS option is used to save the residual sum of squares from fitting the fixed model by generalized least squares. For example, after a REML analysis, to save the residuals and fitted values into variates called Res and Fit respectively, you can use the commandVKEEP [RESIDUALS=Res; FITTED=Fit]
The formula given in the
TERMS parameter is expanded to give a series of model terms. The other parameters of VKEEP are taken in parallel with these terms. The string 'Constant' can be used within the formula to save structures associated with the constant term. The COMPONENTS parameter allows you to save the estimated variance component for each random term in the TERMS list. Tables of means for each term can be saved using the MEANS parameter, and standard errors of differences between the means are saved by SEDMEANS. You can also save the estimated variance-covariance matrix for the means of each term using parameter VARMEANS. The EFFECTS parameter is used to save tables of estimated parameters. A symmetric matrix of the standard errors of differences between the effects of each term can be saved using parameter SEDEFFECTS, and the estimated variance-covariance matrix for the parameters can be saved using parameter VAREFFECTS. For example, to save a table of means and its variance-covariance matrix for termS A and B fitted in a REML analysis, you can use the commandVKEEP A+B; MEANS=MeanA,MeanB; VARMEANS=VarmeanA,VarmeanB
Then
MeanA and MeanB will be tables containing predicted means for factors A and B, and VarmeanA and VarmeanB will be symmetric matrices containing the variances and covariances between the table cells.
VPEDIGREE directive
Generates an inverse relationship matrix for use when fitting animal or plant breeding models by
REML.
Options
SEX
= string Possible sex categories of parents (fixed, either); default fixeUNKNOWN
= scalar Value to be treated as unknown
Parameters
INDIVIDUALS
= factors Individuals on which data has been measuredMALEPARENTS
= factors Male parents of the progenyFEMALEPARENTS
= factors Female parents of the progenyINVERSE
= pointer Inverse relationship matrix in sparse matrix formPOPULATION
= variates Full list of identifiers generated from the individuals and parents
Description
VPEDIGREE
is used to generate a sparse inverse relationship matrix for use when fitting animal (or plant) breeding models by REML. The algorithm requires three parallel factors as input. The numerical levels of these factors must give identifiers for the individuals from which data is available (INDIVIDUALS) and the identifiers for the male and female parents for each individual(MALEPARENTS and FEMALEPARENTS). Note that an individual may appear as both progeny and a parent (for example, when data has been taken from several generations) and conversely, that if an identifier appears in more than one list then it is assumed to refer to a single individual. Also, the algorithm does not take account of labels, so where textual labels are used the labels vectors of the three factors should be identical in order to generate matching levels vectors and thus avoid errors. A complete list of all individuals in the three factors is compiled and can be saved using the POPULATION option, and on output, the three factors will be redefined with this list as their levels vector.The inverse relationship matrix that is generated is held in a special sparse matrix form (that is, only non-zero values are stored), using a pointer. This is usable in the
VSTRUCTURE directive but not, currently, elsewhere in Genstat. The second element of the pointer is a variate storing the non-zero values of the inverse matrix in lower-triangular order. The first element of the pointer is an integer index vector. This vector is not a standard Genstat data structure, and so cannot be used except by VSTRUCTURE.By default, it is assumed that an individual can act as either a male or female parent but not both. Option
SEX=either can be used to specify that individuals can act as both male and female parents. This may be useful, for example, in plant breeding analyses.Missing values in any of the factors will be treated as coding for unknown individuals. Option
UNKNOWN allows you to specify an additional scalar value used to represent unknown individuals.
VSTATUS directive
Prints the current model settings for
REML.
Option
No parameters
Description
VSTATUS
can be used to print out and hence check the fixed and random models and covariance structures as set up by the VCOMPONENTS and VSTRUCTURE directives, prior to using REML to run an analysis.
VSTRUCTURE directive
Defines a variance structure for random effects in a
REML model.
Options
TERMS
= formula Model terms for which the covariance structure is to be definedFORMATION
= string Whether the structure is formed by direct product construction or by definition of the whole matrix (direct, whole); default direCORRELATE
= string Whether to impose correlation across the model terms if several are specified (none, positivedefinite, unrestricted); default noneCINITIAL
= scalars Initial values for covariance matrix across termsEQUAL
= factors Factors whose parameters are to be equal across the model termsCOORDINATES
= identifiers Coordinates of the data points to be used in calculating distance-based models (list of variates or matrix)
Parameters
MODEL
= strings Type of covariance model associated with the term(s), or with individual factors in the term(s) if FORMATION=direct (identity, fixed, AR, MA, ARMA, power, banded, antedependence, unstructured, diagonal, uniform) default idenORDER
= scalar Order of modelHETEROGENEITY
= string Heterogeneity for correlation matrices (none, outside); default noneMETRIC
= string How to calculate distances when MODEL=power (cityblock, squared, euclidean); default cityFACTOR
= factors Factors over which to form direct productsMATRIX
= identifiers To define matrix values for a term or the factors when MODEL=fixedINVERSE
= identifiers To define values for matrix inverses (instead of the fixed matrices themselves) when MODEL=fixedINITIAL
= variates Initial parameter values for each correlation matrixCONSTRAINTS
= texts Texts containing strings none, fix or positive to define constraints for the parameters in each model
Description
VSTRUCTURE
can be used to define the form of covariance structure for any term in the random model defined for REML by VCOMPONENTS. By default, the effects for each random term are assumed to be independent with common variance sj2 for term j, that is, the random term has covariance matrix sj2I. VSTRUCTURE is used to define correlation between random effects within terms, to allow a changing variance within a term, and to define correlations between different random terms. These models are particularly useful when fitting linear models to repeated measurements or spatial data and for random coefficient regression.VSTRUCTURE
can only be used after VCOMPONENTS has been used to define the fixed and random models. It can be used more than once to define different structures for different random terms. The information is accumulated within Genstat, and it will all be used by subsequent REML commands. You can check on the model and covariance structures defined at any time by using the VSTATUS directive. To cancel a covariance structure for a term you simply need to use VSTRUCTURE to change the model back to the default sj2I. To cancel all covariance structures you should give a new VCOMPONENTS command.For a random term constructed from more than one factor, the covariance matrix can be formed either as a single matrix for the whole term, or as the direct product of several matrices corresponding to the factors. Consider an analysis of repeated measurements where data has been taken weekly from each subject, and one of several different treatments has been applied to each subject. It is likely that data taken from the same subject will be correlated, with correlation decreasing over time, but that subjects will be independent. This corresponds to an I
Ä C covariance structure, where the identity matrix I corresponds to the independent subjects, and the covariance matrix C corresponds to the correlated measurements over time within subjects. If we take C to be an auto-regressive process of order 1, this can be defined and fitted as follows:VCOMPONENTS [FIXED=Tmt] RANDOM=Subject.Week
VSTRUCTURE [TERM=Subject.Week] MODEL=I,AR; ORDER=1; \
FACTOR=Subject,Week
REML Y
The
TERM option is used to specify the term to which the covariance structure is to be applied. For each factor in the term you can then specify the covariance model to be applied (see below for list of available models). However, it is not necessary to specify factors for which the default identity model is required, so the following is an equivalent specification:VCOMPONENTS [FIXED=Tmt] RANDOM=Subject.Week
VSTRUCTURE [TERM=Subject.Week] MODEL=AR; ORDER=1; FACTOR=Week
To cancel the covariance structure for the term, a null setting is sufficient:
VSTRUCTURE [TERM=Subject.Week]
It is instructive to compare the auto-regressive model fitted above with the standard ANOVA analysis:
VCOMPONENTS [FIXED=Tmt] RANDOM=Subject/Week
REML Y
Although the covariance structure for each term here is of the form Gj = I, the variance matrix for the data is of the form
V =
s2 ( åj gjZjGjZ + I )In this case the random subject term generates correlations that are equal across all the times within subjects. It is important to remember that including a random term in the model will generate uniform correlations between units with the same values of the random factor(s). It is usually necessary to exclude these terms when the object is to model the correlations explicitly.
The possible settings for the MODEL parameter, generating symmetric covariance matrices C (Ci, j = Cj, i for all i, j), are listed below. Where more than one model order can be used, the default is shown in bold and can be changed by using the ORDER option. For the AR, MA, ARMA, power and banded models, the order is the same as the number of parameters to be fitted. For the banded, correlation, antedependence and unstructured models, the order is the number of non-zero off-diagonal bands in the matrix.
identity
identity matrix Ci, i = 1, Ci,j = 0, for i jfixed
fixed matrix Ci, j specifiedAR
auto-regressive Ci, i = 1order 1 or 2 Ci+1, i =
q1 / (1-q2)(q2=0 for order 1) Ci, j = q1 Ci-1, j + q2 Ci-2, j, i>j+1
-1 < q1, q2 < 1, 1/2 q1±q21/2 <1
MA
moving average Ci, i = 1order 1 or 2 Ci+1, i = -
f1(1-f2) / (1+f +f )(
f2=0 for order 1) Ci+2, i = -f2 / (1+f +f )Ci, j = 0, i>j+2
-1 <
f1, f2 < 1, f2±f1 < 1ARMA
auto-regressive Ci, i = 1moving-average Ci+1, i = (
f-q)(1-qè)/(1+f2-2qè)order 1 Ci, j =
qCi-1, j , i>j+1-1 <
q, f < 1power
based on distance Ci, i = 1order 1 or 2 Ci, j =
q1d1q2d2(
q1 = q2 for order 1) d1, d2 = distance in 1st and2nd dimensions
0 < q1, q2 < 1
banded
equal bands Ci, i = 11 < order < nrows-1 Ci+k, i =
fk , 1 < k < order-1 < fk < 1
Ci+k, i = 0, otherwise
correlation
general correlation matrix Ci, i = 11 < order < nrows-1 Ci, j =
fij ,1 < 1/2 i-j1/2 £ order
Ci, j = 0, 1/2 i-j1/2 > order
-1 < fij < 1
uniform
uniform matrix Ci, j = f " i,jdiagonal
diagonal matrix Ci, i = fiCi, j = 0, i¹ j
antedependence
antedependence model C-1 = UDU¢1 < order < nrows-1 Di, i = di ,
Di, j = 0 for i
Ui, i = 1,
Ui, j = uij ,
1
Ui, j = 0, for i>j
unstructured
general covariance matrix Ci, j = fij ,1 < order < nrows-1 0 < 1/2 i-j1/2 £ order
Ci, j = 0, 1/2 i-j1/2 > order
Initial parameter values can be specified using the INITIAL parameter. For most models, the number of initial values required is the number of parameters, and default values will be generated. However, for the antedependence and unstructured models, a full covariance matrix of initial values must be given, and for the correlation model a full correlation matrix must be provided. This is required because the algorithm may not converge when many parameters are fitted if the starting values are not realistic. Initial values might be generated from covariance matrices estimated by fitting simpler models, or from residuals from a null variance model. A missing value in the initial values is taken to mean that the value is inestimable and it will be fixed at a small value for the analysis.
For the models defined in terms of correlation matrices, that is, the AR, MA, ARMA, power, banded, and correlation models, it may sometimes be desirable to allow for unequal variances. This can be done by setting option HETEROGENEITY=outside. This means a diagonal matrix D of standard errors will be applied to the correlation matrix C to generate a matrix DCD¢ . In this case, a number of extra parameters (equal to the number of effects in the factor or term) should be added to the vector of initial values. These models allow investigation of a structured correlation pattern for changing variances and are particularly useful in the analysis of repeated measurements data when variance increases over time. For example, to allow for changing variance over time in our example above, we can specify
VCOMPONENTS [FIXED=Tmt] RANDOM=Subject.Week
VSTRUCTURE [TERM=Subject.Week] MODEL=AR; ORDER=1;\
FACTOR=Week; HETEROGENEITY=outside
REML Y
In some circumstances, you may wish to define a single model to apply to the whole term, instead of using the direct product form illustrated above. In this case, you should set option
FORM=whole. Note that, when a term consists of a single factor, it is not necessary to set the FACTOR option.When
MODEL=fixed is used, you must either give the values of the covariance matrix C using option MATRIX, or give the inverse matrix using option INVERSE. Values for the matrix or its inverse can be supplied as diagonal matrices or symmetric matrices. In addition, values for the inverse matrix can be supplied in sparse form as a pointer. The output from VPEDIGREE is designed for input here, but you can also define the inverse matrix explicitly. The second element of the pointer should then be a variate containing the non-zero values of the inverse in lower triangular order. The first element should be a factor, with number of levels equal to the number of rows n(n+1)/2 of the matrix. This has firstly a block of n values giving the position in the variate of the first value stored for each row. There is then a block of values for each row in turn, giving the columns in which each non-zero value appears.When
MODEL=power is used to define a distance-based model, the coordinates of the data points must be specified by the COORDINATES option using either a list of variates or a matrix. The number of units for the coordinates must be equal to the number of units in model factors/variates (and the data). Coordinates for groups of points will be taken as the mean value over the group. In the current release, only one- or two- dimensional distances can be used. The distance calculation is defined by the METRIC option. For coordinates {ri,ci} distances dij between points i and j are defined asdij =
dij = (ri-rj)2 + (ci-cj)2 for
METRIC=squared; anddij = [ (ri-rj)2 + (ci-cj)2 ]1/2for
METRIC=euclidean.When
ORDER=1 for a power model, the correlation pattern is the same in both dimensions and distances will be calculated simultaneously over all dimensions. When ORDER=2, distances will be calculated within each dimension separately and in this case, euclidean and city-block distances are equivalent.The
CORRELATE option allows you to specify correlations between model terms which have equal numbers of effects. A common correlation will then be fitted between parallel effects. For example, consider a random coefficient regression model where the fixed model contains common response to covariate X and the random model allows for deviations in the intercept and slope about this line for each subject. The random intercept and slope for each subject may be correlated, but subjects are independent. This correlation across terms is defined using the CORRELATE option as follows:VCOMPONENTS [FIXED=X] RANDOM=SUBJECT+SUBJECT.X
VSTRUCTURE [SUBJECT+SUBJECT.X; CORRELATE=positivedef;\
CINITIAL=!(1,0.1,0.3); FORM=whole]
The setting
positivedefinite is used to ensure that the correlation matrix between the terms remains positive definite. This constraint can be relaxed using the setting unrestricted. The model fitting is done here in terms of a covariance matrix, where the diagonal elements are the gammas for the correlated terms. The CINITIAL option is used to give initial values for this matrix. If no initial values are given, the initial values are taken from initial gamma values given in VCOMPONENTS when the model is declared. A missing value in the initial values is taken to mean that the value is inestimable and it will be fixed at a value close to zero during the analysis. When correlations are declared between terms, you must set FORMATION=whole. In the random coefficient regression model above, no correlation structure is declared within terms since the subjects are independent. However, it is possible to declare correlation/covariance models within terms as usual. For example, an animal breeding model might use VPEDIGREE to set up covariances within terms as follows:VPEDIGREE PROGENY=animal; FEMALE=dam; MALE=sire; INVERSE=Ainv
VCOMPONENTS [FIXED=Trt] RANDOM=animal+dam+env
VSTRUCTURE [animal+dam; CORRELATE=unr; FORM=whole] \
MODEL=fixed; INVERSE=Ainv
These declarations set up random terms with covariance structures of the form: cov(
animal)=s A, cov(dam)=s A, cov(animal,dam)=sadA.
Direct Products
Although the direct product construction used to build the covariance structures does not generally constrain the models that can be fitted to any data set, you should be aware of the implications that arise when defining covariance structures for the residual term. The REML algorithm used by Genstat detects the presence of the residual term in the model by searching for terms with number of levels equal to the number of data values, n. When no covariance structures are specified, the first term with number of levels > n is used as the residual. However, when covariance structures are defined, the form of the variance model is
V = s2 ( åj gjZjGjZ + R )
where matrix R corresponds to the residual term and has n rows. For this reason, any term found with > n rows will not be used as the residual if it has a covariance matrix. If no valid residual term is found, a residual term will automatically be added to the model. This may result in an extra error term being fitted unintentionally. An example where this may happen is in repeated measurements data where unequal numbers of measurements have been taken on subjects. If direct product construction is used, the matrix generated will have more rows than the data and cannot be used as R. A workaround is to put missing values in the data set to give equal replication and use REML option MVINCLUDE=yvariate to retain the missing values in the analysis. Alternatively, you could fix the residual component at a small value.
Note that in the repeated measurements example above, if measurements are taken at different times for each subject, the direct product structure is not appropriate. In this case, a power model may be fitted over the whole term, constraining the between subject correlation to zero:
VSTRUCTURE [TERM=Subject.Week; FORM=whole; \
COORD=subject,week] MODEL=power; ORDER=2; \
INITIAL=!(0,0.1); CONSTRAIN=!T(Fix,None)
Note that the parameters run in the order of the coordinates vectors (variates not factors).
WORKSPACE directive
Accesses private data structures for use in procedures.
No options
Parameters
NAME
= texts Texts, each containing a single line, to give the names used to identify the private data structuresDUMMY
= identifiers Dummy structure to be used to refer to each private data structure
Description
The
WORKSPACE directive is intended particularly for writers of procedures. It allows data to be accessed within a number of procedures, and in the main program if needed. You merely need to decide how to label your workspace "area". Genstat reserves a data structure for each one, and WORKSPACE allows you to link this to a dummy (of your choice) within any procedure or in the outer program itself. For exampleWORKSPACE 'AUNBALANCED work'; Wspace
TEXT [VALUES=Yvar,Factopt] Wlabels
POINTER [LABELS=Wlabels] Wspace
VARIATE Wspace['Yvar']
SCALAR Wspace['Factopt']
names the area
'AUNBALANCED work' and sets the dummy Wspace to the associated data structure. The data structure is then defined to be a pointer with two values, the variate Wspace['Yvar'] and the scalar Wspace['Factopt']. A similar WORKSPACE statement can then be used later on (in another procedure) to access the same information. For exampleWORKSPACE 'AUNBALANCED work'; Abwork
links the dummy
Abwork to the pointer, allowing us to refer to Abwork['Yvar'] and Abwork['Factopt']. This will be used particularly within the procedure library, to link suites of associated procedures so, for safety, you should avoid prefixing the name of any workspace of your own by G5PL.