Genstat Command Language Reference
Genstat is a powerful statistical system. You can use it either by selecting from menus or by typing commands. This version of Genstat is designed to be used in a windowed environment, providing many of the customary features such as multiple windows and pull-down menus. If you choose to type commands, they may be in the form of any of the standard directives in the Genstat language, or of any of the standard procedures in the Genstat Procedure Library. Most commands allow you to use expressions to represent calculations or formulae, and these may include any of the functions provided in the language. To view any topic, click the underlined text.
Further information is available about:
the Genstat Language -
Glossary of terminology
List of directives
List of procedures
List of functions for expressions
List of functions for formulae
Genstat faults
Commands associated with -
Program control
Data handling
Input and output
Calculations and manipulation
Graphics
Basic statistics
Regression analysis
Design and analysis of experiments
Multivariate analysis and cluster analysis
Time series
Spatial statistics
Other statistical methods
Function types -
Transformations
Probability functions
Vector functions
Matrix functions
String functions
Table functions
Subset functions
Random functions
Treatment functions
Regression functions
Constant functions
Authors and copyright
Input to Genstat is known as a Genstat program. This is made up of
statements each of which may use one of the standard Genstat commands (known as directives); alternatively, it may use a Genstat procedure, that is, a subprogram of statements. You can write your own procedures, or use those in the Library distributed with Genstat, or in the library provided at your site.Whether the statement uses a directive or a procedure, the syntax is identical. First you give the name of the directive (or procedure), then
options, and then parameters. Finally, you indicate the end of the statement, either by typing a colon or by ending the line (by typing <RETURN>). Long statements can be continued onto succeeding lines by typing the continuation character (\) before <RETURN>.Some statements will have neither options nor parameters: for example
to start a new page in output. Others may have no options: for example
PRINT STRUCTURE=X,Y; DECIMALS=0,2
prints the contents of
data structures X and Y with zero and two decimal places respectively. In this statement, there are two parameter settings defining two lists running in parallel. Parameter settings are always in parallel like this, and are separated from one another by semicolons. Options are enclosed in square brackets, and set aspects that apply to all the (parallel) parameter values. They are also separated from one another by semicolons. For examplePRINT [CHANNEL=2; INDENTATION=5] STRUCTURE=X,Y; DECIMALS=0,2
prints
X and Y to output channel 2 with a five-character indentation at the start of each line. Nearly all options, and some parameters, have default values chosen to be those required most often, and so will usually not need to be set.Settings of options and parameters can be
lists (as above), expressions or formulae. Lists may be of numbers (as with DECIMALS above), or identifiers (as with STRUCTURE) or strings. An identifier is the name that you have given to a Genstat data structure (for example X or Y), and which will be used to refer to it in the program. They must start with a letter (for Genstat this means the alphabetic characters A to Z, in capitals or lower case, as well as the percent and underline characters) and then contain either letters or digits (the numerical characters 0 to 9); Genstat takes notice of only the first eight characters. Where a list of identifiers provides input to a directive or procedure, you can put an expression instead; this will then be evaluated (to give a list of identifiers containing the results) before the directive or procedure is used. A string is a list of characters. They occur within the Text data structure, or as the settings for some options and parameters. Usually the start and end of the string must be marked by a single quote ('). The separator between items in lists is comma; spaces can be included anywhere between items but do not act as separators. Formal definitions of expressions, formulae, and all the other concepts of the Genstat language are in 1.2.Names of directives, options and parameters are examples of Genstat
system words. They can be given in capital or small letters (or in mixtures of both), and can always be abbreviated to four characters. In fact, names of options and parameters can often be abbreviated further, and there are also rules by which the option or parameter name, with its accompanying equals character, can be omitted altogether. The most useful of these is that, if the first parameter of the directive is the one that comes first in the statement, then the name of the parameter can be omitted: for examplePRINT [CHANNEL=2; INDENTATION=5] X,Y; DECIMALS=0,2
as
STRUCTURE is the first parameter of PRINT. The same rule holds for options:PRINT [2; INDENTATION=5] X,Y; DECIMALS=0,2
as
CHANNEL is the first option of PRINT. Full details of the rules are in 1.2.A final point about the first parameter is that its setting determines the length of the parallel lists. The lists for other parameters will be repeated (or recycled) if they are shorter. (If they are longer, Genstat gives an
error diagnostic.) For examplePRINT A,B,C,D; DECIMALS=0,2
prints
A with zero decimal places, B with two, and then (recycling the DECIMALS list), C with zero and D with two.
The glossary gives a brief explanation of the terminology of the Genstat language:
Bracket, Character, Comment, Data structure, Diagonal matrix, Digit, Directive, Expression, Factor, Formula, Function, Identifier, Item, Letter, List, LRV structure, Macro, Matrix, Missing value, Multiplier, Number, Operator, Option, Parameter, Pointer, Procedure, Procedure Library, Program, Progression, Punctuation symbol, Qualified identifier, Scalar, Special symbol, SSPM structure, Statement, String, Subset selection, Suffix, Symmetric matrix, System word, Table, Text, TSM structure, Unnamed structure, Variate, and Vector.
Round brackets
Square brackets
[ ] are used to enclose a list of option settings or to enclose the suffix list of a pointer; also, when preceded by $, they enclose lists of unit names or numbers for a qualified identifier.Curly brackets
{ } are each synonymous with the corresponding square bracket.
The characters used to form Genstat
statements are a subset of those available on most computers. For the Genstat language they are classified as brackets, digits, letters, punctuation symbols, simple operators, or special symbols.
A comment consists of any series of characters that the computer can represent, enclosed by double quotes (
"); comments are ignored and can appear anywhere in a Genstat program.
These are used to store information within Genstat, such as
numbers, character strings or even identifiers of other data structures. Directives known as declarations are available to form each of the available types.
is a data structure that stores the diagonal elements of a square matrix whose other values are all zero. Diagonal matrices can be declared using the
DIAGONALMATRIX directive.
The numerical characters
0 to 9 are known as digits in Genstat.
is a standard form of instruction in the Genstat language requesting a particular action or analysis. All Genstat 5 directives have the same
syntax.Directive name is a
is an arithmetic expression consisting of
lists and functions separated by operators. An expression data structure stores a Genstat expression, and can be declared using the EXPRESSION directive.
is a
data structure that specifies an allocation of the units into groups. It is thus a vector that, unlike the variate or the text, takes only a limited set of values, one for each group. The groups are referred to by numbers known as levels; you can also define textual labels. Factors can be declared using the FACTOR directive.
is a model formula of
lists and operators defining the list of model terms involved in an analysis. A formula data structure stores a Genstat formula, and can be defined using the FORMULA directive.
denotes a standard operation in an
expression or formula, with the form "function-name (sequence of lists and/or expressions separated by ;)". The function-name is a system word and may be abbreviated to four characters; if characters 5-8 are given, they must match the standard form. A wide range of functions are available, for operations ranging from transformations to the calculation of summary statistics.
is the name given to a particular
data structure within a Genstat program. The first character of an identifier must be a letter; any others can be either letters or digits. Only the first eight characters are significant; subsequent characters are ignored. The directive SET allows you to specify whether or not the case of the letters (small or capital) is to be significant; e.g. whether LENGTH is the same as Length.
is a
number, a string, an identifier, a system word, a missing value, or an operator.
Letters in Genstat are the upper-case (capital) letters
A to Z, the lower-case letters a to z, the underline symbol (_), and the percent character (%).
is a sequence of
items separated by commas. In an identifier list, each item is an identifier or an unnamed structure, while number or string lists contain numbers or strings respectively. Lists can contain pre- or post-multipliers. Identifier and number lists can contain progressions.
is a compound
data structure storing latent roots and vectors, mainly used in multivariate analysis. They can be declared using the LRV directive.
is a Genstat
text structure containing a section of a Genstat program. The text must have an unsuffixed identifier. It can be substituted into the program, by giving its identifier, preceded by a contiguous pair of substitution symbols (##). The substitution takes place as soon as Genstat reads the pair of hashes. (However, Genstat also has the EXECUTE directive, which allows a text containing a list of statements to be executed for example within a loop or procedure.)
is a
data structure that stores a rectangular array of numbers. Matrices can be declared using the MATRIX directive.
is denoted within a Genstat program by one asterisk (
*). When reading data, a series of contiguous asterisks or an asterisk followed by letters or digits is treated as a missing value too, and other characters can also be defined to represent missing values.
allows repetive
lists to be specified concisely. A multiplier may be a number, or the substitution symbol (#) followed by a single-valued numerical data structure.Post-multiplier is given immediately after the second of a pair of round
Pre-multiplier occurs immediately before the initial (round)
bracket of a pair enclosing a list of identifiers, numbers, or strings and has the effect of repeating each item, in turn, the specified number of times.
is a sequence of
digits, optionally containing a decimal point (.). The sequence can be preceded by a sign (+ or -) and can be followed by an exponent: i.e. the letter E or D (in upper or lower case) optionally followed by spaces, then a sequence of digits optionally preceded by a sign.
is a symbol or symbols denoting an operation in an
expression or formula:Simple
Compound
** (exponentiation), *+ (matrix multiplication), -* (crossed deletion), -/ (nested deletion), // (pseudo-term linkage), .EQ. or == (equality), .NE. or /= or <> (non-equality), .LE. or <= (less than or equal to), .GE. or >= (greater than or equal to), .LT. (less than), .GT. (greater than), .EQS. (string equality), .NES. (string non-equality), .IN. (set inclusion), .NI. (set non-inclusion), .IS. (identifier equivalence), .ISNT. (identifier non-equivalence), .AND. (logical and), .OR. (logical or), .EOR. (logical either or), .NOT. (logical not).Only
+ - * / . -/ -* and // may occur in formulae, while . -* -/ and // cannot occur (as operators) in expressions.
Options specify arguments that are global within a Genstat
statement: i.e. they apply to all the items in the parameter list(s). Often, but not always, options have default values and so need not be specified.Option name is a
Option sequence is a list of option settings separated by semi-colons (
;).Option setting has the form
option-name
= list, expression or formula"option-name
=" can be omitted if the settings are given in the prescribed order for the directive or procedure concerned: i.e. the name may be omitted for the first setting if this is for the first prescribed option, and for subsequent settings if the previous setting was for the option immediately before the current one in the prescribed order.
Parameters specify parallel lists of arguments for a statement: i.e. the
statement (with its option settings) operates for the first item in each list, then the second, and so on. The number of times that this happens is determined by the length of the parameter list that is first in the prescribed order for the directive or procedure concerned. Subsequent lists are recycled if they are shorter than the first list.Parameter name is a
Parameter sequence is a list of parameter settings separated by semi-colons (
;).Parameter setting has the form
parameter-name
= list, expression or formula"parameter-name
=" can be omitted if the settings are given in the prescribed order for the directive or procedure concerned: i.e. the name may be omitted for the first setting if this is for the first prescribed parameter, and for subsequent settings if the previous setting was for the parameter immediately before the current one in the prescribed order. For directives or procedures with only a single parameter, no parameter name is defined.
is a
data structure that stores a series of identifiers, pointing to other data structures. Pointers can be declared using the POINTER directive.
This is a structure that contains Genstat
statements, and fulfils the role of the subroutine in the Genstat language. The use of a procedure looks just like the use of a Genstat directive. All data structures within the procedure are local (i.e. they cannot be referenced, or confused, with data structures outside the procedure); input and output structures for the procedure are defined by option and parameter settings in the procedure call.Procedure name is a
The Genstat Procedure Library contains
procedures contributed not only by the writers of Genstat but also by knowledgeable Genstat users from many application areas and countries. The Library is controlled by an Editorial Board, who check that the procedures are useful and reliable, and maintain standards for the documentation. It is regularly extended and updated, independently to the releases of Genstat itself, and these revised versions are distributed automatically to all supported Genstat sites. Information about the Library is available using procedures in the help module of the Library. Other modules cover, for example, manipulation, graphics and various types of statistical analysis. These procedures are all accessed automatically by Genstat, when required. Instructions for authors of procedures can be obtained using procedure NOTICE. You can also form your own procedure libraries using the STORE directive.
is a series of statements, ending with the statement
STOP.
Lists of
numbers ascending or descending with equal increments can be specified succinctly using the form "number, number ... number" where the first two numbers define the first two elements in the list (and thus the increment) and the list ends with the value beyond which the third number would be passed. For lists with an increment of plus or minus one, the second number can be omitted, to give the form "number ... number".
The Genstat punctuation symbols are:
colon (
comma (
,) separates items;double quote (
") is used to show the beginning and end of a comment;equals (
=) separates an option name or parameter name from its setting;newline is synonymous with colon, by default, but directive
SET can request that it be ignored;semi-colon (
;) separates lists;single quote (
') is used to show the beginning and end of a string (left single quote (`) is synonymous with single quote);space can appear between items or can be omitted altogether if the items are already separated by another punctuation symbol, a bracket, an operator, or an ampersand;
tab the tab character is treated as a synonym of space everywhere except within texts and comments or if reading in fixed format (when it is treated as a fault).
These may occur in a
list of identifiers to define subsets of the values of a data structure. The form is "identifier $ qualifier", where the qualifier is a sequence of identifier lists enclosed in square brackets. For factors, variates, and texts, the qualifier has a single list, each element of which defines a subset of the vector concerned. For matrices there are two lists running in parallel, one for each dimension. For a symmetric matrix, there can be either one or two lists, depending on whether or not its two dimensions are to be subsetted in the same way. For a diagonal matrix there is a single list. Tables cannot be qualified. The elements of the qualifier lists can be scalars, numbers, variates, quoted strings, or texts.
is a
data structure that stores a single number. Scalars can be declared using the SCALAR directive.
The special symbols in Genstat are as follows:
ampersand (
asterisk (
*) denotes a missing value (and is also used as an operator);backslash (
\) is the continuation symbol, typed at the end of a line to indicate that the current statement continues onto the next line (this is unnecessary when directive SET has been used to specify that newline is to be ignored);dollar (
$) precedes a list of unit names or numbers (enclosed in square brackets) that define subsets of a factor, variate, matrix, symmetric matrix, diagonal matrix, or text;exclamation mark (
!) indicates an unnamed structure (vertical bar (|) is synonymous with exclamation mark);hash (
#) is the substitution symbol; when used on its own (i.e. followed just by a punctuation symbol) it represents the default setting of an option; alternatively, it can be followed by the identifier of a data structure whose values are to be inserted at that point in a Genstat statement (the substitution takes place immediately before the statement is executed). A pair of contiguous substitution symbols (##) is used to introduce a macro.
is a compound
data structure storing sums of squares and products, means and ancillary information for use in regression and multivariate analysis. SSPMs can be declared using the SSPM directive.
is an instruction in the Genstat language; it has the form
statement-name
If no option settings are given, the square brackets can be omitted. The terminator is colon (
:), ampersand (&) or newline (unless directive SET has indicated that this is to be ignored).Statement name is the name of either a
directive or a procedure.
is a sequence of characters forming one unit (or line) of a Genstat
text structure. In most contexts, the string must be quoted: i.e. enclosed in single quotes ('). Quoted strings may contain any of the characters available on the computer. However, if single quote ('), double quote ("), or the continuation symbol (\) are required as characters within a quoted string, they must each be typed twice to distinguish this use from their action in, respectively, terminating the string, introducing a comment within the string, or indicating continuation. Newline within a quoted string is taken to terminate the current (quoted) string and begin another one, unless the newline is within a comment or preceded by an (unduplicated) continuation symbol (\), or unless directive SET has specified that newline is to be ignored. Unquoted strings can occur in unnamed texts, or in option or parameter settings where you have to specify a particular string from a prescribed set of alternatives; an unquoted string must have a letter as its first character and contain only letters or digits.
An identifier list can contain
qualified identifiers, each defining a list of subsets of the values of the data structure concerned.
Elements of
pointers can be referred to by suffixes. Each suffix takes the form of an identifier list enclosed in square brackets; the list can contain numbers, scalars, or variates to reference an element or elements by number, or texts or quoted strings to reference by label. A null list within the brackets is taken to mean all the elements of the pointer in turn. Where a pointer has other pointers as its elements, their elements can be referred to in the same way, and so the original identifier may be followed by several suffix lists each contained in its own pair of square brackets; these define a list of elements, one for each combination of an element from each suffix list, taking the combinations in an order in which the last list cycles through its elements fastest, then the next to last list, and so on.
is a
data structure that stores the lower triangle (including the diagonal) of a symmetric square matrix.
is a
letter followed by letters and/or digits with a special meaning within the Genstat language, e.g. directive, option, parameter, or function names. The case of the letters (small/capital) is not significant; the abbreviation rules vary according to context.
is a
data structure that stores a multi-dimensional array of numbers, each dimension classified by a factor. Thus a table can be used to hold a summary of data that are classified (by the factors) into groups. Tables can be declared using the TABLE directive.
is a
data structure that stores a series of strings, each one representing a line of textual information. Texts can be declared using the TEXT directive.
is a compound
data structure storing a model for use in Box-Jenkins modelling of time series. TSMs can be declared using the TSM directive.
An
identifier list may contain unnamed variates, scalars, texts, pointers, expressions, or formulae. An unnamed structure consists of an exclamation mark, followed by the type code, and then the values contained in round brackets. The type code is E for expression, F for formula, P for pointer, S for scalar, T for text, or V for variate. If no code is given, variate is assumed by default.
is a
data structure that stores a series of numbers. Variates can be declared using the VARIATE directive.
is a series of values, notionally arranged in a column. Genstat has three different types of vector:
factors, texts, and variates.
ADD adds extra terms to a linear, generalized linear, generalized additive, or nonlinear model.
ADDPOINTS adds points for new objects to a principal coordinates analysis.
ADISPLAY displays further output from analyses produced by ANOVA.
AKEEP copies information from an ANOVA analysis into Genstat data structures.
ANOVA analyses y-variates by analysis of variance according to the model defined by earlier BLOCKSTRUCTURE, COVARIATE, and TREATMENTSTRUCTURE statements.
ASSIGN sets elements of pointers and dummies.
AXES defines the axes in each window for high-resolution graphics.
BLOCKSTRUCTURE defines the blocking structure of the design and hence the strata and the error terms.
BREAK suspends execution of the statements in the current channel or control structure and takes subsequent statements from the channel specified.
CALCULATE calculates numerical values for data structures.
CASE introduces a "multiple-selection" control structure.
CATALOGUE displays the contents of a backing-store file.
CLOSE closes files.
CLUSTER forms a non-hierarchical classification.
COLOUR defines the red, green and blue intensities to be used for the Genstat colours with certain graphics devices.
COMBINE combines or omits "slices" of a multi-way data structure (table, matrix, or variate).
CONCATENATE concatenates and truncates lines (units) of text structures; allows the case of letters to be changed.
CONTOUR produces contour maps of two-way arrays of numbers (on the terminal/printer).
COPY forms a transcript of a job.
CORRELATE forms correlations between variates, autocorrelations of variates, and lagged cross-correlations between variates.
COVARIATE specifies covariates for use in subsequent ANOVA statements.
CVA performs canonical variates analysis.
DCLEAR clears a graphics screen.
DCONTOUR draws contour plots on a plotter or graphics monitor.
DDISPLAY redraws the current graphical display.
DEBUG puts an implicit BREAK statement after the current statement and after every NSTATEMENTS subsequent statements, until an ENDDEBUG is reached.
DECLARE declares one or more customized data structures.
DELETE deletes the attributes and values of structures.
DEVICE switches between (high-resolution) graphics devices.
DGRAPH draws graphs on a plotter or graphics monitor.
DHISTOGRAM draws histograms on a plotter or graphics monitor.
DIAGONALMATRIX declares one or more diagonal matrix data structures.
DISPLAY prints, or reprints, diagnostic messages.
DISTRIBUTION estimates the parameters of continuous and discrete distributions.
DKEEP saves information from the last plot on a particular device.
DPIE draws a pie chart on a plotter or graphics monitor.
DREAD reads the locations of points from an interactive graphical device.
DROP drops terms from a linear, generalized linear, generalized additive, or nonlinear model.
DSURFACE produces perspective views of a two-way arrays of numbers.
DUMMY declares one or more dummy data structures.
DUMP prints information about data structures, and internal system information.
DUPLICATE forms new data structures with attributes taken from an existing structure.
D3HISTOGRAM produces three-dimensional histograms.
EDIT edits text vectors.
ELSE introduces the default set of statements in block-if or in multiple-selection control structures.
ELSIF introduces a set of alternative statements in a block-if control structure.
ENDBREAK returns to the original channel or control structure and continues execution.
ENDCASE indicates the end of a "multiple-selection" control structure.
ENDDEBUG cancels a DEBUG statement.
ENDFOR indicates the end of the contents of a loop.
ENDIF indicates the end of a block-if control structure.
ENDJOB ends a Genstat job.
ENDPROCEDURE indicates the end of the contents of a Genstat procedure.
ENQUIRE provides details about files opened by Genstat.
EQUATE transfers data between structures of different sizes or types (but the same modes i.e. numerical or text) or where transfer is not from single structure to single structure.
ESTIMATE estimates parameters in Box-Jenkins models for time series.
EXECUTE executes the statements contained within a text.
EXIT exits from a control structure.
EXPRESSION declares one or more expression data structures.
FACROTATE rotates factor loadings from a principal components or canonical variates analysis according to either the varimax or quartimax criterion.
FACTOR declares one or more factor data structures.
FCLASSIFICATION forms a classification set for each term in a formula, breaks a formula up into separate formulae (one for each term), and applies a limit to the number of factors and variates in the terms of a formula.
FILTER filters time series by time-series models.
FIT fits a linear, generalized linear, generalized additive, or generalized nonlinear model.
FITCURVE fits a standard nonlinear regression model.
FITNONLINEAR fits a nonlinear regression model or optimizes a scalar function.
FKEY forms design keys for multi-stratum experimental designs, allowing for confounded and aliased treatments.
FLRV forms the values of LRV structures.
FOR introduces a loop; subsequent statements define the contents of the loop, which is terminated by the directive ENDFOR.
FORECAST forecasts future values of a time series.
FORMULA declares one or more formula data structures.
FOURIER calculates cosine or Fourier transforms of real or complex series.
FPSEUDOFACTORS determines patterns of confounding and aliasing from design keys, and extends the treatment model to incorporate the necessary pseudo-factors.
FRAME defines the positions of windows within the frame of a high-resolution graph. The positions are defined in normalized device coordinates ([0,1]´ [0,1]).
FSIMILARITY forms a similarity matrix or a between-group-elements similarity matrix or prints a similarity matrix.
FSSPM forms the values of SSPM structures.
FTSM forms preliminary estimates of parameters in time-series models.
FVARIOGRAM forms auto variograms for individual variates or cross variograms for pairs of variates.
GENERATE generates factor values for designed experiments: with no options set, factor values are generated in standard order; the options allow treatment factors to be generated using the design-key method, or pseudo-factors to be generated to describe the confounding in a partially balanced experimental design.
GET accesses details of the "environment" of a Genstat job.
GETATTRIBUTE accesses attributes of structures.
GRAPH produces scatter and line graphs on the terminal or line printer.
GROUPS forms a factor (or grouping variable) from a variate or text, together with the set of distinct values that occur.
HCLUSTER performs hierarchical cluster analysis.
HDISPLAY displays results ancillary to hierarchical cluster analyses: matrix of mean similarities between and within groups, a set of nearest neighbours for each unit, a minimum spanning tree, and the most typical elements from each group.
HELP prints details about the Genstat language and environment.
HISTOGRAM produces histograms of data on the terminal or line printer.
HLIST lists the data matrix in abbreviated form.
HSUMMARIZE forms and prints a group by levels table for each test together with appropriate summary statistics for each group.
IF introduces a block-if control structure.
INPUT specifies the input file from which to take further statements.
INTERPOLATE interpolates values at intermediate points.
JOB starts a Genstat job.
KRIGE calculates kriged estimates using a model fitted to the sample variogram.
LIST lists details of the data structures currently available within Genstat.
LRV declares one or more LRV data structures.
MARGIN forms and calculates marginal values for tables.
MATRIX declares one or more matrix data structures.
MDS performs non-metric multidimensional scaling.
MERGE copies subfiles from backing-store files into a single file.
MODEL defines the response variate(s) and the type of model to be fitted for linear, generalized linear, generalized additive, and nonlinear models.
MONOTONIC fits an increasing monotonic regression of y on x.
OPEN opens files.
OPTION defines the options of a Genstat procedure with information to allow them to be checked when the procedure is executed.
OR introduces a set of alternative statements in a "multiple-selection" control structure.
OUTPUT defines where output is to be stored or displayed.
OWN does work specified in Fortran subprograms linked into Genstat by the user.
PAGE moves to the top of the next page of an output file.
PARAMETER defines the parameters of a Genstat procedure with information to allow them to be checked when the procedure is executed.
PASS does work specified in subprograms supplied by the user, but not linked into Genstat. This directive may not be available on some computers.
PCO performs principal coordinates analysis, also principal components and canonical variates analysis (but with different weighting from that used in CVA) as special cases.
PCP performs principal components analysis.
PEN defines the properties of "pens" for high-resolution graphics.
POINTER declares one or more pointer data structures.
PREDICT forms predictions from a linear or generalized linear model.
PRINT prints data in tabular format in an output file, unformatted file, or text.
PROCEDURE introduces a Genstat procedure.
QUESTION obtains a response using a Genstat menu.
RANDOMIZE randomizes the units of a designed experiment or the elements of a factor or variate.
RCYCLE controls iterative fitting of generalized linear, generalized additive, and nonlinear models, and specifies parameters, bounds etc for nonlinear models.
RDISPLAY displays the fit of a linear, generalized linear, generalized additive, or nonlinear model.
READ reads data from an input file, an unformatted file, or a text.
RECORD dumps a job so that it can later be restarted by a RESUME statement.
REDUCE forms a reduced similarity matrix (referring to the GROUPS instead of the original units).
RELATE relates the observed values on a set of variates to the results of a principal coordinates analysis.
REML fits a variance-components model by residual (or restricted) maximum likelihood.
RESTRICT defines a restricted set of units of vectors for subsequent statements.
RESUME restarts a recorded job.
RETRIEVE retrieves structures from a subfile.
RETURN returns to a previous input stream (text vector or input channel).
RFUNCTION estimates functions of parameters of a nonlinear model.
RKEEP stores results from a linear, generalized linear, generalized additive, or nonlinear model.
ROTATE does a Procrustes rotation of one configuration of points to fit another.
SCALAR declares one or more scalar data structures.
SET sets details of the "environment" of a Genstat job.
SETOPTION sets or modifies defaults of options of Genstat directives or procedures.
SETPARAMETER sets or modifies defaults of parameters of Genstat directives or procedures.
SKIP skips lines in input or output files.
SORT sorts units of vectors according to an index vector.
SPREADSHEET allows interactive entry or editing of data (available in only some implementations).
SSPM declares one or more SSPM data structures.
STEP selects terms to include in or exclude from a linear, generalized linear, or generalized additive model according to the ratio of residual mean squares.
STOP ends a Genstat program.
STORE to store structures in a subfile of a backing-store file.
STRUCTURE defines a compound data structure.
SUSPEND suspends execution of Genstat to carry out commands in the operating system. This directive may not be available on some computers.
SVD calculates singular value decompositions of matrices i.e. ( LEFT *+ SINGULAR *+ TRANSPOSE(RIGHT) ).
SWITCH adds terms to, or drops them from a linear, generalized linear, generalized additive, or nonlinear model.
SYMMETRICMATRIX declares one or more symmetric matrix data structures.
TABLE declares one or more table data structures.
TABULATE forms summary tables of variate values.
TDISPLAY displays further output after an analysis by ESTIMATE.
TERMS specifies a maximal model, containing all terms to be used in subsequent linear, generalized linear, generalized additive, and nonlinear models.
TEXT declares one or more text data structures.
TKEEP saves results after an analysis by ESTIMATE.
TRANSFERFUNCTION specifies input series and transfer function models for subsequent estimation of a model for an output series.
TREATMENTSTRUCTURE specifies the treatment terms to be fitted by subsequent ANOVA statements.
TRY displays results of single-term changes to a linear, generalized linear, or generalized additive model.
TSM declares one or more TSM data structures.
TSUMMARIZE displays characteristics of time series models.
UNITS defines an auxiliary vector of labels and/or the length of any vector whose length is not defined when a statement needing it is executed.
VARIATE declares one or more variate data structures.
VCOMPONENTS defines the variance-components model for REML.
VDISPLAY displays further output from a REML analysis.
VKEEP copies information from a REML analysis into Genstat data structures.
VPEDIGREE generates an inverse relationship matrix for use when fitting animal or plant breeding models by REML.
VSTATUS prints the current model settings for REML.
VSTRUCTURE defines a variance structure for random effects in a REML model
WORKSPACE accesses private data structures for use in procedures.
ABIVARIATE produces graphs and statistics for bivariate analysis of variance.
AFALPHA generates alpha designs.
AFCYCLIC generates block and treatment factors for cyclic designs.
AFORMS prints data forms for an experimental design.
AFUNITS forms a factor to index the units of the final stratum of a design.
AGALPHA forms alpha designs by standard generators for up to 100 treatments.
AGBIB generates balanced incomplete block designs.
AGBOXBEHNKEN generates Box Behnken designs.
AGCENTRALCOMPOSITE generates central composite designs.
AGCYCLIC generates cyclic designs from standard generators.
AGDESIGN generates generally balanced designs.
AGFRACTION generates fractional factorial designs.
AGHIERARCHICAL generates orthogonal hierarchical designs.
AGMAINEFFECT generates designs to estimate main effects of two-level factors.
AGNEIGHBOUR generates neighbour-balanced designs.
AGRAPH plots one- or two-way tables of means from ANOVA.
AKAIKEHISTOGRAM prints histograms with improved definition of groups.
AKEY generates values for treatment factors using the design key method.
ALIAS finds out information about aliased model terms in analysis of variance.
AMERGE merges extra units into an experimental design.
ANTMVESTIMATE estimates missing values in repeated measurements.
ANTORDER assesses order of ante-dependence for repeated measures data.
ANTTEST calculates overall tests based on a specified order of ante-dependence.
AONEWAY provides one-way analysis of variance for inexperienced users.
APLOT plots residuals from an ANOVA analysis.
APPEND appends a list of vectors of the same type.
APRODUCT forms a new experimental design from the product of two designs.
ARANDOMIZE randomizes and prints an experimental design.
AREPMEASURES produces an analysis of variance for repeated measurements.
ASTATUS provides information about the settings of ANOVA models and variates.
ASWEEP performs sweeps for model terms in an analysis of variance.
AUDISPLAY produces further output for an unbalanced design (after AUNBALANCED).
AUNBALANCED performs analysis of variance for unbalanced designs.
A2PLOT plots effects from two-level designs with robust s.e. estimates.
BANK calculates the optimum aspect ratio for a graph.
BARCHART plots a bar chart using line-printer or high-resolution graphics.
BIPLOT produces a biplot from a set of variates.
BJESTIMATE fits an ARIMA model, with forecast and residual checks.
BJFORECAST plots forecasts of a time series using a previously fitted ARIMA.
BJIDENTIFY displays time series statistics useful for ARIMA model selection.
BOOTSTRAP produces bootstrapped estimates, standard errors and distributions.
BOXPLOT draws box-and-whisker diagrams or schematic plots.
CANCOR does canonical correlation analysis.
CENSOR pre-processes censored data before analysis by ANOVA.
CHECKARGUMENT checks the arguments of a procedure.
CHISQUARE calculates chi-square statistics for one- and two-way tables.
CINTERACTION clusters rows and columns of a two-way interaction table.
CLASSIFY obtains a starting classification for non-hierarchical clustering.
CONCORD calculates Kendall's Coefficient of Concordance.
CONVEXHULL finds the points of a single or a full peel of convex hulls.
CORRESP does correspondence analysis, or reciprocal averaging.
CUMDISTRIBUTION fits frequency distributions to accumulated counts.
CVAPLOT plots the mean and unit scores from a canonical variate analysis.
CVASCORES calculates scores for individual units in canonical variate analysis.
DAPLOT plots residuals from ANOVA with interactive identification of outliers.
DAYCOUNT converts a date to a daycount, or vice versa.
DAYLENGTH calculates daylengths at a given period of the year.
DBARCHART produces barcharts for one or two-way tables.
DDENDROGRAM draws dendrograms with control over structure and style.
DDESIGN plots the plan of an experimental design.
DECIMALS sets the number of decimals for a structure, using its round-off.
DESCRIBE saves and/or prints summary statistics for variates.
DESIGN helps to select and generate effective experimental designs.
DIALLEL analyses full and half diallel tables with parents.
DILUTION calculates Most Probable Numbers from dilution series data.
DISCRIMINATE performs discriminant analysis.
DMST gives a high resolution plot of an ordination with minimum spanning tree.
DOTPLOT produces a dot-plot using line-printer or high-resolution graphics.
DPARALLEL displays multivariate data using parallel coordinates.
DPOLYGON draws polygons using high-resolution graphics.
DPTMAP draws maps for spatial point patterns using high-resolution graphics.
DPTREAD adds points interactively to a spatial point pattern.
DREPMEASURES plots profiles and differences of profiles for repeated measures data.
DRPOLYGON reads a polygon interactively from the current graphics device.
DSCATTER produces a scatter-plot matrix using high-resolution graphics.
DSHADE produces a pictorial representation of a data matrix.
EXTRABINOMIAL fits the models of Williams (1982) to overdispersed proportions.
FACAMEND permutes the levels and labels of a factor.
FACPRODUCT forms a factor with a level for every combination of other factors.
FDESIGNFILE forms a backing-store file of information for AGDESIGN.
FEXACT2X2 does Fisher's exact test for 2´ 2 tables.
FIELLER calculates effective doses or relative potencies.
FILEREAD reads data from a file.
FITMULTIVARIATE performs multivariate linear regression with accumulated tests.
FITNONNEGATIVE fits a generalized linear model with nonnegativity constraints.
FITPARALLEL carries out analysis of parallelism for nonlinear functions.
FITSCHNUTE fits a general 4 parameter growth model to a non-decreasing Y-variate.
FLIBHELP forms a help information file for use by LIBHELP &c.
FRESTRICTEDSET forms vectors with the restricted subset of a list of vectors.
FTEXT forms a text structure from a variate.
GEE fits models to longitudinal data by generalized estimating equations.
GENPROC performs a generalized Procrustes analysis.
GETDATA recovers data and information previously stored by SAVEDATA.
GINVERSE calculates the generalized inverse of a matrix.
GLM analyses non-standard generalized linear models.
GLMM fits a generalized linear mixed model.
GRANDOM generates pseudo-random numbers from probability distributions.
GRLABEL randomly labels two or more spatial point patterns.
GRTHIN randomly thins a spatial point pattern.
GRTORSHIFT performs a random toroidal shift on a spatial point pattern.
HANOVA does hierarchical analysis of variance/covariance for unbalanced data.
HEATUNITS calculates accumulated heat units of a temperature dependent process.
IFUNCTION estimates implicit and/or explicit functions of parameters.
INSIDE determines whether points lie within a specified polygon.
JACKKNIFE produces Jackknife estimates and standard errors.
KAPLANMEIER calculates the Kaplan-Meier estimate of the survivor function.
KAPPA calculates a kappa coefficient of agreement for nominally scaled data.
KOLMOG2 performs a Kolmogorov-Smirnoff two-sample test.
KRUSKAL carries out a Kruskal-Wallis one-way analysis of variance.
LATTICE analyses square and rectangular lattice designs.
LIBEXAMPLE accesses examples and source code of library procedures.
LIBFILENAME supplies the names of information files for library procedures.
LIBHELP provides help information about library procedures.
LIBINFORM prints information about the contents of the Procedure Library.
LIBMANUAL prints a "Manual" containing information about library procedures.
LIBVERSION provides the name of the current Genstat 5 Procedure Library.
LINDEPENDENCE finds the linear relations associated with matrix singularities.
LRVSCREE prints a scree diagram and/or a difference table of latent roots.
LVARMODEL analyses a field trial using the Linear Variance Neighbour model.
MANNWHITNEY performs a Mann-Whitney U test.
MANOVA performs multivariate analysis of variance and covariance.
MENU initiates a menu system.
MPOWER forms integer powers of a square matrix.
MULTMISS estimates missing values for units in a multivariate data set.
MVARIOGRAM fits models to an experimental variogram.
NLCONTRASTS fits nonlinear contrasts to quantitative factors in ANOVA.
NORMTEST performs tests of univariate and/or multivariate normality.
NOTICE gives access to the Genstat Notice Board (news, errors &c).
ORTHPOL calculates orthogonal polynomials.
PAIRTEST performs t-tests for pairwise differences.
PCOPROC performs a multiple Procrustes analysis.
PDESIGN prints or stores treatment combinations tabulated by the block factors.
PERCENT expresses the body of a table as percentages of one of its margins.
PERIODTEST gives periodogram-based tests for white noise in time series.
PLS fits a partial least squares regression model.
PPAIR displays results of t-tests for pairwise differences in compact diagrams.
PREWHITEN filters a time series before spectral analysis.
PROBITANALYSIS fits probit models allowing for natural mortality and immunity.
PTBOX generates a bounding or surrounding box for a spatial point pattern.
PTCLOSEPOLYGON closes open polygons.
PTDESCRIBE gives summary and second order statistics for a point process.
PTREMOVE removes points interactively from a spatial point pattern.
QUANTILE calculates quantiles of the values in a variate.
RANK produces ranks, from the values in a variate, allowing for ties.
RCHECK checks the fit of a linear or generalized linear regression.
RGRAPH draws a graph to display the fit of a regression model.
RIDGE produces ridge regression and principal component regression analyses.
RJOINT does modified joint regression analysis for variety-by-environment data.
ROBSSPM forms robust estimates of sum-of-squares-and-products matrices.
RPAIR gives t-tests for all pairwise differences of means from a regression or GLM.
RPROPORTIONAL fits the proportional hazards model to survival data as a GLM.
RSURVIVAL models survival times of exponential, Weibull or extreme-value distributions.
RUGPLOT draws "rugplots" to display the distribution of one or more samples.
RUNTEST performs a test of randomness of a sequence of observations.
SAMPLE samples from a set of units, possibly stratified by factors.
SAVEDATA saves all the current data and information for use in a future run.
SIGNTEST performs a one or two sample sign test.
SKEWSYMM provides an analysis of skew-symmetry for an asymmetric matrix.
SMOOTHSPECTRUM forms smoothed spectrum estimates for univariate time series.
SPEARMAN calculates Spearman's Rank Correlation Coefficient.
SPLINE calculates a set of basis functions for M-, B- or I-splines.
STANDARDIZE standardizes columns of a data matrix to have mean zero and variance one.
STEM produces a simple stem-and-leaf chart.
SUBSET forms vectors containing subsets of the values in other vectors.
TTEST performs a one- or two-sample t-test.
VEQUATE equates across numerical structures.
VFUNCTION calculates functions of variance components from a REML analysis.
VHOMOGENEITY tests homogeneity of variances and variance-covariance matrices.
VINTERPOLATE performs linear & inverse linear interpolation between variates.
VORTHPOL calculates orthogonal polynomial time-contrasts for repeated measures.
VPLOT plots residuals from a REML analysis.
VREGRESS performs regression across variates.
VTABLE forms a variate and set of classifying factors from a table.
WADLEY fits models for Wadley's problem, allowing alternative links and errors.
WILCOXON performs a Wilcoxon Matched-Pairs (Signed-Rank) test.
XOCATEGORIES performs analyses of categorical data from crossover trials.
List of functions for expressions
These are the functions that can be used in expressions:
ABS, ANGULAR, ARCCOS, ARCSIN, AREA, CED, CHARACTERS, CHISQ, CHOLESKI, CIRCULATE, CLBETA, CLBINOMIAL, CLBVARIATENORMAL, CLCHISQUARE, CLF, CLGAMMA, CLHYPERGEOMETRIC, CLLOGNORMAL, CLNORMAL, CLOGLOG, CLPOISSON, CLT, CONSTANTS, CORRELATION, COS, COVARIANCE, CUBETA, CUBINOMIAL, CUBVARIATENORMAL, CUCHISQUARE, CUF, CUGAMMA, CUHYPERGEOMETRIC, CULOGNORMAL, CUMULATE, CUNORMAL, CUPOISSON, CUT, DETERMINANT, DIFFERENCE, EDBETA, EDCHISQUARE, EDF, EDGAMMA, EDLOGNORMAL, EDNORMAL, EDT, ELEMENTS, EXP, EXPAND, FED, FPROBABILITY, GETFIRST, GETLAST, GETPOSITION, IANGULAR, ICLOGLOG, ILOGIT, INTEGER, INVERSE, LLBINOMIAL, LLGAMMA, LLNORMAL, LLPOISSON, LOG, LOG10, LOGIT, LTPRODUCT, MAXIMUM, MEAN, MEDIAN, MINIMUM, MODULO, MVINSERT, MVREPLACE, NCOLUMNS, NED, NEWLEVELS, NLEVELS, NMV, NOBSERVATIONS, NORMAL, NROWS, NVALUES, POSITION, PRBETA, PRBINOMIAL, PRCHISQUARE, PRF, PRGAMMA, PRHYPERGEOMETRIC, PRLOGNORMAL, PRNORMAL, PRODUCT, PRPOISSON, PRT, QPRODUCT, RESTRICTION, REVERSE, ROUND, RTPRODUCT, SHIFT, SIN, SOLUTION, SORT, SQRT, SUBMAT, SUM, TMAXIMA, TMEANS, TMEDIANS, TMINIMA, TNMV, TNOBSERVATIONS, TNVALUES, TOTAL, TRACE, TRANSPOSE, TTOTALS, TVARIANCES, UNSET, URAND, VARIANCE, VCORRELATION, VCOVARIANCE, VMAXIMA, VMEANS, VMEDIANS, VMINIMA, VNMV, VNOBSERVATIONS, VNVALUES, VTOTALS, and VVARIANCES.
ABS(x)
the absolute value of x: |x|.
ANGULAR(p)
or ANG(p) the angular transformation: for a percentage p (0 <p < 100), forms x = (180/pi) ´ arcsin(sqrt(p/100)).
ARCCOS(x)
inverse cosine of x, where -1 <= x <= 1.
ARCSIN(x)
inverse sine of x, where -1 <= x <= 1.
AREA(y;x)
numerically integrates the curve running through the points specified by variates y and x.
CED(p;s)
the chi-square equivalent deviate for probability p (0 < p < 1) with s degrees of freedom (synonym of EDCHI).
CHARACTERS(g)
returns a variate giving the length of each line of the text g.
CHISQ(x;s)
the chi-square probability of t < x with s degrees of freedom (synonym of CLCHI).
CHOLESKI(x)
the Choleski decomposition of a symmetric matrix x: such that x = LL' where L is square with upper off-diagonal elements zero.
CIRCULATE(x;s)
shifts the values of x, treating x as a circular stack. If s is omitted, values are shifted one to the right, as for s=1.
CLBETA(x;a;b)
cumulative lower probability for a beta distribution with parameters a and b.
CLBINOMIAL(j;n;p)
probability of x or fewer successes out of n binomial trials with probability of success p.
CLBVARIATENORMAL(x;y;r)
cumulative lower probability for a bivariate normal distribution with means 0, variances 1, and correlation r.
CLCHISQUARE(x;df)
cumulative lower probability for a chi-square distribution with df degrees of freedom.
CLF(x;df1;df2)
cumulative lower probability for an F distribution with df1 and df2 degrees of freedom.
CLGAMMA(x;a;b)
cumulative lower probability for a gamma distribution with index parameter a and shape parameter b.
CLHYPERGEOMETRIC(j;l;m;n)
probability of x or fewer positive samples out of a total sample of size m from a population of size n of which l are positive (hypergeometric distribution).
CLLOGNORMAL(x)
cumulative lower probability for a lognormal distribution corresponding to a normal distribution with mean 0 and variance 1.
CLNORMAL(x)
cumulative lower probability for a normal distribution with mean 0 and variance 1.
CLOGLOG(p)
takes the complementary log-log transformation of the percentages p (0 < p < 100%).
CLPOISSON(j;m)
probability of value of x or less for a poisson distribution with mean m.
CLT(x;df)
cumulative lower probability for a t distribution with df degrees of freedom.
CONSTANTS(g)
or C(g) provides the value of various constants, according to the contents of g: e (for a string of 'e' or 'E'), P ('pi' or 'PI'), or missing value ('*').
CORRELATION(x;y)
or CORRMAT(x;y) if both x and y are specified, returns a scalar giving the correlation between the values of x and y; if y is omitted, forms a correlation matrix from a symmetric matrix x.
COS(x)
cosine of x, for x in radians.
COVARIANCE(x;y)
returns a scalar giving the covariance between the values of x and y.
CUBETA(x;a;b)
cumulative upper probability for a beta distribution with parameters a and b.
CUBINOMIAL(j;n;p)
probability of more than x successes out of n binomial trials with probability of success p.
CUBVARIATENORMAL(x;y;r)
cumulative upper probability for a bivariate normal distribution with means 0, variances 1, and correlation r.
CUCHISQUARE(x;df)
cumulative upper probability for a chi-square distribution with df degrees of freedom.
CUF(x;df1;df2)
cumulative upper probability for an F distribution with df1 and df2 degrees of freedom.
CUGAMMA(x;a;b)
cumulative upper probability for a gamma distribution with index parameter a and shape parameter b.
CUHYPERGEOMETRIC(j;l;m;n)
probability of more than x positive samples out of a total sample of size m from a population of size n of which l are positive (hypergeometric distribution).
CULOGNORMAL(x)
cumulative upper probability for a lognormal distribution corresponding to a normal distribution with mean 0 and variance 1.
CUMULATE(x)
or CUM(x) forms the cumulative sum of the values of x; i.e. x1, x1+x2, x1+x2+x3, and so on.
CUNORMAL(x)
cumulative upper probability for a normal distribution with mean 0 and variance 1.
CUPOISSON(j;m)
probability of a value greater than x for a poisson distribution with mean m.
CUT(x;df)
cumulative upper probability for a t distribution with df degrees of freedom.
,
DETERMINANT(x), DET(x)
or D(x) the determinant of a square or symmetric matrix.
DIFFERENCE(x;s)
forms the differences of x, i.e. xi-xi-s; if s is omitted, first differences are formed, as for s=1
EDBETA(p;a;b)
equivalent deviate corresponding to cumulative lower probability p for a beta distribution with parameters a and b.
EDCHISQUARE(p;df)
equivalent deviate corresponding to cumulative lower probability p for a chi-square distribution with df degrees of freedom.
EDF(p;df1;df2)
equivalent deviate corresponding to cumulative lower probability p for an F distribution with df1 and df2 degrees of freedom.
EDGAMMA(p;a;b)
equivalent deviate corresponding to cumulative lower probability p for a gamma distribution with index parameter a and shape parameter b.
EDLOGNORMAL(p)
equivalent deviate corresponding to cumulative lower probability p for a lognormal distribution corresponding to a normal distribution with mean 0 and variance 1.
EDNORMAL(p)
equivalent deviate corresponding to cumulative lower probability p for a normal distribution with mean 0 and variance 1.
EDT(p;df)
equivalent deviate corresponding to cumulative lower probability p for a t distribution with df degrees of freedom.
ELEMENTS(x;e1;e2)
forms a sub-structure of x. If x is a vector or a diagonal matrix, then only e1 should be specified; this then indicates the selected elements of x. If x is a rectangular matrix, then both e1 and e2 should be given, to specify respectively the selected rows and columns of x. For a symmetric matrix x, if the same rows and columns are to be selected (giving a symmetric matrix) then only e1 should be specified; otherwise both e1 and e2 should be given (and the result is a matrix).
EXP(x)
exponential: ex.
EXPAND(x;s)
forms a variate of length s, containing zeroes and ones; if s is omitted and the length cannot be determined from the context, the length of the current units structure, if any, is taken. The values in x specify the numbers of the units that are to contain the value 1.
FED(p;s1;s2)
the F-distribution equivalent deviate for probability p (0 < p < 1) and (s1,s2) degrees of freedom (synonym of EDF).
FPROBABILITY(x;s1;s2)
or FRATIO(x;s1;s2) the F-ratio probability of t < x with (s1,s2) degrees of freedom (synonym of CLF).
GETFIRST(g)
gives a variate containing the position of the first non-space character in each string of the text g.
GETLAST(g)
gives a variate containing the position of the last non-space character in each string of the text g.
GETPOSITION(g1;g2;x)
for each unit, if the string in the text g2 occurs as a substring of the string in the text g1, this returns the position at which the substring starts; otherwise it returns the value zero. The text g2 may contain a single string (to be checked against every string of g1). The structure x (scalar or variate) supplies a logical value to indicate whether to ignore the case of any letters; if x is omitted, thelogical is assumed to be false (case not ignored).
IANGULAR(x)
gives the inverse of the angular transformation (result in percentages).
ICLOGLOG(x)
gives the inverse of the complementary log-log transformation (result in percentages).
ILOGIT(x)
gives the inverse of the logit transformation (result in percentages).
INTEGER(x)
or INT(x) integer part of x: [x].
INVERSE(x)
or INV(x) or I(x) the inverse of a non-singular square or symmetric matrix x.
LLBINOMIAL(x;n;p)
or LLB(x;n;p) log-likelihood function for the Binomial distribution; n is the sample size and p the mean proportion (or the probability).
LLGAMMA(x;a;b)
or LLG(x;a;b) log-likelihood function for the Gamma distribution with index parameter a and shape parameter b.
LLNORMAL(x;m;v)
or LLN(x;m;v) log-likelihood function for the Normal distribution; m is the mean and v the variance.
LLPOISSON(x;m)
or LLP(x;m) log-likelihood function for the Poisson distribution; m is the mean.
LOG(x)
natural logarithm of x, for x > 0.
LOG10(x)
logarithm to base 10 of x, for x > 0.
LOGIT(p)
takes the logit transformation log(p/(100-p)) of the percentages p (0 < p < 100%).
LTPRODUCT(x;y)
left transposed product of x and y: a more efficient way of calculating TRANSPOSE(x)*+y.
MAXIMUM(x)
or MAX(x) finds the maximum of the values in x.
MEAN(x)
forms the mean of the values of x.
MEDIAN(x)
or MED(x) finds the median of the values in x.
MINIMUM(x)
or MIN(x) finds the minimum of the values in x.
MODULO(x;y)
Form modulus of x to base y.
MVINSERT(x;y)
replaces values in x by missing value wherever the second identifier stores a non-zero value (logical .TRUE.).
MVREPLACE(x;y)
replaces missing values in x with the values in the corresponding units of y.
NCOLUMNS(x)
gives the number of columns of x.
NED(p)
gives the Normal equivalent deviate: that is the value x that leaves a proportion p (0 < p < 1) to the left of it under the standard Normal curve (synonym of EDNORMAL).
NEWLEVELS(f;x)
forms a variate from the factor f; the variate x defines a value for each level and should be the same length as the number of levels of the factor.
NLEVELS(f)
gives the number of levels of factor f.
NMV(x)
counts the number of missing values in x.
NOBSERVATIONS(x)
counts the number of observations (that is non-missing values) in x.
NORMAL(x)
the Normal probability integral: gives the probability that a random variable with a standard Normal N(0,1) distribution is less than x (synonym of CLNORMAL).
NROWS(x)
gives the number of rows of x.
NVALUES(x)
gives the number of values, including missing values, of x (that is the length of x).
POSITION(x;y)
finds the position, within the vector y, of each value of x.
PRBETA(x;a;b)
probability density function for a beta distribution with parameters a and b.
PRBINOMIAL(x;n;p)
probability of x successes out of n binomial trials with probability of success p.
PRCHISQUARE(x;df)
probability density function for a chi-square distribution with df degrees of freedom.
PRF(x;df1;df2)
probability density function for an F distribution with df1 and df2 degrees of freedom.
PRGAMMA(x;a;b)
probability density function for a gamma distribution with index parameter a and shape parameter b.
PRHYPERGEOMETRIC(j;l;m;n)
probability of x successes out of a sample of m from a population of size n of which l are positive (hypergeometric distribution).
PRLOGNORMAL(x)
probability density function for a lognormal distribution corresponding to a normal distribution with mean 0 and variance 1.
PRNORMAL(x)
probability density function for a normal distribution with mean 0 and variance 1.
PRODUCT(x;y)
forms the matrix product of x and y (that is x *+ y).
PRPOISSON(j;m)
probability of the value x for a poisson distribution with mean m.
PRT(x;df)
probability density function for a t distribution with df degrees of freedom.
QPRODUCT(x;y)
forms the quadratic product of x and y (that is x *+ y *+ TRANSPOSE(x)), where x is a rectangular matrix or variate and y is a symmetric or diagonal matrix or a scalar.
RESTRICTION(x)
forms a variate with the value 1 in the units to which x is currently restricted.
REVERSE(x)
reverses the values of x.
ROUND(x)
rounds the values of x to the nearest integer.
RTPRODUCT(x;y)
forms the right transposed product of x and y (that is x *+ TRANSPOSE(y)).
SHIFT(x;s)
shifts the values of x by s places (to the right or left according to the sign of s). This is not a circular shift, so some positions lose their values and are given missing values.
SIN(x)
sine of x, for x in radians.
SOLUTION(x;y)
finds the solution b of the set of simultaneous linear equations x *+ b = y.
SORT(x;y)
sorts the elements of x into the order that would put the values of y into ascending order; if y is omitted, the values of x are sorted.
SQRT(x)
gives the square root of x (x ³ 0).
SUBMAT(x)
forms sub-triangles or sub-rectangles of a rectangular or symmetric matrix. The rows and columns to be included are determined by matching the pointers indexing the resultant matrix with the pointers indexing x. (SUBMAT does not allow for indexing by variates or texts.)
SUM(x)
forms the sum of the values in x (synonym TOTAL).
TMAXIMA(t)
forms margins of maxima for table t.
TMEANS(t)
forms margins of means for table t.
TMEDIANS(t)
forms margins of medians for table t.
TMINIMA(t)
forms margins of minima for table t.
TNMV(t)
forms margins counting the numbers of missing values in table t.
TNOBSERVATIONS(t)
forms margins counting the numbers of observations (non-missing values) in table t.
TNVALUES(t)
forms margins counting the numbers of values (missing or non-missing) in table t.
TOTAL(x)
forms the total of the values in x (synonym SUM).
TRACE(x)
calculates the trace of the square, diagonal, or symmetric matrix x (that is the sum of all its diagonal elements).
TRANSPOSE(x)
or T(x) forms the transpose of a rectangular matrix x.
TTOTALS(t)
or TSUMS(t) forms margins of totals for table t.
TVARIANCES(t)
forms margins of between-cell variances for table t.
UNSET(d)
returns a scalar logical value according to whether or not the dummy d is set.
URAND(seed;s)
provides s uniform pseudo-random numbers in the range (0,1). If s is not supplied and URAND cannot determine the length of the result from the context of the expression, the length of the current units structure (if any) is taken. Scalar seed initializes the generator. If zero in the first use of URAND in a job, the system clock is used to provide a seed; subsequent calls may use zero to continue the sequence of random numbers.
VARIANCE(x)
or VAR(x) gives the variance of the values in x.
VCORRELATION(p1;p2)
gives the correlation, at every unit, between the values of the corresponding structures in the pointers p1 and p2.
VCOVARIANCE(p1;p2)
gives the covariance, at every unit, between the values of the corresponding structures in the pointers p1 and p2.
VMAXIMA(p)
finds the maximum of the values in each unit over the variates in pointer p.
VMEANS(p)
gives the mean of the non-missing values in each unit over the variates in pointer p.
VMEDIANS(p)
finds the median of the values in each unit over the variates in pointer p.
VMINIMA(p)
finds the minimum of the values in each unit over the variates in pointer p.
VNMV(p)
counts the number of missing values in each unit over the variates in pointer p.
VNOBSERVATIONS(p)
counts the number of observations (non-missing values) in each unit over the variates in pointer p.
VNVALUES(p)
gives the number of values in each unit over the variates in pointer p (that is the number of values of p).
VTOTALS(p)
or VSUMS(p) gives the total of the non-missing values in each unit over the variates in pointer p.
VVARIANCES(p)
gives the variance of the non-missing values in each unit over the variates in pointer p.
List of functions for formulae
These are the functions that can be used in formulae:
COMPARISON, POL, POLND, REG, REGND, and SSPLINE.
COMPARISON(f;s;m)
calculates the comparisons amongst the levels of factor f specified by the first s rows of the matrix m (TREATMENT formulae only).
POL(f;s;v)
indicates that the effects of factor f are to be partitioned into polynomial components (linear, quadratic etc) up to order s, where s is a scalar containing an integer between 1 and 4; variate v defines a numerical value for each level of the factor; if omitted, the factor levels themselves are used; in regression models POL(v;s) can be used to fit simple polynomials of a variate v up to order s.
POLND(f;s;v)
has the same effect as POL, except that no Dev components are fitted for factor f in interactions (TREATMENT formulae only).
REG(f;s;m)
indicates that the effects of factor f are to be partitioned into the regression contrasts specified by the first s rows of the matrix m. In TREATMENT formulae scalar s must lie between 1 and 7. In regression models REG(v;s;m) can be used to fit a set of associated variates stored in the rows of a matrix; if m is omitted orthogonal polynomial contrasts are constructed for either f or v (in regression models contrasts are otherwise not orthogonalised).
REGND(f;s;m)
has the same effect as REG, except that no Dev components are fitted for factor f in interactions (TREATMENT formulae only).
SSPLINE(v;s;p)
or S(v;s;p) indicates that the effect of a variate v is to be fitted by a smoothing spline with approximately s degrees of freedom or using "smoothing parameter" p (regression models only).
Data structures store the information on which a Genstat program operates. Structures can be defined, or declared, by a Genstat statement known as a declaration. The directive for declaring each type of structure has the same name as given to that type of structure, for example
SCALAR to declare a scalar (or single-valued numerical structure), and so on. These are the directives, with details of their corresponding data structures:
SCALAR single number
VARIATE series of numbers
TEXT series of character strings (or lines of text)
FACTOR series of group allocations (using a pre-defined set of numbers or strings to indicate the groups)
MATRIX rectangular matrix
SYMMETRICMATRIX symmetric matrix
DIAGONALMATRIX diagonal matrix
TABLE table (to store tabular summaries like means, totals etc)
DUMMY single identifier
POINTER series of identifiers (e.g. to represent a set of structures)
EXPRESSION arithmetic expression
FORMULA model formula (to be fitted in a statistical analysis)
LRV latent roots and vectors
SSPM sums of squares and products with associated information such as means
TSM model for Box-Jenkins modelling of time series
It is possible to declare new structures with attributes the same as those of an existing structure.
DUPLICATE forms new data structures with attributes taken from an existing structure
You can also define data structures whose contents are customized for particular tasks.
STRUCTURE defines a customized data structure
DECLARE declares one or more customized data structures
A Genstat program consists of a sequence of one or more jobs. The first job starts automatically at the start of the program. Subsequent jobs can be initialized by the
JOB and ENDJOB directives:
JOB starts a Genstat job (ending the previous one if necessary)
ENDJOB ends a job
The whole program is terminated by a
STOP directive:
STOP ends a Genstat program
Statements within a program can be repeated using a
FOR loop. The loop is introduced by a FOR statement. This is followed by the series of statements that is to repeated (that is, the contents of the loop), and the end of the loop is marked by an ENDFOR statement. Parameters of the FOR directive allow lists of data structures to be specified so that the statements in the loop operate on different structures each time that it is executed.
FOR indicates the start of a loop
ENDFOR marks the end of a loop
Genstat has two ways of choosing between sets of statements. The block-if structure consists of one or more alternative sets of statements. The first set is introduced by an
IF statement. There may then be further sets introduced by ELSIF statements. Then there may be a final set introduced by an ELSE statement, and the whole structure is terminated by an ENDIF structure. The IF statement, and each ELSIF statement, contains a single-valued logical expression. Genstat evaluates each one in turn and executes the statements following the first TRUE logical found; if none of them is true, Genstat executes the statements following the ELSE statement (if any).
IF introduces a block-if structure
ELSIF introduces an alternative set of statements in a block-if structure
ELSE introduces a default set of statements for a block-if structure
ENDIF marks the end of a block-if structure
The multiple-selection structure consists of several sets of statements. The first is introduced by a
CASE statement. Subsequent sets are introduced by OR statements. There can then be a final, default, set introduced by an ELSE statement, and the end of the structure is indicated by an ENDCASE statement. The parameter of the CASE statement is an expression which must produce a single number. Genstat rounds this to the nearest integer, n say, and then executes the nth set of statements. If there is no nth set, the statements following the ELSE statement are executed (if any).
CASE introduces a multiple-selection structure
OR introduces an alternative set of statements for a multiple-selection structure
ELSE introduces a default set of statements for a multiple-selection structure
ENDCASE marks the end of a multiple-selection structure
Sequences of statements can be formed into Genstat procedures for convenient future use. The use of a procedure looks just like one of the Genstat directives, with its own options and parameters, which transfer information to and from the procedure. Otherwise the procedure is completely self-contained. The start of a procedure is indicated by a
PROCEDURE statement. Then OPTION and PARAMETER statements can be given to define the arguments of the procedure. These are followed by the statements to be executed when the procedure is called, terminated by an ENDPROCEDURE statement.
PROCEDURE introduces a procedure, and defines its name
OPTION defines the options of a procedure
PARAMETER defines the parameters of a procedure
ENDPROCEDURE indicates the end of a procedure
WORKSPACE accesses "private" data structures for use in procedures
Any control structure (job, block-if structure, loop, multiple-selection structure or procedure) can be abandoned using an
EXIT statement. Also, execution of any of these structures can be interrupted explicitly with a BREAK statement, or implicitly by using DEBUG. Once DEBUG has been entered, Genstat will produce breaks automatically at regular intervals, until it meets an ENDDEBUG statement.
EXIT exits from a control structure
BREAK suspends the execution of a control structure
ENDBREAK continues execution of a control structure, following a break
DEBUG can cause a break to take place after the current statement (and at specified intervals thereafter), or immediately after the next fault
ENDDEBUG cancels DEBUG
Macros within a procedure are substituted as soon as they are met during the definition of the procedure. However, it is also possible to execute a set of statements (contained in a text) during execution of the procedure. This can also be useful within loops.
EXECUTE executes the statements contained within a text
In some implementations of Genstat, it is possible to suspend the execution of Genstat and return to the operating system of the computer to execute commands, for example to list or edit files on the computer. Likewise, it may be possible to halt the execution of Genstat to execute some other computer program. The OWN directive provides another way of running a user's program from within Genstat. The OWN subroutine, within the Fortran code of Genstat, needs to be modified to call the program. The new code must then be recompiled and linked into a new version of Genstat.
SUSPEND suspends the execution of Genstat to carry out operating-system commands
PASS runs another computer program, taking data from Genstat and transferring results back
OWN executes the user's own code linked into Genstat
Data can be read into Genstat data structures using the
READ directive or the FILEREAD procedure:
READ reads data from an input file, an unformatted file or a text
FILEREAD reads data from a file, assumed to be in a rectangular array
Files can be connected to input, output or other channels during execution of a Genstat program. Channels can also be closed, terminating the connection, so that they can be attached to other files.
OPEN opens files and connects them to Genstat input/output channels
CLOSE closes files, freeing the channels to which they were attached
The channel from which input statements are taken can be changed, as can the channel to which output is sent. It is also possible to send a transcript (or copy) of input and/or output to output files, to skip sections of input or output files, and to obtain information about the files connected to each channel.
INPUT specifies the channel from which subsequent statements should be read
RETURN returns to the previous input channel
OUTPUT specifies the channel to which future output should be sent
COPY requests a transcript of subsequent input and/or output
SKIP skips lines of input or output files
ENQUIRE provides details about files opened by Genstat
The contents of data structures can be "printed" into output files or into text structures, using the
PRINT directive. Other directives allow system information or details of attributes of structures to be printed, or syntax details to be obtained. Directive SKIP, as mentioned above, allows blank lines to be inserted in output files; PAGE moves to the top of the next page.
PRINT prints data in tabular form to an output file or text
LIST lists details of the data structures that currently exist in your program
PAGE moves to the top of the next page of an output file
DISPLAY repeats the last Genstat diagnostic
DUMP prints attributes of data structures and other internal information
HELP prints details of the Genstat syntax and environment
Other information is available from the procedures in the
help module of the Genstat Procedure Library:
LIBHELP provides help information for Library procedures
LIBEXAMPLE accesses examples and source code of Library procedures
LIBINFORM prints information about the contents of the Procedure Library
LIBMANUAL prints a "Manual" for the Procedure Library
LIBVERSION provides the name of the current Genstat 5 Procedure Library
NOTICE gives access to the Genstat Notice Board (news, errors, instructions for authors of procedures etc.)
Menu-driven interfaces can be defined using the
QUESTION directive and invoked using the MENU procedure.
QUESTION obtains a response using a Genstat menu
MENU initiates a menu system
The values of a data structure, with all its defining information, can be stored in a sub-file of a "backing-store" file. It can then be retrieved in a later job, without the need to repeat the definitions. The current state of the whole job can also be dumped to an unformatted file, so that it can be picked up and continued on a later occasion.
STORE stores data structures in a backing-store file
RETRIEVE retrieves data structures from a backing-store file
CATALOGUE displays the contents of a backing-store file
MERGE copies sub-files of backing-store files into a single file
RECORD dumps the complete details of a job
RESUME reads and restarts a recorded job
The directive
CALCULATE allows arithmetic calculations on the values of any numeric data structure; logical tests can also be done on numerical and textual values. Functions and operators are available for a very wide range of calculations on matrices and tables. Another general directive is EQUATE, which allows values to be copied from one set of data structures to another; the structures must store values of the same mode (for example, numbers or text), but need not be of the same type. Structure values can be deleted to save space within Genstat; attributes can also be deleted so that the structure can be redefined, for example as another type.
CALCULATE performs arithmetic and logical calculations
DELETE allows values and attributes of data structures to be deleted
EQUATE copies values between sets of data structures
There are several general directives for manipulating vectors (variates, factors or texts). Units of vectors can be sorted into systematic order or into random order. A "restriction" can be associated with a vector, so that subsequent statements operate on only a subset of its units. A default length and labelling can be defined for vectors formed later in the job. Facilities for specific types of vector allow interpolation of values for variates, monotonic regression, generation of factor values, concatenation and editing of text.
RESTRICT defines a "restriction" on the units of a vector
SORT sorts units of vectors into alphabetic or numerical order of an index vector, or forms a factor from a variate or text
UNITS defines default length or labelling for vectors defined subsequently in the job
INTERPOLATE calculates variates of interpolated values
MONOTONIC fits an increasing monotonic regression of y on x
GROUPS forms a factor (or grouping variable) from a variate or text, together with the set of distinct values that occur
CONCATENATE concatenates together lines of text vectors
EDIT line editor for units of text vectors
Other facilities for vectors are provided by the procedures in the
manipulation module of the Genstat Procedure Library, including
APPEND appends a list of vectors of the same type
FACAMEND permutes the levels and labels of a factor
FACPRODUCT forms a factor with a level for every combination of other factors
FRESTRICTEDSET forms vectors with the restricted subset of a list of vectors
GRANDOM generates pseudo-random numbers from probability distributions
ORTHPOL calculates orthogonal polynomials
QUANTILE calculates quantiles of the values in a variate
RANK produces ranks, from the values in a variate, allowing for ties
SAMPLE samples from a set of units, possibly stratified by factors
SPLINE calculates a set of basis functions for M-, B- or I-splines
STANDARDIZE standardizes columns of a data matrix to have mean 0 and variance 1
SUBSET forms vectors containing subsets of the values in other vectors
VEQUATE equates across numerical structures
VINTERPOLATE performs linear and inverse linear interpolation between variates
Tables can be formed containing summaries of values in variates: totals, minimum and maximum values, quantiles, numbers of missing and non-missing values, means and variances. Manipulations of multi-way structures include the ability to add various types of marginal summaries to tables, and to combine "slices" of tables, of matrices or of variates. Directives are also available for eigenvalue and singular-value decompositions of matrices, and to form the values of SSPM structures.
TABULATE forms tables of summaries of the values of a variate
MARGIN calculates or deletes margins of tables
COMBINE combines or omits "slices" of tables, matrices or variates
FLRV calculates latent roots and vectors (that is eigenvalues and eigenvectors)
SVD calculates singular-value decompositions of matrices
FSSPM calculates values for SSPM structures (sums of squares and products, means, etc)
Procedures in the Library for manipulating tables and matrices include
PERCENT expresses the body of a table as percentages of one of its margins
GINVERSE calculates the generalized inverse of a matrix
LINDEPENDENCE finds the linear relations associated with matrix singularities
MPOWER forms integer powers of a square matrix
Formulae can be interpreted using the
FCLASSIFICATION directive.
FCLASSIFICATION forms classification sets for the terms in a formula or breaks a formula up into separate formulae (one for each term)
Values can be assigned to dummies and pointers:
ASSIGN sets values of dummies and pointers
Aspects of the "environment" of the current job can be modified, such as whether or not Genstat starts output from a statistical analysis at the top of a new page, or whether it should pause during interactive output. New defaults can be set for options and parameters. Details of the environmental settings can be copied into Genstat data structures. Attributes of data structures can also be accessed.
SET sets details of the "environment" of a Genstat job
SETOPTION sets or modifies defaults of options of Genstat directives or procedures
SETPARAMETER sets or modifies defaults of parameters of Genstat directives or procedures
GET gets details of the "environment" of a Genstat job
GETATTRIBUTE accesses attributes of data structures
Genstat can plot data on terminals or line-printers. Most Genstat implementations can also produce graphs on higher resolution devices like graphics monitors and plotters. The relevant directives for line-printers or terminals are:
CONTOUR produces contour maps of two-way arrays of numbers
GRAPH produces scatter plots and line graphs
HISTOGRAM plots histograms
For high-resolution graphics, the directives have two main purposes. There are those that define the "graphics environment" for subsequent plots, and those that do the plotting. Often the default environment, set up at the start of a program, will be satisfactory. To change the graphics environment, the following directives can be used:
AXES defines the axes in each graphical window
COLOUR defines the colour map for certain graphics devices
DEVICE switches between graphics devices
FRAME defines the positions of the windows within the frame
PEN defines the properties of the graphics "pens"
DKEEP Saves information about the graphics environment
The directives for plotting high-resolution graphs are:
DCONTOUR produces contour maps
DGRAPH produces scatter plots and line graphs
DHISTOGRAM plots histograms
DPIE produces pie charts
DSURFACE draws a perspective plot of a two-way array of numbers
D3HISTOGRAM produces 3-dimensional histograms
DDISPLAY redraws the current graphical display
DCLEAR clears a graphics screen
With interactive graphics devices, information can be read from the screen:
DREAD reads locations of points from an interactive graphics device
Other facilities, provided by procedures in the
graphics module of the Library include:
BANK calculates the optimum aspect ratio for a graph
BARCHART plots a bar chart
BOXPLOT draws box-and-whisker diagrams (schematic plots)
DBARCHART plots barcharts for one or two-way tables
DOTPLOT produces a dot-plot
DSCATTER produces a scatter-plot matrix
DSHADE produces a pictorial representation of a data matrix
INSIDE determines whether points lie within a specified polygon
RUGPLOT draws "rugplots" to display the distribution of one or more samples
STEM produces a simple stem-and-leaf chart
Many simple statistical operations, such as t-tests, one-way analysis of variance, non-parametric tests, and summary statistics are provided by procedures in the
basic and nonparametric modules of the Library:
AONEWAY provides one-way analysis of variance
CHISQUARE calculates chi-square statistics for one- and two-way tables
CONCORD calculates Kendall's Coefficient of Concordance
DESCRIBE saves and/or prints summary statistics for variates
KAPPA calculates a kappa coefficient of agreement for nominally scaled data
KOLMOG2 performs a Kolmogorov-Smirnoff two-sample test
KRUSKAL carries out a Kruskal-Wallis one-way analysis of variance
MANNWHITNEY performs a Mann-Whitney U test
RUNTEST performs a test of randomness of a sequence of observations
SIGNTEST performs a one or two sample sign test
SPEARMAN calculates Spearman's Rank Correlation Coefficient
TTEST performs a one- or two-sample t-test
WILCOXON performs a Wilcoxon Matched-Pairs (Signed-Rank) test
There is also a Genstat directive for fitting of statistical distributions:
DISTRIBUTION estimates the parameters of continuous and discrete distributions
Genstat provides directives for carrying out linear and nonlinear regression, also generalized linear, generalized additive and generalized nonlinear models. They are designed to allow easy comparison between models, and comparison between groups of data (specified as factors). The directives for nonlinear regression can also be used for general optimization. There are three preliminary directives for defining the form of model to be fitted, of which the
MODEL directive must always be given first:
MODEL defines the response variate(s) and the type of model to be fitted
TERMS specifies a maximal model, containing all terms to be used in subsequent regression models
RCYCLE controls iterative fitting of generalized linear models, generalized additive models and nonlinear models, and specifies parameters and bounds for nonlinear models
Separate directives carry out the fitting of the various types of model:
FIT fits a linear model, a generalized linear model, a generalized additive model, or a generalized nonlinear model
FITCURVE fits a standard nonlinear regression model
FITNONLINEAR fits a user-defined nonlinear regression model or optimizes a scalar function
Further directives are provided to allow sequential modification of the set of explanatory variables:
ADD adds extra terms to any type of regression model
DROP drops terms from any type of regression model
adds terms to, or drops them from, any type of regression model
TRY displays results of single-term changes to a linear or generalized linear model
STEP selects terms to include in or exclude from a linear or generalized linear model
The results of fitting the models can be displayed or stored in data structures:
RDISPLAY displays the fit of any type of regression model
RKEEP stores the results from any type of regression model
PREDICT forms predictions from a linear or generalized linear model
RFUNCTION estimates functions of parameters of a nonlinear model
Procedure relevant to regression analysis, in the
regression and glm modules of the Library, include:
RCHECK checks the fit of a regression model
RGRAPH draws a graph to display the fit of a regression model
FITNONNEGATIVE fits a generalized linear model with nonnegativity constraints
FITPARALLEL carries out analysis of parallelism for non-linear functions
FITSCHNUTE fits a general four-parameter growth model to a non-decreasing response variate
GEE fits models to longitudinal data by generalized estimating equations
GLM analyses non-standard generalized linear models
GLMM fits a generalized linear mixed model
IFUNCTION estimates implicit and/or explicit functions of parameters
PAIRTEST performs t-tests for pairwise differences
PPAIR displays results of t-tests for pairwise differences in compact diagrams
RJOINT does modified joint regression analysis for variety-by-environment data
RPAIR gives t-tests for all pairwise differences of means from linear or generalized linear models
XOCATEGORIES performs analyses of categorical data from crossover trials
PROBITANALYSIS fits probit models allowing for natural mortality and immunity
EXTRABINOMIAL fits models to overdispersed proportions
FIELLER calculates effective doses or relative potencies
DILUTION calculates Most Probable Numbers from dilution series data
WADLEY fits models for Wadley's problem, allowing alternative links and errors
Design and analysis of experiments
Genstat has a very general algorithm for analysis of variance of balanced experiments. There are several directives to define the various aspects of model to be fitted:
BLOCKSTRUCTURE defines the blocking structure of the design, and hence the strata and error terms
COVARIATE specifies covariates for analysis of covariance
TREATMENTSTRUCTURE defines the treatment (or systematic) terms
For unstructured designs with a single error term,
BLOCKSTRUCTURE need not be specified, and COVARIATE is needed only for analysis of covariance. Once the model has been defined, the y-variates can be analysed using the ANOVA directive:
ANOVA performs analysis of variance
Directives are available to save information in Genstat data structures, or to produce further output:
ADISPLAY displays further output from analyses produced by ANOVA
AKEEP copies information from an ANOVA analysis into Genstat data structures
Procedure relevant to analysis of variance, in the
aov module of the Library, include:
AGRAPH plots one- or two-way tables of means from ANOVA
APLOT plots residuals from an ANOVA analysis
DAPLOT plots residuals from ANOVA in high-resolution, with interactive identification of outliers
ASTATUS provides information about the settings of ANOVA models and variates
A2PLOT plots effects from two-level designs with robust s.e. estimates
ABIVARIATE produces graphs and statistics for bivariate analysis of variance
ALIAS finds out information about aliased model terms in analysis of variance
AREPMEASURES produces an analysis of variance for repeated measurements
AUNBALANCED performs analysis of variance for unbalanced designs
AUDISPLAY produces further output for an unbalanced design (after AUNBALANCED)
CENSOR pre-processes censored data before analysis by ANOVA
CINTERACTION clusters rows and columns of a two-way interaction table
DIALLEL analyses full and half diallel tables with parents
NLCONTRASTS fits non-linear contrasts to quantitative factors in ANOVA
The REML algorithm is available for estimating variance components and for analysing unbalanced designs.
REML fits a variance-component model by residual (or restricted) maximum likelihood
VCOMPONENTS defines the model for REML
VDISPLAY displays further output from a REML analysis
VKEEP copies information from a REML analysis into Genstat data structures
VSTRUCTURE defines a variance structure for random effects in a REML model
VPEDIGREE generates an inverse relationship matrix for use when fitting animal or plant breeding models by REML
VSTATUS prints the current model settings for REML
Procedures relevant to REML include:
VFUNCTION calculates functions of variance components from a REML analysis
VHOMOGENEITY tests homogeneity of variances
VPLOT plots residuals from a REML analysis
Directives are available for generating the values of factors for experimental designs, and for randomization.
FKEY forms design keys for multi-stratum experimental designs, allowing for confounding and aliasing of treatments
FPSEUDOFACTORS determines patterns of confounding and aliasing from design keys, and extends the treatment formula to incorporate the necessary pseudo-factors
GENERATE generates values of factors in systematic order or as defined by a design key, or forms values of pseudo-factors
RANDOMIZE puts units of vectors into random order, or randomizes units of an experimental design
Relevant procedures in the
design module of the Library include:
DESIGN acts as a menu-driven interface to the Genstat design system, providing a convenient way of selecting and generating various types of factorial designs, also fractional factorial, lattice, alpha, balanced-incomplete-block, Box Behnken, central composite, cyclic, neighbour-balanced and Plackett Burman designs; for those that prefer a command-based interface the procedures that it uses (AGDESIGN, AKEY, AGHIERARCHICAL, AGFRACTION, AGALPHA, AFALPHA, AGCYCLIC, AFCYCLIC, AGBIB, AGBOXBEHNKEN, AGCENTRALCOMPOSITE, AGMAINEFFECT and AGNEIGHBOUR) can also be called directly
AFORMS prints data forms for an experimental design
AFUNITS forms a factor to index the units of the final stratum of a design
AMERGE merges extra units into an experimental design
APRODUCT forms a new experimental design from the product of two designs
ARANDOMIZE randomizes and prints an experimental design
DDESIGN plots the plan of an experimental design
FACPRODUCT forms a factor with a level for every combination of other factors
PDESIGN prints or stores treatment combinations tabulated by the block factors
Multivariate analysis and cluster analysis
Several standard multivariate methods are provided by Genstat directives. These include methods that analyse data in the form of units-by-variates, and methods that use a similarity or distance matrix.
The following directives carry out standard multivariate analyses:
CVA canonical variates analysis
PCP principal components analysis
PCO principal coordinates analysis
ROTATE Procrustes rotation
MDS non-metric multidimensional scaling
Separate directives are available to process results from multivariate analyses:
FACROTATE rotates factor loadings from a PCP or CVA
ADDPOINTS adds points for new objects to a PCO
RELATE relates principal coordinates to original data variates
The following directives are used for hierarchical or non-hierarchical cluster analysis:
FSIMILARITY forms a similarity matrix or a between-group similarity matrix from a units-by-variates data matrix
REDUCE forms a reduced similarity matrix (by groups)
HCLUSTER hierarchical cluster analysis from a similarity matrix
CLUSTER non-hierarchical clustering from a data matrix
Separate directives that process the results from hierarchical cluster analyses are:
HDISPLAY displays results associated with hierarchical clustering
HLIST lists a data matrix in abbreviated form
HSUMMARIZE summarizes data variates by clusters
Other multivariate techniques are provided by procedures in the
mva module of the Library:
BIPLOT produces a biplot from a set of variates
CANCOR does canonical correlation analysis
CINTERACTION clusters rows and columns of a two-way interaction table
CLASSIFY obtains a starting classification for non-hierarchical clustering
CONVEXHULL finds the points of a single or a full peel of convex-hulls
CORRESP does correspondence analysis, or reciprocal averaging
CVAPLOT plots the mean and unit scores from a canonical variate analysis
CVASCORES calculates scores for individual units in canonical variate analysis
DDENDROGRAM draws dendrograms with control over structure and style
DISCRIMINATE performs discriminant analysis
DMST gives a high resolution plot of an ordination with minumum spanning tree
DPARALLEL displays multivariate data using parallel coordinates
FITMULTIVARIATE performs multivariate linear regression with accumulated testing of terms
GENPROC performs a generalized Procrustes analysis
LRVSCREE prints a scree diagram and/or a difference table of latent roots
MANOVA performs multivariate analysis of variance and covariance
MULTMISS estimates missing values for units in a multivariate data set
NORMTEST performs tests of univariate and/or multivariate normality
PCOPROC performs a multiple Procrustes analysis
PLS fits a partial least squares regression model
RIDGE produces ridge regression and principal component regression analyses
ROBSSPM forms robust estimates of sum-of-squares-and-products matrices
SKEWSYMM provides an analysis of skew-symmetry for an asymmetric matrix
Genstat provides several methods for examining and analysing time series. Sample correlation functions are produced by the directive
CORRELATE:
CORRELATE forms correlations between variates, autocorrelations of variates, and lagged cross-correlations between variates
The analysis of Box-Jenkins models is specified by several directives:
FTSM forms preliminary estimates of parameters in time-series models
TRANSFERFUNCTION specifies input series and transfer-function models for subsequent estimation of a model for an output series
ESTIMATE estimates parameters in Box-Jenkins models for time series
Information can be saved in Genstat data structures, or further output can be produced:
TDISPLAY displays further output after an analysis by ESTIMATE
TKEEP saves results after an analysis by ESTIMATE
FORECAST forecasts future values of a time series
TSUMMARIZE displays characteristics of a time series model
It is also possible to filter a time series, or perform spectral analysis via the Fourier transform of a time series using the directives:
FILTER filters time series by time-series models
FOURIER calculates cosine or Fourier transforms of a real or complex series
Procedures in module
timeseries of the Library include:
BJESTIMATE fits an ARIMA model, with forecasts and residual checks
BJFORECAST plots forecasts of a time series using a previously fitted ARIMA
BJIDENTIFY displays time series statistics useful for ARIMA model selection
PERIODTEST gives periodogram-based tests for white noise in time series
PREWHITEN filters a time series before spectral analysis
SMOOTHSPECTRUM forms smoothed spectrum estimates for univariate time series
Directives are available form forming variograms and for producing kriged estimates.
FVARIOGRAM forms auto-variograms for individual variates or cross-variograms for pairs of variates
KRIGE calculates kriged estimates using a model fitted to a sample variogram
Procedures in the
spatialstatistics module of the Library include:
MVARIOGRAM fits models to an experimental variogram
LVARMODEL analyses a field trial using the Linear Variance Neighbour model
DPOLYGON draws polygons using high-resolution graphics
DPTMAP draws maps for spatial point patterns using high-resolution graphics
DPTREAD adds points interactively to a spatial point pattern
DRPOLYGON reads a polygon interactively from the current graphics device
GRLABEL randomly labels two or more spatial point patterns
GRTHIN randomly thins a spatial point pattern
GRTORSHIFT performs a random toroidal shift on a spatial point pattern
PTBOX generates a box bounding or surrounding a spatial point pattern
PTCLOSEPOLYGON closes open polygons
PTDESCRIBE gives summary and second order statistics for a point process
PTREMOVE removes points interactively from a spatial point pattern
The Procedure Library covers many other areas of statistics, including analysis of repeated measurements, exact tests, sample re-use and survival analysis:
ANTORDER assesses order of ante-dependence for repeated measures data
ANTTEST calculates overall tests based on a specified order of ante-dependence
AREPMEASURES produces an analysis of variance for repeated measurements
CUMDISTRIBUTION fits frequency distributions to accumulated counts
DREPMEASURES plots profiles and differences of profiles for repeated measures data
VORTHPOL calculates orthogonal polynomial time-contrasts for repeated measures
FEXACT2X2 does Fisher's exact test for 2´ 2 tables
GEE fits models to longitudinal data by generalized estimating equations
BOOTSTRAP produces bootstrapped estimates, standard errors and distributions
JACKKNIFE produces Jackknife estimates and standard errors
KAPLANMEIER calculates the Kaplan-Meier estimate of the survivor function
RPROPORTIONAL fits the proportional hazards model to survival data as a GLM
RSURVIVAL models survival times of exponential, Weibull or extreme-value distributions
These functions calculate scalar summaries of values in any numerical structure:
SUM Arithmetic sum
TOTAL Synonym for SUM
MEAN Average
MEDIAN Median value
MINIMUM Minimum value
MAXIMUM Maximum value
CORRELATION Correlation
COVARIANCE Covariance
VARIANCE Variance
NVALUES Number of values
NOBSERVATIONS Number of observations
NMV Number of missing values
This function gives the number of levels in a factor:
NLEVELS Number of levels of a factor
This function evaluates the area under a curve defined by two variates:
AREA Estimates the area under a curve
This function indicates whether a dummy structure has been set:
UNSET Returns 0 or 1 according as a dummy is set or unset
These functions transform each value of numerical structures:
EXP Exponential
LOG Natural logarithm
LOG10 Logarithm base 10
SQRT Square root
SIN Sine
ARCSIN Inverse sine
COS Cosine
ARCCOS Inverse cosine
ANGULAR Angular transform
IANGULAR Inverse angular
LOGIT Logit
ILOGIT Inverse logit
CLOGLOG Complementary log-log
ICLOGLOG Inverse complementary log-log
ABS Absolute value
MODULO Modulo
INTEGER Integer part
ROUND Nearest integer
CUMULATE Cumulative sums
DIFFERENCE Differences
SORT Ordered values
REVERSE Reversed series
SHIFT Shift a series
CIRCULATE Circulate a series
MVINSERT Missing values inserted at specified positions
MVREPLACE Missing values replaced by specified values
NEWLEVELS Factor levels replaced by specified values
POSITION Locate position within a vector
These functions provide cumulative lower probabilities from continuous or discrete probability distributions:
CLNORMAL Normal (Synonym NORMAL)
CLLOGNORMAL Log-normal
CLT t-distribution
CLCHISQUARE Chi-square (Synonym CHISQ)
CLF F-distribution (Synonyms FRATIO, FPROBABILITY)
CLBVARIATENORMAL Bivariate Normal
CLBETA Beta
CLGAMMA Gamma
CLBINOMIAL Binomial
CLPOISSON Poisson
CLHYPERGEOMETRIC Hypergeometric
These functions provide cumulative upper probabilities from continuous or discrete probability distributions:
CUNORMAL Normal
CULOGNORMAL Log-normal
CUT t-distribution
CUCHISQUARE Chi-square
CUF F-distribution
CUBVARIATENORMAL Bivariate Normal
CUBETA Beta
CUGAMMA Gamma
CUBINOMIAL Binomial
CUPOISSON Poisson
CUHYPERGEOMETRIC Hypergeometric
These functions provide the equivalent deviate (that is, inverse cumulative from probability transform) from continuous or discrete probability distributions:
EDNORMAL Normal (Synonym NED)
EDLOGNORMAL Log-normal
EDT t-distribution
EDCHISQUARE Chi-square (Synonym CED)
EDF F-distribution (Synonym FED)
EDBETA Beta
EDGAMMA Gamma
These functions provide point probabilities from continuous or discrete probability distributions:
PRBETA Beta
PRBINOMIAL Binomial
PRCHISQUARE Chi-square
PRF F-distribution
PRGAMMA Gamma
PRHYPERGEOMETRIC Hypergeometric
PRLOGNORMAL Lognormal
PRNORMAL Normal
PRPOISSON Poisson
PRT t-distribution
These functions provide log-likelihoods from continuous or discrete probability distributions:
LLNORMAL Normal (Synonym LLN)
LLGAMMA Gamma (Synonym LLG)
LLBINOMIAL Binomial (Synonym LLB)
LLPOISSON Poisson (Synonym LLP)
A vector is a structure with a series of values: variate, text or factor. These functions form summaries for each set of corresponding values of a list of vectors:
VSUMS Arithmetic sums
VTOTALS Synonym for VSUMS
VMEANS Averages
VMEDIANS Median values
VMINIMA Minimum values
VMAXIMA Maximum values
VCOVARIANCE Covariances
VCORRELATION Correlation
VVARIANCES Variances
VNOBSERVATIONS Nos. of observations
VNVALUES Numbers of values
VNMV Numbers of missing values
These functions perform matrix operations:
PRODUCT Matrix product (the same as the operator *+)
LTPRODUCT Product after transposing left matrix, i.e. L' *+ R
RTPRODUCT Product after transposing right matrix, i.e. L *+ R'
QPRODUCT Quadratic product, i.e. M *+ S *+ M'
DETERMINANT Determinant of a square matrix
INVERSE Inverse of a square matrix
TRANSPOSE Transpose of a matrix, i.e. M'
TRACE Trace of a square matrix
CHOLESKI Choleski decomposition of a matrix
CORRMAT Correlation matrix derived from a symmetric matrix
SUBMAT Forms sub-triangles or sub-rectangles
SOLUTION Solution of simultaneous linear equations
In addition, the following functions give information about matrices:
NCOLUMNS Gives the number of columns of a matrix
NROWS Gives the number of rows of a matrix
These functions perform operations on
CHARACTERS Length of each line of a text
GETFIRST Position of first non-space character in each string
GETLAST Position of last non-space character in each string
GETPOSITION Position of a string in a text
These functions form marginal summaries of
TSUMS Arithmetic sums
TTOTALS Synonym for TSUMS
TMEANS Averages
TMEDIANS Median values
TMINIMA Minimum values
TMAXIMA Maximum values
TVARIANCES Variances
TNOBSERVATIONS Nos. of observations
TNVALUES Numbers of values
TNMV Numbers of missing values
Subsets of elements can be specified with the
EXPAND Forms a logical variate indicating selected units from a variate of unit numbers
POSITION Finds the positions of values within any vector
RESTRICTION Forms a logical variate indicating currently restricted units of a vector
In expressions, subsets of structure values can be referred to by qualified identifiers, or by the function:
ELEMENTS Selects values of any structure
Random numbers can be generated with the procedure
URAND Generates numbers in the range (0 - 1)
Random numbers from alternative probability distributions can be generated by the use of URAND in conjunction with the probability functions, e.g.
CALCULATE Z = EDNORMAL(URAND(0;100))
will give standard Normal distributed numbers in Z
Contrasts can be specified in the
COMPARISON Comparisons amongst the levels of a factor
POL Orthogonal polynomial contrasts of factor levels
POLND As POL, assigning deviations to error
REG Contrasts specified by a matrix of coefficients
REGND As REG, assigning deviations to error
Contrasts can be specified in the regression models with the following functions:
POL Polynomial contrasts of factor levels or of variate values
REG Contrasts specified by a matrix of coefficients for factors, or by a transposed data matrix for variates also othogonal polynomials)
Smoothing of explanatory variates in a linear or generalized linear model can be specified by the function:
SSPLINE Smoothing spline effect of a variate (synonym S)
CONSTANTS (g) (or C(g)) Provides the value of various constants, according to the contents of g: e (for a string of 'e' or 'E'), pi ('pi' or 'PI'), or missing value ('*').