Genstat Command Language Reference 

 

 

Genstat is a powerful statistical system. You can use it either by selecting from menus or by typing commands. This version of Genstat is designed to be used in a windowed environment, providing many of the customary features such as multiple windows and pull-down menus. If you choose to type commands, they may be in the form of any of the standard directives in the Genstat language, or of any of the standard procedures in the Genstat Procedure Library. Most commands allow you to use expressions to represent calculations or formulae, and these may include any of the functions provided in the language. To view any topic, click the underlined text.

 

Further information is available about:

 

the Genstat Language -

Syntax

Glossary of terminology

List of directives

List of procedures

List of functions for expressions

List of functions for formulae

Genstat faults

 

Commands associated with -

Data structures

Program control

Data handling

Input and output

Calculations and manipulation

Graphics

Basic statistics

Regression analysis

Design and analysis of experiments

Multivariate analysis and cluster analysis

Time series

Spatial statistics

Other statistical methods

 

Function types -

Summary functions

Transformations

Probability functions

Vector functions

Matrix functions

String functions

Table functions

Subset functions

Random functions

Treatment functions

Regression functions

Constant functions

 

Authors and copyright

 

Syntax

 

Input to Genstat is known as a Genstat program. This is made up of statements each of which may use one of the standard Genstat commands (known as directives); alternatively, it may use a Genstat procedure, that is, a subprogram of statements. You can write your own procedures, or use those in the Library distributed with Genstat, or in the library provided at your site.

Whether the statement uses a directive or a procedure, the syntax is identical. First you give the name of the directive (or procedure), then options, and then parameters. Finally, you indicate the end of the statement, either by typing a colon or by ending the line (by typing <RETURN>). Long statements can be continued onto succeeding lines by typing the continuation character (\) before <RETURN>.

Some statements will have neither options nor parameters: for example

PAGE

to start a new page in output. Others may have no options: for example

PRINT STRUCTURE=X,Y; DECIMALS=0,2

prints the contents of data structures X and Y with zero and two decimal places respectively. In this statement, there are two parameter settings defining two lists running in parallel. Parameter settings are always in parallel like this, and are separated from one another by semicolons. Options are enclosed in square brackets, and set aspects that apply to all the (parallel) parameter values. They are also separated from one another by semicolons. For example

PRINT [CHANNEL=2; INDENTATION=5] STRUCTURE=X,Y; DECIMALS=0,2

prints X and Y to output channel 2 with a five-character indentation at the start of each line. Nearly all options, and some parameters, have default values chosen to be those required most often, and so will usually not need to be set.

Settings of options and parameters can be lists (as above), expressions or formulae. Lists may be of numbers (as with DECIMALS above), or identifiers (as with STRUCTURE) or strings. An identifier is the name that you have given to a Genstat data structure (for example X or Y), and which will be used to refer to it in the program. They must start with a letter (for Genstat this means the alphabetic characters A to Z, in capitals or lower case, as well as the percent and underline characters) and then contain either letters or digits (the numerical characters 0 to 9); Genstat takes notice of only the first eight characters. Where a list of identifiers provides input to a directive or procedure, you can put an expression instead; this will then be evaluated (to give a list of identifiers containing the results) before the directive or procedure is used. A string is a list of characters. They occur within the Text data structure, or as the settings for some options and parameters. Usually the start and end of the string must be marked by a single quote ('). The separator between items in lists is comma; spaces can be included anywhere between items but do not act as separators. Formal definitions of expressions, formulae, and all the other concepts of the Genstat language are in 1.2.

Names of directives, options and parameters are examples of Genstat system words. They can be given in capital or small letters (or in mixtures of both), and can always be abbreviated to four characters. In fact, names of options and parameters can often be abbreviated further, and there are also rules by which the option or parameter name, with its accompanying equals character, can be omitted altogether. The most useful of these is that, if the first parameter of the directive is the one that comes first in the statement, then the name of the parameter can be omitted: for example

PRINT [CHANNEL=2; INDENTATION=5] X,Y; DECIMALS=0,2

as STRUCTURE is the first parameter of PRINT. The same rule holds for options:

PRINT [2; INDENTATION=5] X,Y; DECIMALS=0,2

as CHANNEL is the first option of PRINT. Full details of the rules are in 1.2.

A final point about the first parameter is that its setting determines the length of the parallel lists. The lists for other parameters will be repeated (or recycled) if they are shorter. (If they are longer, Genstat gives an error diagnostic.) For example

PRINT A,B,C,D; DECIMALS=0,2

prints A with zero decimal places, B with two, and then (recycling the DECIMALS list), C with zero and D with two.

 

Glossary

 

 

The glossary gives a brief explanation of the terminology of the Genstat language: Bracket, Character, Comment, Data structure, Diagonal matrix, Digit, Directive, Expression, Factor, Formula, Function, Identifier, Item, Letter, List, LRV structure, Macro, Matrix, Missing value, Multiplier, Number, Operator, Option, Parameter, Pointer, Procedure, Procedure Library, Program, Progression, Punctuation symbol, Qualified identifier, Scalar, Special symbol, SSPM structure, Statement, String, Subset selection, Suffix, Symmetric matrix, System word, Table, Text, TSM structure, Unnamed structure, Variate, and Vector.

 

Bracket

 

Round brackets ( ) are used to enclose a list of numbers to be pre- or post-multiplied or to enclose the arguments of a function; they also occur in expressions.

Square brackets [ ] are used to enclose a list of option settings or to enclose the suffix list of a pointer; also, when preceded by $, they enclose lists of unit names or numbers for a qualified identifier.

Curly brackets { } are each synonymous with the corresponding square bracket.

 

Character

The characters used to form Genstat statements are a subset of those available on most computers. For the Genstat language they are classified as brackets, digits, letters, punctuation symbols, simple operators, or special symbols.

 

Comment

A comment consists of any series of characters that the computer can represent, enclosed by double quotes ("); comments are ignored and can appear anywhere in a Genstat program.

 

Data structure

These are used to store information within Genstat, such as numbers, character strings or even identifiers of other data structures. Directives known as declarations are available to form each of the available types.

 

Diagonal matrix

is a data structure that stores the diagonal elements of a square matrix whose other values are all zero. Diagonal matrices can be declared using the DIAGONALMATRIX directive.

 

Digit

The numerical characters 0 to 9 are known as digits in Genstat.

 

Directive

is a standard form of instruction in the Genstat language requesting a particular action or analysis. All Genstat 5 directives have the same syntax.

Directive name is a system word used to request a particular action or analysis from Genstat. Directive names may be abbreviated to four characters; if characters 5-8 are given, they must match the standard form, e.g. TREATMENTSTRUCTURE can be written as TREA, TREAT, TREATM, and so on, but not as TREATS. (Also see procedure.)

 

Expression

is an arithmetic expression consisting of lists and functions separated by operators. An expression data structure stores a Genstat expression, and can be declared using the EXPRESSION directive.

 

Factor

is a data structure that specifies an allocation of the units into groups. It is thus a vector that, unlike the variate or the text, takes only a limited set of values, one for each group. The groups are referred to by numbers known as levels; you can also define textual labels. Factors can be declared using the FACTOR directive.

 

Formula

is a model formula of lists and operators defining the list of model terms involved in an analysis. A formula data structure stores a Genstat formula, and can be defined using the FORMULA directive.

 

Function

denotes a standard operation in an expression or formula, with the form "function-name (sequence of lists and/or expressions separated by ;)". The function-name is a system word and may be abbreviated to four characters; if characters 5-8 are given, they must match the standard form. A wide range of functions are available, for operations ranging from transformations to the calculation of summary statistics.

 

Identifier

is the name given to a particular data structure within a Genstat program. The first character of an identifier must be a letter; any others can be either letters or digits. Only the first eight characters are significant; subsequent characters are ignored. The directive SET allows you to specify whether or not the case of the letters (small or capital) is to be significant; e.g. whether LENGTH is the same as Length.

 

Item

is a number, a string, an identifier, a system word, a missing value, or an operator.

 

Letter

Letters in Genstat are the upper-case (capital) letters A to Z, the lower-case letters a to z, the underline symbol (_), and the percent character (%).

 

List

is a sequence of items separated by commas. In an identifier list, each item is an identifier or an unnamed structure, while number or string lists contain numbers or strings respectively. Lists can contain pre- or post-multipliers. Identifier and number lists can contain progressions.

 

LRV structure

is a compound data structure storing latent roots and vectors, mainly used in multivariate analysis. They can be declared using the LRV directive.

 

Macro

is a Genstat text structure containing a section of a Genstat program. The text must have an unsuffixed identifier. It can be substituted into the program, by giving its identifier, preceded by a contiguous pair of substitution symbols (##). The substitution takes place as soon as Genstat reads the pair of hashes. (However, Genstat also has the EXECUTE directive, which allows a text containing a list of statements to be executed for example within a loop or procedure.)

 

Matrix

is a data structure that stores a rectangular array of numbers. Matrices can be declared using the MATRIX directive.

 

Missing value

is denoted within a Genstat program by one asterisk (*). When reading data, a series of contiguous asterisks or an asterisk followed by letters or digits is treated as a missing value too, and other characters can also be defined to represent missing values.

 

Multiplier

allows repetive lists to be specified concisely. A multiplier may be a number, or the substitution symbol (#) followed by a single-valued numerical data structure.

Post-multiplier is given immediately after the second of a pair of round brackets enclosing a list of identifiers, numbers, or strings, and has the effect of repeating the entire list, as a whole, the specified number of times.

Pre-multiplier occurs immediately before the initial (round) bracket of a pair enclosing a list of identifiers, numbers, or strings and has the effect of repeating each item, in turn, the specified number of times.

 

Number

is a sequence of digits, optionally containing a decimal point (.). The sequence can be preceded by a sign (+ or -) and can be followed by an exponent: i.e. the letter E or D (in upper or lower case) optionally followed by spaces, then a sequence of digits optionally preceded by a sign.

 

Operator

is a symbol or symbols denoting an operation in an expression or formula:

Simple + (addition), - (subtraction), * (multiplication or product), /  (division), . (interaction), = (assignment), < (less than), >  (greater than)

Compound ** (exponentiation), *+ (matrix multiplication), -* (crossed deletion), -/ (nested deletion), // (pseudo-term linkage), .EQ. or == (equality), .NE. or /= or <> (non-equality), .LE. or <= (less than or equal to), .GE. or >= (greater than or equal to), .LT. (less than), .GT. (greater than), .EQS. (string equality), .NES. (string non-equality), .IN. (set inclusion), .NI. (set non-inclusion), .IS. (identifier equivalence), .ISNT. (identifier non-equivalence), .AND. (logical and), .OR. (logical or), .EOR. (logical either or), .NOT. (logical not).

Only + - * / . -/ -* and // may occur in formulae, while . -* -/ and // cannot occur (as operators) in expressions.

 

Option

Options specify arguments that are global within a Genstat statement: i.e. they apply to all the items in the parameter list(s). Often, but not always, options have default values and so need not be specified.

Option name is a system word that identifies a particular option setting. It can be abbreviated to the minimum number of characters required to distinguish it from the options that precede it in the prescribed order for the directive or procedure concerned; for directives, four characters are always sufficient.

Option sequence is a list of option settings separated by semi-colons (;).

Option setting has the form

option-name = list, expression or formula

"option-name =" can be omitted if the settings are given in the prescribed order for the directive or procedure concerned: i.e. the name may be omitted for the first setting if this is for the first prescribed option, and for subsequent settings if the previous setting was for the option immediately before the current one in the prescribed order.

 

Parameter

Parameters specify parallel lists of arguments for a statement: i.e. the statement (with its option settings) operates for the first item in each list, then the second, and so on. The number of times that this happens is determined by the length of the parameter list that is first in the prescribed order for the directive or procedure concerned. Subsequent lists are recycled if they are shorter than the first list.

Parameter name is a system word that identifies which parameter is being set. It may be abbreviated to the minimum number of characters required to distinguish it from the parameters that precede it in the prescribed order for the directive or procedure concerned; for directives, four characters are always sufficient.

Parameter sequence is a list of parameter settings separated by semi-colons (;).

Parameter setting has the form

parameter-name = list, expression or formula

"parameter-name =" can be omitted if the settings are given in the prescribed order for the directive or procedure concerned: i.e. the name may be omitted for the first setting if this is for the first prescribed parameter, and for subsequent settings if the previous setting was for the parameter immediately before the current one in the prescribed order. For directives or procedures with only a single parameter, no parameter name is defined.

 

Pointer

is a data structure that stores a series of identifiers, pointing to other data structures. Pointers can be declared using the POINTER directive.

 

Procedure

This is a structure that contains Genstat statements, and fulfils the role of the subroutine in the Genstat language. The use of a procedure looks just like the use of a Genstat directive. All data structures within the procedure are local (i.e. they cannot be referenced, or confused, with data structures outside the procedure); input and output structures for the procedure are defined by option and parameter settings in the procedure call.

Procedure name is a letter followed by letters and/or digits. Only the first eight characters are significant; subsequent characters are ignored. The case of the letters (small or capital) is also ignored.

 

Procedure Library

The Genstat Procedure Library contains procedures contributed not only by the writers of Genstat but also by knowledgeable Genstat users from many application areas and countries. The Library is controlled by an Editorial Board, who check that the procedures are useful and reliable, and maintain standards for the documentation. It is regularly extended and updated, independently to the releases of Genstat itself, and these revised versions are distributed automatically to all supported Genstat sites. Information about the Library is available using procedures in the help module of the Library. Other modules cover, for example, manipulation, graphics and various types of statistical analysis. These procedures are all accessed automatically by Genstat, when required. Instructions for authors of procedures can be obtained using procedure NOTICE. You can also form your own procedure libraries using the STORE directive.

 

Program

is a series of statements, ending with the statement STOP.

 

Progression

Lists of numbers ascending or descending with equal increments can be specified succinctly using the form "number, number ... number" where the first two numbers define the first two elements in the list (and thus the increment) and the list ends with the value beyond which the third number would be passed. For lists with an increment of plus or minus one, the second number can be omitted, to give the form "number ... number".

 

Punctuation symbol

The Genstat punctuation symbols are:

colon (:) indicates the end of a statement;

comma (,) separates items;

double quote (") is used to show the beginning and end of a comment;

equals (=) separates an option name or parameter name from its setting;

newline is synonymous with colon, by default, but directive SET can request that it be ignored;

semi-colon (;) separates lists;

single quote (') is used to show the beginning and end of a string (left single quote (`) is synonymous with single quote);

space can appear between items or can be omitted altogether if the items are already separated by another punctuation symbol, a bracket, an operator, or an ampersand;

tab the tab character is treated as a synonym of space everywhere except within texts and comments or if reading in fixed format (when it is treated as a fault).

 

Qualified identifier

These may occur in a list of identifiers to define subsets of the values of a data structure. The form is "identifier $ qualifier", where the qualifier is a sequence of identifier lists enclosed in square brackets. For factors, variates, and texts, the qualifier has a single list, each element of which defines a subset of the vector concerned. For matrices there are two lists running in parallel, one for each dimension. For a symmetric matrix, there can be either one or two lists, depending on whether or not its two dimensions are to be subsetted in the same way. For a diagonal matrix there is a single list. Tables cannot be qualified. The elements of the qualifier lists can be scalars, numbers, variates, quoted strings, or texts.

 

Scalar

is a data structure that stores a single number. Scalars can be declared using the SCALAR directive.

 

Special symbol

The special symbols in Genstat are as follows:

ampersand (&) repeats the previous statement name (unless that statement contained a syntax error) and any option settings that are not explicitly changed;

asterisk (*) denotes a missing value (and is also used as an operator);

backslash (\) is the continuation symbol, typed at the end of a line to indicate that the current statement continues onto the next line (this is unnecessary when directive SET has been used to specify that newline is to be ignored);

dollar ($) precedes a list of unit names or numbers (enclosed in square brackets) that define subsets of a factor, variate, matrix, symmetric matrix, diagonal matrix, or text;

exclamation mark (!) indicates an unnamed structure (vertical bar (|) is synonymous with exclamation mark);

hash (#) is the substitution symbol; when used on its own (i.e. followed just by a punctuation symbol) it represents the default setting of an option; alternatively, it can be followed by the identifier of a data structure whose values are to be inserted at that point in a Genstat statement (the substitution takes place immediately before the statement is executed). A pair of contiguous substitution symbols (##) is used to introduce a macro.

 

SSPM structure

is a compound data structure storing sums of squares and products, means and ancillary information for use in regression and multivariate analysis. SSPMs can be declared using the SSPM directive.

 

Statement

is an instruction in the Genstat language; it has the form

statement-name [option-sequence] parameter-sequence terminator

If no option settings are given, the square brackets can be omitted. The terminator is colon (:), ampersand (&) or newline (unless directive SET has indicated that this is to be ignored).

Statement name is the name of either a directive or a procedure.

 

String

is a sequence of characters forming one unit (or line) of a Genstat text structure. In most contexts, the string must be quoted: i.e. enclosed in single quotes ('). Quoted strings may contain any of the characters available on the computer. However, if single quote ('), double quote ("), or the continuation symbol (\) are required as characters within a quoted string, they must each be typed twice to distinguish this use from their action in, respectively, terminating the string, introducing a comment within the string, or indicating continuation. Newline within a quoted string is taken to terminate the current (quoted) string and begin another one, unless the newline is within a comment or preceded by an (unduplicated) continuation symbol (\), or unless directive SET has specified that newline is to be ignored. Unquoted strings can occur in unnamed texts, or in option or parameter settings where you have to specify a particular string from a prescribed set of alternatives; an unquoted string must have a letter as its first character and contain only letters or digits.

 

Subset selection

An identifier list can contain qualified identifiers, each defining a list of subsets of the values of the data structure concerned.

 

Suffix

Elements of pointers can be referred to by suffixes. Each suffix takes the form of an identifier list enclosed in square brackets; the list can contain numbers, scalars, or variates to reference an element or elements by number, or texts or quoted strings to reference by label. A null list within the brackets is taken to mean all the elements of the pointer in turn. Where a pointer has other pointers as its elements, their elements can be referred to in the same way, and so the original identifier may be followed by several suffix lists each contained in its own pair of square brackets; these define a list of elements, one for each combination of an element from each suffix list, taking the combinations in an order in which the last list cycles through its elements fastest, then the next to last list, and so on.

 

Symmetric matrix

is a data structure that stores the lower triangle (including the diagonal) of a symmetric square matrix.

 

System word

is a letter followed by letters and/or digits with a special meaning within the Genstat language, e.g. directive, option, parameter, or function names. The case of the letters (small/capital) is not significant; the abbreviation rules vary according to context.

 

Table

is a data structure that stores a multi-dimensional array of numbers, each dimension classified by a factor. Thus a table can be used to hold a summary of data that are classified (by the factors) into groups. Tables can be declared using the TABLE directive.

 

Text

is a data structure that stores a series of strings, each one representing a line of textual information. Texts can be declared using the TEXT directive.

 

TSM structure

is a compound data structure storing a model for use in Box-Jenkins modelling of time series. TSMs can be declared using the TSM directive.

 

Unnamed structure

An identifier list may contain unnamed variates, scalars, texts, pointers, expressions, or formulae. An unnamed structure consists of an exclamation mark, followed by the type code, and then the values contained in round brackets. The type code is E for expression, F for formula, P for pointer, S for scalar, T for text, or V for variate. If no code is given, variate is assumed by default.

 

Variate

is a data structure that stores a series of numbers. Variates can be declared using the VARIATE directive.

 

Vector

is a series of values, notionally arranged in a column. Genstat has three different types of vector: factors, texts, and variates.

 

List of directives

 

ADD adds extra terms to a linear, generalized linear, generalized additive, or nonlinear model.

ADDPOINTS adds points for new objects to a principal coordinates analysis.

ADISPLAY displays further output from analyses produced by ANOVA.

AKEEP copies information from an ANOVA analysis into Genstat data structures.

ANOVA analyses y-variates by analysis of variance according to the model defined by earlier BLOCKSTRUCTURE, COVARIATE, and TREATMENTSTRUCTURE statements.

ASSIGN sets elements of pointers and dummies.

AXES defines the axes in each window for high-resolution graphics.

BLOCKSTRUCTURE defines the blocking structure of the design and hence the strata and the error terms.

BREAK suspends execution of the statements in the current channel or control structure and takes subsequent statements from the channel specified.

CALCULATE calculates numerical values for data structures.

CASE introduces a "multiple-selection" control structure.

CATALOGUE displays the contents of a backing-store file.

CLOSE closes files.

CLUSTER forms a non-hierarchical classification.

COLOUR defines the red, green and blue intensities to be used for the Genstat colours with certain graphics devices.

COMBINE combines or omits "slices" of a multi-way data structure (table, matrix, or variate).

CONCATENATE concatenates and truncates lines (units) of text structures; allows the case of letters to be changed.

CONTOUR produces contour maps of two-way arrays of numbers (on the terminal/printer).

COPY forms a transcript of a job.

CORRELATE forms correlations between variates, autocorrelations of variates, and lagged cross-correlations between variates.

COVARIATE specifies covariates for use in subsequent ANOVA statements.

CVA performs canonical variates analysis.

DCLEAR clears a graphics screen.

DCONTOUR draws contour plots on a plotter or graphics monitor.

DDISPLAY redraws the current graphical display.

DEBUG puts an implicit BREAK statement after the current statement and after every NSTATEMENTS subsequent statements, until an ENDDEBUG is reached.

DECLARE declares one or more customized data structures.

DELETE deletes the attributes and values of structures.

DEVICE switches between (high-resolution) graphics devices.

DGRAPH draws graphs on a plotter or graphics monitor.

DHISTOGRAM draws histograms on a plotter or graphics monitor.

DIAGONALMATRIX declares one or more diagonal matrix data structures.

DISPLAY prints, or reprints, diagnostic messages.

DISTRIBUTION estimates the parameters of continuous and discrete distributions.

DKEEP saves information from the last plot on a particular device.

DPIE draws a pie chart on a plotter or graphics monitor.

DREAD reads the locations of points from an interactive graphical device.

DROP drops terms from a linear, generalized linear, generalized additive, or nonlinear model.

DSURFACE produces perspective views of a two-way arrays of numbers.

DUMMY declares one or more dummy data structures.

DUMP prints information about data structures, and internal system information.

DUPLICATE forms new data structures with attributes taken from an existing structure.

D3HISTOGRAM produces three-dimensional histograms.

EDIT edits text vectors.

ELSE introduces the default set of statements in block-if or in multiple-selection control structures.

ELSIF introduces a set of alternative statements in a block-if control structure.

ENDBREAK returns to the original channel or control structure and continues execution.

ENDCASE indicates the end of a "multiple-selection" control structure.

ENDDEBUG cancels a DEBUG statement.

ENDFOR indicates the end of the contents of a loop.

ENDIF indicates the end of a block-if control structure.

ENDJOB ends a Genstat job.

ENDPROCEDURE indicates the end of the contents of a Genstat procedure.

ENQUIRE provides details about files opened by Genstat.

EQUATE transfers data between structures of different sizes or types (but the same modes i.e. numerical or text) or where transfer is not from single structure to single structure.

ESTIMATE estimates parameters in Box-Jenkins models for time series.

EXECUTE executes the statements contained within a text.

EXIT exits from a control structure.

EXPRESSION declares one or more expression data structures.

FACROTATE rotates factor loadings from a principal components or canonical variates analysis according to either the varimax or quartimax criterion.

FACTOR declares one or more factor data structures.

FCLASSIFICATION forms a classification set for each term in a formula, breaks a formula up into separate formulae (one for each term), and applies a limit to the number of factors and variates in the terms of a formula.

FILTER filters time series by time-series models.

FIT fits a linear, generalized linear, generalized additive, or generalized nonlinear model.

FITCURVE fits a standard nonlinear regression model.

FITNONLINEAR fits a nonlinear regression model or optimizes a scalar function.

FKEY forms design keys for multi-stratum experimental designs, allowing for confounded and aliased treatments.

FLRV forms the values of LRV structures.

FOR introduces a loop; subsequent statements define the contents of the loop, which is terminated by the directive ENDFOR.

FORECAST forecasts future values of a time series.

FORMULA declares one or more formula data structures.

FOURIER calculates cosine or Fourier transforms of real or complex series.

FPSEUDOFACTORS determines patterns of confounding and aliasing from design keys, and extends the treatment model to incorporate the necessary pseudo-factors.

FRAME defines the positions of windows within the frame of a high-resolution graph. The positions are defined in normalized device coordinates ([0,1]´ [0,1]).

FSIMILARITY forms a similarity matrix or a between-group-elements similarity matrix or prints a similarity matrix.

FSSPM forms the values of SSPM structures.

FTSM forms preliminary estimates of parameters in time-series models.

FVARIOGRAM forms auto variograms for individual variates or cross variograms for pairs of variates.

GENERATE generates factor values for designed experiments: with no options set, factor values are generated in standard order; the options allow treatment factors to be generated using the design-key method, or pseudo-factors to be generated to describe the confounding in a partially balanced experimental design.

GET accesses details of the "environment" of a Genstat job.

GETATTRIBUTE accesses attributes of structures.

GRAPH produces scatter and line graphs on the terminal or line printer.

GROUPS forms a factor (or grouping variable) from a variate or text, together with the set of distinct values that occur.

HCLUSTER performs hierarchical cluster analysis.

HDISPLAY displays results ancillary to hierarchical cluster analyses: matrix of mean similarities between and within groups, a set of nearest neighbours for each unit, a minimum spanning tree, and the most typical elements from each group.

HELP prints details about the Genstat language and environment.

HISTOGRAM produces histograms of data on the terminal or line printer.

HLIST lists the data matrix in abbreviated form.

HSUMMARIZE forms and prints a group by levels table for each test together with appropriate summary statistics for each group.

IF introduces a block-if control structure.

INPUT specifies the input file from which to take further statements.

INTERPOLATE interpolates values at intermediate points.

JOB starts a Genstat job.

KRIGE calculates kriged estimates using a model fitted to the sample variogram.

LIST lists details of the data structures currently available within Genstat.

LRV declares one or more LRV data structures.

MARGIN forms and calculates marginal values for tables.

MATRIX declares one or more matrix data structures.

MDS performs non-metric multidimensional scaling.

MERGE copies subfiles from backing-store files into a single file.

MODEL defines the response variate(s) and the type of model to be fitted for linear, generalized linear, generalized additive, and nonlinear models.

MONOTONIC fits an increasing monotonic regression of y on x.

OPEN opens files.

OPTION defines the options of a Genstat procedure with information to allow them to be checked when the procedure is executed.

OR introduces a set of alternative statements in a "multiple-selection" control structure.

OUTPUT defines where output is to be stored or displayed.

OWN does work specified in Fortran subprograms linked into Genstat by the user.

PAGE moves to the top of the next page of an output file.

PARAMETER defines the parameters of a Genstat procedure with information to allow them to be checked when the procedure is executed.

PASS does work specified in subprograms supplied by the user, but not linked into Genstat. This directive may not be available on some computers.

PCO performs principal coordinates analysis, also principal components and canonical variates analysis (but with different weighting from that used in CVA) as special cases.

PCP performs principal components analysis.

PEN defines the properties of "pens" for high-resolution graphics.

POINTER declares one or more pointer data structures.

PREDICT forms predictions from a linear or generalized linear model.

PRINT prints data in tabular format in an output file, unformatted file, or text.

PROCEDURE introduces a Genstat procedure.

QUESTION obtains a response using a Genstat menu.

RANDOMIZE randomizes the units of a designed experiment or the elements of a factor or variate.

RCYCLE controls iterative fitting of generalized linear, generalized additive, and nonlinear models, and specifies parameters, bounds etc for nonlinear models.

RDISPLAY displays the fit of a linear, generalized linear, generalized additive, or nonlinear model.

READ reads data from an input file, an unformatted file, or a text.

RECORD dumps a job so that it can later be restarted by a RESUME statement.

REDUCE forms a reduced similarity matrix (referring to the GROUPS instead of the original units).

RELATE relates the observed values on a set of variates to the results of a principal coordinates analysis.

REML fits a variance-components model by residual (or restricted) maximum likelihood.

RESTRICT defines a restricted set of units of vectors for subsequent statements.

RESUME restarts a recorded job.

RETRIEVE retrieves structures from a subfile.

RETURN returns to a previous input stream (text vector or input channel).

RFUNCTION estimates functions of parameters of a nonlinear model.

RKEEP stores results from a linear, generalized linear, generalized additive, or nonlinear model.

ROTATE does a Procrustes rotation of one configuration of points to fit another.

SCALAR declares one or more scalar data structures.

SET sets details of the "environment" of a Genstat job.

SETOPTION sets or modifies defaults of options of Genstat directives or procedures.

SETPARAMETER sets or modifies defaults of parameters of Genstat directives or procedures.

SKIP skips lines in input or output files.

SORT sorts units of vectors according to an index vector.

SPREADSHEET allows interactive entry or editing of data (available in only some implementations).

SSPM declares one or more SSPM data structures.

STEP selects terms to include in or exclude from a linear, generalized linear, or generalized additive model according to the ratio of residual mean squares.

STOP ends a Genstat program.

STORE to store structures in a subfile of a backing-store file.

STRUCTURE defines a compound data structure.

SUSPEND suspends execution of Genstat to carry out commands in the operating system. This directive may not be available on some computers.

SVD calculates singular value decompositions of matrices i.e. ( LEFT *+ SINGULAR *+ TRANSPOSE(RIGHT) ).

SWITCH adds terms to, or drops them from a linear, generalized linear, generalized additive, or nonlinear model.

SYMMETRICMATRIX declares one or more symmetric matrix data structures.

TABLE declares one or more table data structures.

TABULATE forms summary tables of variate values.

TDISPLAY displays further output after an analysis by ESTIMATE.

TERMS specifies a maximal model, containing all terms to be used in subsequent linear, generalized linear, generalized additive, and nonlinear models.

TEXT declares one or more text data structures.

TKEEP saves results after an analysis by ESTIMATE.

TRANSFERFUNCTION specifies input series and transfer function models for subsequent estimation of a model for an output series.

TREATMENTSTRUCTURE specifies the treatment terms to be fitted by subsequent ANOVA statements.

TRY displays results of single-term changes to a linear, generalized linear, or generalized additive model.

TSM declares one or more TSM data structures.

TSUMMARIZE displays characteristics of time series models.

UNITS defines an auxiliary vector of labels and/or the length of any vector whose length is not defined when a statement needing it is executed.

VARIATE declares one or more variate data structures.

VCOMPONENTS defines the variance-components model for REML.

VDISPLAY displays further output from a REML analysis.

VKEEP copies information from a REML analysis into Genstat data structures.

VPEDIGREE generates an inverse relationship matrix for use when fitting animal or plant breeding models by REML.

VSTATUS prints the current model settings for REML.

VSTRUCTURE defines a variance structure for random effects in a REML model

WORKSPACE accesses private data structures for use in procedures.

 

List of procedures

 

ABIVARIATE produces graphs and statistics for bivariate analysis of variance.

AFALPHA generates alpha designs.

AFCYCLIC generates block and treatment factors for cyclic designs.

AFORMS prints data forms for an experimental design.

AFUNITS forms a factor to index the units of the final stratum of a design.

AGALPHA forms alpha designs by standard generators for up to 100 treatments.

AGBIB generates balanced incomplete block designs.

AGBOXBEHNKEN generates Box Behnken designs.

AGCENTRALCOMPOSITE generates central composite designs.

AGCYCLIC generates cyclic designs from standard generators.

AGDESIGN generates generally balanced designs.

AGFRACTION generates fractional factorial designs.

AGHIERARCHICAL generates orthogonal hierarchical designs.

AGMAINEFFECT generates designs to estimate main effects of two-level factors.

AGNEIGHBOUR generates neighbour-balanced designs.

AGRAPH plots one- or two-way tables of means from ANOVA.

AKAIKEHISTOGRAM prints histograms with improved definition of groups.

AKEY generates values for treatment factors using the design key method.

ALIAS finds out information about aliased model terms in analysis of variance.

AMERGE merges extra units into an experimental design.

ANTMVESTIMATE estimates missing values in repeated measurements.

ANTORDER assesses order of ante-dependence for repeated measures data.

ANTTEST calculates overall tests based on a specified order of ante-dependence.

AONEWAY provides one-way analysis of variance for inexperienced users.

APLOT plots residuals from an ANOVA analysis.

APPEND appends a list of vectors of the same type.

APRODUCT forms a new experimental design from the product of two designs.

ARANDOMIZE randomizes and prints an experimental design.

AREPMEASURES produces an analysis of variance for repeated measurements.

ASTATUS provides information about the settings of ANOVA models and variates.

ASWEEP performs sweeps for model terms in an analysis of variance.

AUDISPLAY produces further output for an unbalanced design (after AUNBALANCED).

AUNBALANCED performs analysis of variance for unbalanced designs.

A2PLOT plots effects from two-level designs with robust s.e. estimates.

BANK calculates the optimum aspect ratio for a graph.

BARCHART plots a bar chart using line-printer or high-resolution graphics.

BIPLOT produces a biplot from a set of variates.

BJESTIMATE fits an ARIMA model, with forecast and residual checks.

BJFORECAST plots forecasts of a time series using a previously fitted ARIMA.

BJIDENTIFY displays time series statistics useful for ARIMA model selection.

BOOTSTRAP produces bootstrapped estimates, standard errors and distributions.

BOXPLOT draws box-and-whisker diagrams or schematic plots.

CANCOR does canonical correlation analysis.

CENSOR pre-processes censored data before analysis by ANOVA.

CHECKARGUMENT checks the arguments of a procedure.

CHISQUARE calculates chi-square statistics for one- and two-way tables.

CINTERACTION clusters rows and columns of a two-way interaction table.

CLASSIFY obtains a starting classification for non-hierarchical clustering.

CONCORD calculates Kendall's Coefficient of Concordance.

CONVEXHULL finds the points of a single or a full peel of convex hulls.

CORRESP does correspondence analysis, or reciprocal averaging.

CUMDISTRIBUTION fits frequency distributions to accumulated counts.

CVAPLOT plots the mean and unit scores from a canonical variate analysis.

CVASCORES calculates scores for individual units in canonical variate analysis.

DAPLOT plots residuals from ANOVA with interactive identification of outliers.

DAYCOUNT converts a date to a daycount, or vice versa.

DAYLENGTH calculates daylengths at a given period of the year.

DBARCHART produces barcharts for one or two-way tables.

DDENDROGRAM draws dendrograms with control over structure and style.

DDESIGN plots the plan of an experimental design.

DECIMALS sets the number of decimals for a structure, using its round-off.

DESCRIBE saves and/or prints summary statistics for variates.

DESIGN helps to select and generate effective experimental designs.

DIALLEL analyses full and half diallel tables with parents.

DILUTION calculates Most Probable Numbers from dilution series data.

DISCRIMINATE performs discriminant analysis.

DMST gives a high resolution plot of an ordination with minimum spanning tree.

DOTPLOT produces a dot-plot using line-printer or high-resolution graphics.

DPARALLEL displays multivariate data using parallel coordinates.

DPOLYGON draws polygons using high-resolution graphics.

DPTMAP draws maps for spatial point patterns using high-resolution graphics.

DPTREAD adds points interactively to a spatial point pattern.

DREPMEASURES plots profiles and differences of profiles for repeated measures data.

DRPOLYGON reads a polygon interactively from the current graphics device.

DSCATTER produces a scatter-plot matrix using high-resolution graphics.

DSHADE produces a pictorial representation of a data matrix.

EXTRABINOMIAL fits the models of Williams (1982) to overdispersed proportions.

FACAMEND permutes the levels and labels of a factor.

FACPRODUCT forms a factor with a level for every combination of other factors.

FDESIGNFILE forms a backing-store file of information for AGDESIGN.

FEXACT2X2 does Fisher's exact test for 2´ 2 tables.

FIELLER calculates effective doses or relative potencies.

FILEREAD reads data from a file.

FITMULTIVARIATE performs multivariate linear regression with accumulated tests.

FITNONNEGATIVE fits a generalized linear model with nonnegativity constraints.

FITPARALLEL carries out analysis of parallelism for nonlinear functions.

FITSCHNUTE fits a general 4 parameter growth model to a non-decreasing Y-variate.

FLIBHELP forms a help information file for use by LIBHELP &c.

FRESTRICTEDSET forms vectors with the restricted subset of a list of vectors.

FTEXT forms a text structure from a variate.

GEE fits models to longitudinal data by generalized estimating equations.

GENPROC performs a generalized Procrustes analysis.

GETDATA recovers data and information previously stored by SAVEDATA.

GINVERSE calculates the generalized inverse of a matrix.

GLM analyses non-standard generalized linear models.

GLMM fits a generalized linear mixed model.

GRANDOM generates pseudo-random numbers from probability distributions.

GRLABEL randomly labels two or more spatial point patterns.

GRTHIN randomly thins a spatial point pattern.

GRTORSHIFT performs a random toroidal shift on a spatial point pattern.

HANOVA does hierarchical analysis of variance/covariance for unbalanced data.

HEATUNITS calculates accumulated heat units of a temperature dependent process.

IFUNCTION estimates implicit and/or explicit functions of parameters.

INSIDE determines whether points lie within a specified polygon.

JACKKNIFE produces Jackknife estimates and standard errors.

KAPLANMEIER calculates the Kaplan-Meier estimate of the survivor function.

KAPPA calculates a kappa coefficient of agreement for nominally scaled data.

KOLMOG2 performs a Kolmogorov-Smirnoff two-sample test.

KRUSKAL carries out a Kruskal-Wallis one-way analysis of variance.

LATTICE analyses square and rectangular lattice designs.

LIBEXAMPLE accesses examples and source code of library procedures.

LIBFILENAME supplies the names of information files for library procedures.

LIBHELP provides help information about library procedures.

LIBINFORM prints information about the contents of the Procedure Library.

LIBMANUAL prints a "Manual" containing information about library procedures.

LIBVERSION provides the name of the current Genstat 5 Procedure Library.

LINDEPENDENCE finds the linear relations associated with matrix singularities.

LRVSCREE prints a scree diagram and/or a difference table of latent roots.

LVARMODEL analyses a field trial using the Linear Variance Neighbour model.

MANNWHITNEY performs a Mann-Whitney U test.

MANOVA performs multivariate analysis of variance and covariance.

MENU initiates a menu system.

MPOWER forms integer powers of a square matrix.

MULTMISS estimates missing values for units in a multivariate data set.

MVARIOGRAM fits models to an experimental variogram.

NLCONTRASTS fits nonlinear contrasts to quantitative factors in ANOVA.

NORMTEST performs tests of univariate and/or multivariate normality.

NOTICE gives access to the Genstat Notice Board (news, errors &c).

ORTHPOL calculates orthogonal polynomials.

PAIRTEST performs t-tests for pairwise differences.

PCOPROC performs a multiple Procrustes analysis.

PDESIGN prints or stores treatment combinations tabulated by the block factors.

PERCENT expresses the body of a table as percentages of one of its margins.

PERIODTEST gives periodogram-based tests for white noise in time series.

PLS fits a partial least squares regression model.

PPAIR displays results of t-tests for pairwise differences in compact diagrams.

PREWHITEN filters a time series before spectral analysis.

PROBITANALYSIS fits probit models allowing for natural mortality and immunity.

PTBOX generates a bounding or surrounding box for a spatial point pattern.

PTCLOSEPOLYGON closes open polygons.

PTDESCRIBE gives summary and second order statistics for a point process.

PTREMOVE removes points interactively from a spatial point pattern.

QUANTILE calculates quantiles of the values in a variate.

RANK produces ranks, from the values in a variate, allowing for ties.

RCHECK checks the fit of a linear or generalized linear regression.

RGRAPH draws a graph to display the fit of a regression model.

RIDGE produces ridge regression and principal component regression analyses.

RJOINT does modified joint regression analysis for variety-by-environment data.

ROBSSPM forms robust estimates of sum-of-squares-and-products matrices.

RPAIR gives t-tests for all pairwise differences of means from a regression or GLM.

RPROPORTIONAL fits the proportional hazards model to survival data as a GLM.

RSURVIVAL models survival times of exponential, Weibull or extreme-value distributions.

RUGPLOT draws "rugplots" to display the distribution of one or more samples.

RUNTEST performs a test of randomness of a sequence of observations.

SAMPLE samples from a set of units, possibly stratified by factors.

SAVEDATA saves all the current data and information for use in a future run.

SIGNTEST performs a one or two sample sign test.

SKEWSYMM provides an analysis of skew-symmetry for an asymmetric matrix.

SMOOTHSPECTRUM forms smoothed spectrum estimates for univariate time series.

SPEARMAN calculates Spearman's Rank Correlation Coefficient.

SPLINE calculates a set of basis functions for M-, B- or I-splines.

STANDARDIZE standardizes columns of a data matrix to have mean zero and variance one.

STEM produces a simple stem-and-leaf chart.

SUBSET forms vectors containing subsets of the values in other vectors.

TTEST performs a one- or two-sample t-test.

VEQUATE equates across numerical structures.

VFUNCTION calculates functions of variance components from a REML analysis.

VHOMOGENEITY tests homogeneity of variances and variance-covariance matrices.

VINTERPOLATE performs linear & inverse linear interpolation between variates.

VORTHPOL calculates orthogonal polynomial time-contrasts for repeated measures.

VPLOT plots residuals from a REML analysis.

VREGRESS performs regression across variates.

VTABLE forms a variate and set of classifying factors from a table.

WADLEY fits models for Wadley's problem, allowing alternative links and errors.

WILCOXON performs a Wilcoxon Matched-Pairs (Signed-Rank) test.

XOCATEGORIES performs analyses of categorical data from crossover trials.

 

List of functions for expressions

 

 

These are the functions that can be used in expressions: ABS, ANGULAR, ARCCOS, ARCSIN, AREA, CED, CHARACTERS, CHISQ, CHOLESKI, CIRCULATE, CLBETA, CLBINOMIAL, CLBVARIATENORMAL, CLCHISQUARE, CLF, CLGAMMA, CLHYPERGEOMETRIC, CLLOGNORMAL, CLNORMAL, CLOGLOG, CLPOISSON, CLT, CONSTANTS, CORRELATION, COS, COVARIANCE, CUBETA, CUBINOMIAL, CUBVARIATENORMAL, CUCHISQUARE, CUF, CUGAMMA, CUHYPERGEOMETRIC, CULOGNORMAL, CUMULATE, CUNORMAL, CUPOISSON, CUT, DETERMINANT, DIFFERENCE, EDBETA, EDCHISQUARE, EDF, EDGAMMA, EDLOGNORMAL, EDNORMAL, EDT, ELEMENTS, EXP, EXPAND, FED, FPROBABILITY, GETFIRST, GETLAST, GETPOSITION, IANGULAR, ICLOGLOG, ILOGIT, INTEGER, INVERSE, LLBINOMIAL, LLGAMMA, LLNORMAL, LLPOISSON, LOG, LOG10, LOGIT, LTPRODUCT, MAXIMUM, MEAN, MEDIAN, MINIMUM, MODULO, MVINSERT, MVREPLACE, NCOLUMNS, NED, NEWLEVELS, NLEVELS, NMV, NOBSERVATIONS, NORMAL, NROWS, NVALUES, POSITION, PRBETA, PRBINOMIAL, PRCHISQUARE, PRF, PRGAMMA, PRHYPERGEOMETRIC, PRLOGNORMAL, PRNORMAL, PRODUCT, PRPOISSON, PRT, QPRODUCT, RESTRICTION, REVERSE, ROUND, RTPRODUCT, SHIFT, SIN, SOLUTION, SORT, SQRT, SUBMAT, SUM, TMAXIMA, TMEANS, TMEDIANS, TMINIMA, TNMV, TNOBSERVATIONS, TNVALUES, TOTAL, TRACE, TRANSPOSE, TTOTALS, TVARIANCES, UNSET, URAND, VARIANCE, VCORRELATION, VCOVARIANCE, VMAXIMA, VMEANS, VMEDIANS, VMINIMA, VNMV, VNOBSERVATIONS, VNVALUES, VTOTALS, and VVARIANCES.

 

ABS

 

ABS(x) the absolute value of x: |x|.

 

ANGULAR

 

ANGULAR(p) or ANG(p) the angular transformation: for a percentage p (0 <p < 100), forms x = (180/pi) ´  arcsin(sqrt(p/100)).

 

ARCCOS

 

ARCCOS(x) inverse cosine of x, where -1 <= x <= 1.

 

ARCSIN

 

ARCSIN(x) inverse sine of x, where -1 <= x <= 1.

 

AREA

 

AREA(y;x) numerically integrates the curve running through the points specified by variates y and x.

 

CED

 

CED(p;s) the chi-square equivalent deviate for probability p (0 < p < 1) with s degrees of freedom (synonym of EDCHI).

 

CHARACTERS

 

CHARACTERS(g) returns a variate giving the length of each line of the text g.

 

CHISQ

 

CHISQ(x;s) the chi-square probability of t < x with s degrees of freedom (synonym of CLCHI).

 

CHOLESKI

 

CHOLESKI(x) the Choleski decomposition of a symmetric matrix x: such that x = LL' where L is square with upper off-diagonal elements zero.

 

CIRCULATE

 

CIRCULATE(x;s) shifts the values of x, treating x as a circular stack. If s is omitted, values are shifted one to the right, as for s=1.

 

CLBETA

 

CLBETA(x;a;b) cumulative lower probability for a beta distribution with parameters a and b.

 

CLBINOMIAL

 

CLBINOMIAL(j;n;p) probability of x or fewer successes out of n binomial trials with probability of success p.

 

CLBVARIATENORMAL

 

CLBVARIATENORMAL(x;y;r) cumulative lower probability for a bivariate normal distribution with means 0, variances 1, and correlation r.

 

CLCHISQUARE

 

CLCHISQUARE(x;df) cumulative lower probability for a chi-square distribution with df degrees of freedom.

 

CLF

 

CLF(x;df1;df2) cumulative lower probability for an F distribution with df1 and df2 degrees of freedom.

 

CLGAMMA

 

CLGAMMA(x;a;b) cumulative lower probability for a gamma distribution with index parameter a and shape parameter b.

 

CLHYPERGEOMETRIC

 

CLHYPERGEOMETRIC(j;l;m;n) probability of x or fewer positive samples out of a total sample of size m from a population of size n of which l are positive (hypergeometric distribution).

 

CLLOGNORMAL

 

CLLOGNORMAL(x) cumulative lower probability for a lognormal distribution corresponding to a normal distribution with mean 0 and variance 1.

 

CLNORMAL

 

CLNORMAL(x) cumulative lower probability for a normal distribution with mean 0 and variance 1.

 

CLOGLOG

 

CLOGLOG(p) takes the complementary log-log transformation of the percentages p (0 < p < 100%).

 

CLPOISSON

 

CLPOISSON(j;m) probability of value of x or less for a poisson distribution with mean m.

 

CLT

 

CLT(x;df) cumulative lower probability for a t distribution with df degrees of freedom.

 

CONSTANTS

 

CONSTANTS(g) or C(g) provides the value of various constants, according to the contents of g: e (for a string of 'e' or 'E'), P ('pi' or 'PI'), or missing value ('*').

 

CORRELATION

 

CORRELATION(x;y) or CORRMAT(x;y) if both x and y are specified, returns a scalar giving the correlation between the values of x and y; if y is omitted, forms a correlation matrix from a symmetric matrix x.

 

COS

 

COS(x) cosine of x, for x in radians.

 

COVARIANCE

 

COVARIANCE(x;y) returns a scalar giving the covariance between the values of x and y.

 

CUBETA

 

CUBETA(x;a;b) cumulative upper probability for a beta distribution with parameters a and b.

 

CUBINOMIAL

 

CUBINOMIAL(j;n;p) probability of more than x successes out of n binomial trials with probability of success p.

 

CUBVARIATENORMAL

 

CUBVARIATENORMAL(x;y;r) cumulative upper probability for a bivariate normal distribution with means 0, variances 1, and correlation r.

 

CUCHISQUARE

 

CUCHISQUARE(x;df) cumulative upper probability for a chi-square distribution with df degrees of freedom.

 

CUF

 

CUF(x;df1;df2) cumulative upper probability for an F distribution with df1 and df2 degrees of freedom.

 

CUGAMMA

 

CUGAMMA(x;a;b) cumulative upper probability for a gamma distribution with index parameter a and shape parameter b.

 

CUHYPERGEOMETRIC

 

CUHYPERGEOMETRIC(j;l;m;n) probability of more than x positive samples out of a total sample of size m from a population of size n of which l are positive (hypergeometric distribution).

 

CULOGNORMAL

 

CULOGNORMAL(x) cumulative upper probability for a lognormal distribution corresponding to a normal distribution with mean 0 and variance 1.

 

CUMULATE

 

CUMULATE(x) or CUM(x) forms the cumulative sum of the values of x; i.e. x1, x1+x2, x1+x2+x3, and so on.

 

CUNORMAL

 

CUNORMAL(x) cumulative upper probability for a normal distribution with mean 0 and variance 1.

 

CUPOISSON

 

CUPOISSON(j;m) probability of a value greater than x for a poisson distribution with mean m.

 

CUT

 

CUT(x;df) cumulative upper probability for a t distribution with df degrees of freedom.

 

DETERMINANT

,

DETERMINANT(x), DET(x) or D(x) the determinant of a square or symmetric matrix.

 

DIFFERENCE

 

DIFFERENCE(x;s) forms the differences of x, i.e. xi-xi-s; if s is omitted, first differences are formed, as for s=1

 

EDBETA

 

EDBETA(p;a;b) equivalent deviate corresponding to cumulative lower probability p for a beta distribution with parameters a and b.

 

EDCHISQUARE

 

EDCHISQUARE(p;df) equivalent deviate corresponding to cumulative lower probability p for a chi-square distribution with df degrees of freedom.

 

EDF

 

EDF(p;df1;df2) equivalent deviate corresponding to cumulative lower probability p for an F distribution with df1 and df2 degrees of freedom.

 

EDGAMMA

 

EDGAMMA(p;a;b) equivalent deviate corresponding to cumulative lower probability p for a gamma distribution with index parameter a and shape parameter b.

 

EDLOGNORMAL

 

EDLOGNORMAL(p) equivalent deviate corresponding to cumulative lower probability p for a lognormal distribution corresponding to a normal distribution with mean 0 and variance 1.

 

EDNORMAL

 

EDNORMAL(p) equivalent deviate corresponding to cumulative lower probability p for a normal distribution with mean 0 and variance 1.

 

EDT

 

EDT(p;df) equivalent deviate corresponding to cumulative lower probability p for a t distribution with df degrees of freedom.

 

ELEMENTS

 

ELEMENTS(x;e1;e2) forms a sub-structure of x. If x is a vector or a diagonal matrix, then only e1 should be specified; this then indicates the selected elements of x. If x is a rectangular matrix, then both e1 and e2 should be given, to specify respectively the selected rows and columns of x. For a symmetric matrix x, if the same rows and columns are to be selected (giving a symmetric matrix) then only e1 should be specified; otherwise both e1 and e2 should be given (and the result is a matrix).

 

EXP

 

EXP(x) exponential: ex.

 

EXPAND

 

EXPAND(x;s) forms a variate of length s, containing zeroes and ones; if s is omitted and the length cannot be determined from the context, the length of the current units structure, if any, is taken. The values in x specify the numbers of the units that are to contain the value 1.

 

FED

 

FED(p;s1;s2) the F-distribution equivalent deviate for probability p (0 < p < 1) and (s1,s2) degrees of freedom (synonym of EDF).

 

FPROBABILITY

 

FPROBABILITY(x;s1;s2) or FRATIO(x;s1;s2) the F-ratio probability of t < x with (s1,s2) degrees of freedom (synonym of CLF).

 

GETFIRST

 

GETFIRST(g) gives a variate containing the position of the first non-space character in each string of the text g.

 

GETLAST

 

GETLAST(g) gives a variate containing the position of the last non-space character in each string of the text g.

 

GETPOSITION

 

GETPOSITION(g1;g2;x) for each unit, if the string in the text g2 occurs as a substring of the string in the text g1, this returns the position at which the substring starts; otherwise it returns the value zero. The text g2 may contain a single string (to be checked against every string of g1). The structure x (scalar or variate) supplies a logical value to indicate whether to ignore the case of any letters; if x is omitted, thelogical is assumed to be false (case not ignored).

 

IANGULAR

 

IANGULAR(x) gives the inverse of the angular transformation (result in percentages).

 

ICLOGLOG

 

ICLOGLOG(x) gives the inverse of the complementary log-log transformation (result in percentages).

 

ILOGIT

 

ILOGIT(x) gives the inverse of the logit transformation (result in percentages).

 

INTEGER

 

INTEGER(x) or INT(x) integer part of x: [x].

 

INVERSE

 

INVERSE(x) or INV(x) or I(x) the inverse of a non-singular square or symmetric matrix x.

 

LLBINOMIAL

 

LLBINOMIAL(x;n;p) or LLB(x;n;p) log-likelihood function for the Binomial distribution; n is the sample size and p the mean proportion (or the probability).

 

LLGAMMA

 

LLGAMMA(x;a;b) or LLG(x;a;b) log-likelihood function for the Gamma distribution with index parameter a and shape parameter b.

 

LLNORMAL

 

LLNORMAL(x;m;v) or LLN(x;m;v) log-likelihood function for the Normal distribution; m is the mean and v the variance.

 

LLPOISSON

 

LLPOISSON(x;m) or LLP(x;m) log-likelihood function for the Poisson distribution; m is the mean.

 

LOG

 

LOG(x) natural logarithm of x, for x > 0.

 

LOG10

 

LOG10(x) logarithm to base 10 of x, for x > 0.

 

LOGIT

 

LOGIT(p) takes the logit transformation log(p/(100-p)) of the percentages p (0 < p < 100%).

 

LTPRODUCT

 

LTPRODUCT(x;y) left transposed product of x and y: a more efficient way of calculating TRANSPOSE(x)*+y.

 

MAXIMUM

 

MAXIMUM(x) or MAX(x) finds the maximum of the values in x.

 

MEAN

 

MEAN(x) forms the mean of the values of x.

 

MEDIAN

 

MEDIAN(x) or MED(x) finds the median of the values in x.

 

MINIMUM

 

MINIMUM(x) or MIN(x) finds the minimum of the values in x.

 

MODULO

 

MODULO(x;y) Form modulus of x to base y.

 

MVINSERT

 

MVINSERT(x;y) replaces values in x by missing value wherever the second identifier stores a non-zero value (logical .TRUE.).

 

MVREPLACE

 

MVREPLACE(x;y) replaces missing values in x with the values in the corresponding units of y.

 

NCOLUMNS

 

NCOLUMNS(x) gives the number of columns of x.

 

NED

 

NED(p) gives the Normal equivalent deviate: that is the value x that leaves a proportion p (0 < p < 1) to the left of it under the standard Normal curve (synonym of EDNORMAL).

 

NEWLEVELS

 

NEWLEVELS(f;x) forms a variate from the factor f; the variate x defines a value for each level and should be the same length as the number of levels of the factor.

 

NLEVELS

 

NLEVELS(f) gives the number of levels of factor f.

 

NMV

 

NMV(x) counts the number of missing values in x.

 

NOBSERVATIONS

 

NOBSERVATIONS(x) counts the number of observations (that is non-missing values) in x.

 

NORMAL

 

NORMAL(x) the Normal probability integral: gives the probability that a random variable with a standard Normal N(0,1) distribution is less than x (synonym of CLNORMAL).

 

NROWS

 

NROWS(x) gives the number of rows of x.

 

NVALUES

 

NVALUES(x) gives the number of values, including missing values, of x (that is the length of x).

 

POSITION

 

POSITION(x;y) finds the position, within the vector y, of each value of x.

 

PRBETA

 

PRBETA(x;a;b) probability density function for a beta distribution with parameters a and b.

 

PRBINOMIAL

 

PRBINOMIAL(x;n;p) probability of x successes out of n binomial trials with probability of success p.

 

PRCHISQUARE

 

PRCHISQUARE(x;df) probability density function for a chi-square distribution with df degrees of freedom.

 

PRF

 

PRF(x;df1;df2) probability density function for an F distribution with df1 and df2 degrees of freedom.

 

PRGAMMA

 

PRGAMMA(x;a;b) probability density function for a gamma distribution with index parameter a and shape parameter b.

 

PRHYPERGEOMETRIC

 

PRHYPERGEOMETRIC(j;l;m;n) probability of x successes out of a sample of m from a population of size n of which l are positive (hypergeometric distribution).

 

PRLOGNORMAL

 

PRLOGNORMAL(x) probability density function for a lognormal distribution corresponding to a normal distribution with mean 0 and variance 1.

 

PRNORMAL

 

PRNORMAL(x) probability density function for a normal distribution with mean 0 and variance 1.

 

PRODUCT

 

PRODUCT(x;y) forms the matrix product of x and y (that is x *+ y).

 

PRPOISSON

 

PRPOISSON(j;m) probability of the value x for a poisson distribution with mean m.

 

PRT

 

PRT(x;df) probability density function for a t distribution with df degrees of freedom.

 

QPRODUCT

 

QPRODUCT(x;y) forms the quadratic product of x and y (that is x *+ y *+ TRANSPOSE(x)), where x is a rectangular matrix or variate and y is a symmetric or diagonal matrix or a scalar.

 

RESTRICTION

 

RESTRICTION(x) forms a variate with the value 1 in the units to which x is currently restricted.

 

REVERSE

 

REVERSE(x) reverses the values of x.

 

ROUND

 

ROUND(x) rounds the values of x to the nearest integer.

 

RTPRODUCT

 

RTPRODUCT(x;y) forms the right transposed product of x and y (that is x *+ TRANSPOSE(y)).

 

SHIFT

 

SHIFT(x;s) shifts the values of x by s places (to the right or left according to the sign of s). This is not a circular shift, so some positions lose their values and are given missing values.

 

SIN

 

SIN(x) sine of x, for x in radians.

 

SOLUTION

 

SOLUTION(x;y) finds the solution b of the set of simultaneous linear equations x *+ b = y.

 

SORT

 

SORT(x;y) sorts the elements of x into the order that would put the values of y into ascending order; if y is omitted, the values of x are sorted.

 

SQRT

 

SQRT(x) gives the square root of x (x ³ 0).

 

SUBMAT

 

SUBMAT(x) forms sub-triangles or sub-rectangles of a rectangular or symmetric matrix. The rows and columns to be included are determined by matching the pointers indexing the resultant matrix with the pointers indexing x. (SUBMAT does not allow for indexing by variates or texts.)

 

SUM

 

SUM(x) forms the sum of the values in x (synonym TOTAL).

 

TMAXIMA

 

TMAXIMA(t) forms margins of maxima for table t.

 

TMEANS

 

TMEANS(t) forms margins of means for table t.

 

TMEDIANS

 

TMEDIANS(t) forms margins of medians for table t.

 

TMINIMA

 

TMINIMA(t) forms margins of minima for table t.

 

TNMV

 

TNMV(t) forms margins counting the numbers of missing values in table t.

 

TNOBSERVATIONS

 

TNOBSERVATIONS(t) forms margins counting the numbers of observations (non-missing values) in table t.

 

TNVALUES

 

TNVALUES(t) forms margins counting the numbers of values (missing or non-missing) in table t.

 

TOTAL

 

TOTAL(x) forms the total of the values in x (synonym SUM).

 

TRACE

 

TRACE(x) calculates the trace of the square, diagonal, or symmetric matrix x (that is the sum of all its diagonal elements).

 

TRANSPOSE

 

TRANSPOSE(x) or T(x) forms the transpose of a rectangular matrix x.

 

TTOTALS

 

TTOTALS(t) or TSUMS(t) forms margins of totals for table t.

 

TVARIANCES

 

TVARIANCES(t) forms margins of between-cell variances for table t.

 

UNSET

 

UNSET(d) returns a scalar logical value according to whether or not the dummy d is set.

 

URAND

 

URAND(seed;s) provides s uniform pseudo-random numbers in the range (0,1). If s is not supplied and URAND cannot determine the length of the result from the context of the expression, the length of the current units structure (if any) is taken. Scalar seed initializes the generator. If zero in the first use of URAND in a job, the system clock is used to provide a seed; subsequent calls may use zero to continue the sequence of random numbers.

 

VARIANCE

 

VARIANCE(x) or VAR(x) gives the variance of the values in x.

 

VCORRELATION

 

VCORRELATION(p1;p2) gives the correlation, at every unit, between the values of the corresponding structures in the pointers p1 and p2.

 

VCOVARIANCE

 

VCOVARIANCE(p1;p2) gives the covariance, at every unit, between the values of the corresponding structures in the pointers p1 and p2.

 

VMAXIMA

 

VMAXIMA(p) finds the maximum of the values in each unit over the variates in pointer p.

 

VMEANS

 

VMEANS(p) gives the mean of the non-missing values in each unit over the variates in pointer p.

 

VMEDIANS

 

VMEDIANS(p) finds the median of the values in each unit over the variates in pointer p.

 

VMINIMA

 

VMINIMA(p) finds the minimum of the values in each unit over the variates in pointer p.

 

VNMV

 

VNMV(p) counts the number of missing values in each unit over the variates in pointer p.

 

VNOBSERVATIONS

 

VNOBSERVATIONS(p) counts the number of observations (non-missing values) in each unit over the variates in pointer p.

 

VNVALUES

 

VNVALUES(p) gives the number of values in each unit over the variates in pointer p (that is the number of values of p).

 

VTOTALS

 

VTOTALS(p) or VSUMS(p) gives the total of the non-missing values in each unit over the variates in pointer p.

 

VVARIANCES

 

VVARIANCES(p) gives the variance of the non-missing values in each unit over the variates in pointer p.

 

 

List of functions for formulae

 

 

These are the functions that can be used in formulae: COMPARISON, POL, POLND, REG, REGND, and SSPLINE.

 

COMPARISON

 

COMPARISON(f;s;m) calculates the comparisons amongst the levels of factor f specified by the first s rows of the matrix m (TREATMENT formulae only).

 

POL

 

POL(f;s;v) indicates that the effects of factor f are to be partitioned into polynomial components (linear, quadratic etc) up to order s, where s is a scalar containing an integer between 1 and 4; variate v defines a numerical value for each level of the factor; if omitted, the factor levels themselves are used; in regression models POL(v;s) can be used to fit simple polynomials of a variate v up to order s.

 

POLND

 

POLND(f;s;v) has the same effect as POL, except that no Dev components are fitted for factor f in interactions (TREATMENT formulae only).

 

REG

 

REG(f;s;m) indicates that the effects of factor f are to be partitioned into the regression contrasts specified by the first s rows of the matrix m. In TREATMENT formulae scalar s must lie between 1 and 7. In regression models REG(v;s;m) can be used to fit a set of associated variates stored in the rows of a matrix; if m is omitted orthogonal polynomial contrasts are constructed for either f or v (in regression models contrasts are otherwise not orthogonalised).

 

REGND

 

REGND(f;s;m) has the same effect as REG, except that no Dev components are fitted for factor f in interactions (TREATMENT formulae only).

 

SSPLINE

 

SSPLINE(v;s;p) or S(v;s;p) indicates that the effect of a variate v is to be fitted by a smoothing spline with approximately s degrees of freedom or using "smoothing parameter" p (regression models only).

 

 

Data structures

 

Data structures store the information on which a Genstat program operates. Structures can be defined, or declared, by a Genstat statement known as a declaration. The directive for declaring each type of structure has the same name as given to that type of structure, for example SCALAR to declare a scalar (or single-valued numerical structure), and so on. These are the directives, with details of their corresponding data structures:

 

SCALAR single number

VARIATE series of numbers

TEXT series of character strings (or lines of text)

FACTOR series of group allocations (using a pre-defined set of numbers or strings to indicate the groups)

MATRIX rectangular matrix

SYMMETRICMATRIX symmetric matrix

DIAGONALMATRIX diagonal matrix

TABLE table (to store tabular summaries like means, totals etc)

DUMMY single identifier

POINTER series of identifiers (e.g. to represent a set of structures)

EXPRESSION arithmetic expression

FORMULA model formula (to be fitted in a statistical analysis)

LRV latent roots and vectors

SSPM sums of squares and products with associated information such as means

TSM model for Box-Jenkins modelling of time series

 

It is possible to declare new structures with attributes the same as those of an existing structure.

 

DUPLICATE forms new data structures with attributes taken from an existing structure

 

You can also define data structures whose contents are customized for particular tasks.

 

STRUCTURE defines a customized data structure

DECLARE declares one or more customized data structures

 

Program control

 

A Genstat program consists of a sequence of one or more jobs. The first job starts automatically at the start of the program. Subsequent jobs can be initialized by the JOB and ENDJOB directives:

 

JOB starts a Genstat job (ending the previous one if necessary)

ENDJOB ends a job

 

The whole program is terminated by a STOP directive:

 

STOP ends a Genstat program

 

Statements within a program can be repeated using a FOR loop. The loop is introduced by a FOR statement. This is followed by the series of statements that is to repeated (that is, the contents of the loop), and the end of the loop is marked by an ENDFOR statement. Parameters of the FOR directive allow lists of data structures to be specified so that the statements in the loop operate on different structures each time that it is executed.

 

FOR indicates the start of a loop

ENDFOR marks the end of a loop

 

Genstat has two ways of choosing between sets of statements. The block-if structure consists of one or more alternative sets of statements. The first set is introduced by an IF statement. There may then be further sets introduced by ELSIF statements. Then there may be a final set introduced by an ELSE statement, and the whole structure is terminated by an ENDIF structure. The IF statement, and each ELSIF statement, contains a single-valued logical expression. Genstat evaluates each one in turn and executes the statements following the first TRUE logical found; if none of them is true, Genstat executes the statements following the ELSE statement (if any).

 

IF introduces a block-if structure

ELSIF introduces an alternative set of statements in a block-if structure

ELSE introduces a default set of statements for a block-if structure

ENDIF marks the end of a block-if structure

 

The multiple-selection structure consists of several sets of statements. The first is introduced by a CASE statement. Subsequent sets are introduced by OR statements. There can then be a final, default, set introduced by an ELSE statement, and the end of the structure is indicated by an ENDCASE statement. The parameter of the CASE statement is an expression which must produce a single number. Genstat rounds this to the nearest integer, n say, and then executes the nth set of statements. If there is no nth set, the statements following the ELSE statement are executed (if any).

 

CASE introduces a multiple-selection structure

OR introduces an alternative set of statements for a multiple-selection structure

ELSE introduces a default set of statements for a multiple-selection structure

ENDCASE marks the end of a multiple-selection structure

 

Sequences of statements can be formed into Genstat procedures for convenient future use. The use of a procedure looks just like one of the Genstat directives, with its own options and parameters, which transfer information to and from the procedure. Otherwise the procedure is completely self-contained. The start of a procedure is indicated by a PROCEDURE statement. Then OPTION and PARAMETER statements can be given to define the arguments of the procedure. These are followed by the statements to be executed when the procedure is called, terminated by an ENDPROCEDURE statement.

 

PROCEDURE introduces a procedure, and defines its name

OPTION defines the options of a procedure

PARAMETER defines the parameters of a procedure

ENDPROCEDURE indicates the end of a procedure

WORKSPACE accesses "private" data structures for use in procedures

 

Any control structure (job, block-if structure, loop, multiple-selection structure or procedure) can be abandoned using an EXIT statement. Also, execution of any of these structures can be interrupted explicitly with a BREAK statement, or implicitly by using DEBUG. Once DEBUG has been entered, Genstat will produce breaks automatically at regular intervals, until it meets an ENDDEBUG statement.

 

EXIT exits from a control structure

BREAK suspends the execution of a control structure

ENDBREAK continues execution of a control structure, following a break

DEBUG can cause a break to take place after the current statement (and at specified intervals thereafter), or immediately after the next fault

ENDDEBUG cancels DEBUG

 

Macros within a procedure are substituted as soon as they are met during the definition of the procedure. However, it is also possible to execute a set of statements (contained in a text) during execution of the procedure. This can also be useful within loops.

 

EXECUTE executes the statements contained within a text

 

In some implementations of Genstat, it is possible to suspend the execution of Genstat and return to the operating system of the computer to execute commands, for example to list or edit files on the computer. Likewise, it may be possible to halt the execution of Genstat to execute some other computer program. The OWN directive provides another way of running a user's program from within Genstat. The OWN subroutine, within the Fortran code of Genstat, needs to be modified to call the program. The new code must then be recompiled and linked into a new version of Genstat.

 

SUSPEND suspends the execution of Genstat to carry out operating-system commands

PASS runs another computer program, taking data from Genstat and transferring results back

OWN executes the user's own code linked into Genstat

 

Input and output

 

Data can be read into Genstat data structures using the READ directive or the FILEREAD procedure:

 

READ reads data from an input file, an unformatted file or a text

FILEREAD reads data from a file, assumed to be in a rectangular array

 

Files can be connected to input, output or other channels during execution of a Genstat program. Channels can also be closed, terminating the connection, so that they can be attached to other files.

 

OPEN opens files and connects them to Genstat input/output channels

CLOSE closes files, freeing the channels to which they were attached

 

The channel from which input statements are taken can be changed, as can the channel to which output is sent. It is also possible to send a transcript (or copy) of input and/or output to output files, to skip sections of input or output files, and to obtain information about the files connected to each channel.

 

INPUT specifies the channel from which subsequent statements should be read

RETURN returns to the previous input channel

OUTPUT specifies the channel to which future output should be sent

COPY requests a transcript of subsequent input and/or output

SKIP skips lines of input or output files

ENQUIRE provides details about files opened by Genstat

 

The contents of data structures can be "printed" into output files or into text structures, using the PRINT directive. Other directives allow system information or details of attributes of structures to be printed, or syntax details to be obtained. Directive SKIP, as mentioned above, allows blank lines to be inserted in output files; PAGE moves to the top of the next page.

 

PRINT prints data in tabular form to an output file or text

LIST lists details of the data structures that currently exist in your program

PAGE moves to the top of the next page of an output file

DISPLAY repeats the last Genstat diagnostic

DUMP prints attributes of data structures and other internal information

HELP prints details of the Genstat syntax and environment

 

Other information is available from the procedures in the help module of the Genstat Procedure Library:

 

LIBHELP provides help information for Library procedures

LIBEXAMPLE accesses examples and source code of Library procedures

LIBINFORM prints information about the contents of the Procedure Library

LIBMANUAL prints a "Manual" for the Procedure Library

LIBVERSION provides the name of the current Genstat 5 Procedure Library

NOTICE gives access to the Genstat Notice Board (news, errors, instructions for authors of procedures etc.)

 

Menu-driven interfaces can be defined using the QUESTION directive and invoked using the MENU procedure.

 

QUESTION obtains a response using a Genstat menu

MENU initiates a menu system

 

The values of a data structure, with all its defining information, can be stored in a sub-file of a "backing-store" file. It can then be retrieved in a later job, without the need to repeat the definitions. The current state of the whole job can also be dumped to an unformatted file, so that it can be picked up and continued on a later occasion.

 

STORE stores data structures in a backing-store file

RETRIEVE retrieves data structures from a backing-store file

CATALOGUE displays the contents of a backing-store file

MERGE copies sub-files of backing-store files into a single file

RECORD dumps the complete details of a job

RESUME reads and restarts a recorded job

 

Calculations and manipulation

 

The directive CALCULATE allows arithmetic calculations on the values of any numeric data structure; logical tests can also be done on numerical and textual values. Functions and operators are available for a very wide range of calculations on matrices and tables. Another general directive is EQUATE, which allows values to be copied from one set of data structures to another; the structures must store values of the same mode (for example, numbers or text), but need not be of the same type. Structure values can be deleted to save space within Genstat; attributes can also be deleted so that the structure can be redefined, for example as another type.

 

CALCULATE performs arithmetic and logical calculations

DELETE allows values and attributes of data structures to be deleted

EQUATE copies values between sets of data structures

 

There are several general directives for manipulating vectors (variates, factors or texts). Units of vectors can be sorted into systematic order or into random order. A "restriction" can be associated with a vector, so that subsequent statements operate on only a subset of its units. A default length and labelling can be defined for vectors formed later in the job. Facilities for specific types of vector allow interpolation of values for variates, monotonic regression, generation of factor values, concatenation and editing of text.

 

RESTRICT defines a "restriction" on the units of a vector

SORT sorts units of vectors into alphabetic or numerical order of an index vector, or forms a factor from a variate or text

UNITS defines default length or labelling for vectors defined subsequently in the job

INTERPOLATE calculates variates of interpolated values

MONOTONIC fits an increasing monotonic regression of y on x

GROUPS forms a factor (or grouping variable) from a variate or text, together with the set of distinct values that occur

CONCATENATE concatenates together lines of text vectors

EDIT line editor for units of text vectors

 

Other facilities for vectors are provided by the procedures in the manipulation module of the Genstat Procedure Library, including

 

APPEND appends a list of vectors of the same type

FACAMEND permutes the levels and labels of a factor

FACPRODUCT forms a factor with a level for every combination of other factors

FRESTRICTEDSET forms vectors with the restricted subset of a list of vectors

GRANDOM generates pseudo-random numbers from probability distributions

ORTHPOL calculates orthogonal polynomials

QUANTILE calculates quantiles of the values in a variate

RANK produces ranks, from the values in a variate, allowing for ties

SAMPLE samples from a set of units, possibly stratified by factors

SPLINE calculates a set of basis functions for M-, B- or I-splines

STANDARDIZE standardizes columns of a data matrix to have mean 0 and variance 1

SUBSET forms vectors containing subsets of the values in other vectors

VEQUATE equates across numerical structures

VINTERPOLATE performs linear and inverse linear interpolation between variates

 

Tables can be formed containing summaries of values in variates: totals, minimum and maximum values, quantiles, numbers of missing and non-missing values, means and variances. Manipulations of multi-way structures include the ability to add various types of marginal summaries to tables, and to combine "slices" of tables, of matrices or of variates. Directives are also available for eigenvalue and singular-value decompositions of matrices, and to form the values of SSPM structures.

 

TABULATE forms tables of summaries of the values of a variate

MARGIN calculates or deletes margins of tables

COMBINE combines or omits "slices" of tables, matrices or variates

FLRV calculates latent roots and vectors (that is eigenvalues and eigenvectors)

SVD calculates singular-value decompositions of matrices

FSSPM calculates values for SSPM structures (sums of squares and products, means, etc)

 

Procedures in the Library for manipulating tables and matrices include

 

PERCENT expresses the body of a table as percentages of one of its margins

GINVERSE calculates the generalized inverse of a matrix

LINDEPENDENCE finds the linear relations associated with matrix singularities

MPOWER forms integer powers of a square matrix

 

Formulae can be interpreted using the FCLASSIFICATION directive.

 

FCLASSIFICATION forms classification sets for the terms in a formula or breaks a formula up into separate formulae (one for each term)

 

Values can be assigned to dummies and pointers:

 

ASSIGN sets values of dummies and pointers

 

Aspects of the "environment" of the current job can be modified, such as whether or not Genstat starts output from a statistical analysis at the top of a new page, or whether it should pause during interactive output. New defaults can be set for options and parameters. Details of the environmental settings can be copied into Genstat data structures. Attributes of data structures can also be accessed.

 

SET sets details of the "environment" of a Genstat job

SETOPTION sets or modifies defaults of options of Genstat directives or procedures

SETPARAMETER sets or modifies defaults of parameters of Genstat directives or procedures

GET gets details of the "environment" of a Genstat job

GETATTRIBUTE accesses attributes of data structures

 

Graphics

 

Genstat can plot data on terminals or line-printers. Most Genstat implementations can also produce graphs on higher resolution devices like graphics monitors and plotters. The relevant directives for line-printers or terminals are:

 

CONTOUR produces contour maps of two-way arrays of numbers

GRAPH produces scatter plots and line graphs

HISTOGRAM plots histograms

 

For high-resolution graphics, the directives have two main purposes. There are those that define the "graphics environment" for subsequent plots, and those that do the plotting. Often the default environment, set up at the start of a program, will be satisfactory. To change the graphics environment, the following directives can be used:

 

AXES defines the axes in each graphical window

COLOUR defines the colour map for certain graphics devices

DEVICE switches between graphics devices

FRAME defines the positions of the windows within the frame

PEN defines the properties of the graphics "pens"

DKEEP Saves information about the graphics environment

 

The directives for plotting high-resolution graphs are:

 

DCONTOUR produces contour maps

DGRAPH produces scatter plots and line graphs

DHISTOGRAM plots histograms

DPIE produces pie charts

DSURFACE draws a perspective plot of a two-way array of numbers

D3HISTOGRAM produces 3-dimensional histograms

DDISPLAY redraws the current graphical display

DCLEAR clears a graphics screen

 

With interactive graphics devices, information can be read from the screen:

 

DREAD reads locations of points from an interactive graphics device

 

Other facilities, provided by procedures in the graphics module of the Library include:

 

BANK calculates the optimum aspect ratio for a graph

BARCHART plots a bar chart

BOXPLOT draws box-and-whisker diagrams (schematic plots)

DBARCHART plots barcharts for one or two-way tables

DOTPLOT produces a dot-plot

DSCATTER produces a scatter-plot matrix

DSHADE produces a pictorial representation of a data matrix

INSIDE determines whether points lie within a specified polygon

RUGPLOT draws "rugplots" to display the distribution of one or more samples

STEM produces a simple stem-and-leaf chart

 

Basic statistics

 

Many simple statistical operations, such as t-tests, one-way analysis of variance, non-parametric tests, and summary statistics are provided by procedures in the basic and nonparametric modules of the Library:

 

AONEWAY provides one-way analysis of variance

CHISQUARE calculates chi-square statistics for one- and two-way tables

CONCORD calculates Kendall's Coefficient of Concordance

DESCRIBE saves and/or prints summary statistics for variates

KAPPA calculates a kappa coefficient of agreement for nominally scaled data

KOLMOG2 performs a Kolmogorov-Smirnoff two-sample test

KRUSKAL carries out a Kruskal-Wallis one-way analysis of variance

MANNWHITNEY performs a Mann-Whitney U test

RUNTEST performs a test of randomness of a sequence of observations

SIGNTEST performs a one or two sample sign test

SPEARMAN calculates Spearman's Rank Correlation Coefficient

TTEST performs a one- or two-sample t-test

WILCOXON performs a Wilcoxon Matched-Pairs (Signed-Rank) test

 

There is also a Genstat directive for fitting of statistical distributions:

 

DISTRIBUTION estimates the parameters of continuous and discrete distributions

 

Regression analysis

 

Genstat provides directives for carrying out linear and nonlinear regression, also generalized linear, generalized additive and generalized nonlinear models. They are designed to allow easy comparison between models, and comparison between groups of data (specified as factors). The directives for nonlinear regression can also be used for general optimization. There are three preliminary directives for defining the form of model to be fitted, of which the MODEL directive must always be given first:

 

MODEL defines the response variate(s) and the type of model to be fitted

TERMS specifies a maximal model, containing all terms to be used in subsequent regression models

RCYCLE controls iterative fitting of generalized linear models, generalized additive models and nonlinear models, and specifies parameters and bounds for nonlinear models

 

Separate directives carry out the fitting of the various types of model:

 

FIT fits a linear model, a generalized linear model, a generalized additive model, or a generalized nonlinear model

FITCURVE fits a standard nonlinear regression model

FITNONLINEAR fits a user-defined nonlinear regression model or optimizes a scalar function

 

Further directives are provided to allow sequential modification of the set of explanatory variables:

 

ADD adds extra terms to any type of regression model

DROP drops terms from any type of regression model

adds terms to, or drops them from, any type of regression model

TRY displays results of single-term changes to a linear or generalized linear model

STEP selects terms to include in or exclude from a linear or generalized linear model

 

The results of fitting the models can be displayed or stored in data structures:

 

RDISPLAY displays the fit of any type of regression model

RKEEP stores the results from any type of regression model

PREDICT forms predictions from a linear or generalized linear model

RFUNCTION estimates functions of parameters of a nonlinear model

 

Procedure relevant to regression analysis, in the regression and glm modules of the Library, include:

 

RCHECK checks the fit of a regression model

RGRAPH draws a graph to display the fit of a regression model

FITNONNEGATIVE fits a generalized linear model with nonnegativity constraints

FITPARALLEL carries out analysis of parallelism for non-linear functions

FITSCHNUTE fits a general four-parameter growth model to a non-decreasing response variate

GEE fits models to longitudinal data by generalized estimating equations

GLM analyses non-standard generalized linear models

GLMM fits a generalized linear mixed model

IFUNCTION estimates implicit and/or explicit functions of parameters

PAIRTEST performs t-tests for pairwise differences

PPAIR displays results of t-tests for pairwise differences in compact diagrams

RJOINT does modified joint regression analysis for variety-by-environment data

RPAIR gives t-tests for all pairwise differences of means from linear or generalized linear models

XOCATEGORIES performs analyses of categorical data from crossover trials

PROBITANALYSIS fits probit models allowing for natural mortality and immunity

EXTRABINOMIAL fits models to overdispersed proportions

FIELLER calculates effective doses or relative potencies

DILUTION calculates Most Probable Numbers from dilution series data

WADLEY fits models for Wadley's problem, allowing alternative links and errors

 

Design and analysis of experiments

 

Genstat has a very general algorithm for analysis of variance of balanced experiments. There are several directives to define the various aspects of model to be fitted:

 

BLOCKSTRUCTURE defines the blocking structure of the design, and hence the strata and error terms

COVARIATE specifies covariates for analysis of covariance

TREATMENTSTRUCTURE defines the treatment (or systematic) terms

 

For unstructured designs with a single error term, BLOCKSTRUCTURE need not be specified, and COVARIATE is needed only for analysis of covariance. Once the model has been defined, the y-variates can be analysed using the ANOVA directive:

 

ANOVA performs analysis of variance

 

Directives are available to save information in Genstat data structures, or to produce further output:

 

ADISPLAY displays further output from analyses produced by ANOVA

AKEEP copies information from an ANOVA analysis into Genstat data structures

 

Procedure relevant to analysis of variance, in the aov module of the Library, include:

 

AGRAPH plots one- or two-way tables of means from ANOVA

APLOT plots residuals from an ANOVA analysis

DAPLOT plots residuals from ANOVA in high-resolution, with interactive identification of outliers

ASTATUS provides information about the settings of ANOVA models and variates

A2PLOT plots effects from two-level designs with robust s.e. estimates

ABIVARIATE produces graphs and statistics for bivariate analysis of variance

ALIAS finds out information about aliased model terms in analysis of variance

AREPMEASURES produces an analysis of variance for repeated measurements

AUNBALANCED performs analysis of variance for unbalanced designs

AUDISPLAY produces further output for an unbalanced design (after AUNBALANCED)

CENSOR pre-processes censored data before analysis by ANOVA

CINTERACTION clusters rows and columns of a two-way interaction table

DIALLEL analyses full and half diallel tables with parents

NLCONTRASTS fits non-linear contrasts to quantitative factors in ANOVA

 

The REML algorithm is available for estimating variance components and for analysing unbalanced designs.

 

REML fits a variance-component model by residual (or restricted) maximum likelihood

VCOMPONENTS defines the model for REML

VDISPLAY displays further output from a REML analysis

VKEEP copies information from a REML analysis into Genstat data structures

VSTRUCTURE defines a variance structure for random effects in a REML model

VPEDIGREE generates an inverse relationship matrix for use when fitting animal or plant breeding models by REML

VSTATUS prints the current model settings for REML

 

Procedures relevant to REML include:

 

VFUNCTION calculates functions of variance components from a REML analysis

VHOMOGENEITY tests homogeneity of variances

VPLOT plots residuals from a REML analysis

 

Directives are available for generating the values of factors for experimental designs, and for randomization.

 

FKEY forms design keys for multi-stratum experimental designs, allowing for confounding and aliasing of treatments

FPSEUDOFACTORS determines patterns of confounding and aliasing from design keys, and extends the treatment formula to incorporate the necessary pseudo-factors

GENERATE generates values of factors in systematic order or as defined by a design key, or forms values of pseudo-factors

RANDOMIZE puts units of vectors into random order, or randomizes units of an experimental design

 

Relevant procedures in the design module of the Library include:

 

DESIGN acts as a menu-driven interface to the Genstat design system, providing a convenient way of selecting and generating various types of factorial designs, also fractional factorial, lattice, alpha, balanced-incomplete-block, Box Behnken, central composite, cyclic, neighbour-balanced and Plackett Burman designs; for those that prefer a command-based interface the procedures that it uses (AGDESIGN, AKEY, AGHIERARCHICAL, AGFRACTION, AGALPHA, AFALPHA, AGCYCLIC, AFCYCLIC, AGBIB, AGBOXBEHNKEN, AGCENTRALCOMPOSITE, AGMAINEFFECT and AGNEIGHBOUR) can also be called directly

AFORMS prints data forms for an experimental design

AFUNITS forms a factor to index the units of the final stratum of a design

AMERGE merges extra units into an experimental design

APRODUCT forms a new experimental design from the product of two designs

ARANDOMIZE randomizes and prints an experimental design

DDESIGN plots the plan of an experimental design

FACPRODUCT forms a factor with a level for every combination of other factors

PDESIGN prints or stores treatment combinations tabulated by the block factors

 

Multivariate analysis and cluster analysis

 

Several standard multivariate methods are provided by Genstat directives. These include methods that analyse data in the form of units-by-variates, and methods that use a similarity or distance matrix.

 

The following directives carry out standard multivariate analyses:

 

CVA canonical variates analysis

PCP principal components analysis

PCO principal coordinates analysis

ROTATE Procrustes rotation

MDS non-metric multidimensional scaling

 

Separate directives are available to process results from multivariate analyses:

 

FACROTATE rotates factor loadings from a PCP or CVA

ADDPOINTS adds points for new objects to a PCO

RELATE relates principal coordinates to original data variates

 

The following directives are used for hierarchical or non-hierarchical cluster analysis:

 

FSIMILARITY forms a similarity matrix or a between-group similarity matrix from a units-by-variates data matrix

REDUCE forms a reduced similarity matrix (by groups)

HCLUSTER hierarchical cluster analysis from a similarity matrix

CLUSTER non-hierarchical clustering from a data matrix

 

Separate directives that process the results from hierarchical cluster analyses are:

 

HDISPLAY displays results associated with hierarchical clustering

HLIST lists a data matrix in abbreviated form

HSUMMARIZE summarizes data variates by clusters

 

Other multivariate techniques are provided by procedures in the mva module of the Library:

 

BIPLOT produces a biplot from a set of variates

CANCOR does canonical correlation analysis

CINTERACTION clusters rows and columns of a two-way interaction table

CLASSIFY obtains a starting classification for non-hierarchical clustering

CONVEXHULL finds the points of a single or a full peel of convex-hulls

CORRESP does correspondence analysis, or reciprocal averaging

CVAPLOT plots the mean and unit scores from a canonical variate analysis

CVASCORES calculates scores for individual units in canonical variate analysis

DDENDROGRAM draws dendrograms with control over structure and style

DISCRIMINATE performs discriminant analysis

DMST gives a high resolution plot of an ordination with minumum spanning tree

DPARALLEL displays multivariate data using parallel coordinates

FITMULTIVARIATE performs multivariate linear regression with accumulated testing of terms

GENPROC performs a generalized Procrustes analysis

LRVSCREE prints a scree diagram and/or a difference table of latent roots

MANOVA performs multivariate analysis of variance and covariance

MULTMISS estimates missing values for units in a multivariate data set

NORMTEST performs tests of univariate and/or multivariate normality

PCOPROC performs a multiple Procrustes analysis

PLS fits a partial least squares regression model

RIDGE produces ridge regression and principal component regression analyses

ROBSSPM forms robust estimates of sum-of-squares-and-products matrices

SKEWSYMM provides an analysis of skew-symmetry for an asymmetric matrix

 

Time series

 

Genstat provides several methods for examining and analysing time series. Sample correlation functions are produced by the directive CORRELATE:

 

CORRELATE forms correlations between variates, autocorrelations of variates, and lagged cross-correlations between variates

 

The analysis of Box-Jenkins models is specified by several directives:

 

FTSM forms preliminary estimates of parameters in time-series models

TRANSFERFUNCTION specifies input series and transfer-function models for subsequent estimation of a model for an output series

ESTIMATE estimates parameters in Box-Jenkins models for time series

 

Information can be saved in Genstat data structures, or further output can be produced:

 

TDISPLAY displays further output after an analysis by ESTIMATE

TKEEP saves results after an analysis by ESTIMATE

FORECAST forecasts future values of a time series

TSUMMARIZE displays characteristics of a time series model

 

It is also possible to filter a time series, or perform spectral analysis via the Fourier transform of a time series using the directives:

 

FILTER filters time series by time-series models

FOURIER calculates cosine or Fourier transforms of a real or complex series

 

Procedures in module timeseries of the Library include:

 

BJESTIMATE fits an ARIMA model, with forecasts and residual checks

BJFORECAST plots forecasts of a time series using a previously fitted ARIMA

BJIDENTIFY displays time series statistics useful for ARIMA model selection

PERIODTEST gives periodogram-based tests for white noise in time series

PREWHITEN filters a time series before spectral analysis

SMOOTHSPECTRUM forms smoothed spectrum estimates for univariate time series

 

Spatial statistics

 

Directives are available form forming variograms and for producing kriged estimates.

 

FVARIOGRAM forms auto-variograms for individual variates or cross-variograms for pairs of variates

KRIGE calculates kriged estimates using a model fitted to a sample variogram

 

Procedures in the spatialstatistics module of the Library include:

 

MVARIOGRAM fits models to an experimental variogram

LVARMODEL analyses a field trial using the Linear Variance Neighbour model

DPOLYGON draws polygons using high-resolution graphics

DPTMAP draws maps for spatial point patterns using high-resolution graphics

DPTREAD adds points interactively to a spatial point pattern

DRPOLYGON reads a polygon interactively from the current graphics device

GRLABEL randomly labels two or more spatial point patterns

GRTHIN randomly thins a spatial point pattern

GRTORSHIFT performs a random toroidal shift on a spatial point pattern

PTBOX generates a box bounding or surrounding a spatial point pattern

PTCLOSEPOLYGON closes open polygons

PTDESCRIBE gives summary and second order statistics for a point process

PTREMOVE removes points interactively from a spatial point pattern

 

Other statistical methods

 

The Procedure Library covers many other areas of statistics, including analysis of repeated measurements, exact tests, sample re-use and survival analysis:

 

ANTORDER assesses order of ante-dependence for repeated measures data

ANTTEST calculates overall tests based on a specified order of ante-dependence

AREPMEASURES produces an analysis of variance for repeated measurements

CUMDISTRIBUTION fits frequency distributions to accumulated counts

DREPMEASURES plots profiles and differences of profiles for repeated measures data

VORTHPOL calculates orthogonal polynomial time-contrasts for repeated measures

FEXACT2X2 does Fisher's exact test for 2´ 2 tables

GEE fits models to longitudinal data by generalized estimating equations

BOOTSTRAP produces bootstrapped estimates, standard errors and distributions

JACKKNIFE produces Jackknife estimates and standard errors

KAPLANMEIER calculates the Kaplan-Meier estimate of the survivor function

RPROPORTIONAL fits the proportional hazards model to survival data as a GLM

RSURVIVAL models survival times of exponential, Weibull or extreme-value distributions

 

Summary functions

 

 

These functions calculate scalar summaries of values in any numerical structure:

 

SUM Arithmetic sum

TOTAL Synonym for SUM

MEAN Average

MEDIAN Median value

MINIMUM Minimum value

MAXIMUM Maximum value

CORRELATION Correlation

COVARIANCE Covariance

VARIANCE Variance

NVALUES Number of values

NOBSERVATIONS Number of observations

NMV Number of missing values

 

This function gives the number of levels in a factor:

 

NLEVELS Number of levels of a factor

 

This function evaluates the area under a curve defined by two variates:

 

AREA Estimates the area under a curve

 

This function indicates whether a dummy structure has been set:

 

UNSET Returns 0 or 1 according as a dummy is set or unset

 

Transformations

 

 

These functions transform each value of numerical structures:

 

EXP Exponential

LOG Natural logarithm

LOG10 Logarithm base 10

SQRT Square root

 

SIN Sine

ARCSIN Inverse sine

COS Cosine

ARCCOS Inverse cosine

 

ANGULAR Angular transform

IANGULAR Inverse angular

LOGIT Logit

ILOGIT Inverse logit

CLOGLOG Complementary log-log

ICLOGLOG Inverse complementary log-log

 

ABS Absolute value

MODULO Modulo

INTEGER Integer part

ROUND Nearest integer

 

CUMULATE Cumulative sums

DIFFERENCE Differences

SORT Ordered values

REVERSE Reversed series

SHIFT Shift a series

CIRCULATE Circulate a series

 

MVINSERT Missing values inserted at specified positions

MVREPLACE Missing values replaced by specified values

NEWLEVELS Factor levels replaced by specified values

POSITION Locate position within a vector

 

Probability functions

 

 

These functions provide cumulative lower probabilities from continuous or discrete probability distributions:

 

CLNORMAL Normal (Synonym NORMAL)

CLLOGNORMAL Log-normal

CLT t-distribution

CLCHISQUARE Chi-square (Synonym CHISQ)

CLF F-distribution (Synonyms FRATIO, FPROBABILITY)

CLBVARIATENORMAL Bivariate Normal

CLBETA Beta

CLGAMMA Gamma

CLBINOMIAL Binomial

CLPOISSON Poisson

CLHYPERGEOMETRIC Hypergeometric

 

These functions provide cumulative upper probabilities from continuous or discrete probability distributions:

 

CUNORMAL Normal

CULOGNORMAL Log-normal

CUT t-distribution

CUCHISQUARE Chi-square

CUF F-distribution

CUBVARIATENORMAL Bivariate Normal

CUBETA Beta

CUGAMMA Gamma

CUBINOMIAL Binomial

CUPOISSON Poisson

CUHYPERGEOMETRIC Hypergeometric

 

These functions provide the equivalent deviate (that is, inverse cumulative from probability transform) from continuous or discrete probability distributions:

 

EDNORMAL Normal (Synonym NED)

EDLOGNORMAL Log-normal

EDT t-distribution

EDCHISQUARE Chi-square (Synonym CED)

EDF F-distribution (Synonym FED)

EDBETA Beta

EDGAMMA Gamma

 

These functions provide point probabilities from continuous or discrete probability distributions:

 

PRBETA Beta

PRBINOMIAL Binomial

PRCHISQUARE Chi-square

PRF F-distribution

PRGAMMA Gamma

PRHYPERGEOMETRIC Hypergeometric

PRLOGNORMAL Lognormal

PRNORMAL Normal

PRPOISSON Poisson

PRT t-distribution

 

These functions provide log-likelihoods from continuous or discrete probability distributions:

 

LLNORMAL Normal (Synonym LLN)

LLGAMMA Gamma (Synonym LLG)

LLBINOMIAL Binomial (Synonym LLB)

LLPOISSON Poisson (Synonym LLP)

 

Vector functions

 

 

A vector is a structure with a series of values: variate, text or factor. These functions form summaries for each set of corresponding values of a list of vectors:

 

VSUMS Arithmetic sums

VTOTALS Synonym for VSUMS

VMEANS Averages

VMEDIANS Median values

VMINIMA Minimum values

VMAXIMA Maximum values

VCOVARIANCE Covariances

VCORRELATION Correlation

VVARIANCES Variances

VNOBSERVATIONS Nos. of observations

VNVALUES Numbers of values

VNMV Numbers of missing values

 

Matrix functions

 

 

These functions perform matrix operations:

 

PRODUCT Matrix product (the same as the operator *+)

LTPRODUCT Product after transposing left matrix, i.e. L' *+ R

RTPRODUCT Product after transposing right matrix, i.e. L *+ R'

QPRODUCT Quadratic product, i.e. M *+ S *+ M'

DETERMINANT Determinant of a square matrix

INVERSE Inverse of a square matrix

TRANSPOSE Transpose of a matrix, i.e. M'

TRACE Trace of a square matrix

CHOLESKI Choleski decomposition of a matrix

CORRMAT Correlation matrix derived from a symmetric matrix

SUBMAT Forms sub-triangles or sub-rectangles

SOLUTION Solution of simultaneous linear equations

 

In addition, the following functions give information about matrices:

 

NCOLUMNS Gives the number of columns of a matrix

NROWS Gives the number of rows of a matrix

 

String functions

 

 

These functions perform operations on strings:

 

CHARACTERS Length of each line of a text

GETFIRST Position of first non-space character in each string

GETLAST Position of last non-space character in each string

GETPOSITION Position of a string in a text

 

Table functions

 

 

These functions form marginal summaries of tables:

 

TSUMS Arithmetic sums

TTOTALS Synonym for TSUMS

TMEANS Averages

TMEDIANS Median values

TMINIMA Minimum values

TMAXIMA Maximum values

TVARIANCES Variances

TNOBSERVATIONS Nos. of observations

TNVALUES Numbers of values

TNMV Numbers of missing values

 

Subset functions

 

 

Subsets of elements can be specified with the RESTRICT directive; alternatively, subsets can be copied to new structures using the EQUATE directive. These functions deal with subsets of values.

 

EXPAND Forms a logical variate indicating selected units from a variate of unit numbers

POSITION Finds the positions of values within any vector

RESTRICTION Forms a logical variate indicating currently restricted units of a vector

 

In expressions, subsets of structure values can be referred to by qualified identifiers, or by the function:

 

ELEMENTS Selects values of any structure

 

Random functions

 

 

Random numbers can be generated with the procedure GRANDOM or with the function:

 

URAND Generates numbers in the range (0 - 1)

 

Random numbers from alternative probability distributions can be generated by the use of URAND in conjunction with the probability functions, e.g.

CALCULATE Z = EDNORMAL(URAND(0;100))

will give standard Normal distributed numbers in Z

 

Treatment functions

 

 

Contrasts can be specified in the TREATMENT directive with the following functions:

 

COMPARISON Comparisons amongst the levels of a factor

POL Orthogonal polynomial contrasts of factor levels

POLND As POL, assigning deviations to error

REG Contrasts specified by a matrix of coefficients

REGND As REG, assigning deviations to error

 

Regression functions

 

 

Contrasts can be specified in the regression models with the following functions:

 

POL Polynomial contrasts of factor levels or of variate values

REG Contrasts specified by a matrix of coefficients for factors, or by a transposed data matrix for variates also othogonal polynomials)

 

Smoothing of explanatory variates in a linear or generalized linear model can be specified by the function:

 

SSPLINE Smoothing spline effect of a variate (synonym S)

 

Constant functions

 

 

CONSTANTS (g) (or C(g)) Provides the value of various constants, according to the contents of g: e (for a string of 'e' or 'E'), pi ('pi' or 'PI'), or missing value ('*').