HANOVA procedure 

Does hierarchical analysis of variance/covariance for unbalanced data

(P.W. Lane)

 

Options

PRINT = string Which analyses to print (all, some, none); default all

INCHANNEL = scalar Channel from which to read data; default * specifies that the data values are already stored in the factors and variates specified by the parameters of HANOVA

FORMAT = variate Format for reading data; default * requests free format

ANALYSIS = symmetric matrix For PRINT=some, this indicates which analyses to print

SSPM = SSPM Stores the corrected sums of squares and products; default *

COEFFICIENT = matrix Stores the estimated variance and co-variance components; default *

 

Parameters

VARIATES = pointers Variates to be analysed

FACTORS = pointers Factors defining the hierarchy, the first factor of the pointer defining the first stratum, and so on

 

Description

Procedure HANOVA performs hierarchical analysis of variance and covariance, estimating the components of variance corresponding to each level of a nested classification. It is designed for unbalanced classifications; balanced data are analysed more efficiently by the ANOVA directive.

Data are said to be classified hierarchically if the units have several groupings successively nested within each other. One way of representing such a classification would be to identify the groupings in each stratum of the hierarchy by a single factor; two units with the same value for one of the factors would then be required to have the identical values for the factors representing the previous strata. An alternative method is to use not only the factor for the current stratum, but also the factors for previous strata, to indicate the groupings that occur there. For example, the following classifications are effectively equivalent:

 

(1) (2)

Unit Factor 1 Factor 2 Factor 1 Factor 2

(stratum 1) (stratum 2) (stratum 1) (stratum 2)

 

1 1 1 1 1

2 1 1 1 1

3 1 2 1 2

4 2 3 2 1

5 2 4 2 2

 

Thus, in the second form of representation, the second factor indicates the sub-divisions within each group in the first stratum, using the same levels each time. This more efficient method is the one required by HANOVA.

The simplest way to use HANOVA is to set the VARIATES parameter to a single variate (or to a pointer if several variates are to be analysed), and set the FACTORS parameter to a pointer of factors. The factors must be in the order of the hierarchy with the first factor defining the coarsest grouping of the units and succeeding factors being nested within the first. The units of data stored in the variates and factors can be in any order.

Since hierarchical data can often be extensive, HANOVA can be requested to read the data sequentially, tabulating it with respect to the factors, so that the data need not all be held in core at the same time. The INCHANNEL defines the channel number of the file from which the data are to be read; if INCHANNEL is not set, the data are assumed to be present already, in the factors and variates contained in the VARIATES and FACTORS parameters. The FORMAT option allows a variate to be specified for use in the FORMAT option of the READ command within the procedure; if this is not set, the default format of READ is assumed.

If a unit has a missing value for any of the variates or factors, it is omitted from all the analyses. The procedure carries out analyses of variance for specified variates, and of covariance for specified pairs of variates. Variance components are calculated for each stratum: that is, the proportion of the total variance per individual ascribable to the various strata of the classification.

Output is controlled by the PRINT option: by default, the matrix of coefficients of variance components is printed, followed by an analysis of variance of each variate and of covariance of each pair of variates. To obtain only some of the analyses, option PRINT should be set to some, and the ANALYSIS option to a symmetric matrix with numbers of rows and columns equal to the number of variates. A non-zero value in the matrix indicates that the corresponding analysis of variance or covariance is to be displayed. Printed output can be suppressed by setting PRINT=none.

The matrix of coefficients can be saved using the COEFFICIENTS option, and the sum of squares and products of the variates using the SSPM option.

 

Options: PRINT, INCHANNEL, FORMAT, ANALYSIS, COEFFICIENT, SSPM.

Parameters: VARIATES, FACTORS.

 

Method

HANOVA uses the method described by Gower (1962).

 

Action with RESTRICT

Account is taken of restriction on any factor, or on the first variate in the VARIATES parameter: subsequent variates must either have the same restriction, or be unrestricted.

 

Reference

Gower, J.C. (1962). Variance component estimation for unbalanced hierarchical classifications, Biometrics 18, 537-542.

 

HEATUNITS procedure

Calculates accumulated heat units of a temperature dependent process

(R.J. Reader, R.A. Sutherland & K. Phelps)

 

Options

METHOD = string Temperature/time relationship to be used (sawtooth, cosine, linsine, expsine); default sawt

LATITUDE = scalar Latitude at which temperatures were measured; default 52.205 N {Wellesbourne, U.K.}

RATE = variate Value of rate relationship at cardinal temperatures

TEMPERATURE = variate Cardinal temperatures

PARAMETERS = variate Parameters a, b, c (a, c in hours) for the expsine method

 

Parameters

MINTEMPERATURE = variates Minimum temperature on each day

MAXTEMPERATURE = variates Maximum temperature on each day

FIRSTDAY = scalars Day of year of first temperature recorded

HEATUNITS = variates Development on each day

 

Description

HEATUNITS calculates heat units accumulated each day by a process whose rate depends on temperature. The temperature is assumed to vary diurnally. The rate function is defined as a linear spline so that any relationship can be approximated by specifying a set of cardinal temperatures and corresponding rates.

The METHOD option specifies the form of the diurnal temperature variation; this is derived from consecutive daily maximum and minimum temperatures according to methods compared by Reicosky et al. (1989). The LATITUDE option should be set to the latitude (degrees) at which the maxima and minima were recorded (positive for the northern hemisphere and negative for the southern hemisphere). The RATE and TEMPERATURE options define the rate/temperature relationship. They specify variates of equal length, RATE containing the rate of the process at the temperature of the corresponding unit of TEMPERATURE. The PARAMETERS option is a variate containing the values of the parameters a, b and c of the METHOD expsine.

The parameters MAXTEMP and MINTEMP contain the maximum and minimum temperatures on each day respectively. The FIRSTDAY parameter specifies the day of the year of the first unit of the MAXTEMP and MINTEMP variates. The HEATUNITS parameter returns the heat units accumulated on each day.

 

Options: METHOD, LATITUDE, RATE, TEMPERATURE, PARAMETERS.

Parameters: MINTEMPERATURE, MAXTEMPERATURE, FIRSTDAY, HEATUNITS.

 

Method

The integral of each segment of the rate/temperature relationship on each day is evaluated. These integrals are then added together. Further details are given by Reader & Phelps (1991).

 

Action with RESTRICT

None of the options or parameters of this procedure should be restricted as the maximum and minimum temperatures must be from consecutive days. Also they should not contain missing values, except for the first minimum and final maximum which are not used.

 

References

Reicosky, D.C., Winkelman, L.J., Baker, J.M. & Baker, D.G. (1989). Accuracy of hourly air temperatures calculated from daily minima and maxima. Agricultural and Forest Meteorology, 46, 193-209.

Reader, R.J. & Phelps, K. (1991). Modelling the development of temperature-dependent processes. Genstat Newsletter, 28, 27-32.

 

IFUNCTION procedure

Estimates implicit and/or explicit functions of parameters

(W.M. Patefield)

 

Options

PRINT = string What to print (estimates, correlations, monitoring); default esti

NOMESSAGE = string Which warning messages to suppress (parameter, convergence); default *

NPARAMETER = scalar Number of parameters; default zero

MAXCYCLE = scalar Maximum number of iterations; default 20

STRINGENCY = scalar Stringency of tests for convergence, 0,1,2...etc; default 5

EXITCONTROL = string Control for exit on fault detection (job, procedure); default job for batch jobs, proc for interactive

ZCALCULATION = expressions Specify the calculation of ZERO and DZBIMPLICIT

DZPCALCULATION = expressions Specify the calculation of DZBPARAMETER

ECALCULATION = expressions Specify the calculation of EXPLICIT, DEBPARAMETER and DEBIMPLICIT

 

Parameters

IMPLICIT = variate or pointer to scalars

Implicit functions

INITIAL = variate Initial values for IMPLICIT functions

LOWER = variate Lower bounds to IMPLICIT functions; default -1010

UPPER = variate Upper bounds to IMPLICIT functions; default +1010

VCOVARIANCE = symmetric matrix Variance-covariance matrix of parameter estimates

ZERO = variate Equations defining implicit functions (values calculated by ZCALCULATION)

DZBIMPLICIT = matrix First derivatives of equations ZERO with respect to implicit functions IMPLICIT (values calculated by ZCALCULATION); rows correspond to ZERO, columns correspond to IMPLICIT

DZBPARAMETER = matrix First derivatives of equations ZERO with respect to parameters (must not be set for NPARAMETER=0; values calculated by DZPCALCULATION); rows correspond to ZERO, columns to parameters

DIBPARAMETER = matrix First derivatives of IMPLICIT functions with respect to parameters (must not be set for NPARAMETER=0); rows correspond to IMPLICIT, columns correspond to parameters

EXPLICIT = variate or pointer to scalars

Explicit functions of parameters and/or implicit functions (values calculated by ECALCULATION)

DEBPARAMETER = matrix First partial derivatives of EXPLICIT functions with respect to parameters (values calculated by ECALCULATION); rows correspond to EXPLICIT, columns correspond to parameters

DEBIMPLICIT = matrix First partial derivatives of EXPLICIT functions with respect to IMPLICIT functions (values calculated by ECALCULATION); rows correspond to EXPLICIT, columns correspond to IMPLICIT

DFBPARAMETER = matrix First derivatives of ESTIMATES with respect to parameters; rows correspond to ESTIMATES, columns correspond to parameters

ESTIMATES = variate Estimates of IMPLICIT and EXPLICIT functions

SE = variate Standard errors of ESTIMATES

CORRELATIONS = symmetric matrix

Correlation matrix of ESTIMATES

FCOVARIANCE = symmetric matrix Variance-covariance matrix of ESTIMATES

 

Description

IFUNCTION solves implicit equations of functions of parameters. The equations are specified by the variate ZERO, the ith element defining the ith equation in terms of the IMPLICIT functions. The parameters ZERO and IMPLICIT must be of the same length (n), IMPLICIT being either a variate or a pointer to n scalars. The option ZCALCULATION supplies expressions for the calculation of both ZERO and the n by n matrix DZBIMPLICIT of first derivatives of ZERO with respect to the IMPLICIT functions. The element in the ith row and jth column of DZBIMPLICIT is the (partial) derivative of the ith element of ZERO with respect to the jth element of IMPLICIT. DZBIMPLICIT is initialized to zero and hence only non-zero elements need be calculated by ZCALCULATION.

The values of the IMPLICIT functions satisfying ZERO = 0 are obtained iteratively. Initial values may be given as a variate in the parameter INITIAL. If INITIAL is not set any current values of IMPLICIT are used as initial values. Output is controlled by the PRINT option. The option NOMESSAGE allows warning messages to be suppressed. The option MAXCYCLE and the parameters LOWER and UPPER are similar in their effect to their use in the RCYCLE directive. The option STRINGENCY controls the stringency with which tests for convergence are applied, higher values being more stringent. The option EXITCONTROL controls the action on fault detection. IFUNCTION may be used to solve n simultaneous nonlinear equations in n unknowns (the IMPLICIT functions) by not setting the NPARAMETER option (or setting it to zero). More generally, the variate ZERO is a function of both the IMPLICIT functions and NPARAMETER parameter estimates from a model previously fitted using FIT, FITCURVE or FITNONLINEAR. The DZPCALCULATION option supplies expressions for calculation of the n by NPARAMETER matrix DZBPARAMETER of (partial) derivatives of ZERO with respect to the model parameters (only non-zero elements need be calculated).

In addition (or instead) m explicit functions of the model parameters and/or the IMPLICIT functions may be specified by the parameter EXPLICIT, a variate of length m or a pointer to m scalars. The (partial) derivatives of the EXPLICIT functions with respect to the model parameters are given by the m by NPARAMETER matrix DEBPARAMETER and the (partial) derivatives with respect to the IMPLICIT functions by the m by n matrix DEBIMPLICIT. If either of these matrices is not set, then it is taken to be zero (i.e. the EXPLICIT functions do not depend on the model parameters or the IMPLICIT functions respectively). Expressions for calculating EXPLICIT, DEBPARAMETER and DEBIMPLICIT are supplied by the option ECALCULATION, the two matrices being initialized to zero and hence only their non-zero elements need be calculated. For EXPLICIT functions dependent on model parameters only (i.e. not on any IMPLICIT functions), ECALCULATION need not be set, in which case their values must be supplied by EXPLICIT and their (partial) derivatives with respect to model parameters by DEBPARAMETER on entry to IFUNCTION.

The parameters ZERO, DZBIMPLICIT, DZBPARAMETER, DEBPARAMETER and DEBIMPLICIT entering into the calculations ZCALCULATION, DZPCALCULATION and ECALCULATION need not be declared before using IFUNCTION. If they are declared they must have the correct attributes. The only exception to this is when derivatives of the EXPLICIT functions are supplied directly in the matrix DEBPARAMETER rather than obtained by calculations using ECALCULATION.

It is essential that the expressions for calculating DZBIMPLICIT are formulated correctly. If they are not, faults such as divergence of the optimization algorithm or estimates becoming out of bounds may be detected and reported. Fault CA16 may also be caused by incorrectly calculating DZBIMPLICIT as a singular matrix.

The variance-covariance matrix of the fitted parameters is supplied by the parameter VCOVARIANCE containing the variance-covariance matrix from a previous FIT, FITCURVE or FITNONLINEAR.

Estimates of all n+m functions (n IMPLICIT and m EXPLICIT functions of parameters) are saved by the parameter ESTIMATES. Their derivatives with respect to the model parameters are saved by the parameter DFBPARAMETER. Their variance-covariance matrix is saved by the parameter FCOVARIANCE. The standard errors of, and correlations between, the ESTIMATES are saved by the parameters SE and CORRELATIONS.

 

Options: PRINT, NOMESSAGE, NPARAMETER, MAXCYCLE, STRINGENCY, EXITCONTROL, ZCALCULATION, DZPCALCULATION, ECALCULATION.

Parameters: IMPLICIT, INITIAL, LOWER, UPPER, VCOVARIANCE, ZERO, DZBIMPLICIT, DZBPARAMETER, DIBPARAMETER, EXPLICIT, DEBPARAMETER, DEBIMPLICIT, DFBPARAMETER, ESTIMATES, SE, CORRELATIONS, FCOVARIANCE.

 

Method

The implicit functions are calculated by solving the simultaneous equations ZERO = 0 iteratively using Newton-Raphson. It is assumed that a solution exists and that the initial values are sufficiently close to a solution for the optimization to converge. Poor initial values can lead to divergence. A warning message is given when divergence is detected. Reasonable initial values may be obtained by using FITNONLINEAR to minimize the function k ´  MAX( ABS(ZERO) ), with k equal to a large number such as 106.

A maximum of three convergence criteria may be employed. They are:

(i) the Increment criterion defined as MAX( ABS(Inc) / MAX( ABS(IMPLICIT), 1 ) ), where Inc is the variate of implicit function increments in the iterative process,

(ii) the Zero criterion defined as MAX( ABS(ZERO) / Scaling-variate ) where the Scaling-variate is the greater of the maximum value of ZERO over all cycles of the iterative process and 0.0001, and

(iii) the Gradient criterion defined as ABS( T(Inc) *+ DZBIMPLICIT *+ Inc ).

The values of criterion (ii) may be highly dependent on the initial parameter values and criterion (iii) is of use primarily when the equations ZERO = 0 are derivatives of a scalar function and DZBIMPLICIT is the matrix of second derivatives of the function.

Convergence is completed when criterion (i) cannot be further reduced. However the iterative process continues searching for lower values until other criteria cannot be further reduced. The criteria involved are determined by the STRINGENCY option. For STRINGENCY = 0 or 1 only criterion (i) is used. For STRINGENCY = 2 or 3 criterion (ii) is also used. STRINGENCY = 1 or 3 requires convergence at two successive iterations. For STRINGENCY = 4 or 5 all criteria are used, STRINGENCY = 5 requiring convergence of both criteria (i) and (ii) at two successive iterations. Higher values of STRINGENCY require convergence of all three criteria at increasing numbers of successive iterations.

The default STRINGENCY value of 5 is recommended at least until the expressions for calculations are validated. Low values may give convergence at incorrect values of the implicit functions, particularly with poor INITIAL values when the equations ZERO are not approximately linear. High values will often result in an unneccessarily large number of iterations. IFUNCTION calculates the matrix DIBPARAMETER of derivatives of the implicit functions with respect to the model parameters (Marsden, 1984, page 211). The matrices DEBPARAMETER and DEBIMPLICIT of partial derivatives of any explicit functions with respect to the model parameters and the implicit functions respectively are evaluated using expressions supplied in ECALCULATION. By the chain rule, the derivatives of the explicit functions with respect to the parameters are given by

DEBPARAMETER + ( DEBIMPLICIT *+ DIBPARAMETER ).

This matrix is appended to DIBPARAMETER to form the n+m by NPARAMETER matrix DFBPARAMETER of derivatives of the length n+m variate

ESTIMATES = !( #IMPLICIT, #EXPLICIT )

with respect to the model parameters.

The variance-covariance matrix of model parameters resulting from a previous FIT, FITCURVE or FITNONLINEAR is supplied by the parameter VCOVARIANCE, and the variance-covariance matrix of the ESTIMATES of both the implicit and explicit functions is computed as

FCOVARIANCE = QPRODUCT(DFBPARAMETER; VCOVARIANCE).

 

Action with RESTRICT

None of the parameters of IFUNCTION may be restricted.

 

Reference

Marsden, J.E. (1984). Elementary Classical Analysis. W.H. Freeman and Company, San Francisco.

 

INSIDE procedure

Determines whether points lie within a specified polygon

(S.A. Harding)

 

Option

TOLERANCE = scalar Value used for testing against zero; default 10-4

 

Parameters

Y = variates Y coordinates of points

X = variates X coordinates of points

YPOLYGON = variates Y coordinates of polygon

XPOLYGON = variates X coordinates of polygon

INSIDE = variates Indicate whether points are inside (1) the polygon, outside (-1) or on an edge (0)

 

Description

INSIDE takes a set of points whose x and y coordinates are specified by the X and Y parameters and determines which of these lie inside the polygon whose vertices are specified by the XPOLYGON and YPOLOGON parameters. This procedure is primarily intended for use with high-resolution graphics. It allows subsets of plotted points to be identified according to their spatial relationships so that they can be redrawn or deleted.

The output is in the form of a variate, specified by the INSIDE parameter. This will contain the value 1 for points that are located inside the polygon, 0 for those on an edge, and -1 for those outside the polygon. It can thus be used in RESTRICT, for example, to identify subsets of the values.

Usually the polygon will be defined by several points. Closure is assumed, so the last point need not be the same as the first. The polygon need not be convex. If only two points are given these are interpreted as diagonally opposite corners of a rectangle (thus maintaining compatibility with the "rubber-rectangle" type of input cursor of DREAD).

 

Options: TOLERANCE.

Parameters: Y, X, YPOLYGON, XPOLYGON, INSIDE.

 

Method

The method used is essentially that of Shimrat (1962). The algorithm counts the number of edges for which a point lies within the y-range and to the left. If this is an odd number the point must lie within the polygon. A separate check is made for points that lie on the boundary.

 

Action with RESTRICT

If either Y or X variate is restricted, only the restricted set of points is checked for inclusion in the polygon. Any points omitted by a restriction will be identified as lying outside the polygon. Restrictions are removed from YPOLYGON and XPOLYGON.

 

Reference

Shimrat, M. (1962). Position of point relative to polygon, CACM Algorithm 112, Comm. ACM, Aug. 1962.

 

INVNORMAL procedure

Calculates probabilities from the inverse normal distribution

(A. Keen)

 

Options

PRINT = string Printed output required (cumprobability); default * i.e. no printing

MU = identifier Mean of the inverse normal distribution; no default

SIGMA = identifier Standard deviation of the inverse normal distribution; no default

 

Parameters

X = identifiers The constant(s) x for which probabilities are required; no default

CUMPROBABILITY = identifiers To save the cumulative probabilities

 

Description

INVNORMAL calculates the probability that a random variable with the inverse normal distribution is less than a constant x.

The x-values are specified using the parameter X. The mean and standard deviation of the inverse normal distribution must be specified by options MU and SIGMA respectively. Options MU and SIGMA and parameter X can be set to any numerical structure. Non-scalar structures must contain the same number of values.

Output is controlled by the PRINT option, with setting cumprobability to print the cumulative probabilities Pr(X £ x). These probabilities can be saved by setting the CUMPROBABILITY parameter. The default type of CUMPROBABILITY is the same as that of X unless X is scalar and MU or SIGMA is non-scalar, in which case the type will be that of MU or SIGMA.

 

Options: PRINT, MU, SIGMA.

Parameters: X, CUMPROBABILITY.

 

Method

The probabilities are calculated according to the method given in Johnson & Kotz (1970), formula 16, page 141, using the function NORMAL. If the shape parameter of the inverse normal distribution is large, then the second part of the formula is approximated by the expression given in Section 26.2.12 of Abramowitz & Stegun (1972), taking an appropriate number of terms in the series expansion.

Usually the inverse normal distribution is characterized by the parameters m and g (see, for example, Johnson & Kotz 1970). The relation between g and s is s2 = m3 / g .

 

Action with RESTRICT

Restrictions are not allowed.

 

References

Johnson, N.L. & Kotz, S.K. (1970). Continuous Univariate Distributions - 1. Houghton Mifflin Company: Boston.

Abramowitz, M. & Stegun, I.A. (1972). Handbook of Mathematical Functions. Dover Publications: New York.

 

JACKKNIFE procedure

Produces Jackknife estimates and standard errors

(R.W. Payne)

 

Options

PRINT = string Controls printed output (estimates, vcovariance); default esti

DATA = variates, factors or texts Data vectors from which the statistics are to be calculated

ANCILLARY = any type Other relevant information needed to calculate the statistics

VCOVARIANCE = symmetric matrix Saves the variance-covariance matrix for the statistics

 

Parameters

LABEL = texts Texts, each containing a single line, to label the statistics

ESTIMATE = scalars Saves the Jackknife estimate for each statistic

SE = scalars Saves Jackknife estimates of the standard errors

PSEUDOVALUES = variates Saves the Jackknife pseudo-values

 

Description

The Jackknife provides a way of decreasing bias and obtaining standard errors in situations where the standard methods might be expected to be inappropriate. The basic form of the Jackknife method works by calculating the statistic (or statistics) of interest omitting each data value in turn. Thus, if there are n data values, n "partial estimates" T-1 ... T-n are obtained (where T-j is the estimate omitting value j). These are combined with the estimate T obtained from all the data, to produce n pseudo-values:

Pj = n ´ T - (n - 1) ´ T-j : j = 1 ... n

The Jackknife estimate of the statistic is given by the mean of the pseudo-values, and the standard error by the standard error of the mean of the pseudo-values.

The Jackknife can be shown to eliminate the term proportional to 1/n from a bias of the form

T = t + a/n + O(1/n2)

where t is the true value of the estimate and O(1/n2) is a term of order one divided by the square of the number of observations (Quenouille 1956). However, it is not appropriate in all situations. In particular the statistic needs to be "smooth" (small changes in the data set should cause only small changes in the statistic); it will not work for example with medians or order statistics. Further details and advice are given by Miller (1974), Bissell & Ferguson (1975), Hinkley (1983) and Efron & Tibshirani (1993).

The data for JACKKNIFE are provided as a list of vectors (variates, factors or texts) using the DATA option. From this, new vectors are formed omitting each unit of the original vectors in turn, and a subsidiary procedure RESAMPLE is called to calculate the statistics. Other relevant information can be provided for passing to RESAMPLE, in any type of data structure, using the ANCILLARY option. To use JACKKNIFE, you need to provide a version of RESAMPLE to calculate the particular statistics that you require. The default RESAMPLE procedure, which accompanies JACKKNIFE in the library, merely prints details of the syntax (also described in the Methods Section).

A label should be provided for each statistic, using the LABEL parameter; by default, there is assumed to be a single statistic labelled simply as Statistic. The estimates, their standard errors and variates of corresponding pseudo-values for each statistic can be saved by the ESTIMATE, SE and PSEUDOVALUES parameters, respectively. Also, if there is more than one statistic, a variance-covariance matrix can be saved for the estimates using the VCOVARIANCE option.

Printed output is controlled by the PRINT option, with settings estimates for the estimates and their standard errors, and vcovariance for the variance-covariance matrix; by default PRINT=estimates.

 

Options: PRINT, DATA, ANCILLARY, VCOVARIANCE.

Parameters: LABEL, ESTIMATE, SE, PSEUDOVALUES.

 

Method

The original papers describing the Jackknife technique are by Quenouille (1949, 1956) and by Tukey (1958). Good expository accounts are provided by Hinkley (1983) or Bissell & Ferguson (1975).

JACKKNIFE needs a subsidiary procedure RESAMPLE to calculate the statistics of interest. RESAMPLE has an option, DATA, which is used to supply the data vectors (variates, factors or texts) from which the statistics are to be calculated. (On the first occasion that RESAMPLE is called, these will be the original vectors as supplied to JACKNIFE, in order to calculate the estimate T; subsequently, they will be new vectors containing all except one of the units.) Other relevant information can can be supplied through the ANCILLARY option, which corresponds to the ANCILLARY option of JACKKNIFE itself. RESAMPLE can be called by the BOOTSTRAP procedure, and it then also has an AUXILIARY option, but this is not relevant to JACKKNIFE.

There are two parameters: STATISTICS supplies a list of scalars to store the estimates of each statistic, and EXIT a list of scalars which should be set to zero or one according to whether or not each statistic could be estimated successfully with the supplied data vectors. If the value of EXIT is not calculated in RESAMPLE, JACKKNIFE assumes that the calculations succeeded. This example shows a version of RESAMPLE which calculates the correlation between two variates.

PROCEDURE [PARAMETER=pointer] 'RESAMPLE'

OPTION 'DATA', " (I: variates, factors or texts) data

vectors from which to calculate the

statistics; no default"\

'ANCILLARY'; " (I: any type of structure) other relevant

information needed to calculate the

statistics "\

MODE=p; TYPE=!t(variate,factor,text),*; SET=yes,no; \

LIST=yes; DECLARED=yes; PRESENT=yes

PARAMETER 'STATISTIC', " (O: scalars) to save the calculated

statistics "\

'EXIT'; " (O: scalars) to save an exit code to

indicate failure (EXIT[i]=1) or success

(EXIT[i]=0) when calculating each

STATISTIC[i]"\

MODE=p; TYPE='scalar'; SET=yes

 

CALCULATE STATISTIC[1] = CORRELATION(DATA[1]; DATA[2])

& EXIT[1] = STATISTIC[1]==C('missing')

 

ENDPROCEDURE

Action with RESTRICT

If any of the data vectors is restricted, JACKKNIFE will use only the units that are not restricted for any of the vectors.

 

References

Bissell, A.F. & Ferguson, R.A. (1975). The jackknife - toy, tool or two-edged weapon. The Statistician, 24, 79-100.

Efron, B. & Tibshirani, R.J. (1993). An Introduction to the Bootstrap. Chapman and Hall, London.

Hinkley, D. (1983). Jackknife methods. In: Encyclopedia of Statistics, Volume 4 (ed: S. Kotz, N.L. Johnson & C.B. Read). Wiley, New York.

Miller, R.G. (1974). The jackknife - a review. Biometrika, 61, 1-15.

Quenouille, M.H. (1949). Approximate tests of correlation in time series. Journal of the Royal Statistical Society B, 11, 18-44.

Quenouille, M.H. (1956). Notes on bias in estimation. Biometrika, 61, 353-360.

 

KAPLANMEIER procedure

Calculates the Kaplan-Meier estimate of the survivor function

(J.T.N.M. Thissen)

 

Options

PRINT = strings Whether to print the estimates or to display the Kaplan-Meier estimate in a graph (estimate, graph); default esti, grap

GRAPHICS = string Type of graphics to use (lineprinter, highresolution); default high

TITLE = text General title for the graph; default *

WINDOW = scalar Window number for the high-resolution graph; default 1

KEYWINDOW = scalar Window number for the key (zero for no key); default 2

SCREEN = string Whether to clear the screen before plotting or to continue plotting on the old screen (clear, keep); default clea

PROBABILITY = scalar Probability level of the confidence interval for the Kaplan-Meier estimates; default 0.95

XLOWER = scalar Lower bound for x-axis; default 0

XUPPER = scalar Upper bound for x-axis; default * i.e. a value slightly larger than the maximum of the TIME parameter (or EVENT parameter if TIME is not set) is used

 

Parameters

TIME = variates Observed timepoints

CENSORED = variates Variate specifying whether the corresponding element of TIME is censored (1) or not (0); default is to assume no censoring

GROUPS = factors Factor specifying the different groups for which the survivor function is estimated

EVENT = variates Saves the distinct TIME values when TIME is set; otherwise supplies an input variate specifying the endpoint of each interval

NDEATH = variates Saves the number of deaths at each EVENT when TIME is set; otherwise supplies an input variate specifying the number of deaths in each interval

NATRISK = variates Saves the number of units at risk at each EVENT when TIME is set; otherwise supplies an input variate with the number of deaths in each interval

ESTIMATE = variates Saves the Kaplan-Meier estimates of the survivor function

NEWGROUPS = factors Saves the grouping of the EVENT, NDEATH, NATRISK and ESTIMATE variates when TIME is set

 

Description

Survival data are data in which the response variate is the lifetime of a component or the survival time of a patient. Typically these are censored, i.e. the survival time of some units is unknown at the end of the study. The survivor function F(t) is a key element in the analysis of survival data. It is defined as the probability of an individual still surviving at time t. KAPLANMEIER calculates the Kaplan-Meier estimate of the survivor function for two different types of data.

The first type of data occurs when all timepoints are accurately observed. The observed timepoints or the timepoints at which censoring took place are then specified using the TIME parameter. The CENSORED variate contains values 0 and 1 to specify whether the corresponding element of TIME is censored (1) or not (0); if there was no censoring, this need not be set. The GROUPS parameter can be used to specify a factor to indicate different groups whose survivor functions are to be estimated separately. The distinct TIME values can be saved using the EVENT parameter, and the number of deaths and the number of units at risk at each individual EVENT can be saved using parameters NDEATH and NATRISK respectively. The Kaplan-Meier estimate can be saved with the ESTIMATE parameter. The NEWGROUPS parameter can save a factor indicating the group structure of the output variates.

The second type of data is relevant when the units are observed at the end of time-intervals. The exact times are then unknown and input should be specified using parameters EVENT, NDEATH, NATRISK. These specify the timepoints, number of deaths and number of risk at the end of each interval. The GROUPS parameter can again be used to request separate group estimates.

The PRINT option controls output. Setting PRINT=estimate prints the events, number of deaths, number of units at risk and the Kaplan-Meier estimate with a confidence interval. The probability level for the interval can be set using the PROBABILITY option; by default this is 0.95. PRINT=graph plots the Kaplan-Meier estimate against the time points. If GRAPHICS=highresolution different lines are drawn for different groups, whereas GRAPHICS=lineprinter produces separate graphs for the different groups. Lower and upper bounds for the x-axis can be set by options XLOWER and XUPPER, the TITLE option can specify a title for the plots. Options WINDOW and KEYWINDOW control the windows used for high-resolution graphs.

 

Options: PRINT, GRAPHICS, TITLE, WINDOW, KEYWINDOW, SCREEN, PROBABILITY, XLOWER, XUPPER.

Parameters: TIME, CENSORED, GROUPS, EVENT, NDEATH, NATRISK, ESTIMATE, NEWGROUPS.

 

Method

When TIME is set, the Kaplan-Meier estimate is calculated according to equation (1.10) in Kalbfleisch & Prentice (1980). When TIME is not set, the Kaplan-Meier estimate is directly calculated from the variates specified by EVENT, NDEATH and NATRISK.

 

Action with RESTRICT

The input variates and factor GROUPS may be restricted identically. The Kaplan-Meier estimate is based only on the units not excluded by the restriction.

 

Reference

Kalbfleisch, J.D. & Prentice, R.L. (1980). The Statistical Analysis of Failure Time Data. Wiley. New York.

 

KAPPA procedure

Calculates a kappa coefficient of agreement for nominally scaled data

(A.J. Rook)

 

Option

PRINT = string Whether to print kappa and its associated information (test); default test

 

Parameters

DATA = tables Data sets, each consisting of an object ´ category table whose entries are the number of judges assigning the ith object to the jth category

STATISTIC = scalars Save the value of kappa for each data table

VARIANCE = scalars Save the corresponding variances

 

Description

The kappa coefficient provides a way of assessing the agreement between judges who have rated a set of N objects or subjects using a nominal scale: that is, each judge has allocated each object to one of M different categories. The data for KAPPA, specified by the DATA parameter, consists of an N ´ M table whose entries indicate the number of judges that have assigned the ith object to the jth category. This must not contain any missing values and all the row totals must be equal.

Kappa takes the value one when there is complete agreement and zero when there is none (except that expected by chance). The printing of the test statistic and its associated information is controlled by the PRINT option. With the default, test, the procedure prints the actual and expected proportion of times that the judges agree, the resulting value of kappa and its variance. When N is large, the sampling distribution of kappa is approximately normal. The procedure thus also prints the value of kappa divided by the variance, and its probability assuming a normal distribution. A warning is printed if N is less than 20.

The STATISTIC and VARIANCE parameters allow kappa and its variance to be saved, in scalars.

 

Options: PRINT. Parameter: DATA, STATISTIC, VARIANCE.

 

Method

The method used is that of Siegel & Castellan (1988, pages 284-291).

 

Reference

Siegel, S. & Castellan, N. J. (1988). Nonparametric statistics for the Behavioural Sciences, 2nd Edition. McGraw-Hill, Singapore.

 

KOLMOG2 procedure

Performs a Kolmogorov-Smirnoff two-sample test

(S.J. Welham, N.M. Maclaren & H.R. Simpson)

 

Options

PRINT = strings Output required (test, differences, ranks): test gives the test statistic, differences gives signed differences, and ranks produces the ranks for each sample; default test

GROUPS = factor Defines the groups for a two-sample test if only the Y1 parameter is specified

 

Parameters

Y1 = variates Identifier of the variate holding the first sample

Y2 = variates Identifier of the variate holding the second sample

R1 = variates Saves the ranks of the first sample

R2 = variates Saves the ranks of the second sample

STATISTIC = scalars Scalar to save the test statistic (the maximum absolute difference between the cumulative distribution functions)

CHISQUARE = scalars Scalar to save the chi-square approximation to the test statistic

DIFFERENCES = variates Variate to save the signed differences between the cumulative distribution functions

 

Description

The Kolmogorov-Smirnoff test assesses the similarity between the underlying distributions of the two samples, by comparing their cumulative distribution functions; the test statistic is the maximum absolute difference between the cumulative distribution functions. The samples can either be specified in two separate variates using the parameters Y1 and Y2. Alternatively, they can be given in a single variate, with the GROUPS option set to a factor to identify the samples. The GROUPS option is ignored when the Y2 parameter is set.

Output from the procedure is controlled by the PRINT option: test prints the relevant test statistic, differences prints the signed differences, and ranks prints a vector of ranks for each of the samples.

The test statistic and its chi-square approximation can be saved using the parameters STATISTIC and CHISQUARE respectively. The parameter DIFFERENCES can be used to save the differences between the cumulative distributions. The R1 and R2 parameters allow the ranks of the samples to be saved.

 

Options: PRINT, GROUPS.

Parameters: Y1, Y2, R1, R2, STATISTIC, CHISQUARE, DIFFERENCES.

 

Method

The Kolmogorov-Smirnoff two sample test is a test of the null hypothesis that the two samples arise from the same distribution, against the alternative that the underlying distributions are different. The test compares the two empirical cumulative distribution functions in order to try and detect differences in shape of the underlying distributions. The cumulative distribution functions S1 and S2 are formed by

Sk(X) = ( number of scores in sample k £ X ) / ( size of sample k )

for k=1,2; and a suitable set of points X. The procedure uses the set of values taken by one or other of the samples, i.e. {X: X is in DATA}. The maximum absolute difference

MD = max( abs { S1(X) - S2(X) } )

is used as the basis for significance tests. The chi-square approximation (2 degrees of freedom) to this statistic is CH:

CH = MD ´ 4 ´ (n1´ n2) / (n1+n2) )

where n1, n2 are the sizes of the samples. (See for example Siegel 1956, pages 127-136.)

 

Action with RESTRICT

The variates in DATA can be restricted, and in different ways. KOLMOG2 uses only those units of each variate that are not excluded by their respective restrictions.

 

Reference

Siegel, S. (1956). Nonparametric Statistics for the behavioural sciences. McGraw-Hill, New York.

 

KRUSKAL procedure

Carries out a Kruskal-Wallis one-way analysis of variance

(S.J. Welham, N.M. Maclaren & H.R. Simpson)

 

Options

PRINT = strings Output required (test, ranks): test produces the relevant test statistics, ranks produces a vector of ranks for each sample relative to the whole data set; default test

GROUPS = factor Defines the sample membership if only one variate is specified by DATA

STATISTIC = scalar Scalar to save the Kruskal-Wallis test statistic

MEANRANKS = variate Variate to save the mean ranks of the samples

DF = scalar Scalar to save the degrees of freedom for the statistic

 

Parameters

DATA = variates List of variates containing the data for each sample, or a single variate containing the data from all the samples (the GROUPS option must then be set to indicate the sample to which each unit belongs)

RANKS = variates Allow the ranks to be saved (relative to the combined data)

 

Description

KRUSKAL carries out a Kruskal-Wallis one-way analysis of variance on the ranks (relative to the whole data set) of a set of samples. The samples can be stored in different variates and supplied as a list in the DATA pointer. Alternatively, they can all be placed in a single variate, and the GROUPS option set to a factor to indicate the sample to which each unit belongs. Output from the procedure is controlled by the PRINT option: test (the default setting) prints the relevant test statistics, and ranks prints the vector of ranks for each sample.

The test statistic, vector of mean ranks and degrees of freedom can be saved using the STATISTIC, MEANRANKS and DF options, respectively. Parameter RANKS can be set to a variate, or variates, to store the ranks of the data relative to the whole data set.

 

Options: PRINT, GROUPS, STATISTIC, MEANRANKS, DF.

Parameters: DATA, RANKS.

 

Method

The Kruskal-Wallis One-Way Analysis of Variance is used to test the hypothesis that several (K) samples come from distributions with the same mean. The test statistic H, is formed by ranking the combined data set, then considering the sum of these ranks within each sample:

H = [ (12 / N´ (N+1)) ´ S j=1...K { Rj´ Rj/nj } ] - 3´ (N+1)

where Rj is the sum of ranks for the jth sample,

nj is the size of the jth sample, and

N is the size of the combined data set.

If ties are present in the data, then an adjustment to the statistic H is required:

adjusted H = H /( 1 - S k { tk3-tk }/(N3-N) )

where tk is the number of observations with rank k. (See for example Siegel 1956, pages 184-193.)

When there are at least five cases in each of the samples, H has approximately a Chi-square distribution on K-1 degrees of freedom. When this condition is not satisfied, and there are three samples, KRUSKAL uses a table of calculated values of the distribution of the statistic.

 

Action with RESTRICT

The variates in DATA can be restricted, and in different ways. KRUSKAL uses only those units of each variate that are not excluded by their respective restrictions.

 

Reference

Siegel, S. (1956). Nonparametric Statistics for the behavioural sciences. McGraw-Hill, New York.

 

LATTICE procedure

Analyses square and rectangular lattice designs

(K. Ryder, E.R. Williams & D. Ratcliff)

 

Options

PRINT = strings Controls output from the ANOVA analysis, values as for ANOVA; default aovt

TREATMENTS = factor Factor defining the treatments in the design; must be specified

REPLICATES = factor Factor defining the replicates within the repeats of the basic design; must be specified

BLOCKS = factor Factor defining the blocks within replicates; must be specified

REPEATS = factor Factor to specify the number of complete repeats of the basic design; this may be omitted if the design is repeated only once

NOTCOMBINED = string Can suppress printing of the combined means (no, yes); default no

 

Parameters

Y = variates Variates to be analysed

RESIDUALS = variates Variates to store the residuals from the analyses

FITTEDVALUES = variates Variates to store the fitted values from the analyse

MEANSCOMBINED = variates Variates to store the treatment means combining between- and within-block information

 

Description

Procedure LATTICE analyses either square or rectangular lattice designs with any number of replicates and repeats of the basic design (Cochran & Cox 1957, Chapter 10). The procedure first produces the standard ANOVA analysis (see Genstat 5 Release 3 Reference Manual, pages 520-522), and then forms estimates of the treatment means combining between- and within-block information, using the method of Williams & Ratcliff (1980).

The model and design are specified by the options of the procedure. The TREATMENTS option specifies the treatment factor. The REPEATS option specifies a factor to identify repeats of the basic design; if the design is repeated only once, this need not be specified. The structure of each repeat of the basic design is specified by options REPLICATES and BLOCKS, defining the replicates and the blocks within each replicate respectively.

The variate to be analysed is specified by the Y parameter. Other parameters allow the fitted values, the residuals and the combined estimates of the treatment means to be saved.

Output is controlled by options PRINT and NOTCOMBINED, for the standard ANOVA output and for the (suppression of the) printing of the combined means, respectively.

 

Options: PRINT, TREATMENTS, REPLICATES, BLOCKS, REPEATS, NOTCOMBINED.

Parameters: Y, RESIDUALS, FITTEDVALUES, MEANSCOMBINED.

 

Method

The procedure first analyses the design using pseudo-factors, as explained on pages 520-522 of the Genstat 5 Release 3 Reference Manual. It then uses the method of Williams & Ratcliff (1980) to produce estimates of the treatment means combining between- and within-block information.

 

Action with RESTRICT

If a Y variate is restricted, its analysis will be restricted accordingly.

 

References

Cochran, W.G. & Cox, G. (1957). Experimental Designs (2nd Edition). Wiley, New York.

Williams, E.R. & Ratcliff, D. (1980). A note on the analysis of lattice designs with repeats. Biometrika, 67, 706-708.

 

LIBEXAMPLE procedure

Accesses examples and source code of library procedures

(R.W. Payne)

 

No options

 

Parameters

PROCEDURE = texts Single-valued texts indicating the procedures about which the information is required

EXAMPLE = texts Identifiers of text structures to store the example for each procedure

SOURCE = texts Identifiers of text structures to store the source code of each procedure

 

Description

LIBEXAMPLE allows you to obtain an example of the use of any procedure in the Genstat 5 Procedure Library, also to access the source code of any procedure, so that you can see how it works, or modify it. The names of procedures for which examples or source code are required should be listed, in quotes, using the PROCEDURE parameter. The EXAMPLE parameter can be used to specify the identifier of a text to store each example, and the SOURCE parameter to specify texts to store the source code. The examples can then be run (as macros) using the operator ##. Thus,

LIBEXAMPLE 'PERCENT'; EXAMPLE=%Ex

##%Ex

would put an example of how to use PERCENT into the text %Ex, and then run it.

The examples and source are stored in backing-store files whose names are defined by Library procedure LIBFILENAME; there must be a free backing-store channel to which the files can be attached. A file can also be defined to supply information about procedures in a local library, and LIBEXAMPLE will then look there first so that any local examples are taken in preference to those for the main library. The file can be formed using procedure FLIBHELP.

 

Options: none. Parameters: PROCEDURE, EXAMPLE, SOURCE.

 

Method

The examples are held in the same backing-store file that holds the other help information about Library procedures; the name of the file is supplied by procedure LIBFILENAME. This file is opened on the first available backing-store channel; if all the channels are in use, the procedures stops with a diagnostic. Within the help file, the information about each procedure is stored in a subfile whose identifier is the same as the name of the procedure itself. The example is stored in a text with identifier Help['example']. After the required information has been brought back from backing store, the help file is closed. The source code of the procedures is stored in a separate backing-store, and accessed in a similar way.

 

LIBFILENAME procedure

Supplies the names of information files for library procedures

(R.W. Payne)

 

No options

 

Parameters

FILENAME = texts Text in which to store the name of the backing-store file containing the help information for the Procedure Library

CONTENTS = strings Indicates which file is required (help, procedure, adesign, afraction, acyclic, agenerator); default help

PROCEDURE = texts Name of the procedure for which information is required; default * assumes that it is a procedure in the Genstat rather than the local library

 

Description

The help information for procedures in the Genstat 5 Procedure Library is stored in a backing-store file. Procedures such as LIBHELP and LIBINFORM open the file on the first free backing-store channel, read and print the required information, and then close the file again. For flexibility, these procedures all call LIBFILENAME to ascertain the name of the file. A help information file can also be formed, using procedure FLIBHELP, for the local procedure library. If the PROCEDURE option is set, LIBFILENAME returns the name of the first file which contains information about that procedure, looking first in the file for local library and then in that for the Genstat library. (By default a null local file is supplied with Genstat, containing no information.) Also, LIBEXAMPLE has a file containing the source code of the library procedures, and the procedures of the Genstat Design System have files containing information for the designs that can be generated. Thus, if the location of any file needs to be changed at a particular site, only LIBFILENAME needs to be modified.

 

Options: none. Parameters: FILENAME, CONTENTS, PROCEDURE.

 

Method

The procedure contains a text structure containing the various filenames, and the POSITION function of CALCULATE is used to set FILENAME to the appropriate one.

 

Action with RESTRICT

Any restriction on the FILENAME text will be cancelled.

 

LIBHELP procedure

Provides help information about library procedures

(R.W. Payne)

 

Option

PRINT = strings Indicates what information is required about each procedure (index, description, options, parameters, method, restrict, calls, similar, authors, references, module, history, errors); default desc

 

Parameter

PROCEDURE = texts Single-valued texts indicating the procedures about which the information is required; if this is not set, information is given about LIBHELP itself

 

Description

LIBHELP provides information about procedures in the Genstat 5 Procedure Library. It has a parameter, called PROCEDURE, which you use to indicate the procedures for which you want information; if PROCEDURE is not specified, information is given about LIBHELP itself. The names of the procedures should be given in quotes: for example

LIBHELP 'LIBINFORM'

will obtain information about the procedure LIBINFORM (you can use LIBINFORM to find out what procedures and modules are in the Library).

LIBHELP has a single option, called PRINT, with which you specify a list of strings to indicate what information you want about each procedure. The possible values, with explanations in brackets, are as follows: index (one-line description), description (full description), options (syntax of the options), parameters (syntax of the parameters), method (description of the method used), restrict (action when arguments are restricted), calls (list of procedures called by this procedure), similar (procedures with similar facilities), authors (list of authors), references (relevant publications), module (the Library module to which the procedure belongs), history (when accepted, modified &c.), errors (details of any reported errors).

The information is stored in a backing-store file whose name is defined by Library procedure LIBFILENAME; there must be a free backing-store to which the file can be attached. A second file can also be defined to supply information about procedures in a local library, and LIBHELP will then look there first so that any local details are taken in preference to those of the main library. The file can be formed using procedure FLIBHELP.

 

Option: PRINT. Parameter: PROCEDURE.

 

Method

The description of LIBHELP is held within LIBHELP itself, and is printed as a caption. Other information is obtained from a backing-store file, whose name is supplied by procedure LIBFILENAME. This file is opened on the first available backing-store channel; if all the channels are in use, the procedures stops with a diagnostic. Within the Help file, the information about each procedure is stored in a subfile whose identifier is the same as the name of the procedure. The sections of information are stored in separate text structures each with the suffixed identifier Help['section name']. After printing the requested sections, the file is closed.

 

LIBINFORM procedure

Prints information about the contents of the Procedure Library

(R.W. Payne)

 

Options

PRINT = strings Indicates what information is required about each module (contents, index, modules, errors); default cont, erro

LIBRARY = string Defines the library for which information is required (Genstat, local); default Gene

 

Parameter

MODULE = texts Single-valued texts indicating the modules about which the information is required; if this is not set, information is given about the whole library

 

Description

LIBINFORM provides information about the Genstat 5 Procedure Library or a local library. The MODULE parameter allows you to specify that information is required only for a specified list of modules of the library; if MODULE is not set, the information is given for the whole library. The name of each module should be given in a quoted string: for example

LIBINFORM [PRINT=index] 'AOV','MVA'

If the MODULE is not given in full, LIBINFORM identifies the first module (in alphabetic order) that matches the MODULE setting, up to the number of characters that has been specified.

The PRINT option specifies what information is required about each module. The possible values are as follows: contents list of procedures in the module or in the Library (see MODULE), index index lines for the procedures in the module/Library, errors list of procedures in the module/Library for which errors have been reported, modules list of modules in the Library (given only if MODULE is not set).

The information is stored in a backing-store file whose name is defined by library procedure LIBFILENAME; there must be a free backing-store to which the file can be attached. A second file may also have been defined to supply information about procedures in a local library, and you can then set option LIBRARY=local to print details about the local instead of the main library.

 

Option: PRINT, LIBRARY. Parameter: MODULE.

 

Method

The information is held in subfile _contents of the backing-store file that holds help for the library; the name of the file is supplied by procedure LIBFILENAME. This file is opened on the first available backing-store channel; if all the channels are in use, the procedures stops with a diagnostic. After printing the requested sections, the help file is closed.

 

LIBMANUAL procedure

Prints a "Manual" containing information about library procedures

(R.W. Payne)

 

Options

CHANNEL = scalar Channel to which to print the manual; default is to use the current output channel

REFERENCE = string Whether to print just reference information (no, yes); default no

INDENTATION = scalar Number of spaces to leave before the first column in each line; default 0

LIBRARY = string Defines the library for which information is required (Genstat, local); default Gene

 

Parameter

MODULE = text Modules to be included in the manual; by default the manual is for the whole library

 

Description

LIBMANUAL prints a manual containing information about procedures in the Genstat 5 Procedure Library. There is first a header page, with title and list of index lines giving brief details about the procedures. Then Help information is printed about each of the procedures in turn. LIBMANUAL takes account of the current environment (as controlled by the OUTPRINT option of SET) to decide whether to start each procedure on a new page. The information is stored in a backing-store file whose name is defined by Library procedure LIBFILENAME; there must be a free backing-store channel to which the file can be attached. A second file may also have been defined to supply information about procedures in the local library, and you can then set option LIBRARY=local to print a manual for the local instead of the main library.

Unless otherwise specified, the manual will contain every procedure in the library. However, there is a parameter, MODULES, which can be set to a text to indicate that only procedures in a particular set of modules should be included. Details of the modules in the library can be obtained using procedure LIBINFORM, and some procedures may belong to more than one. In particular, there are modules called PLn (where n is a positive integer 1, 2...) to indicate the procedures that were added in release PLn of the Library, PLnHELP to indicate the procedures whose help information was last modified in release PLn of the library, and PLnPROCEDURE to indicate those where the procedure itself was last modified in release PLn. Thus, a manual for the procedures that were new in Release PL6 can be obtained by putting

LIBMANUAL 'PL6'

The CHANNEL option specifies the output channel to which the Manual is to be printed; by default it is printed to the current output channel.

The REFERENCE option allows just a reference summary to be obtained, instead of the full information each procedure. Finally, the INDENTATION option can be used to indent the information by a specified number of columns, so that the manual can conveniently be put into a folder or binder.

 

Options: CHANNEL, REFERENCE, INDENTATION, LIBRARY. Parameter: MODULES.

 

Method

LIBMANUAL first prints a header, followed by a list of index lines describing the contents of the library. It then runs a loop over the procedures in the library, accessing and printing the Help information. This information is stored in the library Help file, the name of which is supplied by procedure LIBFILENAME. The file is opened on the first available backing-store channel; if all the channels are in use, the procedures stops with a diagnostic. Afterwards the channel is closed.

OUTPUT directives are executed at the start and end of the procedure, to switch output channels as requested by the CHANNEL option.

 

LIBVERSION procedure

Provides the name of the current Genstat 5 Procedure Library

(R.W. Payne)

 

Option

PRINT = string Controls printed output (release); default rele

 

Parameter

RELEASENAME = text Text in which to store the name of the currently available release of the Genstat 5 Procedure Library

 

Description

The Genstat 5 Procedure Library is updated independently of releases of the main Genstat program and the current release thus may not be immediately apparent. Consequently LIBVERSION is provided to allow users to obtain the name of the currently available release. The name is printed by default, but you can set option PRINT=* to suppress this. The RELEASENAME parameter allows the name, 'Genstat 5 Procedure Library Release ...' to be saved.

 

Options: none. Parameters: RELEASENAME.

 

Method

RELEASENAME is formed by an ordinary TEXT declaration.

 

Action with RESTRICT

Any restriction on the RELEASENAME text will be cancelled.

 

LINDEPENDENCE procedure

Finds the linear relations associated with matrix singularities

(J.H. Maindonald)

 

Option

PRINT = strings Printed output (dependent, coefficient); default depe

 

Parameters

DATA = symmetric matrices Specifies the positive semi-definite matrix for which the information is required

COEFFICIENTS = matrices Stores the coefficients of the linear dependencies

 

Description

Procedure LINDEPENDENCE takes a positive semi-definite matrix S (e.g. a matrix formed as X¢ X), and identifies any columns of S that are a linear combination of earlier columns. It determines the linear relations involved, and stores these in the columns of the matrix specified by the COEFFICIENTS parameter.

In more mathematical terms the output, stored as columns of COEFFICIENTS, is a basis for the null space of a positive semi-definite matrix S. If S = X¢ X, then this will also be a basis for the column space of X.

The first parameter, DATA, specifies the symmetric matrix S for which the information is required. The columns of the COEFFICIENTS matrix store the linear relations. This matrix will be defined automatically if it has not been declared earlier.

Printed output information on either which columns are dependent and/or what the coefficients for the dependencies are can be requested with the settings dependent and coefficient of the PRINT option. By default the dependent columns are printed.

 

Option: PRINT. Parameters: DATA, COEFFICIENTS.

 

Method

The matrix function CHOLESKI is used to determine a lower triangular matrix L such that LL¢ = S. Zeros on the diagonal of L identify columns of S that are a linear combination of earlier columns. The corresponding columns of L¢ form a matrix H. The algorithm then replaces zeros on the diagonal of L¢ by ones, to give the matrix T, and solves the equation T B = H. Finally it identifies in each column of H the element that was originally on the diagonal of L, and sets each such element to -1. For further details, see Maindonald (1984) page 105.

Warning - if S is inaccurately formed, e.g. using single precision calculations, there is a risk that it will not be detected as singular, or that it will be detected as not positive semi-definite.

 

Reference

Maindonald, J.H. (1984). Statistical Computation. Wiley, New York.

 

LRVSCREE procedure

Prints a scree diagram and/or a difference table of latent roots

(P.G.N. Digby)

 

Option

PRINT = strings Printed output (scree, differences); default scre

 

Parameters

ROOTS = LRVs or any numerical structures

Latent roots to be displayed; if an LRV is supplied the trace will also be extracted from it

TRACE = scalars Supplies or saves the total of the latent roots

DIFFERENCE = pointers Contains 3 variates to save the difference table

 

Description

Procedure LRVSCREE displays a set of latent roots in a convenient form. The input to the procedure is a set of latent roots (ROOTS), either as an LRV or any structure with numerical values. Optionally a scalar (TRACE) can be specified, either to supply or to save the total of the latent roots.

Printed output is controlled by the PRINT option. The setting scree produces a scree diagram, annotated with the latent roots on their original scale and expressed both as per-thousandths of the total and as cumulated per-thousandths. The setting difference prints these quantities as a table, together with the first three differences among the per-thousandth values; i.e. the first difference column gives the differences from each per-thousandth to the next, the second difference column gives differences among the first-difference values, and so on. Large first-difference values indicate latent roots ocurring prior to large declines in the scree diagram. Large second and third differences mark the locations of series of two or more latent roots of similar magnitude, which can be thought of as plateaus on the scree diagram. Large positive, or negative, second differences indicate the first, or last, latent root of a plateau. Large negative third differences occur at the last latent root of one plateau that is followed by another plateau. See the example for illustration.

The DIFFERENCE parameter allows a pointer to be specified to contain three variates storing the columns of the difference table.

 

Option: PRINT. Parameters: ROOTS, TRACE, DIFFERENCE.

 

Method

Procedure LRVSCREE uses the HISTOGRAM directive to give the scree diagram.

 

Action with RESTRICT

Not relevant: LRVSCREE deals primarily with diagonal matrices or LRVs. If the latent roots are supplied in a variate, any restriction on the variate will be ignored.

 

LVARMODEL procedure

Analyses a field trial using the Linear Variance Neighbour model

(D.B. Baird)

 

Options

PRINT = strings Controls printed output (data, effects, sed, residuals, variances); default effe, sed, vari

METHOD = string Indicates which version of the LV model to use (full, reduced); default full

LAMBDA = scalar Number between 0 and 1 which defines the value for the variance parameter l (if METHOD=full and LAMBDA=0, the value is estimated by REML); default 0

VARMETHOD = string Specifies which estimator of residual variance to use to calculate the sed's of treatment effects (RMS2, GLS) default RMS2

TOLERANCE = scalar Defines the precision to which the variance parameter l should be estimated; default 0.01

 

Parameters

Y = variates Y-values (usually plot yields) row by row

TREATMENT = factors Plot treatments for each y-variate

NROWS = scalars Number of rows in the field layout; default 1

EFFECTS = tables To save the estimated treatment effects from each analysis

SED = matrices or symmetric matrices

To save the estimated standard errors of differences between treatments

WNOISE = variates To save the estimated white noise component

TREND = variates To save the estimated trend component

COMPONENTS = variates To save the estimated variance components: the tuning parameter l, and either the variance of the random walk innovations (l<0.9) or the white noise variance (l³ 0.9)

 

Description

LVARMODEL analyses a field trial, whose plots are in rows of equal length, using the Linear Variance (LV) Neighbour analysis (Williams 1986). The LV model is equivalent to the extended First Difference model of Besag & Kempton (1986). The model allows for local trends within a row, and the analysis attempts to remove these trends by using a form of smoothing. In the full LV model, the degree of smoothing is estimated from the data; alternatively the reduced model, corresponding to the ordinary First Difference (FD) model of Besag & Kempton (1986), applies a full linear detrending to the data.

The LV model specifies the data as the sum of three components: the treatment effects, a trend component which is a random walk process, and a residual white noise component. This procedure cannot be used to fit the full Linear Variance plus Incomplete Block model of Williams (1986), which has an additional random component for incomplete blocks; however, blocks may be fitted as a fixed effect by regarding each block as a separate row and setting up the data accordingly.

The variable to be analysed (normally a plot yield) is specified in a variate, using the Y parameter, with the values in row order (row by row). The factor defining the corresponding plot treatments is specified using the TREATMENT parameter, and the number of rows in the trial is specified with the NROWS parameter. The procedure can handle missing values in the y-variate but not in the TREATMENT factor.

The other parameters allow information to be saved from the analysis: EFFECTS for the table of estimated treatment effects; SED for the standard errors of differences between treatments effects (in either a matrix or a symmetric matrix); WNOISE for the estimated white noise (in a variate); TREND for trend component (in a variate); and COMPONENTS for the two variance parameters. The first variance component is the parameter l. For l<0.9 the second component is the variance of the innovations in the random walk. If l³ 0.9 the second component saved is the variance of the white noise component, as the random walk component disappears in the limit as l tends to one.

Printed output is controlled by the PRINT option with the following settings: data - y-values and treatments in a tabular form; effects estimated treatment effects; sed standard errors of differences of effects; variance estimates of l and the white noise variance; and residuals trend and white noise components.

The METHOD option controls the form of LV model to be fitted. By default setting of full causes the full LV model to be fitted, with the variance parameters of the model estimated by Residual Maximum Likelihood (REML) (Gleeson & Cullis 1987). The variance parameters used, l and κ, are those given by Baird and Mead (1991). The parameter l is known as the tuning parameter, as it controls the degree of smoothing used in eliminating trend effects from the data. It is related to the parameter a of Besag & Kempton (1986), by the relationship

l = a / (1 + a)

Alternatively, specifying METHOD=reduced fits the reduced form of the LV model, that is the FD model. This is equivalent to putting l = 0.

The option LAMBDA allows the value of the tuning parameter to be set at a fixed value, which must lie between 0 and 1. By default LAMBDA=0, which for METHOD=full causes the value to be estimated as described above.

The option VARMETHOD controls the estimator used for the estimating the variance of the residual white noise component. There are two possibilities: the normal generalized least-squares estimator GLS, and an estimator based on the second differences of the errors RMS2 (Besag & Kempton 1986). The simulation study of Baird & Mead (1991) showed the standard errors of treatment effects based on RMS2 to be approximately valid under randomization for a wide range of error models. When the estimated value of l was not close to zero, the standard errors based on GLS were found to be approximately unbiased and more efficient than those based on RMS2 for the LV model. However the standard errors based on GLS could be seriously biased in some situations for the FD model or when l was close to zero. Thus the default for VARMETHOD is RMS2.

Finally, the TOLERANCE option specifies the precision to which l should be estimated.

 

Options: PRINT, METHOD, LAMBDA, VARMETHOD, TOLERANCE.

Parameters: Y, TREATMENT, NROWS, EFFECTS, SED, WNOISE, TRAND, COMPONENTS.

 

Method

The model is fitted in a similar manner to that outlined in Besag & Kempton (1986), but the variance components have the parameterization used by Baird & Mead (1991) and are fitted by residual maximum likelihood (Gleeson & Cullis 1987) rather than maximum likelihood; also see Baird (1987). The optimization of the likelihood is done by golden section search on the profile likelihood for l. Residuals are constructed by creating the smoothing matrix S that corresponds to the LV model fitted (Green et al. 1985).

The procedure uses a large amount of data space and computer time when the tuning parameter is estimated by REML. The speed is proportional to the number of rows multiplied by the square of the numbers of columns.

 

Action with RESTRICT

The procedure ignores any restrictions, for example on Y and TREATMENT.

 

References

Baird, D.B. (1987). A Genstat 5 procedure for a First Difference analysis. Genstat Newsletter 19, 40-47.

Baird, D.B. and Mead, R. (1991). The empirical efficiency and validity of two neighbour models. Biometrics 47, 1473-1487.

Besag, J.E. and Kempton R.A. (1986). Statistical analysis of field experiments using neighbouring plots. Biometrics 42, 231-251.

Gleeson, A.C. and Cullis, B.R. (1987). Residual maximum likelihood estimation of a neighbour model for field experiments. Biometrics 43, 277-288.

Green, P.J., Jennison, C. and Seheult. A.H. (1985). Analysis of field experiments by least squares smoothing. J. R. Statist. Soc. B 47, 299-315.

Williams, E.R. (1986). A neighbour model for field experiments. Biometrika, 73, 279-287.