HANOVA procedure
Does hierarchical analysis of variance/covariance for unbalanced data
(P.W. Lane)
Options
INCHANNEL
= scalar Channel from which to read data; default * specifies that the data values are already stored in the factors and variates specified by the parameters of HANOVAFORMAT
= variate Format for reading data; default * requests free formatANALYSIS
= symmetric matrix For PRINT=some, this indicates which analyses to printSSPM
= SSPM Stores the corrected sums of squares and products; default *COEFFICIENT
= matrix Stores the estimated variance and co-variance components; default *
Parameters
VARIATES
= pointers Variates to be analysedFACTORS
= pointers Factors defining the hierarchy, the first factor of the pointer defining the first stratum, and so on
Description
Procedure
HANOVA performs hierarchical analysis of variance and covariance, estimating the components of variance corresponding to each level of a nested classification. It is designed for unbalanced classifications; balanced data are analysed more efficiently by the ANOVA directive.Data are said to be classified hierarchically if the units have several groupings successively nested within each other. One way of representing such a classification would be to identify the groupings in each stratum of the hierarchy by a single factor; two units with the same value for one of the factors would then be required to have the identical values for the factors representing the previous strata. An alternative method is to use not only the factor for the current stratum, but also the factors for previous strata, to indicate the groupings that occur there. For example, the following classifications are effectively equivalent:
(1) (2)
Unit Factor 1 Factor 2 Factor 1 Factor 2
(stratum 1) (stratum 2) (stratum 1) (stratum 2)
1 1 1 1 1
2 1 1 1 1
3 1 2 1 2
4 2 3 2 1
5 2 4 2 2
Thus, in the second form of representation, the second factor indicates the sub-divisions within each group in the first stratum, using the same levels each time. This more efficient method is the one required by
HANOVA.The simplest way to use
HANOVA is to set the VARIATES parameter to a single variate (or to a pointer if several variates are to be analysed), and set the FACTORS parameter to a pointer of factors. The factors must be in the order of the hierarchy with the first factor defining the coarsest grouping of the units and succeeding factors being nested within the first. The units of data stored in the variates and factors can be in any order.Since hierarchical data can often be extensive,
HANOVA can be requested to read the data sequentially, tabulating it with respect to the factors, so that the data need not all be held in core at the same time. The INCHANNEL defines the channel number of the file from which the data are to be read; if INCHANNEL is not set, the data are assumed to be present already, in the factors and variates contained in the VARIATES and FACTORS parameters. The FORMAT option allows a variate to be specified for use in the FORMAT option of the READ command within the procedure; if this is not set, the default format of READ is assumed.If a unit has a missing value for any of the variates or factors, it is omitted from all the analyses. The procedure carries out analyses of variance for specified variates, and of covariance for specified pairs of variates. Variance components are calculated for each stratum: that is, the proportion of the total variance per individual ascribable to the various strata of the classification.
Output is controlled by the
PRINT option: by default, the matrix of coefficients of variance components is printed, followed by an analysis of variance of each variate and of covariance of each pair of variates. To obtain only some of the analyses, option PRINT should be set to some, and the ANALYSIS option to a symmetric matrix with numbers of rows and columns equal to the number of variates. A non-zero value in the matrix indicates that the corresponding analysis of variance or covariance is to be displayed. Printed output can be suppressed by setting PRINT=none.The matrix of coefficients can be saved using the
COEFFICIENTS option, and the sum of squares and products of the variates using the SSPM option.
Options:
PRINT, INCHANNEL, FORMAT, ANALYSIS, COEFFICIENT, SSPM.Parameters:
VARIATES, FACTORS.
Method
HANOVA
uses the method described by Gower (1962).
Action with
RESTRICTAccount is taken of restriction on any factor, or on the first variate in the
VARIATES parameter: subsequent variates must either have the same restriction, or be unrestricted.
Reference
Gower, J.C. (1962). Variance component estimation for unbalanced hierarchical classifications, Biometrics 18, 537-542.
HEATUNITS procedure
Calculates accumulated heat units of a temperature dependent process
(R.J. Reader, R.A. Sutherland & K. Phelps)
Options
METHOD
= string Temperature/time relationship to be used (sawtooth, cosine, linsine, expsine); default sawtLATITUDE
= scalar Latitude at which temperatures were measured; default 52.205 N {Wellesbourne, U.K.}RATE
= variate Value of rate relationship at cardinal temperaturesTEMPERATURE
= variate Cardinal temperaturesPARAMETERS
= variate Parameters a, b, c (a, c in hours) for the expsine method
Parameters
MINTEMPERATURE
= variates Minimum temperature on each dayMAXTEMPERATURE
= variates Maximum temperature on each dayFIRSTDAY
= scalars Day of year of first temperature recordedHEATUNITS
= variates Development on each day
Description
HEATUNITS
calculates heat units accumulated each day by a process whose rate depends on temperature. The temperature is assumed to vary diurnally. The rate function is defined as a linear spline so that any relationship can be approximated by specifying a set of cardinal temperatures and corresponding rates.The
METHOD option specifies the form of the diurnal temperature variation; this is derived from consecutive daily maximum and minimum temperatures according to methods compared by Reicosky et al. (1989). The LATITUDE option should be set to the latitude (degrees) at which the maxima and minima were recorded (positive for the northern hemisphere and negative for the southern hemisphere). The RATE and TEMPERATURE options define the rate/temperature relationship. They specify variates of equal length, RATE containing the rate of the process at the temperature of the corresponding unit of TEMPERATURE. The PARAMETERS option is a variate containing the values of the parameters a, b and c of the METHOD expsine.The parameters
MAXTEMP and MINTEMP contain the maximum and minimum temperatures on each day respectively. The FIRSTDAY parameter specifies the day of the year of the first unit of the MAXTEMP and MINTEMP variates. The HEATUNITS parameter returns the heat units accumulated on each day.
Options:
METHOD, LATITUDE, RATE, TEMPERATURE, PARAMETERS.Parameters:
MINTEMPERATURE, MAXTEMPERATURE, FIRSTDAY, HEATUNITS.
Method
The integral of each segment of the rate/temperature relationship on each day is evaluated. These integrals are then added together. Further details are given by Reader & Phelps (1991).
Action with
RESTRICTNone of the options or parameters of this procedure should be restricted as the maximum and minimum temperatures must be from consecutive days. Also they should not contain missing values, except for the first minimum and final maximum which are not used.
References
Reicosky, D.C., Winkelman, L.J., Baker, J.M. & Baker, D.G. (1989). Accuracy of hourly air temperatures calculated from daily minima and maxima. Agricultural and Forest Meteorology, 46, 193-209.
Reader, R.J. & Phelps, K. (1991). Modelling the development of temperature-dependent processes. Genstat Newsletter, 28, 27-32.
IFUNCTION procedure
Estimates implicit and/or explicit functions of parameters
(W.M. Patefield)
Options
NOMESSAGE
= string Which warning messages to suppress (parameter, convergence); default *NPARAMETER
= scalar Number of parameters; default zeroMAXCYCLE
= scalar Maximum number of iterations; default 20STRINGENCY
= scalar Stringency of tests for convergence, 0,1,2...etc; default 5EXITCONTROL
= string Control for exit on fault detection (job, procedure); default job for batch jobs, proc for interactiveZCALCULATION
= expressions Specify the calculation of ZERO and DZBIMPLICITDZPCALCULATION
= expressions Specify the calculation of DZBPARAMETERECALCULATION
= expressions Specify the calculation of EXPLICIT, DEBPARAMETER and DEBIMPLICIT
Parameters
IMPLICIT
= variate or pointer to scalarsImplicit functions
INITIAL
= variate Initial values for IMPLICIT functionsLOWER
= variate Lower bounds to IMPLICIT functions; default -1010UPPER
= variate Upper bounds to IMPLICIT functions; default +1010VCOVARIANCE
= symmetric matrix Variance-covariance matrix of parameter estimatesZERO
= variate Equations defining implicit functions (values calculated by ZCALCULATION)DZBIMPLICIT
= matrix First derivatives of equations ZERO with respect to implicit functions IMPLICIT (values calculated by ZCALCULATION); rows correspond to ZERO, columns correspond to IMPLICITDZBPARAMETER
= matrix First derivatives of equations ZERO with respect to parameters (must not be set for NPARAMETER=0; values calculated by DZPCALCULATION); rows correspond to ZERO, columns to parametersDIBPARAMETER
= matrix First derivatives of IMPLICIT functions with respect to parameters (must not be set for NPARAMETER=0); rows correspond to IMPLICIT, columns correspond to parametersEXPLICIT
= variate or pointer to scalarsExplicit functions of parameters and/or implicit functions (values calculated by
ECALCULATION)DEBPARAMETER
= matrix First partial derivatives of EXPLICIT functions with respect to parameters (values calculated by ECALCULATION); rows correspond to EXPLICIT, columns correspond to parametersDEBIMPLICIT
= matrix First partial derivatives of EXPLICIT functions with respect to IMPLICIT functions (values calculated by ECALCULATION); rows correspond to EXPLICIT, columns correspond to IMPLICITDFBPARAMETER
= matrix First derivatives of ESTIMATES with respect to parameters; rows correspond to ESTIMATES, columns correspond to parametersESTIMATES
= variate Estimates of IMPLICIT and EXPLICIT functionsSE
= variate Standard errors of ESTIMATESCORRELATIONS
= symmetric matrixCorrelation matrix of
ESTIMATESFCOVARIANCE
= symmetric matrix Variance-covariance matrix of ESTIMATES
Description
IFUNCTION
solves implicit equations of functions of parameters. The equations are specified by the variate ZERO, the ith element defining the ith equation in terms of the IMPLICIT functions. The parameters ZERO and IMPLICIT must be of the same length (n), IMPLICIT being either a variate or a pointer to n scalars. The option ZCALCULATION supplies expressions for the calculation of both ZERO and the n by n matrix DZBIMPLICIT of first derivatives of ZERO with respect to the IMPLICIT functions. The element in the ith row and jth column of DZBIMPLICIT is the (partial) derivative of the ith element of ZERO with respect to the jth element of IMPLICIT. DZBIMPLICIT is initialized to zero and hence only non-zero elements need be calculated by ZCALCULATION.The values of the
IMPLICIT functions satisfying ZERO = 0 are obtained iteratively. Initial values may be given as a variate in the parameter INITIAL. If INITIAL is not set any current values of IMPLICIT are used as initial values. Output is controlled by the PRINT option. The option NOMESSAGE allows warning messages to be suppressed. The option MAXCYCLE and the parameters LOWER and UPPER are similar in their effect to their use in the RCYCLE directive. The option STRINGENCY controls the stringency with which tests for convergence are applied, higher values being more stringent. The option EXITCONTROL controls the action on fault detection. IFUNCTION may be used to solve n simultaneous nonlinear equations in n unknowns (the IMPLICIT functions) by not setting the NPARAMETER option (or setting it to zero). More generally, the variate ZERO is a function of both the IMPLICIT functions and NPARAMETER parameter estimates from a model previously fitted using FIT, FITCURVE or FITNONLINEAR. The DZPCALCULATION option supplies expressions for calculation of the n by NPARAMETER matrix DZBPARAMETER of (partial) derivatives of ZERO with respect to the model parameters (only non-zero elements need be calculated).In addition (or instead) m explicit functions of the model parameters and/or the
IMPLICIT functions may be specified by the parameter EXPLICIT, a variate of length m or a pointer to m scalars. The (partial) derivatives of the EXPLICIT functions with respect to the model parameters are given by the m by NPARAMETER matrix DEBPARAMETER and the (partial) derivatives with respect to the IMPLICIT functions by the m by n matrix DEBIMPLICIT. If either of these matrices is not set, then it is taken to be zero (i.e. the EXPLICIT functions do not depend on the model parameters or the IMPLICIT functions respectively). Expressions for calculating EXPLICIT, DEBPARAMETER and DEBIMPLICIT are supplied by the option ECALCULATION, the two matrices being initialized to zero and hence only their non-zero elements need be calculated. For EXPLICIT functions dependent on model parameters only (i.e. not on any IMPLICIT functions), ECALCULATION need not be set, in which case their values must be supplied by EXPLICIT and their (partial) derivatives with respect to model parameters by DEBPARAMETER on entry to IFUNCTION.The parameters
ZERO, DZBIMPLICIT, DZBPARAMETER, DEBPARAMETER and DEBIMPLICIT entering into the calculations ZCALCULATION, DZPCALCULATION and ECALCULATION need not be declared before using IFUNCTION. If they are declared they must have the correct attributes. The only exception to this is when derivatives of the EXPLICIT functions are supplied directly in the matrix DEBPARAMETER rather than obtained by calculations using ECALCULATION.It is essential that the expressions for calculating
DZBIMPLICIT are formulated correctly. If they are not, faults such as divergence of the optimization algorithm or estimates becoming out of bounds may be detected and reported. Fault CA16 may also be caused by incorrectly calculating DZBIMPLICIT as a singular matrix.The variance-covariance matrix of the fitted parameters is supplied by the parameter
VCOVARIANCE containing the variance-covariance matrix from a previous FIT, FITCURVE or FITNONLINEAR.Estimates of all n+m functions (n
IMPLICIT and m EXPLICIT functions of parameters) are saved by the parameter ESTIMATES. Their derivatives with respect to the model parameters are saved by the parameter DFBPARAMETER. Their variance-covariance matrix is saved by the parameter FCOVARIANCE. The standard errors of, and correlations between, the ESTIMATES are saved by the parameters SE and CORRELATIONS.
Options:
Parameters:
IMPLICIT, INITIAL, LOWER, UPPER, VCOVARIANCE, ZERO, DZBIMPLICIT, DZBPARAMETER, DIBPARAMETER, EXPLICIT, DEBPARAMETER, DEBIMPLICIT, DFBPARAMETER, ESTIMATES, SE, CORRELATIONS, FCOVARIANCE.
Method
The implicit functions are calculated by solving the simultaneous equations
ZERO = 0 iteratively using Newton-Raphson. It is assumed that a solution exists and that the initial values are sufficiently close to a solution for the optimization to converge. Poor initial values can lead to divergence. A warning message is given when divergence is detected. Reasonable initial values may be obtained by using FITNONLINEAR to minimize the function k ´ MAX( ABS(ZERO) ), with k equal to a large number such as 106.A maximum of three convergence criteria may be employed. They are:
(i) the Increment criterion defined as
(ii) the Zero criterion defined as
MAX( ABS(ZERO) / Scaling-variate ) where the Scaling-variate is the greater of the maximum value of ZERO over all cycles of the iterative process and 0.0001, and(iii) the Gradient criterion defined as
ABS( T(Inc) *+ DZBIMPLICIT *+ Inc ).The values of criterion (ii) may be highly dependent on the initial parameter values and criterion (iii) is of use primarily when the equations
ZERO = 0 are derivatives of a scalar function and DZBIMPLICIT is the matrix of second derivatives of the function.Convergence is completed when criterion (i) cannot be further reduced. However the iterative process continues searching for lower values until other criteria cannot be further reduced. The criteria involved are determined by the
STRINGENCY option. For STRINGENCY = 0 or 1 only criterion (i) is used. For STRINGENCY = 2 or 3 criterion (ii) is also used. STRINGENCY = 1 or 3 requires convergence at two successive iterations. For STRINGENCY = 4 or 5 all criteria are used, STRINGENCY = 5 requiring convergence of both criteria (i) and (ii) at two successive iterations. Higher values of STRINGENCY require convergence of all three criteria at increasing numbers of successive iterations.The default
STRINGENCY value of 5 is recommended at least until the expressions for calculations are validated. Low values may give convergence at incorrect values of the implicit functions, particularly with poor INITIAL values when the equations ZERO are not approximately linear. High values will often result in an unneccessarily large number of iterations. IFUNCTION calculates the matrix DIBPARAMETER of derivatives of the implicit functions with respect to the model parameters (Marsden, 1984, page 211). The matrices DEBPARAMETER and DEBIMPLICIT of partial derivatives of any explicit functions with respect to the model parameters and the implicit functions respectively are evaluated using expressions supplied in ECALCULATION. By the chain rule, the derivatives of the explicit functions with respect to the parameters are given byDEBPARAMETER + ( DEBIMPLICIT *+ DIBPARAMETER )
.This matrix is appended to
DIBPARAMETER to form the n+m by NPARAMETER matrix DFBPARAMETER of derivatives of the length n+m variateESTIMATES = !( #IMPLICIT, #EXPLICIT )
with respect to the model parameters.
The variance-covariance matrix of model parameters resulting from a previous
FIT, FITCURVE or FITNONLINEAR is supplied by the parameter VCOVARIANCE, and the variance-covariance matrix of the ESTIMATES of both the implicit and explicit functions is computed asFCOVARIANCE = QPRODUCT(DFBPARAMETER; VCOVARIANCE)
.
Action with
RESTRICTNone of the parameters of
IFUNCTION may be restricted.
Reference
Marsden, J.E. (1984). Elementary Classical Analysis. W.H. Freeman and Company, San Francisco.
INSIDE procedure
Determines whether points lie within a specified polygon
(S.A. Harding)
Option
TOLERANCE
= scalar Value used for testing against zero; default 10-4
Parameters
Y
= variates Y coordinates of pointsX
= variates X coordinates of pointsYPOLYGON
= variates Y coordinates of polygonXPOLYGON
= variates X coordinates of polygonINSIDE
= variates Indicate whether points are inside (1) the polygon, outside (-1) or on an edge (0)
Description
INSIDE
takes a set of points whose x and y coordinates are specified by the X and Y parameters and determines which of these lie inside the polygon whose vertices are specified by the XPOLYGON and YPOLOGON parameters. This procedure is primarily intended for use with high-resolution graphics. It allows subsets of plotted points to be identified according to their spatial relationships so that they can be redrawn or deleted.The output is in the form of a variate, specified by the
INSIDE parameter. This will contain the value 1 for points that are located inside the polygon, 0 for those on an edge, and -1 for those outside the polygon. It can thus be used in RESTRICT, for example, to identify subsets of the values.Usually the polygon will be defined by several points. Closure is assumed, so the last point need not be the same as the first. The polygon need not be convex. If only two points are given these are interpreted as diagonally opposite corners of a rectangle (thus maintaining compatibility with the "rubber-rectangle" type of input cursor of
DREAD).
Options:
TOLERANCE.Parameters:
Y, X, YPOLYGON, XPOLYGON, INSIDE.
Method
The method used is essentially that of Shimrat (1962). The algorithm counts the number of edges for which a point lies within the y-range and to the left. If this is an odd number the point must lie within the polygon. A separate check is made for points that lie on the boundary.
Action with
RESTRICTIf either
Y or X variate is restricted, only the restricted set of points is checked for inclusion in the polygon. Any points omitted by a restriction will be identified as lying outside the polygon. Restrictions are removed from YPOLYGON and XPOLYGON.
Reference
Shimrat, M. (1962). Position of point relative to polygon, CACM Algorithm 112, Comm. ACM, Aug. 1962.
INVNORMAL procedure
Calculates probabilities from the inverse normal distribution
(A. Keen)
Options
MU
= identifier Mean of the inverse normal distribution; no defaultSIGMA
= identifier Standard deviation of the inverse normal distribution; no default
Parameters
X
= identifiers The constant(s) x for which probabilities are required; no defaultCUMPROBABILITY
= identifiers To save the cumulative probabilities
Description
INVNORMAL
calculates the probability that a random variable with the inverse normal distribution is less than a constant x.The x-values are specified using the parameter
X. The mean and standard deviation of the inverse normal distribution must be specified by options MU and SIGMA respectively. Options MU and SIGMA and parameter X can be set to any numerical structure. Non-scalar structures must contain the same number of values.Output is controlled by the
PRINT option, with setting cumprobability to print the cumulative probabilities Pr(X £ x). These probabilities can be saved by setting the CUMPROBABILITY parameter. The default type of CUMPROBABILITY is the same as that of X unless X is scalar and MU or SIGMA is non-scalar, in which case the type will be that of MU or SIGMA.
Options:
PRINT, MU, SIGMA.Parameters:
X, CUMPROBABILITY.
Method
The probabilities are calculated according to the method given in Johnson & Kotz (1970), formula 16, page 141, using the function
NORMAL. If the shape parameter of the inverse normal distribution is large, then the second part of the formula is approximated by the expression given in Section 26.2.12 of Abramowitz & Stegun (1972), taking an appropriate number of terms in the series expansion.Usually the inverse normal distribution is characterized by the parameters
m and g (see, for example, Johnson & Kotz 1970). The relation between g and s is s2 = m3 / g .
Action with
RESTRICTRestrictions are not allowed.
References
Johnson, N.L. & Kotz, S.K. (1970). Continuous Univariate Distributions - 1. Houghton Mifflin Company: Boston.
Abramowitz, M. & Stegun, I.A. (1972). Handbook of Mathematical Functions. Dover Publications: New York.
JACKKNIFE procedure
Produces Jackknife estimates and standard errors
(R.W. Payne)
Options
DATA
= variates, factors or texts Data vectors from which the statistics are to be calculatedANCILLARY
= any type Other relevant information needed to calculate the statisticsVCOVARIANCE
= symmetric matrix Saves the variance-covariance matrix for the statistics
Parameters
LABEL
= texts Texts, each containing a single line, to label the statisticsESTIMATE
= scalars Saves the Jackknife estimate for each statisticSE
= scalars Saves Jackknife estimates of the standard errorsPSEUDOVALUES
= variates Saves the Jackknife pseudo-values
Description
The Jackknife provides a way of decreasing bias and obtaining standard errors in situations where the standard methods might be expected to be inappropriate. The basic form of the Jackknife method works by calculating the statistic (or statistics) of interest omitting each data value in turn. Thus, if there are n data values, n "partial estimates" T-1 ... T-n are obtained (where T-j is the estimate omitting value j). These are combined with the estimate T obtained from all the data, to produce n pseudo-values:
Pj = n ´ T - (n - 1) ´ T-j : j = 1 ... n
The Jackknife estimate of the statistic is given by the mean of the pseudo-values, and the standard error by the standard error of the mean of the pseudo-values.
The Jackknife can be shown to eliminate the term proportional to 1/n from a bias of the form
T = t + a/n + O(1/n2)
where t is the true value of the estimate and O(1/n2) is a term of order one divided by the square of the number of observations (Quenouille 1956). However, it is not appropriate in all situations. In particular the statistic needs to be "smooth" (small changes in the data set should cause only small changes in the statistic); it will not work for example with medians or order statistics. Further details and advice are given by Miller (1974), Bissell & Ferguson (1975), Hinkley (1983) and Efron & Tibshirani (1993).
The data for
JACKKNIFE are provided as a list of vectors (variates, factors or texts) using the DATA option. From this, new vectors are formed omitting each unit of the original vectors in turn, and a subsidiary procedure RESAMPLE is called to calculate the statistics. Other relevant information can be provided for passing to RESAMPLE, in any type of data structure, using the ANCILLARY option. To use JACKKNIFE, you need to provide a version of RESAMPLE to calculate the particular statistics that you require. The default RESAMPLE procedure, which accompanies JACKKNIFE in the library, merely prints details of the syntax (also described in the Methods Section).A label should be provided for each statistic, using the
LABEL parameter; by default, there is assumed to be a single statistic labelled simply as Statistic. The estimates, their standard errors and variates of corresponding pseudo-values for each statistic can be saved by the ESTIMATE, SE and PSEUDOVALUES parameters, respectively. Also, if there is more than one statistic, a variance-covariance matrix can be saved for the estimates using the VCOVARIANCE option.Printed output is controlled by the
PRINT option, with settings estimates for the estimates and their standard errors, and vcovariance for the variance-covariance matrix; by default PRINT=estimates.
Options:
PRINT, DATA, ANCILLARY, VCOVARIANCE.Parameters:
LABEL, ESTIMATE, SE, PSEUDOVALUES.
Method
The original papers describing the Jackknife technique are by Quenouille (1949, 1956) and by Tukey (1958). Good expository accounts are provided by Hinkley (1983) or Bissell & Ferguson (1975).
JACKKNIFE
needs a subsidiary procedure RESAMPLE to calculate the statistics of interest. RESAMPLE has an option, DATA, which is used to supply the data vectors (variates, factors or texts) from which the statistics are to be calculated. (On the first occasion that RESAMPLE is called, these will be the original vectors as supplied to JACKNIFE, in order to calculate the estimate T; subsequently, they will be new vectors containing all except one of the units.) Other relevant information can can be supplied through the ANCILLARY option, which corresponds to the ANCILLARY option of JACKKNIFE itself. RESAMPLE can be called by the BOOTSTRAP procedure, and it then also has an AUXILIARY option, but this is not relevant to JACKKNIFE.There are two parameters:
STATISTICS supplies a list of scalars to store the estimates of each statistic, and EXIT a list of scalars which should be set to zero or one according to whether or not each statistic could be estimated successfully with the supplied data vectors. If the value of EXIT is not calculated in RESAMPLE, JACKKNIFE assumes that the calculations succeeded. This example shows a version of RESAMPLE which calculates the correlation between two variates.PROCEDURE [PARAMETER=pointer] 'RESAMPLE'
OPTION 'DATA', " (I: variates, factors or texts) data
vectors from which to calculate the
statistics; no default"\
'ANCILLARY'; " (I: any type of structure) other relevant
information needed to calculate the
statistics "\
MODE=p; TYPE=!t(variate,factor,text),*; SET=yes,no; \
LIST=yes; DECLARED=yes; PRESENT=yes
PARAMETER 'STATISTIC', " (O: scalars) to save the calculated
statistics "\
'EXIT'; " (O: scalars) to save an exit code to
indicate failure (EXIT[i]=1) or success
(EXIT[i]=0) when calculating each
STATISTIC[i]"\
MODE=p; TYPE='scalar'; SET=yes
CALCULATE STATISTIC[1] = CORRELATION(DATA[1]; DATA[2])
& EXIT[1] = STATISTIC[1]==C('missing')
ENDPROCEDURE
Action with
RESTRICTIf any of the data vectors is restricted,
JACKKNIFE will use only the units that are not restricted for any of the vectors.
References
Bissell, A.F. & Ferguson, R.A. (1975). The jackknife - toy, tool or two-edged weapon. The Statistician, 24, 79-100.
Efron, B. & Tibshirani, R.J. (1993). An Introduction to the Bootstrap. Chapman and Hall, London.
Hinkley, D. (1983). Jackknife methods. In: Encyclopedia of Statistics, Volume 4 (ed: S. Kotz, N.L. Johnson & C.B. Read). Wiley, New York.
Miller, R.G. (1974). The jackknife - a review. Biometrika, 61, 1-15.
Quenouille, M.H. (1949). Approximate tests of correlation in time series. Journal of the Royal Statistical Society B, 11, 18-44.
Quenouille, M.H. (1956). Notes on bias in estimation. Biometrika, 61, 353-360.
KAPLANMEIER procedure
Calculates the Kaplan-Meier estimate of the survivor function
(J.T.N.M. Thissen)
Options
GRAPHICS
= string Type of graphics to use (lineprinter, highresolution); default highTITLE
= text General title for the graph; default *WINDOW
= scalar Window number for the high-resolution graph; default 1KEYWINDOW
= scalar Window number for the key (zero for no key); default 2SCREEN
= string Whether to clear the screen before plotting or to continue plotting on the old screen (clear, keep); default cleaPROBABILITY
= scalar Probability level of the confidence interval for the Kaplan-Meier estimates; default 0.95XLOWER
= scalar Lower bound for x-axis; default 0XUPPER
= scalar Upper bound for x-axis; default * i.e. a value slightly larger than the maximum of the TIME parameter (or EVENT parameter if TIME is not set) is used
Parameters
TIME
= variates Observed timepointsCENSORED
= variates Variate specifying whether the corresponding element of TIME is censored (1) or not (0); default is to assume no censoringGROUPS
= factors Factor specifying the different groups for which the survivor function is estimatedEVENT
= variates Saves the distinct TIME values when TIME is set; otherwise supplies an input variate specifying the endpoint of each intervalNDEATH
= variates Saves the number of deaths at each EVENT when TIME is set; otherwise supplies an input variate specifying the number of deaths in each intervalNATRISK
= variates Saves the number of units at risk at each EVENT when TIME is set; otherwise supplies an input variate with the number of deaths in each intervalESTIMATE
= variates Saves the Kaplan-Meier estimates of the survivor functionNEWGROUPS
= factors Saves the grouping of the EVENT, NDEATH, NATRISK and ESTIMATE variates when TIME is set
Description
Survival data are data in which the response variate is the lifetime of a component or the survival time of a patient. Typically these are censored, i.e. the survival time of some units is unknown at the end of the study. The survivor function F(t) is a key element in the analysis of survival data. It is defined as the probability of an individual still surviving at time t.
KAPLANMEIER calculates the Kaplan-Meier estimate of the survivor function for two different types of data.The first type of data occurs when all timepoints are accurately observed. The observed timepoints or the timepoints at which censoring took place are then specified using the
TIME parameter. The CENSORED variate contains values 0 and 1 to specify whether the corresponding element of TIME is censored (1) or not (0); if there was no censoring, this need not be set. The GROUPS parameter can be used to specify a factor to indicate different groups whose survivor functions are to be estimated separately. The distinct TIME values can be saved using the EVENT parameter, and the number of deaths and the number of units at risk at each individual EVENT can be saved using parameters NDEATH and NATRISK respectively. The Kaplan-Meier estimate can be saved with the ESTIMATE parameter. The NEWGROUPS parameter can save a factor indicating the group structure of the output variates.The second type of data is relevant when the units are observed at the end of time-intervals. The exact times are then unknown and input should be specified using parameters
EVENT, NDEATH, NATRISK. These specify the timepoints, number of deaths and number of risk at the end of each interval. The GROUPS parameter can again be used to request separate group estimates.The
PRINT option controls output. Setting PRINT=estimate prints the events, number of deaths, number of units at risk and the Kaplan-Meier estimate with a confidence interval. The probability level for the interval can be set using the PROBABILITY option; by default this is 0.95. PRINT=graph plots the Kaplan-Meier estimate against the time points. If GRAPHICS=highresolution different lines are drawn for different groups, whereas GRAPHICS=lineprinter produces separate graphs for the different groups. Lower and upper bounds for the x-axis can be set by options XLOWER and XUPPER, the TITLE option can specify a title for the plots. Options WINDOW and KEYWINDOW control the windows used for high-resolution graphs.
Options:
Parameters:
TIME, CENSORED, GROUPS, EVENT, NDEATH, NATRISK, ESTIMATE, NEWGROUPS.
Method
When
TIME is set, the Kaplan-Meier estimate is calculated according to equation (1.10) in Kalbfleisch & Prentice (1980). When TIME is not set, the Kaplan-Meier estimate is directly calculated from the variates specified by EVENT, NDEATH and NATRISK.
Action with
RESTRICTThe input variates and factor
GROUPS may be restricted identically. The Kaplan-Meier estimate is based only on the units not excluded by the restriction.
Reference
Kalbfleisch, J.D. & Prentice, R.L. (1980). The Statistical Analysis of Failure Time Data. Wiley. New York.
KAPPA procedure
Calculates a kappa coefficient of agreement for nominally scaled data
(A.J. Rook)
Option
Parameters
DATA
= tables Data sets, each consisting of an object ´ category table whose entries are the number of judges assigning the ith object to the jth categorySTATISTIC
= scalars Save the value of kappa for each data tableVARIANCE
= scalars Save the corresponding variances
Description
The kappa coefficient provides a way of assessing the agreement between judges who have rated a set of N objects or subjects using a nominal scale: that is, each judge has allocated each object to one of M different categories. The data for
KAPPA, specified by the DATA parameter, consists of an N ´ M table whose entries indicate the number of judges that have assigned the ith object to the jth category. This must not contain any missing values and all the row totals must be equal.Kappa takes the value one when there is complete agreement and zero when there is none (except that expected by chance). The printing of the test statistic and its associated information is controlled by the
PRINT option. With the default, test, the procedure prints the actual and expected proportion of times that the judges agree, the resulting value of kappa and its variance. When N is large, the sampling distribution of kappa is approximately normal. The procedure thus also prints the value of kappa divided by the variance, and its probability assuming a normal distribution. A warning is printed if N is less than 20.The
STATISTIC and VARIANCE parameters allow kappa and its variance to be saved, in scalars.
Options:
PRINT. Parameter: DATA, STATISTIC, VARIANCE.
Method
The method used is that of Siegel & Castellan (1988, pages 284-291).
Reference
Siegel, S. & Castellan, N. J. (1988). Nonparametric statistics for the Behavioural Sciences, 2nd Edition. McGraw-Hill, Singapore.
KOLMOG2 procedure
Performs a Kolmogorov-Smirnoff two-sample test
(S.J. Welham, N.M. Maclaren & H.R. Simpson)
Options
GROUPS
= factor Defines the groups for a two-sample test if only the Y1 parameter is specified
Parameters
Y1
= variates Identifier of the variate holding the first sampleY2
= variates Identifier of the variate holding the second sampleR1
= variates Saves the ranks of the first sampleR2
= variates Saves the ranks of the second sampleSTATISTIC
= scalars Scalar to save the test statistic (the maximum absolute difference between the cumulative distribution functions)CHISQUARE
= scalars Scalar to save the chi-square approximation to the test statisticDIFFERENCES
= variates Variate to save the signed differences between the cumulative distribution functions
Description
The Kolmogorov-Smirnoff test assesses the similarity between the underlying distributions of the two samples, by comparing their cumulative distribution functions; the test statistic is the maximum absolute difference between the cumulative distribution functions. The samples can either be specified in two separate variates using the parameters
Y1 and Y2. Alternatively, they can be given in a single variate, with the GROUPS option set to a factor to identify the samples. The GROUPS option is ignored when the Y2 parameter is set.Output from the procedure is controlled by the
PRINT option: test prints the relevant test statistic, differences prints the signed differences, and ranks prints a vector of ranks for each of the samples.The test statistic and its chi-square approximation can be saved using the parameters
STATISTIC and CHISQUARE respectively. The parameter DIFFERENCES can be used to save the differences between the cumulative distributions. The R1 and R2 parameters allow the ranks of the samples to be saved.
Options:
PRINT, GROUPS.Parameters:
Y1, Y2, R1, R2, STATISTIC, CHISQUARE, DIFFERENCES.
Method
The Kolmogorov-Smirnoff two sample test is a test of the null hypothesis that the two samples arise from the same distribution, against the alternative that the underlying distributions are different. The test compares the two empirical cumulative distribution functions in order to try and detect differences in shape of the underlying distributions. The cumulative distribution functions S1 and S2 are formed by
Sk(X) = ( number of scores in sample k £ X ) / ( size of sample k )
for k=1,2; and a suitable set of points X. The procedure uses the set of values taken by one or other of the samples, i.e. {X: X is in
DATA}. The maximum absolute differenceMD = max( abs { S1(X) - S2(X) } )
is used as the basis for significance tests. The chi-square approximation (2 degrees of freedom) to this statistic is CH:
CH = MD ´ 4 ´ (n1´ n2) / (n1+n2) )
where n1, n2 are the sizes of the samples. (See for example Siegel 1956, pages 127-136.)
Action with
RESTRICTThe variates in
DATA can be restricted, and in different ways. KOLMOG2 uses only those units of each variate that are not excluded by their respective restrictions.
Reference
Siegel, S. (1956). Nonparametric Statistics for the behavioural sciences. McGraw-Hill, New York.
KRUSKAL procedure
Carries out a Kruskal-Wallis one-way analysis of variance
(S.J. Welham, N.M. Maclaren & H.R. Simpson)
Options
GROUPS
= factor Defines the sample membership if only one variate is specified by DATASTATISTIC
= scalar Scalar to save the Kruskal-Wallis test statisticMEANRANKS
= variate Variate to save the mean ranks of the samplesDF
= scalar Scalar to save the degrees of freedom for the statistic
Parameters
DATA
= variates List of variates containing the data for each sample, or a single variate containing the data from all the samples (the GROUPS option must then be set to indicate the sample to which each unit belongs)RANKS
= variates Allow the ranks to be saved (relative to the combined data)
Description
KRUSKAL
carries out a Kruskal-Wallis one-way analysis of variance on the ranks (relative to the whole data set) of a set of samples. The samples can be stored in different variates and supplied as a list in the DATA pointer. Alternatively, they can all be placed in a single variate, and the GROUPS option set to a factor to indicate the sample to which each unit belongs. Output from the procedure is controlled by the PRINT option: test (the default setting) prints the relevant test statistics, and ranks prints the vector of ranks for each sample.The test statistic, vector of mean ranks and degrees of freedom can be saved using the
STATISTIC, MEANRANKS and DF options, respectively. Parameter RANKS can be set to a variate, or variates, to store the ranks of the data relative to the whole data set.
Options:
PRINT, GROUPS, STATISTIC, MEANRANKS, DF.Parameters:
DATA, RANKS.
Method
The Kruskal-Wallis One-Way Analysis of Variance is used to test the hypothesis that several (K) samples come from distributions with the same mean. The test statistic H, is formed by ranking the combined data set, then considering the sum of these ranks within each sample:
H = [ (12 / N´ (N+1)) ´ S j=1...K { Rj´ Rj/nj } ] - 3´ (N+1)
where Rj is the sum of ranks for the jth sample,
nj is the size of the jth sample, and
N is the size of the combined data set.
If ties are present in the data, then an adjustment to the statistic H is required:
adjusted H = H /( 1 - S k { tk3-tk }/(N3-N) )
where tk is the number of observations with rank k. (See for example Siegel 1956, pages 184-193.)
When there are at least five cases in each of the samples, H has approximately a Chi-square distribution on K-1 degrees of freedom. When this condition is not satisfied, and there are three samples,
KRUSKAL uses a table of calculated values of the distribution of the statistic.
Action with
RESTRICTThe variates in
DATA can be restricted, and in different ways. KRUSKAL uses only those units of each variate that are not excluded by their respective restrictions.
Reference
Siegel, S. (1956). Nonparametric Statistics for the behavioural sciences. McGraw-Hill, New York.
LATTICE procedure
Analyses square and rectangular lattice designs
(K. Ryder, E.R. Williams & D. Ratcliff)
Options
TREATMENTS
= factor Factor defining the treatments in the design; must be specifiedREPLICATES
= factor Factor defining the replicates within the repeats of the basic design; must be specifiedBLOCKS
= factor Factor defining the blocks within replicates; must be specifiedREPEATS
= factor Factor to specify the number of complete repeats of the basic design; this may be omitted if the design is repeated only onceNOTCOMBINED
= string Can suppress printing of the combined means (no, yes); default no
Parameters
Y
= variates Variates to be analysedRESIDUALS
= variates Variates to store the residuals from the analysesFITTEDVALUES
= variates Variates to store the fitted values from the analyseMEANSCOMBINED
= variates Variates to store the treatment means combining between- and within-block information
Description
Procedure
LATTICE analyses either square or rectangular lattice designs with any number of replicates and repeats of the basic design (Cochran & Cox 1957, Chapter 10). The procedure first produces the standard ANOVA analysis (see Genstat 5 Release 3 Reference Manual, pages 520-522), and then forms estimates of the treatment means combining between- and within-block information, using the method of Williams & Ratcliff (1980).The model and design are specified by the options of the procedure. The
TREATMENTS option specifies the treatment factor. The REPEATS option specifies a factor to identify repeats of the basic design; if the design is repeated only once, this need not be specified. The structure of each repeat of the basic design is specified by options REPLICATES and BLOCKS, defining the replicates and the blocks within each replicate respectively.The variate to be analysed is specified by the
Y parameter. Other parameters allow the fitted values, the residuals and the combined estimates of the treatment means to be saved.Output is controlled by options
PRINT and NOTCOMBINED, for the standard ANOVA output and for the (suppression of the) printing of the combined means, respectively.
Options:
PRINT, TREATMENTS, REPLICATES, BLOCKS, REPEATS, NOTCOMBINED.Parameters:
Y, RESIDUALS, FITTEDVALUES, MEANSCOMBINED.
Method
The procedure first analyses the design using pseudo-factors, as explained on pages 520-522 of the Genstat 5 Release 3 Reference Manual. It then uses the method of Williams & Ratcliff (1980) to produce estimates of the treatment means combining between- and within-block information.
Action with
RESTRICTIf a
Y variate is restricted, its analysis will be restricted accordingly.
References
Cochran, W.G. & Cox, G. (1957). Experimental Designs (2nd Edition). Wiley, New York.
Williams, E.R. & Ratcliff, D. (1980). A note on the analysis of lattice designs with repeats. Biometrika, 67, 706-708.
LIBEXAMPLE procedure
Accesses examples and source code of library procedures
(R.W. Payne)
No options
Parameters
PROCEDURE
= texts Single-valued texts indicating the procedures about which the information is requiredEXAMPLE
= texts Identifiers of text structures to store the example for each procedureSOURCE
= texts Identifiers of text structures to store the source code of each procedure
Description
LIBEXAMPLE
allows you to obtain an example of the use of any procedure in the Genstat 5 Procedure Library, also to access the source code of any procedure, so that you can see how it works, or modify it. The names of procedures for which examples or source code are required should be listed, in quotes, using the PROCEDURE parameter. The EXAMPLE parameter can be used to specify the identifier of a text to store each example, and the SOURCE parameter to specify texts to store the source code. The examples can then be run (as macros) using the operator ##. Thus,LIBEXAMPLE 'PERCENT'; EXAMPLE=%Ex
##%Ex
would put an example of how to use
PERCENT into the text %Ex, and then run it.The examples and source are stored in backing-store files whose names are defined by Library procedure
LIBFILENAME; there must be a free backing-store channel to which the files can be attached. A file can also be defined to supply information about procedures in a local library, and LIBEXAMPLE will then look there first so that any local examples are taken in preference to those for the main library. The file can be formed using procedure FLIBHELP.
Options: none. Parameters:
PROCEDURE, EXAMPLE, SOURCE.
Method
The examples are held in the same backing-store file that holds the other help information about Library procedures; the name of the file is supplied by procedure
LIBFILENAME. This file is opened on the first available backing-store channel; if all the channels are in use, the procedures stops with a diagnostic. Within the help file, the information about each procedure is stored in a subfile whose identifier is the same as the name of the procedure itself. The example is stored in a text with identifier Help['example']. After the required information has been brought back from backing store, the help file is closed. The source code of the procedures is stored in a separate backing-store, and accessed in a similar way.
LIBFILENAME procedure
Supplies the names of information files for library procedures
(R.W. Payne)
No options
Parameters
FILENAME
= texts Text in which to store the name of the backing-store file containing the help information for the Procedure LibraryCONTENTS
= strings Indicates which file is required (help, procedure, adesign, afraction, acyclic, agenerator); default helpPROCEDURE
= texts Name of the procedure for which information is required; default * assumes that it is a procedure in the Genstat rather than the local library
Description
The help information for procedures in the Genstat 5 Procedure Library is stored in a backing-store file. Procedures such as
LIBHELP and LIBINFORM open the file on the first free backing-store channel, read and print the required information, and then close the file again. For flexibility, these procedures all call LIBFILENAME to ascertain the name of the file. A help information file can also be formed, using procedure FLIBHELP, for the local procedure library. If the PROCEDURE option is set, LIBFILENAME returns the name of the first file which contains information about that procedure, looking first in the file for local library and then in that for the Genstat library. (By default a null local file is supplied with Genstat, containing no information.) Also, LIBEXAMPLE has a file containing the source code of the library procedures, and the procedures of the Genstat Design System have files containing information for the designs that can be generated. Thus, if the location of any file needs to be changed at a particular site, only LIBFILENAME needs to be modified.
Options: none. Parameters:
FILENAME, CONTENTS, PROCEDURE.
Method
The procedure contains a text structure containing the various filenames, and the
POSITION function of CALCULATE is used to set FILENAME to the appropriate one.
Action with
RESTRICTAny restriction on the
FILENAME text will be cancelled.
LIBHELP procedure
Provides help information about library procedures
(R.W. Payne)
Option
Parameter
PROCEDURE
= texts Single-valued texts indicating the procedures about which the information is required; if this is not set, information is given about LIBHELP itself
Description
LIBHELP
provides information about procedures in the Genstat 5 Procedure Library. It has a parameter, called PROCEDURE, which you use to indicate the procedures for which you want information; if PROCEDURE is not specified, information is given about LIBHELP itself. The names of the procedures should be given in quotes: for exampleLIBHELP 'LIBINFORM'
will obtain information about the procedure
LIBINFORM (you can use LIBINFORM to find out what procedures and modules are in the Library).LIBHELP
has a single option, called PRINT, with which you specify a list of strings to indicate what information you want about each procedure. The possible values, with explanations in brackets, are as follows: index (one-line description), description (full description), options (syntax of the options), parameters (syntax of the parameters), method (description of the method used), restrict (action when arguments are restricted), calls (list of procedures called by this procedure), similar (procedures with similar facilities), authors (list of authors), references (relevant publications), module (the Library module to which the procedure belongs), history (when accepted, modified &c.), errors (details of any reported errors).The information is stored in a backing-store file whose name is defined by Library procedure
LIBFILENAME; there must be a free backing-store to which the file can be attached. A second file can also be defined to supply information about procedures in a local library, and LIBHELP will then look there first so that any local details are taken in preference to those of the main library. The file can be formed using procedure FLIBHELP.
Option:
PRINT. Parameter: PROCEDURE.
Method
The description of
LIBHELP is held within LIBHELP itself, and is printed as a caption. Other information is obtained from a backing-store file, whose name is supplied by procedure LIBFILENAME. This file is opened on the first available backing-store channel; if all the channels are in use, the procedures stops with a diagnostic. Within the Help file, the information about each procedure is stored in a subfile whose identifier is the same as the name of the procedure. The sections of information are stored in separate text structures each with the suffixed identifier Help['section name']. After printing the requested sections, the file is closed.
LIBINFORM procedure
Prints information about the contents of the Procedure Library
(R.W. Payne)
Options
LIBRARY
= string Defines the library for which information is required (Genstat, local); default Gene
Parameter
MODULE
= texts Single-valued texts indicating the modules about which the information is required; if this is not set, information is given about the whole library
Description
LIBINFORM
provides information about the Genstat 5 Procedure Library or a local library. The MODULE parameter allows you to specify that information is required only for a specified list of modules of the library; if MODULE is not set, the information is given for the whole library. The name of each module should be given in a quoted string: for exampleLIBINFORM [PRINT=index] 'AOV','MVA'
If the
MODULE is not given in full, LIBINFORM identifies the first module (in alphabetic order) that matches the MODULE setting, up to the number of characters that has been specified.The
PRINT option specifies what information is required about each module. The possible values are as follows: contents list of procedures in the module or in the Library (see MODULE), index index lines for the procedures in the module/Library, errors list of procedures in the module/Library for which errors have been reported, modules list of modules in the Library (given only if MODULE is not set).The information is stored in a backing-store file whose name is defined by library procedure
LIBFILENAME; there must be a free backing-store to which the file can be attached. A second file may also have been defined to supply information about procedures in a local library, and you can then set option LIBRARY=local to print details about the local instead of the main library.
Option:
PRINT, LIBRARY. Parameter: MODULE.
Method
The information is held in subfile
_contents of the backing-store file that holds help for the library; the name of the file is supplied by procedure LIBFILENAME. This file is opened on the first available backing-store channel; if all the channels are in use, the procedures stops with a diagnostic. After printing the requested sections, the help file is closed.
LIBMANUAL procedure
Prints a "Manual" containing information about library procedures
(R.W. Payne)
Options
CHANNEL
= scalar Channel to which to print the manual; default is to use the current output channelREFERENCE
= string Whether to print just reference information (no, yes); default noINDENTATION
= scalar Number of spaces to leave before the first column in each line; default 0LIBRARY
= string Defines the library for which information is required (Genstat, local); default Gene
Parameter
MODULE
= text Modules to be included in the manual; by default the manual is for the whole library
Description
LIBMANUAL
prints a manual containing information about procedures in the Genstat 5 Procedure Library. There is first a header page, with title and list of index lines giving brief details about the procedures. Then Help information is printed about each of the procedures in turn. LIBMANUAL takes account of the current environment (as controlled by the OUTPRINT option of SET) to decide whether to start each procedure on a new page. The information is stored in a backing-store file whose name is defined by Library procedure LIBFILENAME; there must be a free backing-store channel to which the file can be attached. A second file may also have been defined to supply information about procedures in the local library, and you can then set option LIBRARY=local to print a manual for the local instead of the main library.Unless otherwise specified, the manual will contain every procedure in the library. However, there is a parameter,
MODULES, which can be set to a text to indicate that only procedures in a particular set of modules should be included. Details of the modules in the library can be obtained using procedure LIBINFORM, and some procedures may belong to more than one. In particular, there are modules called PLn (where n is a positive integer 1, 2...) to indicate the procedures that were added in release PLn of the Library, PLnHELP to indicate the procedures whose help information was last modified in release PLn of the library, and PLnPROCEDURE to indicate those where the procedure itself was last modified in release PLn. Thus, a manual for the procedures that were new in Release PL6 can be obtained by puttingLIBMANUAL 'PL6'
The CHANNEL option specifies the output channel to which the Manual is to be printed; by default it is printed to the current output channel.
The REFERENCE option allows just a reference summary to be obtained, instead of the full information each procedure. Finally, the INDENTATION option can be used to indent the information by a specified number of columns, so that the manual can conveniently be put into a folder or binder.
Options: CHANNEL, REFERENCE, INDENTATION, LIBRARY. Parameter: MODULES.
Method
LIBMANUAL
first prints a header, followed by a list of index lines describing the contents of the library. It then runs a loop over the procedures in the library, accessing and printing the Help information. This information is stored in the library Help file, the name of which is supplied by procedure LIBFILENAME. The file is opened on the first available backing-store channel; if all the channels are in use, the procedures stops with a diagnostic. Afterwards the channel is closed. directives are executed at the start and end of the procedure, to switch output channels as requested by the CHANNEL option.
LIBVERSION procedure
Provides the name of the current Genstat 5 Procedure Library
(R.W. Payne)
Option
Parameter
RELEASENAME
= text Text in which to store the name of the currently available release of the Genstat 5 Procedure Library
Description
The Genstat 5 Procedure Library is updated independently of releases of the main Genstat program and the current release thus may not be immediately apparent. Consequently
LIBVERSION is provided to allow users to obtain the name of the currently available release. The name is printed by default, but you can set option PRINT=* to suppress this. The RELEASENAME parameter allows the name, 'Genstat 5 Procedure Library Release ...' to be saved.
Options: none. Parameters:
RELEASENAME.
Method
RELEASENAME
is formed by an ordinary TEXT declaration.
Action with
RESTRICTAny restriction on the
RELEASENAME text will be cancelled.
LINDEPENDENCE procedure
Finds the linear relations associated with matrix singularities
(J.H. Maindonald)
Option
Parameters
DATA
= symmetric matrices Specifies the positive semi-definite matrix for which the information is requiredCOEFFICIENTS
= matrices Stores the coefficients of the linear dependencies
Description
Procedure
LINDEPENDENCE takes a positive semi-definite matrix S (e.g. a matrix formed as X¢ X), and identifies any columns of S that are a linear combination of earlier columns. It determines the linear relations involved, and stores these in the columns of the matrix specified by the COEFFICIENTS parameter.In more mathematical terms the output, stored as columns of
COEFFICIENTS, is a basis for the null space of a positive semi-definite matrix S. If S = X¢ X, then this will also be a basis for the column space of X.The first parameter,
DATA, specifies the symmetric matrix S for which the information is required. The columns of the COEFFICIENTS matrix store the linear relations. This matrix will be defined automatically if it has not been declared earlier.Printed output information on either which columns are dependent and/or what the coefficients for the dependencies are can be requested with the settings
dependent and coefficient of the PRINT option. By default the dependent columns are printed.
Option:
PRINT. Parameters: DATA, COEFFICIENTS.
Method
The matrix function
CHOLESKI is used to determine a lower triangular matrix L such that LL¢ = S. Zeros on the diagonal of L identify columns of S that are a linear combination of earlier columns. The corresponding columns of L¢ form a matrix H. The algorithm then replaces zeros on the diagonal of L¢ by ones, to give the matrix T, and solves the equation T B = H. Finally it identifies in each column of H the element that was originally on the diagonal of L, and sets each such element to -1. For further details, see Maindonald (1984) page 105.Warning - if S is inaccurately formed, e.g. using single precision calculations, there is a risk that it will not be detected as singular, or that it will be detected as not positive semi-definite.
Reference
Maindonald, J.H. (1984). Statistical Computation. Wiley, New York.
LRVSCREE procedure
Prints a scree diagram and/or a difference table of latent roots
(P.G.N. Digby)
Option
Parameters
ROOTS
= LRVs or any numerical structuresLatent roots to be displayed; if an LRV is supplied the trace will also be extracted from it
TRACE
= scalars Supplies or saves the total of the latent rootsDIFFERENCE
= pointers Contains 3 variates to save the difference table
Description
Procedure
LRVSCREE displays a set of latent roots in a convenient form. The input to the procedure is a set of latent roots (ROOTS), either as an LRV or any structure with numerical values. Optionally a scalar (TRACE) can be specified, either to supply or to save the total of the latent roots.Printed output is controlled by the
PRINT option. The setting scree produces a scree diagram, annotated with the latent roots on their original scale and expressed both as per-thousandths of the total and as cumulated per-thousandths. The setting difference prints these quantities as a table, together with the first three differences among the per-thousandth values; i.e. the first difference column gives the differences from each per-thousandth to the next, the second difference column gives differences among the first-difference values, and so on. Large first-difference values indicate latent roots ocurring prior to large declines in the scree diagram. Large second and third differences mark the locations of series of two or more latent roots of similar magnitude, which can be thought of as plateaus on the scree diagram. Large positive, or negative, second differences indicate the first, or last, latent root of a plateau. Large negative third differences occur at the last latent root of one plateau that is followed by another plateau. See the example for illustration.The
DIFFERENCE parameter allows a pointer to be specified to contain three variates storing the columns of the difference table.
Option:
PRINT. Parameters: ROOTS, TRACE, DIFFERENCE.
Method
Procedure
LRVSCREE uses the HISTOGRAM directive to give the scree diagram.
Action with
RESTRICTNot relevant:
LRVSCREE deals primarily with diagonal matrices or LRVs. If the latent roots are supplied in a variate, any restriction on the variate will be ignored.
LVARMODEL procedure
Analyses a field trial using the Linear Variance Neighbour model
(D.B. Baird)
Options
METHOD
= string Indicates which version of the LV model to use (full, reduced); default fullLAMBDA
= scalar Number between 0 and 1 which defines the value for the variance parameter l (if METHOD=full and LAMBDA=0, the value is estimated by REML); default 0VARMETHOD
= string Specifies which estimator of residual variance to use to calculate the sed's of treatment effects (RMS2, GLS) default RMS2TOLERANCE
= scalar Defines the precision to which the variance parameter l should be estimated; default 0.01
Parameters
Y
= variates Y-values (usually plot yields) row by rowTREATMENT
= factors Plot treatments for each y-variateNROWS
= scalars Number of rows in the field layout; default 1EFFECTS
= tables To save the estimated treatment effects from each analysisSED
= matrices or symmetric matricesTo save the estimated standard errors of differences between treatments
WNOISE
= variates To save the estimated white noise componentTREND
= variates To save the estimated trend componentCOMPONENTS
= variates To save the estimated variance components: the tuning parameter l, and either the variance of the random walk innovations (l<0.9) or the white noise variance (l³ 0.9)
Description
LVARMODEL
analyses a field trial, whose plots are in rows of equal length, using the Linear Variance (LV) Neighbour analysis (Williams 1986). The LV model is equivalent to the extended First Difference model of Besag & Kempton (1986). The model allows for local trends within a row, and the analysis attempts to remove these trends by using a form of smoothing. In the full LV model, the degree of smoothing is estimated from the data; alternatively the reduced model, corresponding to the ordinary First Difference (FD) model of Besag & Kempton (1986), applies a full linear detrending to the data.The LV model specifies the data as the sum of three components: the treatment effects, a trend component which is a random walk process, and a residual white noise component. This procedure cannot be used to fit the full Linear Variance plus Incomplete Block model of Williams (1986), which has an additional random component for incomplete blocks; however, blocks may be fitted as a fixed effect by regarding each block as a separate row and setting up the data accordingly.
The variable to be analysed (normally a plot yield) is specified in a variate, using the
Y parameter, with the values in row order (row by row). The factor defining the corresponding plot treatments is specified using the TREATMENT parameter, and the number of rows in the trial is specified with the NROWS parameter. The procedure can handle missing values in the y-variate but not in the TREATMENT factor.The other parameters allow information to be saved from the analysis:
EFFECTS for the table of estimated treatment effects; SED for the standard errors of differences between treatments effects (in either a matrix or a symmetric matrix); WNOISE for the estimated white noise (in a variate); TREND for trend component (in a variate); and COMPONENTS for the two variance parameters. The first variance component is the parameter l. For l<0.9 the second component is the variance of the innovations in the random walk. If l³ 0.9 the second component saved is the variance of the white noise component, as the random walk component disappears in the limit as l tends to one.Printed output is controlled by the
PRINT option with the following settings: data - y-values and treatments in a tabular form; effects estimated treatment effects; sed standard errors of differences of effects; variance estimates of l and the white noise variance; and residuals trend and white noise components.The
METHOD option controls the form of LV model to be fitted. By default setting of full causes the full LV model to be fitted, with the variance parameters of the model estimated by Residual Maximum Likelihood (REML) (Gleeson & Cullis 1987). The variance parameters used, l and κ, are those given by Baird and Mead (1991). The parameter l is known as the tuning parameter, as it controls the degree of smoothing used in eliminating trend effects from the data. It is related to the parameter a of Besag & Kempton (1986), by the relationship
Alternatively, specifying
METHOD=reduced fits the reduced form of the LV model, that is the FD model. This is equivalent to putting l = 0.The option
LAMBDA allows the value of the tuning parameter to be set at a fixed value, which must lie between 0 and 1. By default LAMBDA=0, which for METHOD=full causes the value to be estimated as described above.The option
VARMETHOD controls the estimator used for the estimating the variance of the residual white noise component. There are two possibilities: the normal generalized least-squares estimator GLS, and an estimator based on the second differences of the errors RMS2 (Besag & Kempton 1986). The simulation study of Baird & Mead (1991) showed the standard errors of treatment effects based on RMS2 to be approximately valid under randomization for a wide range of error models. When the estimated value of l was not close to zero, the standard errors based on GLS were found to be approximately unbiased and more efficient than those based on RMS2 for the LV model. However the standard errors based on GLS could be seriously biased in some situations for the FD model or when l was close to zero. Thus the default for VARMETHOD is RMS2.Finally, the
TOLERANCE option specifies the precision to which l should be estimated.
Options:
PRINT, METHOD, LAMBDA, VARMETHOD, TOLERANCE.Parameters:
Y, TREATMENT, NROWS, EFFECTS, SED, WNOISE, TRAND, COMPONENTS.
Method
The model is fitted in a similar manner to that outlined in Besag & Kempton (1986), but the variance components have the parameterization used by Baird & Mead (1991) and are fitted by residual maximum likelihood (Gleeson & Cullis 1987) rather than maximum likelihood; also see Baird (1987). The optimization of the likelihood is done by golden section search on the profile likelihood for
l. Residuals are constructed by creating the smoothing matrix S that corresponds to the LV model fitted (Green et al. 1985).The procedure uses a large amount of data space and computer time when the tuning parameter is estimated by REML. The speed is proportional to the number of rows multiplied by the square of the numbers of columns.
Action with
RESTRICTThe procedure ignores any restrictions, for example on
Y and TREATMENT.
References
Baird, D.B. (1987). A Genstat 5 procedure for a First Difference analysis. Genstat Newsletter 19, 40-47.
Baird, D.B. and Mead, R. (1991). The empirical efficiency and validity of two neighbour models. Biometrics 47, 1473-1487.
Besag, J.E. and Kempton R.A. (1986). Statistical analysis of field experiments using neighbouring plots. Biometrics 42, 231-251.
Gleeson, A.C. and Cullis, B.R. (1987). Residual maximum likelihood estimation of a neighbour model for field experiments. Biometrics 43, 277-288.
Green, P.J., Jennison, C. and Seheult. A.H. (1985). Analysis of field experiments by least squares smoothing. J. R. Statist. Soc. B 47, 299-315.
Williams, E.R. (1986). A neighbour model for field experiments. Biometrika, 73, 279-287.