DAPLOT procedure
Plots residuals from
ANOVA with interactive identification of outliers(R.J. Reader)
Options
PEN
= scalar or variate Pen or pens to be used to plot the graphs, if a variate is specified its values define the pen to be used for each point on the graphs; default 1SELECTED
= variate Returns the list of elements that have been selectedADDED
= variate X-values to be used in an added variable plotSAVE
= ANOVA save structure Specifies the analysis from which the residuals and fitted values are to be taken; by default they are taken from the most recent ANOVA
Parameter
METHOD
= strings Type of graph (up to four out of the five possible) to be plotted (histogram, fittedvalues, normal, halfnormal, added); default hist, fitt, norm, half
Description
DAPLOT
provides five types of high-resolution plot for residuals from an ANOVA. These are selected using the METHOD parameter with settings: histogram for a histogram of residuals,fitted
for residuals versus fitted values, normal for a Normal plot,halfnormal
for a half-Normal plot, and added for an added variable plot (Cook & Weisberg, 1982). Up to four can be examined in any call of the procedure.If
METHOD is set to added, the ADDED option must be set to the variate that is to provide the x-values for the plot. These could, for example, be residuals from an analysis of variance of a possible covariate.The
PEN option controls the pen or pens used for the plotting. Other aspects of the graphics environment, such as windows, are set automatically, and restored at the end of the procedure.If the graphs are plotted interactively, the
SELECTED option allows points to be selected from any graph except a histogram. The graphs are then replotted highlighting the selected points, and the unit numbers of the corresponding elements of the original ANOVA y-variate are saved in the variate specified by SELECTED.The residuals and fitted values are accessed automatically from the structure specified by the
SAVE option which, by default, will be that for the last y-variate analysed by ANOVA. Missing values are inserted in the fitted values and residuals in any units that were missing in the original y-variate.
Options:
PEN, SELECTED, ADDED, SAVE. Parameter: METHOD.
Method
Residuals and fitted values are accessed, using
AKEEP, from the latest ANOVA, or from that specified by the SAVE option.For a Normal plot, the Normal quantiles are calculated as follows:
qi = NED( (i-0.375) / (n+0.25) )
while for a half-Normal plot they are given by
qi = NED( 0.5 + 0.5 ´ (i-0.375) / (n+0.25) ).
The graphs are plotted initially using the pen(s) specified by the
PEN option. The characteristics of the pen(s) can be altered using the PEN directive for example to enable different levels of a factor to be plotted with different symbols.The
QUESTION directive is used to determine the graph from which points are to be selected. The DREAD directive is then used to identify the points with the cursor, in the usual way. If any points have been selected, all the graphs are redrawn with the attributes of default pen 2 for the selected points and those of default pen 1 for the others.
Action with
RESTRICTIf the y-variate in the
ANOVA is restricted, only the units not excluded by the restriction are included in the graphs.
Reference
Cooke R.D, & Weisberg S. (1982). Residuals and influence in regression. London: Chapman & Hall.
DAYCOUNT procedure
Converts a date to a daycount, or vice versa
(T.J. Cole)
Option
TODATE
= string Whether to convert from daycount to date, instead of from date to daycount (no, yes); default no
Parameters
NDAYS
= variates or scalars Daycount since 29th February 1600; must be set if option TODATE=yesDAY
= variates or scalars Day of month in range 1...31 (or 30, 29 or 28 depending on month, year and century); must be set if option TODATE=noMONTH
= variates or scalars Month of year in range 1...12; must be set if option TODATE=noYEAR
= variates or scalars Year of century in range 00...99; must be set if option TODATE=noCENTURY
= variates or scalars Century in range 16... ; default 19 (i.e. 1900 - 1999)WEEKDAY
= variates or scalars Day of the week corresponding to the date, where Monday=1 and Sunday=7
Description
DAYCOUNT
takes a date, expressed as day, month and year, and converts it to an exact daycount since 29th February 1600, based on the Gregorian Calendar. The year is defined in two parts, the first two digits being the century and the last two the year, so 1988 is century 19 and year 88. Alternatively, if option TODATE is set to yes, a daycount is converted back to a date. The earliest allowable date is 1st March 1600, corresponding to a daycount of 1. Also calculated is the weekday of the date, where Monday is 1 and Sunday is 7. Parameter CENTURY allows the century of the date to be specified; if this is unset, 19 is assumed (i.e. 1900 - 1999). There is no printed output, other than warnings about dates being invalid or earlier than the starting date.
Option:
TODATE. Parameters: NDAYS, DAY, MONTH, YEAR, CENTURY, WEEKDAY.
Method
The method of calculation is based on Zeller's Congruence, which redefines each year as starting on 1st March. An extra day is added on when the year is divisible by 4 (or when the century is divisible by 4 and the year is 00) and this goes at the end of February the previous year. In addition, the function
INTEGER(MONTH*30.6+0.5) gives a running total of days in previous months of the year, where MONTH=0...11 represents March to February. The procedure uses similar sums to check for valid dates. A cycle of 400 years (e.g. Wednesday 1st March 1600 to Tuesday 29th February 2000) consists of an exact number of weeks, so that the weekday of any date can be found from the daycount mod 7. Wherever possible, the procedure creates the required output structures as scalars, so as to save space.
Action with
RESTRICTIf any of the parameters is restricted, the procedure will operate only on the specified set of units; other parameters must either be unrestricted or restricted to the same set of units.
DAYLENGTH procedure
Calculates daylengths at a given period of the year
(R.J. Reader & K. Phelps)
Option
LATITUDE
= scalar Latitude at which the daylength is to be calculated, positive for northern hemisphere and negative for southern hemisphere; default 52.205 N (Wellesbourne)
Parameters
DAYNUMBER
= variate Days of year for which daylengths are requiredDAYLENGTH
= variate Calculated daylengths in hours
Description
DAYLENGTH
calculates a set of daylengths at a given latitude. The numbers of the days during the year for which the daylengths are required should be specified, in a variate, using the DAYNUMBER parameter. The lengths will then be stored in the variate specified by the DAYLENGTH parameter. The latitude is defined by the LATITUDE option, by default LATITUDE=52.205 which is the latitude of Wellesbourne.
Option:
LATITUDE. Parameters: DAYNUMBER, DAYLENGTH.
Method
The formula by which the daylengths is calculated is given in Sellers (1965).
Action with
RESTRICTIf either the
DAYNUMBER or the DAYLENGTH variate is restricted, the calculations will be done only for the units not excluded by the restriction.
Reference
Sellers W.D. (1965). Physical Climatology. University of Chicago Press, Chicago, Illinois.
DBARCHART procedure
Produces barcharts for one or two-way tables
(Ruth Butler)
Options
TITLE
= text Title for Chart; no defaultWINDOW
= scalar Window for chart (1...8); default 1KEYWINDOW
= scalar Window for Key, no key is produced for one-way tables (1...8); default 2LABELS
= text Labels for clusters of bars; by default the labels or levels of the first classifying factor of TABLE are usedAPPEND
= string Whether to append bars (no, yes); default noSCREEN
= string Whether to clear screen before displaying chart (keep, clear); default cleaKEYDESCRIPTION
= text Title for key; default is the name of the second factor of TABLEYSCALE
= expression Defines a transformation of the data, the expression must be a function of X, for example !e(log(X)), and should be monotonically increasing in the range of the data in TABLE; default no transformation
Parameters
TABLE
= tables One or two-way table of dataORIGIN
= scalars Origin for y-axis; default 0PEN
= variates or scalars Pen (or pens) to use; default is!(1...nlevel(last_classifying_factor))
DESCRIPTION
= texts Annotation for Key for two-way tables; by default the labels or levels of the last classifying factor of TABLE are usedYMARKS
= variates Position of the tick-marks on the y-axis
Description
DBARCHART
produces barcharts for one or two-way tables. For a two-way table, the bar chart is produced with the first factor defining the groups of bars in the chart, and the second the bars within each group. The table is specified by the TABLE parameter and the origin of the y-axis, which need not be zero, can be set with the ORIGIN parameter. The PEN parameter specifies a pen, or pens, for the bars of the histogram. This can be input as a scalar if the same pen is to be used for the whole plot, or as a variate to allow the groups to be drawn in different pens; by default pens 1, 2 ... are used for the successive bars. Labelling for the key can be supplied by the DESCRIPTION parameter; if this is not set, BARCHART uses the labels of the last classifying factor. Positions of the tick-marks on the Y-axis can be specified with the YMARKS parameter.The options of the procedure mainly control the plotting: the windows that are used for the plot (
WINDOW) and for the key (KEYWINDOW), titles for the graph (TITLE) and for the key (KEYDESCRIPTION), whether the groups of bars are appended or placed side-by-side (APPEND), and whether or not to clear the screen before plotting (SCREEN). The YSCALE option can specify a transformation to be used to rescale the data and y-axis; the labels on the y-axis, however, will refer to the original scale of the data.
Options:
Parameters:
TABLE, ORIGIN, PEN, DESCRIPTION, YMARKS.
Method
If
YSCALE is set, the expression is used to transform TABLE and ORIGIN. Any YMARKS are also transformed to find the position of the tick-marks. TABLE is then rescaled so that the ORIGIN is zero. DHISTOGRAM is then used to produce the chart without a y-axis. The y-axes is added to the chart using a DGRAPH statement, with the labelling on the original scale of TABLE. Two-way tables are first split into one-way tables classified by the second factor of TABLE. One sub-table is produced for each level of the first factor of TABLE. The chart is then produced with a single DHISTOGRAM statement for all sub-tables. YSCALE is imported into the program by setting X as a dummy, and printing the expression into a text. A new expression is then set up using this text with the EXECUTE directive. X is then set to ORIGIN, TABLE and YMARKS in turn before the expression is calculated.
DDENDROGRAM procedure
Draws dendrograms with control over structure and style
(P.G.N. Digby)
Options
STYLE
= string Style to use for the links of the dendrogram (average, centroid, lower, full); default averORDERING
= strings How to define the order of the units for the dendrogram (given, ziggurat, size, first); default zigg, size, firsREVERSE
= string Whether to reverse the order of the units in the dendrogram (no, yes); default noORIENTATION
= string Specifies the orientation of a dendrogram produced by high-resolution graphics (north, south, east, west); default westSETSCALE
= string Whether the procedure should set the scale for the axis showing similarity to 1 for similarities, or 100 for percentage similarities, or whether the scale should be determined by the range of similarities (no, yes); default noMETHOD
= string Method used to represent the scale on which the amalgamations have been made: settings other than the default are relevant only for data not generated by HCLUSTER or HDISPLAY (similarities, percentages, distances); default simiSCREEN
= string Setting to use for the SCREEN option of DGRAPH (clear, keep); default cleaCHANGE
= string If a dendrogram-save structure from a previous DDENDROGRAM is used as the DATA parameter then this option specifies the area of the process where the first changes occur: see the description of the SAVE parameter (order, dendrogram, display); default ordeGRAPHICS
= string Form of graphics to be used (lineprinter, highresolution); default high
Parameters
DATA
= matrices or pointers Data defining each dendrogram in the form of either a matrix saved using the AMALGAMATIONS parameter of HCLUSTER (methods other than single linkage), or a matrix from the TREE parameter of HDISPLAY, or a SAVE structure from a previous use of DDENDROGRAMPERMUTATION
= variates Specify or save permutations of the units for drawing each dendrogram, according to ORDERING optionLABELS
= variates or texts Supply labels to use for the units of each dendrogram; these should be in the natural order of the units, not in a permuted orderTITLE
= texts Titles for the dendrogramsWINDOW
= scalars Window to use for each dendrogram (window 1 if unset); if this is set to zero the dendrogram is not drawn, but results can still be saved using the PERMUTATION, ZIGGURAT and SAVE parametersPENS
= scalars, variates, strings or textsScalar or string specifying the graphics pen or symbol in which to draw each (high-resolution or line-printer) dendrogram; alternatively use of a variate or text allows the structure of each dendrogram to be highlighted by drawing different links with different graphics pens or symbols
ZIGGURAT
= variates Save the "ziggurat-degree" of the links in each dendrogramSAVE
= pointers Save the information required to plot a dendrogram, for use as input for the DATA parameter in a subsequent call to DDENDROGRAM
Description
DDENDROGRAM
draws dendrograms using line-printer or high-resolution graphics, as indicated by the GRAPHICS option.Dendrograms can be drawn in many ways, often with apparently quite different results, as illustrated by Digby (1985). The procedure allows the user considerable control over the way that a dendrogram is formed; in particular the order of the units and the style used for drawing the links of the dendrogram can be varied. If high-resolution graphics is to be used, a check should be made to ensure that this facility is present in the available version of Genstat. This can be done by seeing what happens when any of the relevant directives is used (Genstat 5 Release 3 Reference Manual, Chapter 6). Then directives
DEVICE, FRAME and PEN should be used to change the default settings, if required; these can be ascertained using the statementHELP ENVIRONMENT, PICTURES, CURRENT
The input for the procedure is given by the
DATA parameter. This should be a matrix containing the amalgamations information from hierarchical cluster analysis (from the AMALGAMATIONS parameter of HCLUSTER) or a matrix containing the minimum spanning tree information (from the TREE parameter of the HDISPLAY directive); alternatively a SAVE structure from a previous DDENDROGRAM can be used as input. However, in the current release of Genstat, the amalgamations matrix from HCLUSTER is unusable if the clustering has been been produced by single linkage, so the minimum spanning tree information, which is equivalent, should be used as input.The
PERMUTATION parameter can be supplied with a variate, either to specify a permutation of the rows of the dendrogram or to save the permutation generated by DDENDROGRAM, as indicated by the ORDERING option. Setting ORDERING=given takes the ordering defined by the PERMUTATION variate. The other settings of ORDERING define partial orderings of the units, and are used in conjunction with each other to obtain the full ordering: ziggurat (Critchley 1983) is associated with ultrametric distances amongst the units; size specifies that when 2 groups merge the smaller is always placed before the larger in the order; first specifies that when 2 groups merge the group containing the lowest numbered unit is always placed before the other in the order. The orders given by settings ziggurat and size are not completely specified and recourse may be made to the other of these settings or to first. If ORDERING is not set to given then a list of settings may be specified in which case the first in the list is used, the second is used to satisfy indeterminacies in the order given by the first setting in the list, and so on. The default is the list of settings: ziggurat, size, first.Option
REVERSE allows the ordering thus obtained to be reversed.The
LABELS parameter can be given a variate or a text to supply labels for the rows of the dendrogram. Labelling can be suppressed altogether by using a text containing only spaces.The
STYLE option controls the style to use in forming the links of the dendrogram: its setting indicates where the line representing each new cluster should be placed. Assuming that the dendrogram has the units on the left-hand side, the settings can be described as follows:average
(the default) the new line is midway between the old lines; centroid the new line is placed at the mid-point of all the units in the group it represents; lower the new line is a continuation of the lower of the two old lines (comparable with dendrograms from HCLUSTER); full the new line is a continuation of the upper or lower of the two old lines, so that each vertical line spans all the units in the group it represents.The
ORIENTATION option is relevant to high-resolution graphics, when it controls the orientation of the dendrogram: for example the setting north results in a "hanging dendrogram" with the units across the top. The default setting is west, which gives a dendrogram with the units on the left-hand side; this is also how DDENDROGRAM draws dendrograms on the line-printer.The
SETSCALE option controls whether the procedure should set the scale for the axis showing similarity to 1 for similarities (100 for percentage similarities), instead of determining the scale by the range of similarities or distances.The
METHOD option indicates the scale on which the amalgamations have been made. This option need be set only if the data have been obtained from a source other than HCLUSTER or HDISPLAY.The
TITLE parameter specifies a title for each dendrogram. For high-resolution graphics, the WINDOW parameter defines the graphics window to use for each plot. With line-printer graphics, two "windows" are available: window 1 has a width of 101 characters, window 2 a width of 61 characters. If WINDOW is not set, window 1 is used. If it is set to zero, the dendrogram is not drawn but results can still be saved using the PERMUTATION, ZIGGURAT and SAVE parameters; however, if the SAVE structure is used later as input to DDENDROGRAM, the CHANGE option must not be set to display as the dendrogram stage will not have been completed. The SCREEN option controls whether to clear the high-resolution graphics screen before plotting (default clear).For high-resolution graphics, the
PENS parameter can be supplied with a scalar indicating the graphics pen with which to draw the dendrogram. Alternatively, if required, a variate can be specified to highlight the structure of the dendrogram by drawing different links with different pens; the links are taken in the same order as the rows of the AMALGAMATIONS matrix from HCLUSTER or in increasing order of the links of the minimum spanning tree. DDENDROGRAM will use pen 1 if the PENS parameter is not set. Any pens used by DDENDROGRAM will be set to METHOD=line, SYMBOLS=0, JOIN=given. If a scalar is supplied or PENS is not set, the pen used will also have LINESTYLE set to 1. If a variate is used, appropriate settings of COLOUR and LINESTYLE should set (using the PEN directive) prior to calling DDENDROGRAM. Similarly, with line-printer graphics, the PENS parameter can be set either to a string or to a text, according to whether the links are to be drawn with the same or different symbols; if the parameter is unset, the plus symbol (+) is used for all the links.The
ZIGGURAT parameter can be used to save the "ziggurat-degree" (Critchley 1983) of each link. This could then be used to form the setting of the PENS parameter for a later dendrogram, in order to display particular aspects of the clustering more clearly.The
SAVE parameter can be used to save the various structures that control the drawing of a dendrogram in order to save computing time when drawing a similar dendrogram. The SAVE structure should then be used as the setting of the DATA parameter, and the CHANGE option used to indicate the stage at which to start changing aspects of the previous dendrogram. The various stages (in order) involve the following options and parameters:order ORDERING
and PERMUTATION;dendrogram STYLE
and METHOD;display REVERSE
, ORIENTATION, SETSCALE, SCREEN, LABELS, TITLE, WINDOW, PENS.
Options:
STYLE, ORDERING, REVERSE, ORIENTATION, SETSCALE, METHOD, SCREEN, CHANGE, GRAPHICS.Parameters:
DATA, PERMUTATION, LABELS, TITLE, WINDOW, PENS, ZIGGURAT, SAVE.
Method
Dendrograms are constructed and drawn in four separate stages: firstly the amalgamations information is used to construct information on group sizes; secondly a permutation of the units is formed, if required, according to several possible ordering schemes; thirdly graphical information on each of the links of the dendrogram is formed; lastly this graphical information is used to display the dendrogram, subject to requirements over orientation, pens, etc. Separate procedures are used for each stage (for details see the source code of
DDENDROGRAM, obtainable via LIBEXAMPLE). A preliminary stage is also needed to construct the amalgamations from information on a minimum spanning tree. Communication amongst the subsidiary procedures is obtained using a pointer, which the user may keep using the SAVE parameter. The algorithms used by the first three subsidiary procedures are similar to those described by Digby (1984a, 1984b).
Action with
RESTRICTIf any of the options or parameters are restricted unpredictable results may occur: none of the options or parameters should be restricted.
References
Critchley, F. (1983). Ziggurats and dendrograms. Report No. 43, Department of Statistics, University of Warwick.
Digby, P.G.N. (1984a). Drawing pretty dendrograms. Genstat Newsletter, 14, 18-26.
Digby, P.G.N. (1984b). Dendrograms and ziggurats. Genstat Newsletter, 14, 14-18.
Digby, P.G.N. (1985). Graphical displays for classification. PACT Journal of the European Study Group on Physical, Chemical and Mathematical Techniques Applied to Archaeology.
DDESIGN procedure
Plots the plan of an experimental design
(K.E. Bicknell & R.W. Payne)
Options
Y
= variate Specifies the y position of the plots in standard coordinates 1 ... number of rows of plots in the experiment (taking 1 as the top row of the window)X
= variate Specifies the x-coordinate of the plots in standard coordinates 1 ... number of columns of experimental plotsTITLE
= text Title for the planWINDOW
= scalar Window number for the plan; default 3KEYWINDOW
= scalar Window number for the key; default 0SCREEN
= string Whether to clear the screen before plotting (clear, keep); default cleaKEYDESCRIPTION
= text Overall description for the key; default *ENDACTION
= string Action to be taken after completing the plot (continue, pause); default * uses the setting from the last DEVICE statementCHARACTERS
= scalar Sets a limit on the length of each factor label; default * i.e. none
Parameters
FACTOR
= factors Factors to be listed on the plan and to define the layout (the procedure determines the style of line to divide each pair of plots in the design from the grid pen of the first factor in the list with which they have different levels); default * forms the list from first the factors specified by a preceding BLOCKSTRUCTURE statement, and then those specified by a preceding TREATMENTSTRUCTURE statementPEN
= scalars Pen to be used to write the levels of each factor on the plan (if PEN=0 the levels of that factor are not included); default 1PENGRID
= scalars Pens to be used to draw the boundaries between the plots in the design (according to the first FACTOR with which they have different levels but ignoring factors with PENGRID=0); default 1LABEL
= texts Labels to be used for each factor if its own levels or labels are inappropriate
Description
DDESIGN
uses high-resolution graphics to produce a plan of an experimental design. The plots in the design are assumed to be arranged on a rectangular grid. The rows of the plots are assumed to run from 1 (at the top of the graph) upwards and are specified by a variate supplied by the Y option. The columns (again running from 1 upwards) specified by a variate supplied by the X option. If either Y or X is not specified, DDESIGN will generate values automatically according to the factors in the design.The
TITLE, WINDOW, KEYWINDOW, SCREEN, KEYDESCRIPTION and ENDACTION options operate as usual in high-resolution graphics, while the CHARACTERS option allows a limit to be set on the length of each factor label when written on the plan.The factors involved in the experiment can be listed using the
FACTOR parameter. If this is omitted DDESIGN forms the list firstly from the factors in the previous BLOCKSTRUCTURE statement (or a "units" factor if there was none), and then from the factors (if any) in the previous TREATMENTSTRUCTURE statement.These factors are then used to draw the plan and to label the plots in the design. The
PEN parameter allows the levels or labels of the factors to be drawn using different pens (and thus, for example, in different colours). If the pen for any factor is defined as zero, its levels/labels are not included. However, it can still be used to determine the lines drawn to delimit the plots. For these lines, DDESIGN considers each pair of adjacent plots and checks through the list of factors to find the first one for which they have different levels. It then uses the grid pen (defined by the PENGRID parameter) to draw the dividing line. If the grid pen of any factor is zero, it is ignored.This makes it very easy to achieve the usual style of plan in which stronger lines are used for example to indicate the boundaries between different blocks than between the plots within blocks. For example, the parameter settings to draw a randomized block design with a single treatment factor Treat in this way would be
FACTOR=Block,Plots,Treat; PEN=1; PENGRID=1,2,0
if all the factors are to have their levels listed within the plots, or
FACTOR=Block,Plots,Treat; PEN=0,0,1; PENGRID=1,2,0
if only
Treat is to be listed. Note that, as each pair of plots will have different levels of either Block or Plot (or both), the PENGRID specified here for Treat is irrelevant.If a plot has no neighbour in some direction,
DDESIGN will check the next but one plot; if this too is not used in the design, the grid pen of the first FACTOR is used to mark the boundary.The final parameter,
LABEL, allows alternative labels to be specified for each factor if the existing ones are inappropriate.
Options:
Parameters:
FACTOR, PEN, PENGRID, LABEL.
Method
DDESIGN
makes use only of standard Genstat facilities for manipulation and plotting.
Action with
RESTRICTIf any of the factors or
X or Y is restricted, only the unrestricted plots are displayed.
DECIMALS procedure
Sets the number of decimals for a structure, using its round-off
(A. Keen)
No options
Parameters
STRUCTURE
= identifiers Numerical structure for which the number of decimals is to be setDECIMALS
= scalars To save the number of decimalsROUND
= scalars To save the round-off
Description
The default number of decimals that Genstat applies when printing a numerical structure is not always optimal. A scalar with value 0.1 is represented as 0.1000 for example. The trivial solution is to set the parameter
DECIMALS of the directive SCALAR or to set the parameter DECIMALS of the directive PRINT. However, for routine printing when tidy output is required (as may be the case in procedures), this is not a feasible solution.The numerical structure for which the number of decimals has to be determined must be specified using the parameter
STRUCTURE. The procedure calculates the appropriate number of decimal places, and modifies the declaration of the structure so that this becomes its default number of decimal places for subsequent printing. Parameter DECIMALS allows the number of decimals to be saved, parameter ROUND saves the round-off (see Method).
Options: none. Parameters:
STRUCTURE, DECIMALS, ROUND.
Method
The round-off value of a number equals 10k with k a negative or positive integer or zero. The round-off value of a number equals d if the number after dividing by d is an integer but after dividing by 10 ´ d is not. If the round-off value is such that the number of significant digits is greater than 4, the round-off value is increased correspondingly. For example, the round-off value of 880 equals 10, that of 0.2300 equals 0.01 and of 9999.11 equals 1. The round-off value of a structure is the minimum of the round-off values of all the elements of the structure, subject to the restriction that the number of significant digits does not exceed 4 for any of the values of the structure.
The number of decimals of a structure is calculated from the round-off value of the structure as -log10(round-off value), with minimum value zero. So in the above examples the number of decimals equals 0 for 880, 2 for 0.2300 and 0 for 9999.11.
Action with
RESTRICTRestrictions are not allowed.
DESCRIBE procedure
Saves and/or prints summary statistics for variates
(R.C. Butler)
Options
SELECTION
= strings Selects the statistics to be produced (nval, nobs, nmv, mean, median, min, max, range, q1, q3, var, sd, sem, %cv, sum, ss, uss, skew, seskew, kurtosis, sekurtosis); default mean, min, max, nobs, nmv, medi, q1, q3
Parameters
DATA
= variates Data to summarizeSUMMARIES
= variates To save summaries for each DATA variate
Description
DESCRIBE
calculates up to 21 different summary statistics for values stored in a variate. The statistics may be saved, or printed, or both. The statistics to be calculated are indicated by the SELECTION option; the available settings are:|
nval number of values |
var variance |
|
nobs number of non-missing values |
sd standard deviation |
|
nmv number of missing values |
sem standard error of mean |
|
mean arithmetice mean |
%cv coefficient of variation |
|
median median |
sum total of values |
|
min minimum |
ss corrected sum of squares |
|
max maximum |
uss uncorrected sum of squares |
|
range range (max-min) |
skew skewness (see Method) |
|
q1 lower quartile |
seskew standard error of skewness |
|
q3 upper quartile |
kurtosis kurtosis (see Method) |
|
sekurtosis s.e. of kurtosis |
by default the mean, min, max, nobs, nmv, median and both quartiles are calculated.
Printing is controlled by the
PRINT option. The statistics are printed by default, so to suppress printing you need to put PRINT=*.The
SUMMARIES parameter allows the statistics to be saved in a variate, which need not be declared in advance. The units of the variate are labelled by the corresponding strings from the settings (in capital letters) of the SELECTION option, to simplify the subsequent access of any individual statistic. For example, the minimum value can be copied from a SUMMARIES variate v into a scalar m byCALCULATE m = v$['MIN']
Options:
PRINT, SELECTION. Parameters: DATA, SUMMARIES.
Method
The statistics are calculated in a variate which is then restricted to print only those that were required, and to obtain the unit numbers of those to be copied into the
SUMMARIES variate.Skewness is calculated as (M3 - 3 M1 M2 + 2 M13 ) / (M2 - M1 M1)3/2
where Mi = S xi) / N
SE Skewness is calculated as Ö ({6N´ (N-1)}/{(N-2)´ (N+1)´ (N+3)})
Kurtosis is calculated as (M4 - 4 M1 M3 + 6 M12 M2 - 3 M14)/(M2 - M1 M1)2 - 3
SE Kurtosis is calculated as Ö ({24N(N-1)2}/{(N-2)(N-3)(N+5)(N+3)})
Action with
RESTRICTThe statistics are calculated for the restricted set of units from each
DATA variate. Any existing restrictions are not affected by the procedure.
DESIGN procedure
Helps to select and generate effective experimental designs
(M.F. Franklin, R.W. Payne & A.E. Ainsley)
No options
No parameters
Description
DESIGN
is a procedure which can be used interactively to form experimental designs of various types. The process involves answering questions, posed by Genstat, first to select the particular type of design, then to give details such as names of factors, numbers of treatments, and so on. A range of subsidiary procedures may be called, depending on the type of design selected. If you wish to avoid some of the question-and-answer process, the subsidiary procedures can also be called directly. They all have options and parameters which provide an alternative way of supplying the information otherwise obtained by the various questions and, provided you supply all the required information, they can also be used in batch.There are 13 types of design.
Orthogonal hierarchical designs - designs such as randomized blocks, split-plots, split-split-plots, &c.
Factorial designs (with blocking) - these have several treatment factors and a single blocking factor (giving strata for blocks and plots within blocks). The blocks are too small to contain a complete replicate of the treatment combinations and so various interaction are confounded with blocks.
Fractional factorial designs (with blocking) - again there are several treatment factors but the design does not contain every treatment combination and so some interactions are aliased; there can also be a blocking factor and some interactions will then be confounded with blocks.
Lattice designs - designs for a single treatment factor with number of levels that is the square of some integer k. The design has replicates, each containing k blocks of k plots, and different treatment contrasts can be confounded with blocks in each replicate.
Lattice squares - these are similar to lattices except that the blocking structure with the replicates has rows crossed with columns; again different treatment contrasts can be confounded with the rows and columns in each replicate.
Latin squares - designs are available for 3 to 14 treatments; several different orthogonal squares are available for most of these so, for example, Graeco Latin squares can be formed by calling
Alpha designs - these again have a single treatment factor but there is no constraint on the number of levels; the blocking structure has replicates and blocks within replicates. Further details are given in the description of the procedure
AFALPHA or by (Patterson & Williams 1976).Cyclic designs - these are designs with a single blocking factor which defines blocks that are too small to contain every treatment. Usually there is a single treatment factor, but you can also generate the cyclic superimposed designs of Hall & Williams (1973) in which there are two treatment factors and the treatment structure fits only the main effects. An alternative refinement (Davis & Hall 1969) has a crossed blocking structure generally taken to represent
subjects*time. Details of the cyclic process by which the treatment levels are generated can be found in the description of the procedure AFCYCLIC.Balanced-incomplete-block designs - designs where the experimental units are grouped into blocks such that every pair of treatments occurs in an equal number of blocks. All comparisons between treatments are thus made with equal accuracy, so the design is balanced and, in particular, can be analysed by
ANOVA. Further details are given in the description of procedure AGBIB.Neighbour-balanced designs - designs that allow an adjustments to be made for the effect that a treatment may have on adjacent plots. Further details are given in the description of procedure
AGNEIGHBOUR.Central composite designs - used to study multi-dimensional response surfaces; see procedure
AGCENTRALCOMPOSITE.Box-Behnken designs - used to study multi-dimensional response surfaces; see procedure
AGBOXBEHNKEN.Plackett Burman (main effect) designs - for estimating main effects of factors with two levels, using a minimum number of experimental units (Plackett & Burman 1946). Further details are given in the description of procedure
AGMAINEFFECT.You will be asked to provide a seed to be used to randomize the design and then given the opportunity to print a plan. If the design can be analysed by
ANOVA, the procedures will define appropriate block and treatment formulae and then ask if you want to see the skeleton analysis-of-variance table (containing just source of variation, degrees of freedom and efficiency factors). Whether or not you choose to print any of this information, at the end of the whole process all the block and treatment factors necessary to define the design will be available - and they will have the identifiers that you have supplied in response to the various questions asked by the procedures.
Options: none. Parameters: none.
Method
The
QUESTION directive is used to obtain the details of the required design. The design is then generated using GENERATE and the other standard Genstat directives for calculation and manipulation. Most of the information needed to specify the designs is stored in backing-store files on the computer, and much of this was adapted from the standard designs of the program DSIGNX (Franklin & Mann 1986).
References
Davis, A.W. & Hall, W.B. (1969). Cyclic change-over designs. Biometrika 56, 283-293.
Franklin, M.F. & Mann, A.D. (1986). DSIGNX a program for the construction of randomized experimental plans. Scottish Agricultural Statistics Service, Edinburgh (revised edition).
Hall, W.B. & Williams, E.R. (1973). Cyclic superimposed designs. Biometrika 60, 47-53.
Patterson, H.D. & Williams E.R. (1976). A new class of resolvable incomplete block designs. Biometrika, 63, 83-92.
Plackett, R.L. & Burman, J.P. (1946). The design of optimum factorial experiments. Biometrika, 33, 305-325 & 328-332.
DIALLEL procedure
Analyses full and half diallel tables with parents
(J.F. Potter)
Options
LABELS
= text Labels for rowcols, one text value for each, column j has the same label as row j, so each value of LABELS is the label for a pair of parents, applying to a rowcol; default 1...N, where N is the dimension of each diallel tableMETHOD
= string Whether to perform full or half diallel analysis (half, full); default full
Parameter
DATA
= matrices Each matrix contains the data for one block in the analysis, half diallel tables are presented as square matrices with the upper triangles and leading diagonals containing the values of interest, the matrices must be of the same size
Description
DIALLEL
performs analysis of variance of full diallel tables (Hayman 1954) and half diallels (Jones 1965). Work on variance and covariance relationships is also performed (Jinks 1954). The data are specified by the DATA parameter, in a square matrix for every block in the analyses. Half diallel tables are presented as square matrices with the upper triangle and leading diagonal containing the values of interest. The PRINT option controls printed output:data
data values,vrwr
variances and covariances of rowcols,regression
regression of the variances on the covariances,aov
analysis of variance table,means
means.The
LABELS option can give a text to be used for labelling rowcols (called arrays in the literature). The METHOD option specifies whether analysis of full or half diallels is required.
Options:
PRINT, LABELS, METHOD.Parameter:
DATA.
Method
DIALLEL
performs analysis of variance of full diallel tables, according to the method of Hayman (1954), and half diallels, according to the method of Jones (1965). A diallel table is a representation of the results of crossing a set of male and female homozygous parents in all possible combinations, including male:female reciprocation in full diallels. DIALLEL expects parent values (selfs) to be present as the leading diagonal of the table (whether a full or half matrix).The analysis of variance estimates the following genetic components of variation.
a: variation between mean effects of each parental line. Genetically this provides a test of additive variation, but also detects dominance if asymmetry present, i.e. if alleles at any one locus are not equally frequent (Hayman 1954).
b: variation caused by dominance at some of the loci. This term splits into:
b1: if significant this shows that dominance is largely uni-directional;
b2: estimates the asymmetry mentioned in a;
b3: signifies that some dominance is peculiar to individual crosses; If the symmetry condition is met, b1 and b3 together give a test of dominance equivalent to b.
c: variation between average maternal effects of each parental line.
d: variation in the reciprocal differences not attributable to c.
t: total variation.
Components c and d are reciprocal effects not available in half diallels. In the absence of replication, the d term should be used as the error term for testing components a to c in the full diallel.
DIALLEL
can also analyse over any number of blocks, in which case block effects are also estimated, and block interactions with the above components can then be used as estimates of error to test the significance of the components.Variances of rowcols (Vr) are compared with the covariance of the rowcols (Wr) with the corresponding concurrent parents, using the method of Jinks (1954). This entails the regression of Wr on Vr, which gives measures of adequacy of the model, average dominance, and the distribution of dominant and recessive genes. The analysis of diallel tables is more fully described by Mather and Jinks (1971).
Many other diallel methods exist,
DIALLEL representing quite a complex one, but one which makes fairly limiting assumptions, e.g. only a reference population in Hardy-Weinberg equilibrium with respect to individual loci and linkage equilibrium with respect to all pairs of loci can legitimately be used to estimate the genetic variance components. This means a large population reproducing by panmixia without selection. This and other difficulties such as the need for distinction between ancestral and descendant reference populations are discussed by Wright (1985).
Action with
RESTRICTRestrictions are ignored for text
LABELS and are not relevant for DATA, which is of type matrix.
References
Hayman, B.I. (1954). The Analysis of Variance of Diallel Tables. Biometrics, 10, 235-244.
Jones, R.M. (1965). Analysis of Variance of the Half Diallel Table. Heredity, 20, 117-121.
Jinks, J.L. (1954). The Analysis of Continuous Variation in a Diallel Cross of Nicotiana rustica Varieties. Genetics, 39, 767-788.
Mather, K. & Jinks, J.L. (1971). Biometrical Genetics, 249-284. Chapman & Hall Ltd.
Wright, A.J. (1985). Diallel Designs, Analyses, and Reference Populations. Heredity, 54, 307-311.
DILUTION procedure
Calculates Most Probable Numbers from dilution series data
(M.S. Ridout & S.J. Welham)
Options
%LIMITS
= scalar Percentage points for confidence limits; default 95RMETHOD
= string Which type of residuals to form (deviance, Pearson); default devianceMAXCYCLE
= scalar Maximum number of iterations allowed for the Newton-Raphson algorithm to converge; default 10TOLERANCE
= scalar Defines the convergence criterion; default 0.0005
Parameters
POSITIVE
= variates Number of positive subsamples at each dilutionNSAMPLE
= variates Total number of subsamples tested at each dilutionVOLUME
= variates Volume of original sample present in each dilutionFITTED
= variates To store the fitted valuesRESIDUAL
= variates To store the residuals, as specified by option RMETHODMPN
= scalars To store the maximum likelihood estimate of Most Probable NumberUPPER
= scalars To store the upper confidence limit for MPNLOWER
= scalars To store the lower confidence limit for MPNDEVIANCE
= scalars To store the residual deviancePEARSONCHI
= scalars To store Pearson's chi-squared statisticDF
= scalars To store the degrees of freedom for goodness-of-fit tests (zero if no goodness of fit test is available)
Description
A dilution series experiment seeks to estimate the number of organisms in a sample. This is done by preparing successive dilutions of the original sample (usually with a constant dilution factor at each stage), and then testing for the presence/absence of organisms in several subsamples at each dilution. Under certain assumptions, discussed, for example, by Cochran (1950), it is then possible to estimate, by maximum likelihood, the number of organisms in the original sample. In the context of dilution series data, the maximum likelihood estimator is usually known as the Most Probable Number (MPN) of organisms.
DILUTION
calculates the MPN estimator, together with likelihood-based confidence limits for the number of organisms.The number of positive subsamples at each dilution (i.e. the number of subsamples which show the presence of organisms) must be specified in a variate using the parameter
POSITIVE. The total number of subsamples used at each dilution, and the volume of the original sample used at each dilution, must be supplied in variates using parameters NSAMPLE and VOLUME.Output is controlled by the
PRINT option. The estimate setting produces the MPN estimate and associated confidence limits, together with the deviance and Pearson's chi-squared statistic. The fitted setting gives observed and fitted values with residuals. All this information is produced by default. The range of the confidence limits can be set by option %LIMIT, the default being 95% limits, and the type of residuals produced (deviance or Pearson) is controlled by the RMETHOD option.Both the MPN estimator and the confidence limits are calculated iteratively. Option
MAXCYCLE sets the maximum number of iterations allowed in each case, the default being 10. Option TOLERANCE specifies the convergence criterion for the MPN estimator; the estimation process is considered to have converged when the absolute value of the derivative of the log-likelihood is less than TOLERANCE. The default value of TOLERANCE is 0.0005. The iterative calculation of the confidence limits is considered to have converged when the log-likelihood takes the correct value to 2 decimal places.All the information generated can be saved using parameters of the procedure:
MPN saves the estimate; UPPER and LOWER save the upper and lower confidence limits; DEVIANCE, PEARSONCHI and DF save the goodness of fit statistics and the degrees of freedom; and the fitted values and residuals are saved by FITTED and RESIDUAL.
Options:
PRINT, %LIMITS, RMETHOD, MAXCYCLE, TOLERANCE.Parameters:
Method
The Newton-Raphson algorithm is used to find both the MPN and the appropriate confidence limits.
Action with
RESTRICTIf any of
POSITIVE, NSAMPLE or VOLUME are restricted (these restrictions must be compatible), then only the restricted set of units will be used.
Reference
Cochran, W.G. (1950). Estimation of bacterial densities by means of the `most probable number'. Biometrics, 6, 105-116.
DISCRIMINATE procedure
Performs discriminant analysis
(P.G.N. Digby)
Options
NROOTS
= scalar The number of dimensions to be used for printed and saved output, and used in calculating the distances and the allocation of units; default is to use the full dimensionalityREALLOCATE
= string Whether units fron the training set are to be reallocated to groups (no, yes); default no
Parameters
DATA
= pointers Each pointer contains a set of variates to be analysedGROUPS
= factors Define groupings for the units in each training set, or missing values for the units to be allocatedNEWGROUPS
= factors Save allocations (and reallocations)MEANS
= matrices Save scores for group meansSCORES
= matrices Save scores for unitsDISTANCES
= matrices Save unit to group-mean squared distancesLRV
= LRVs Save the LRVs from the canonical variate analysesADJUSTMENTS
= matrices Save adjustments to the canonical variate analyses
Description
DISCRIMINATE
performs discriminant analysis (see, for example, Mardia, Kent & Bibby 1979).The input for the procedure is given by a pointer and a factor, specified by the
DATA and GROUPS parameters, respectively. The pointer contains a set of variates defining the attributes of the units. Any unit with a missing value in any of the variates is excluded from the analysis. Units can also be excluded from the analysis by restricting the factor or variates; any such restrictions must be consistent (the rules here are exactly as used by the FSSPM directive). The factor specifies the pre-defined groupings of the units from which the allocation is derived (the 'training set'); the units to be allocated by the analysis have missing factor values. The levels of the factor must all exceed -9999, or a misallocation of the units may result.Printed output is controlled by the option
PRINT with settings: lrv to print the canonical variate loadings, the latent roots and the trace; adjustments to print the adjustments required to the canonical variate scores; means to print canonical variate scores for the group means;scores
to print canonical variate scores for the units; distances to print Mahalanobis squared distances between the units and the group means; newgroups to print the initial grouping and the allocation of units to groups.The
NROOTS option may be used to specify how many dimensions are to be printed and retained for the latent roots and vectors and for the scores of the means and units. The distances of the units from the group means, and thus the allocation of units, are also formed from the scores in the number of dimensions specified by NROOTS. By default results will be for the full dimensionality, i.e. the smaller of the number of variates and one less than the number of groups.The
REALLOCATE option may be used to specify whether the units in the training set are to be reallocated to groups by the procedure. If the default setting no is used then their group values, either printed or saved, will be missing.Results from the analysis can be saved using the parameters
NEWGROUPS, MEANS, SCORES, DISTANCES, LRV and ADJUSTMENTS. The structures specified for these parameters need not be declared in advance. The results correspond to p dimensions, where p is the smaller of either the number of variates, or the number of groups minus one.
Options:
PRINT, NROOTS, REALLOCATE.Parameters:
DATA, GROUPS, NEWGROUPS, MEANS, SCORES, DISTANCES, LRV, ADJUSTMENTS.
Method
A canonical variate analysis (
CVA) is used to obtain the scores for the group means and the LRV containing the loadings (L), roots and trace; the analysis excludes units omitted by RESTRICT, or that have missing values in the data variates or the GROUPS factor. Scores are then calculated for all the units (i.e. ignoring any restrictions or missing values), using the formula( X L ) + ( J A )
where X is a matrix containing the full set of units-by-variables data, J is a column vector of one's, and A is a row vector of adjustments required to place the scores for the units onto the same scale as those for the group means.
Mahalanobis squared distances between the units and the group means are calculated from the canonical variate scores. Each unit is then allocated to the group for which it has the smallest Mahalanobis squared distance to the group mean. In forming the allocations it is assumed that none of the levels of the factor
GROUPS is less than or equal to -9999; otherwise a misallocation of the units may result.
Action with
RESTRICTThe input variates and factor may be restricted. The restrictions must be identical, otherwise a diagnostic will be generated by an
FSSPM statement within the procedure. The canonical variate analysis is based only on the units not excluded by the restriction. Scores are calculated for all the units, however these are based only on the non-excluded units: i.e. the adjustments for the canonical variate scores are calculated from the non-excluded units, and the loadings used to calculate the scores are those from the canonical variate analysis.
Reference
Mardia, K.V., Kent, J.T. & Bibby, J.M. (1979). Multivariate analysis. Academic Press, London.
DMST procedure
Gives a high resolution plot of an ordination with minimum spanning tree
(A.W.A. Murray)
Options
DIMENSIONS
= scalars Two numbers specifying the dimensions to display, allowed values 1...5TITLE
= text Title for the graphWINDOW
= scalar Window for the graph; default 1KEYWINDOW
= scalar Window for the key; default 2SCREEN
= string Controls screen (clear, keep); default clea
Parameters
COORDINATES
= matrices or datamatricesCoordinates from ordination
TREE
= matrices Minimum spanning treeSIMILARITY
= symmetric matricesAssociation matrix used to derive ordination
SYMBOLS
= factors or texts Symbols to label the coordinatesPENCOORDINATES
= scalars Pen to use for the coordinatesPENTREE
= scalars Pen to use for the minimum spanning tree
Description
DMST
plots a minimum spanning tree using coordinates saved, for example, from a PCO. The COORDINATES parameter specifies the coordinates for the units in the plot, using either a matrix or a pointer to a set of variates (that is, a "datamatrix"). The minimum spanning tree can be supplied using the TREE parameter, or it can be calculated from the original association matrix specified using the SIMILARITY parameter. If TREE supplies a matrix with no values, these will be set to the tree calculated from the SIMILARITY matrix. If the COORDINATES structure was originally declared with row labels the procedure will automatically use these to label the plots. Alternative symbols can be defined using the SYMBOLS parameter. You can also specify the pens to be used to plot the coordinates and tree, using parameters PENCOORDINATES and PENTREE respectively. The definition of these pens, outside the procedure, thus allows the colour, size, font and linestyle of links in the tree to be controlled. By default the coordinates are plotted with colour 1 and the tree with colour 2, symbols are 0.8 of normal size, and the tree is plotted with a dotted line.Options
TITLE, WINDOW, KEYWINDOW and SCREEN function as usual for high resolution graphics. If the WINDOW is unset a default layout with appropriately labelled axes is produced in window 1. Axes will be scaled automatically unless limits have already been set outside the procedure.
Options:
DIMENSIONS, TITLE, WINDOW, KEYWINDOW, SCREEN.Parameters:
COORDINATES, TREE, SIMILARITY, SYMBOLS, PENCOORDINATES, PENTREE.
Method
A two dimensional representation of the results of a multivariate analysis, such as a
PCO, is plotted on the current high resolution graphics device. A minimum spanning tree is calculated (by HDISPLAY) from an input similarity matrix if not supplied. The tree is superimposed on the plot. The procedure uses GETATTRIBUTE to access the row labels (if any) of the input structures. The input structures are converted to variates if necessary and DGRAPH is used to plot the desired data.
Action with
RESTRICTRestrict is irrelevant with matrix input structures. It should work as expected with variates.
DOTPLOT procedure
Produces a dot-plot using line-printer or high-resolution graphics
(J. Ollerton & S.A. Harding)
Options
GRAPHICS
= string Whether to use high-resolution graphics or line-printer graphics (lineprinter, highresolution); default highTITLE
= string Title for the Dot Plot; default *WINDOW
= scalar Window number for the graph; default 1SCREEN
= string Whether to clear the screen before plotting or to or continue plotting on the old screen (clear, keep); default cleaENDACTION
= string Action to be taken after completing the plot (continue, pause); default * uses the current settingDIRECTION
= string Order in which to sort the data before plotting, DIRECTION=* implies plot unsorted data (ascending, descending); default asceLINES
= string How to draw guide lines on the plot, LINES=* omits the guide lines (todot, full); default todot draws lines from the x-origin to the dots
Parameters
YLABELS
= texts Text specifying Y labels for each dotplotX
= variates Data to be plottedPENDOTS
= scalars Pen to draw the dots; default 1PENLINES
= scalars Pen to draw the lines; default 2
Description
DOTPLOT
produces a dot-plot from two parameters, a variate of x-data and a text containing y-labels. Option GRAPHICS allows the plotting to be done using line-printer graphics instead of the default high-resolution graphics.The display takes the form of a vertical histogram, with a single row for each value of
YLABELS. The length of line for each row is specified by the corresponding value of x. It is customary to sort the data according to the x-values, into either ascending or descending order. This is controlled by the DIRECTION option, which by default is ascending; setting DIRECTION=* will plot the data unsorted.For high-resolution plots the guide lines can also be drawn across the full width of the plot (
LINES=full) or can be omitted (LINES=*). By default, pens are set up to draw the dots and lines in a form appropriate for the output device. For an interactive display, solid guide lines in pale grey are used; for other devices dashed or dotted lines are used. The plotting symbol is symbol 2 (circle), except for PostScript output which uses a solid dot (SYMBOL=-9). The parameters PENDOTS and PENLINES can be used to specify pens which have been set up with different attributes.By default the dot-plot is produced in window 1, but this can be changed using the
WINDOW option. A FRAME statement can be used before using DOTPLOT to change the size and position of the display (for example to widen the x lower margin to allow more space for the y-labels). The SCREEN option controls whether or not the screen is cleared before plotting and the ENDACTION option determines what action to take after completing the plot.An
AXES statement can be used to set axis titles and modify the upper and lower bounds of the x-axis. If axis titles are not set explicitly they will be generated from the identifier names of the YLABEL and X parameters.For high-resolution plots, the default window size specifies a lower x-margin of size 0.12. This allows room for a title and labels of up to about 10 characters. To produce a dot-plot with longer labels, a
FRAME statement should be used to specify new dimensions for the window that include a larger value for XMLOWER. A full-size window, with standard margins, has room for about 48 rows before the labels start to overlap. To produce a dot-plot with more rows the margins should be reduced or the axis pen size reduced.
Options:
GRAPHICS, TITLE, WINDOW, SCREEN, ENDACTION, DIRECTION, LINES.Parameters:
YLABELS, X, PENDOTS, PENLINES.
Method
A y-variate is constructed with values 1...
NVALUES(YLABELS) and plotted against the variate X. If required the variates are sorted (this action is performed on duplicates of the data so as not to alter the original variates).
Action with
RESTRICTDOTPLOT will obey restrictions on either YLABELS or X.
Reference
Cleveland, W.S. (1985). The elements of graphing data. Wadsworth advanced books and software.
DPARALLEL procedure
Displays multivariate data using parallel coordinates
(Z. Karaman)
Options
TITLE
= text Title for the plotGROUPS
= factor Defines grouping of the units (if any); by default, different pens are used for the observations in different groupsPERMUTATIONS
= string Whether to display all necessary permutations so that any two variates will be adjacent in at least one plot, or just display once in the order given by the DATA pointer (yes, no); default noSCALING
= string Whether to do scaling overall (scale all variates on the same scale), or to scale each variate separately (overall, separate); default sepaPEN
= variate Pens to be used for different groups (if any); default * uses pens from 1 up to the number of groups (number of levels of the GROUPS factor)
Parameter
DATA
= variates Data variables to be plotted
Description
The scatter plot is probably the most powerful and most frequently used statistical tool for analysing the relationship between two variables. It is very intuitive way to look at the data since it corresponds to our perception of the world. The major drawback is that it does not generalise naturally to higher dimensions. Using interactive graphics devices like high-resolution screens one can rotate a point cloud in three dimensions (commonly called spinning), and further dimension can be partially encoded by using different colours, symbols, or symbol sizes; however, this technique can be used only on interactive graphics devices, and it is difficult to see relationships between all the variables at a time. Another possibility is the matrix of scatter plots (provided by procedure
DSCATTER), but this has the drawback that it is difficult to follow one data point across several plots.An alternative is to display multivariate data using parallel coordinates. The dimensions are not represented by orthogonal lines as is customary done when plotting scatter diagrams (which limits the dimensionality to two, or at most three if spinning is used). Rather, they are represented by a series of parallel lines (either horizontal or vertical), and a point in a multidimensional space is represented by a broken line connecting its coordinates in each dimension. The only limit on the number of dimensions that can be displayed simultaneously by such plot is its readability, which is a function of the underlying graphics display (hardware). The parallel coordinates geometry was developed by Inselberg (1985) in the context of computational geometry; it was applied to statistical multidimensional analysis by Wegman (1990). Inselberg also gives some interesting duality properties between classical Euclidean plane and parallel coordinates geometry.
The relationship between two variables can be visually assessed by inspecting a parallel coordinates plot. When the correlation between two variables is close to -1, the lines are crossing over and so, in the limit, we would have a pencil of lines. (A pencil of lines is a set of lines that are coincident at a single point.) On the other hand, when the correlation approaches +1, we will have fewer and fewer crossovers, so that in the limit we would have a set of parallel lines. The pairwise comparisons are easy for variables represented by adjacent axes; however, they are much more difficult for the axes far away on the graph. For n variables, there are n! possible permutations, but many of these duplicate adjacencies. Wegman (1990) has shown that with a relatively small number of permutations of the axes (approximately n/2) one can achieve that in some permutation every variable is adjacent to every other variable. Multivariate outliers can be identified easily on this plot, since it is very intuitive to follow with one's eye the line across the axes. If the
PERMUTATIONS option is set to yes, several plots will be produced so that every pair of variables is adjacent in at least one plot.In our implementation we have chosen to dispose the axes vertically, since this way the readability is maximized for most output devices (either terminal screens or printers when printing in landscape mode). The variables can be independently scaled on a 0 to 1 scale, or left in original units if the values are of the same order of magnitude. In the first case it is easier to have an visual estimate of the correlation between the two adjacent variables; on the other hand, leaving the data in original units gives us a good idea of the location and spread parameters of the marginal distributions.
The data are specified, in a list of variates, using the
DATA parameter. The GROUPS option can be used to specify a grouping factor. The lines for observations in each group are then plotted using different pens, thus giving an immediate insight to any patterns in data. By default, pens 1 upwards are used for the different groups, but the PEN option can be used to specify other pens, in a variate with as many values as groups. If the GROUPS option is not set, the PEN option can be set to a scalar, to select the pen to be used for all the points. The TITLE option can be used to supply a title for the plots.
Options:
TITLE, GROUPS, PERMUTATIONS, SCALING, PEN.Parameter:
DATA.
Method
DPARALLEL
uses the standard Genstat directives for data manipulation and graphics. The underlying methodology is described by Inselberg (1985) and Wegman (1990). It calls subsidiary procedure WEGMAN to generate the permutations matrix; each column of the output matrix gives one of the permutations described by Wegman (1990).
Action with
RESTRICTRestrictions are not allowed. Missing values are allowed within the input variates in
DATA; the observations with missing data are not excluded form the plot, but will have the parts of their broken lines adjacent to the missing value missing from the plot.
References
Inselberg, A. (1985). The plane with parallel coordinates. The Visual Computer, 1, 69-91.
Wegman, E. (1990). Hyperdimensional data analysis using parallel coordinates. JASA, 85, 664-675.
DPOLYGON procedure
Draws polygons using high-resolution graphics
(M.A. Mugglestone, S.A. Harding, B.Y.Y. Lee, P.J. Diggle & B.S. Rowlingson)
Options
TITLE
= text Main title for the plot; default *WINDOW
= scalar Which graphics window to use for the plot; default 1KEYWINDOW
= scalar Which graphics window to use for the key; default 2YTITLE
= text Title for the vertical axis; default *XTITLE
= text Title for the horizontal axis; default *YLOWER
= scalar Lower limit for the vertical axisYUPPER
= scalar Upper limit for the vertical axisXLOWER
= scalar Lower limit for the horizontal axisXUPPER
= scalar Upper limit for the horizontal axisSCREEN
= string Whether to clear the screen before plotting or to continue plotting on the old screen (clear, keep); default cleaKEYDESCRIPTION
= text Overall description for the key; default *ENDACTION
= string Action to be taken after completing the plot (continue, pause); default paus
Parameters
YPOLYGON
= variates Vertical coordinates of one or more polygons; no default - this parameter must be setXPOLYGON
= variates Horizontal coordinates of one or more polygons; no default - this parameter must be setPEN
= scalars or variates or factorsPen number for each graph
DESCRIPTION
= texts Annotation for the key
Description
DPOLYGON
draws polygons onto the current graphics device. Parameters XPOLYGON and YPOLYGON specify variates containing the horizontal and vertical coordinates of the polygons. DPOLYGON uses procedure DPTMAP to produce the plot. This uses the AXES and FRAME directives to set up axes with equal scales. Options YLOWER, YUPPER, XLOWER and XUPPER can be used to specify bounds for the axes, or these can be set automatically. The axes are made to extend slightly beyond the range of values to be plotted, and are drawn using the box style. Titles for the horizontal and vertical axes can be specified using the XTITLE and YTITLE options, respectively. Options TITLE, WINDOW, KEYWINDOW, SCREEN, KEYDESCRIPTION and ENDACTION are as in DGRAPH.By default,
DPOLYGON uses a different pen for each polygon. The sequence of pens is the same as the default sequence of pens used by DGRAPH but the pens are set to use METHOD=line, SYMBOLS=0 and JOIN=given, so that each polygon is drawn as a sequence of connected line segments. Other pen styles can be specified using the PEN parameter, except that the procedure will override settings of METHOD, SYMBOLS and JOIN, replacing them by METHOD=line, SYMBOLS=0 and JOIN=given. The original settings will be restored on exiting the procedure. To draw polygons in a different style, for example, using lines and points, you can use DPTMAP directly, with an appropriate PEN setting, rather than DPOLYGON.
Options:
Parameters:
YPOLYGON, XPOLYGON, PEN, DESCRIPTION.
Method
A procedure
PTCHECKXY is called to check that each pair of structures in XPOLYGON and YPOLYGON have identical restrictions. If the PEN parameter is unset then pens with METHOD=line and SYMBOLS=0 will be specified using the PEN directive. PTCLOSEPOLYGON is used to close the polygons and DPTMAP to draw them.
Action with
RESTRICTIf any of the variates in
XPOLYGON and YPOLYGON are restricted, only the subset of values specified by the restriction will be included in the graph.
DPTMAP procedure
Draws maps for spatial point patterns using high-resolution graphics
(M.A. Mugglestone, S.A. Harding, B.Y.Y. Lee, P.J. Diggle & B.S. Rowlingson)
Options
TITLE
= text Main title for the plot; default *WINDOW
= scalar Which graphics window to use for the plot; default 1KEYWINDOW
= scalar Which graphics window to use for the key; default 2YTITLE
= text Title for the vertical axis; default *XTITLE
= text Title for the horizontal axis; default *YLOWER
= scalar Lower limit for the vertical axisYUPPER
= scalar Upper limit for the vertical axisXLOWER
= scalar Lower limit for the horizontal axisXUPPER
= scalar Upper limit for the horizontal axisSCREEN
= string Whether to clear the screen before plotting or to continue plotting on the old screen (clear, keep); default cleaKEYDESCRIPTION
= text Overall description for the key; default *ENDACTION
= string Action to be taken after completing the plot (continue, pause); default paus
Parameters
Y
= variates Vertical coordinates of one or more spatial point patterns; no default - this parameter must be setX
= variates Horizontal coordinates of one or more spatial point patterns; no default - this parameter must be setPEN
= scalars or variates or factorsPen number for each graph
DESCRIPTION
= texts Annotation for the key
Description
DPTMAP
is a specially adapted version of DGRAPH designed for producing maps of spatial point patterns. The procedure uses the AXES and FRAME directives to set up axes with equal scales. Options YLOWER, YUPPER, XLOWER and XUPPER can be used to specify bounds for the axes, or these can be set automatically. The axes are made to extend slightly beyond the range of values to be plotted, and are drawn using the box style. The parameters X and Y specify pointers to variates containing the horizontal and vertical coordinates of one or more spatial point patterns. Titles for the horizontal and vertical axes can be specified using the XTITLE and YTITLE options, respectively. Options TITLE, WINDOW, KEYWINDOW, SCREEN, KEYDESCRIPTION and ENDACTION are as in DGRAPH.
Options:
Parameters:
Y, X, PEN, DESCRIPTION.
Method
A procedure
PTCHECKXY is called to check that each pair of structures in X and Y have identical restrictions. If any of YLOWER, XUPPER, YLOWER and YUPPER are unset, the procedure PTBOX is used to assign suitable values based on the data in X and Y. The values of these options are then adjusted to extend the range of the axes and so produce a more attractive plot. The adjusted values are given byXLOWER - 0.05 * range(X),
XUPPER + 0.05 * range(X),
YLOWER - 0.05 * range(Y),
YUPPER + 0.05 * range(Y),
where range(X) is the range of values in X and range(Y) is the range of values in Y. The AXES directive is then used to set up box-style axes with the required upper and lower limits and titles specified by XTITLE and YTITLE. The FRAME directive is used to ensure equal scales on the horizontal and vertical axes. Finally, the DGRAPH directive is used to draw the map on the current graphics device.
Action with RESTRICT
If any of the variates in
X and Y are restricted, only the subset of values specified by the restriction will be included in the graph.
DPTREAD procedure
Adds points interactively to a spatial point pattern
(M.A. Mugglestone, S.A. Harding, B.Y.Y. Lee, P.J. Diggle & B.S. Rowlingson)
Options
WINDOW
= scalar Which graphics window to use for the plot; default 1
Parameters
OLDY
= variates Vertical coordinates of each spatial point pattern; no default - this parameter must be setOLDY
= variates Horizontal coordinates of each spatial point pattern; no default - this parameter must be setNEWY
= variates Variates to receive the vertical coordinates of the original points and added pointsNEWX
= variates Variates to receive the horizontal coordinates of the original points and added points
Description
DPTREAD
uses the DREAD directive to add points to a spatial point pattern. The coordinates of the existing points must be supplied using the parameters OLDX and OLDY. These points will be plotted on the current graphics device using DPTMAP with a pen setting of SYMBOLS=1. The WINDOW option may be used to specify the graphics window to use for the plot. is not always available, and its operation may vary slightly from one system to another. The Users' Note supplied with Genstat explains how to read points and terminate input on specific devices. The usual method for reading points is to click the left mouse button at the required position. The usual way to terminate input is to click the right mouse button. The points read using DREAD will be echoed using a pen setting of SYMBOLS=2. The coordinates of the new spatial point pattern containing the original points and any points which have been added may be saved using the parameters NEWX and NEWY.Printed output is controlled using the
PRINT option. The settings available are monitoring (which prints the coordinates of the points to be added) and summary (which prints the coordinates of the new pattern consisting of the original points and any that have been added under the headingss NEWX and NEWY). The default setting is for both monitoring and summary.
Options:
PRINT, WINDOW.Parameters:
OLDY, OLDX, NEWY, NEWX.
Method
A procedure
PTCHECKXY is called to check that OLDX and OLDY have identical restrictions. DPTMAP is used to draw a map of the original point pattern. The DREAD directive is then used to read the coordinates of points to be added. Finally, the coordinates for the original points and added points are combined in new variates using the EQUATE directive.
Action with
RESTRICTIf
OLDX and OLDY are restricted, only the subset of values specified by the restriction will be included in the calculations.
DREPMEASURES procedure
Plots profiles and differences of profiles for repeated measures data
(J.T.N.M. Thissen)
Options
TITLE
= string Title for the plots; default *GROUPS
= factors List of one or two factors; one factor gives one plot while a list with two factors gives as many plots as the number of levels of the first factor in the list; must be setTIMEPOINTS
= variate Variate of timepoints; by default the suffixes of the DATA pointer are usedDIFFERENCES
= string Can suppress plotting of the differences (no, yes); default no
Parameter
DATA
= pointers Each pointer contains the data variates (observed at successive times)GROUPMEANS
= tables To save the calculated treatment means at each timepoint
Description
A repeated measures experiment is one in which the same set of units, or subjects, is observed at a sequence of times to investigate treatment effects over a period of time.
produces high-resolution graphs of the progress in time for the data variates specified in a pointer by the DATA parameter. Each variate contains the measurements made on the set of units at one of the occasions on which they were observed. The timepoints along the x-axes of the graph are the suffixes of the pointer unless the option TIMEPOINTS is specified. The grouping of the subjects should be specified by one or two factors, and input using the GROUPS option. If one factor is specified, the means of the observations at each level of the factor are plotted in one graph. If two factors are specified several graphs are produced: each graph is a plot of the means of the observations at the various levels of the second factor for a particular level of the first.The means are calculated with the directive
TABULATE. If the data variates contain missing values a warning is printed indicating the possibility of misleading results. (Before using DREPMEASURES missing values can be estimated using procedure MULTMISS.)If option
DIFFERENCES=yes, two plots are produced, beside each other: one of the profiles and one of the differences with the first level. The default setting no gives the plot of the profiles only. Plots of differences can be produced only if the factor has more than one level. The TITLE option can be used to provide a title for the plots.The calculated means can be saved by specifying parameter
GROUPMEANS.
Options:
TITLE, GROUPS, TIMEPOINTS, DIFFERENCES. Parameters: DATA, GROUPMEANS.
Method
Means are calculated with the directive
TABULATE. If restricted variates are specified in DATA, procedure SUBSET is used to remove any levels of the factors that are not present in the subset of subjects.
Action with
RESTRICTIf any of the variates in the
DATA pointer is restricted, only the units not excluded by the restriction will be used for the graphs. If any other DATA variate or GROUPS factor is restricted, it must be restricted to the same set of units. The variate specified by TIMEPOINTS must not be restricted.
DRPOLYGON procedure
Reads a polygon interactively from the current graphics device
(M.A. Mugglestone, S.A. Harding, B.Y.Y. Lee, P.J. Diggle & B.S. Rowlingson)
Options
WINDOW
= scalar Window from which to read default 1
Parameters
YPOLYGON
= variates Variates to receive the vertical coordinates of the polygons that are readXPOLYGON
= variates Variates to receive the horizontal coordinates of the polygons that are readPEN
= scalars Pen numbers to use to echo points
Description
DRPOLYGON
uses the DREAD directive to read the coordinates of a sequence of points which define a polygon. The WINDOW option may be used to specify the window from which to read. The DREAD directive will only work within a window that contains a graph or a contour plot. A call to DRPOLYGON should, therefore, be preceded by a call to DPTMAP, DPOLYGON, DGRAPH or DCONTOUR. is not always available, and its operation may vary slightly from one system to another. The Users' Note supplied with Genstat explains how to read points and terminate input on specific devices. The usual method for reading points is to click the left mouse button at the required position. The usual way to terminate input is to click the right mouse button. The last point of any polygon is implicitly connected to the first point. There is no need to re-enter the first point to draw a closed polygon - this will be done automatically after input has been terminated. The horizontal and vertical coordinates of the polygon may be saved using the parameters XPOLYGON and YPOLYGON, respectively.The
PEN parameter may be used to specify which pen to use to echo points which have been read. The default setting of PEN uses METHOD=line, LINESTYLE=1, SYMBOLS=1 and JOIN=given.Printed output is controlled by the
PRINT option. The default setting of summary prints the horizontal and vertical coordinates of the polygon under the headings XPOLYGON and YPOLYGON.
Options:
PRINT, WINDOW.Parameters:
YPOLYGON, XPOLYGON, PEN.
Method
If the
PEN parameter is unset then a pen with METHOD=line, LINESTYLE=1, SYMBOLS=1 and JOIN=given will be specified using the PEN directive. The DREAD directive is used to read in the coordinates of an open polygon, and then the DGRAPH directive is used to draw a line joining the last point of the polygon to the first point.
DSCATTER procedure
Produces a scatter-plot matrix using high-resolution graphics
(J. Ollerton)
Option
PEN
= scalar or variate or factor Pen number for the graph; default 1
Parameter
DATA
= variates A list of variates to be plotted
Description
Procedure
DSCATTER produces a scatter-plot matrix, from a set of variates, using high-resolution graphics.The parameter
DATA lists the variates to be plotted; each variate is plotted against all other variates, producing plots which are arranged as the lower triangle of a matrix with shared scales. Titles for the axes are the identifiers of the variates.The number of variates which can be plotted by this procedure is in effect unlimited, but of course the greater the number of variates, the smaller the individual plots are.
The pen which is used to plot the data can be specified with the option
PEN.
Option:
PEN. Parameter: DATA.
Method
Each variate is plotted against every other variate, producing n(n-1)/2 graphs. A full scatter-plot matrix would produce n(n-1) separate graphs in the shape of an n ´ n matrix, with both lower and upper triangles of the matrix containing the n(n-1)/2 set of plots. This procedure forms just the lower triangle of the scatter-plot matrix.
Action with
RESTRICTIf any variate in the set of pointers is restricted then only the units not excluded by the restriction (and the corresponding units of other variates) will be plotted.
Reference
Cleveland, W.S. (1985). The elements of graphing data. Wadsworth advanced books and software.
DSHADE procedure
Produces a pictorial representation of a data matrix
(S.A. Harding)
Options
WINDOW
= scalar Window number for the graph; default 1KEYWINDOW
= scalar Window number for the key (0 for no key); default 2SCREEN
= string Whether to clear the screen before plotting or to continue plotting on the old screen (clear, keep); default cleaGRID
= string How to draw a grid around the elements of the matrix (present, complete); default pres
Parameters
DATA
= symmetric matrices, matrices, or pointers to variatesMatrices to be plotted
NGROUPS
= scalars Number of groups to form from the levels of similarity (i.e. number of different shades)PERMUTATION
= variates Can define permutations to be done to the units of symmetric matrices prior to plotting
Description
DSHADE
produces a shaded representation of a rectangular or symmetrix matrix using high-resolution graphics. Each element of the data matrix is represented by a shaded rectangle indicating the value at that location using either colour or shading density. This type of display is often used in cluster analysis for displaying a similarity matrix, but is also useful for the graphical display of spatial data.The data for the procedure consists of a matrix, a symmetric matrix (e.g. of similarities), or a pointer to a set of variates, specified by the
DATA parameter. The NGROUPS parameter defines the number of levels that are to be used when grouping the data for the display; this must be in the range 1 to 32. When producing a shaded plot of a similarity matrix a permutation of the units can be specified, using the PERMUTATION parameter; a suitable variate could be obtained, for example, from the HCLUSTER directive. PERMUTATION is ignored if DATA is not a symmetric matrix.The individual elements are shaded using pens 1...
NGROUPS, with pen 1 being used for the lowest values and pen NGROUPS for the highest. Missing values are ignored, thus leaving blank areas in the plot. The current COLOUR and BRUSH settings of these pens define how each level is represented. If the default settings do not produce a suitable display, these attributes should be set by a PEN statement before using DSHADE. Colour displays can use a solid brush pattern for all pens, with different colours representing the different levels of similarity. To obtain a gray-scale for the shading, for example into ten groups, you could use the following statements:PEN 1...10; COLOUR=2...11
COLOUR 2...11; 0.1,0.2...1.0; 0.1,0.2...1.0; 0.1,0.2...1.0
For a monochrome display the
BRUSH styles should be set to values that produce increased density of shading as the pen number increases. Figure 6.5.5e on page 313 of the Genstat 5 Release 3 Reference Manual may be used to help select appropriate values.The
GRID option specifies whether an outline should be drawn around each element of the matrix. The default, GRID=present, produces an outline for all values that are present; i.e. it ignores missing values. This is suitable where data have been sampled over an irregularly shaped area. Alternatively, for GRID=complete, an outline is drawn around every element. Setting GRID=* stops the grid being drawn, which may be preferable if there are a large number of elements in the input data.By default the plot is produced in window 1 with a key in window 2, but these settings can be changed using the
WINDOW and KEYWINDOW options. The size and position of these windows can be specified in a FRAME statement before using DSHADE. The SCREEN option controls whether or not the screen is cleared before plotting.
Options:
WINDOW, KEYWINDOW, SCREEN, GRID.Parameters:
DATA, NGROUPS, PERMUTATION.
Method
The values of the data matrix are sorted into
NGROUPS groups of equal range. A box is defined as the plotting symbol for pens 1...NGROUPS, with scaling dependent on the number of rows and columns to be plotted. Each point of the matrix is then plotted using the appropriate pen. DPIE is used to produce a key, if required.
Action with
RESTRICTIf the data is input as a pointer to a set of variates any restrictions must be consistent and will be applied to all the variates.
EXTRABINOMIAL procedure
Fits the models of Williams (1982) to overdispersed proportions
(M.S. Ridout & P.W. Goedhart)
Options
CONSTANT
= string How to treat constant (estimate, omit); default estiFACTORIAL
= scalar Limit for expansion of model terms; default 3NOMESSAGE
= strings Which warning messages to suppress (dispersion, leverage, residual, aliasing, marginality); default *METHOD
= string Which model to fit to take account of the extra variation (II, III); default IIMODIFYMODEL
= string Whether to leave the modified MODEL settings (WEIGHTS and DISPERSION) or whether to restore the original situation (yes, no); default noWEIGHTS
= variate To save estimated weightsPHI
= scalar To save estimated overdispersion parameterMAXCYCLE
= scalar Maximum number of iterations; default 10TOLERANCE
= scalar Convergence criterion; default 0.01
Parameter
TERMS
= formula Model terms to be fitted; if unset it is assumed that the model consists only of a constant term
Description
In binomial regression models, residual variability is often larger than would be expected if the data were indeed binomially distributed. This may be due to a few outliers or a poor choice of link function but often it simply indicates that the data are from a distribution more variable than the binomial. Such data are said to be "overdispersed" or to exhibit "extra-binomial variation".
Williams (1982) discusses two possible models to extend the usual binomial model (Model I). Model II assumes that the true variance exceeds the binomial variance by a factor
V = 1 + (
If the overdispersion parameter PHI were known, the data could be analysed using a binomial model with prior weights 1/V. Procedure
EXTRABINOMIAL estimates q so that the residual chi-squared statistic from this weighted analysis is (approximately) equal to the residual degrees of freedom (Moore 1987). If the binomial totals are all equal, Method II is equivalent to setting the DISPERSION option of MODEL equal to the residual chi-squared statistic divided by its degrees of freedom.Alternatively, Model III assumes that the linear predictor varies about its expectation with a constant variance. Usually this variation is assumed to follow a normal distribution; if there is then a logit link, the error distribution will be a logistic normal. Extensions to Model III to have several normal distributions contributing to the variation on the linear predictor, similar to those that occur in stratified analysis of variance, form the basis of many methods suggested for analysing generalized linear mixed models. For Model III, there is generally no simple expression for the exact variance. But the delta method can be used to show that, approximately, the variance exceeds the binomial variance by a factor
V = 1 + (
where
q is variance on the scale of the linear predictor, P is the fitted probability and F is the derivative of the inverse of the link function, evaluated at the fitted value of the linear predictor.Before using
EXTRABINOMIAL a MODEL statement must be given, in the usual way, to define the y-variate, the binomial totals, the link and any offset. The error distribution must also of course be set to binomial but any settings of WEIGHTS or DISPERSION are ignored.The form of
EXTRABINOMIAL is similar in many ways to the FIT directive. There is a single parameter TERMS to define the model terms to be fitted, and the first four options, PRINT, CONSTANT, FACTORIAL, and NOMESSAGE, all have the same syntax and purpose as in FIT. The remaining options are specific to EXTRABINOMIAL.The
METHOD option selects which model to use (II or III); by default METHOD=II. Both models involve the estimation of the weight variate (1/V) required to fit the model using the standard Genstat facilities for generalized linear models. If option MODIFYMODEL=yes, EXTRABINOMIAL will leave the MODEL statement in its modified form (provided the iterative estimation of q converges), with the WEIGHTS option set to these weights and the DISPERSION option set to 1, so that directives like DROP can be used to study the effects of individual terms in the model in the usual way. The TERMS directive will also be left set to the model specified by the TERMS parameter of EXTRABINOMIAL, and this model will be the one most recently fitted, so further output can be obtained using RDISPLAY.Options
WEIGHTS and PHI allow the weights and the estimated value of q, respectively, to be saved. The MAXCYCLE option specifies the maximum number of iterations in the estimation, and the TOLERANCE option defines the convergence criterion:ABS
(Chi-squared - Residual d.f.) < TOLERANCE ´ Residual d.f.
Options:
PRINT, CONSTANT, FACTORIAL, NOMESSAGE, METHOD, MODIFYMODEL, WEIGHTS, PHI, MAXCYCLE, TOLERANCE.Parameter:
TERMS.
Method
If the binomial totals are all equal,
q is determined (non-iteratively) from the residual chi-squared statistic.Otherwise,
q must be found iteratively and the method used (Williams, 1982) involves nested iterations. Each outer iteration (involving a model fit) requires an inner iteration (which uses only CALCULATE statements) to get the updated estimate of q. The option MAXCYCLE controls the maximum number of outer iterations. The maximum number of inner iterations is fixed at 10.Very precise convergence is not important in practice; the default setting of the
TOLERANCE option ( 1% ) should give a perfectly adequate estimate of q, usually within 3 iterations.
Action with
RESTRICTAny of the following structures may be restricted: the
Y variate; the NBINOMIAL variate; the WEIGHTS variate; the OFFSET variate; any variate or factor appearing in the model formula. Restrictions on different structures must be compatible. Restricted units are excluded from the analysis.
References
Moore, D.F. (1987). Modelling the extraneous variance in the presence of extra-binomial variation. Appl. Statist. 36, 8-14.
Williams, D.A. (1982). Extra-binomial variation in logistic linear models Appl. Statist. 31, 144-148.