Statistical analysis

Use of Chi-square, correlation, t-test

Kenny's examples (from Anthony Kenny, The Computation of Style, 1982)

Rhetoric by Aristotle, and pseudo-Rhetoric, p. 111
Frequences of last words in sentences in samples
Are they by the same author?
Rhet1_X.dat


text       noun    verb  other
rhet-g      28      32    40
rhet-a      27      52    21


Aristotle, three books of Metaphysics, p. 116
Frequences of last words in sentences in three chapters
Are they by the same author?
Rhet2_X.dat


noun    verb    other
17       30      23
16       48      46
68      108     124


Corneille, increasing number of words per line in dramas, p. 74
Is this significant?
Corn.dat


date  mwds
1629  8.93
1632  9.02
1635  9.15
1640  9.26
1644  9.15
1650  9.20
1662  9.22
1666  9.32
1672  9.48
1674  9.53


Frequency of words in two texts, p. 80
Are the two text similar? (note: different samples sizes is OK)
Words.dat


Word    TextA   TextB
The       15     9
And       11     8
Of         9     8
To         9     7
In         7     6
Then       7     6
That       5     5
By         3     5
A          3     4
Be         1     2


Coleridge Poetry, section data

(not Kenny)
Cpoet.dat


per  sect  nwds   t/t   hap-l hap-d  mwdl   mstl
1    01    8804  .307   1663   431   4.413  17.40
1    02    6281  .324   1205   366   4.495  19.94
1    03    6821  .356   1639   344   4.533  19.57
1    04    8148  .338   1752   478   4.606  21.99
2    05    6913  .268   1092   300   4.069  18.39
2    06    7174  .276   1218   297   4.197  17.37
2    07    7596  .265   1230   314   4.145  20.31
2    08    7223  .298   1398   333   4.186  18.57
1    09    6096  .319   1278   280   4.277  22.05
1    10    7769  .291   1423   357   4.247  17.34
1    11    7059  .338   1580   370   4.306  16.61
1    12    8264  .301   1589   419   4.212  19.20


Reading data
Trout.dat
(too long to show)

Paper on this research, see Foregrounding by Miall & Kuiken

Files for SPSS analysis (in Rutherford lab):

Corneille example from Kenny Word frequency lists from Kenny Coleridge poetry, section data, including selected words Text and reader data from Foregrounding study

Left-click on the name of a file to open it in SPSS


Statistics programs by Stephen Reimer: CHI; LitStats; by David Miall: z.exe

To run, right-click on filename and save link to your local directory, preferably to a folder on the C: drive, e.g., C:\Stats

Then open a Command Prompt (available through Windows programs, under Accessories); navigate to \Stats, and type the name of the program and press Enter to run it. You should also save to this directory the text files you want to make accessible to LitStats. Type Exit to close the window when finished.

CHI only requires that you type in a data table (as above, for example, for Rhet1_X.dat). Avoid a number less than 5 in one or more cells if possible -- this makes the test less reliable.

For an example run of LitStats on Heart of Darkness, see the output file Heart.sts (you can read this in Word). For other texts you submit to LitStats, save them first as .txt files with line endings.

X2 distribution, for testing significance of CHI report:

Degrees of freedom
Level of Significance
0.05 0.01
1 3.84 6.63
2 5.99 9.21
3 7.81 11.34
4 9.49 13.28
5 11.07 15.09
6 12.59 16.81
7 14.07 18.48
8 15.51 20.09
9 16.92 21.67
10 18.31 23.21

Example: If your table has 2 degrees of freedom (Df = 2), and your chi-square statistic is 10.70, your result is significant beyond the .01 level.

Z-score

In Heart of Darkness, the word darkness co-occurs 8 times with heart in a span of 7 + 7 words. Is this significant? Run z.exe (after opening Command Prompt window):

Enter number of headwords [darkness]: 24
Enter total number of co-occurring words [heart]: 24
Enter number of observed co-occurrences with headword: 8
Enter number of words in text (tokens): 39092
Enter span for co-occurrences: 14
Z = 17.15957

A z-score of 2.6 or above can be considered significant.


return to course page

Document created September 3rd 2000 / Last revised October 23rd 2007