Use of Chi-square, correlation, t-test
Kenny's examples (from Anthony Kenny, The Computation of Style, 1982) Rhetoric by Aristotle, and pseudo-Rhetoric, p. 111 Frequences of last words in sentences in samples Are they by the same author? Rhet1_X.dat text noun verb other rhet-g 28 32 40 rhet-a 27 52 21 Aristotle, three books of Metaphysics, p. 116 Frequences of last words in sentences in three chapters Are they by the same author? Rhet2_X.dat noun verb other 17 30 23 16 48 46 68 108 124 Corneille, increasing number of words per line in dramas, p. 74 Is this significant? Corn.dat date mwds 1629 8.93 1632 9.02 1635 9.15 1640 9.26 1644 9.15 1650 9.20 1662 9.22 1666 9.32 1672 9.48 1674 9.53 Frequency of words in two texts, p. 80 Are the two text similar? (note: different samples sizes is OK) Words.dat Word TextA TextB The 15 9 And 11 8 Of 9 8 To 9 7 In 7 6 Then 7 6 That 5 5 By 3 5 A 3 4 Be 1 2 Coleridge Poetry, section data (not Kenny) Cpoet.dat per sect nwds t/t hap-l hap-d mwdl mstl 1 01 8804 .307 1663 431 4.413 17.40 1 02 6281 .324 1205 366 4.495 19.94 1 03 6821 .356 1639 344 4.533 19.57 1 04 8148 .338 1752 478 4.606 21.99 2 05 6913 .268 1092 300 4.069 18.39 2 06 7174 .276 1218 297 4.197 17.37 2 07 7596 .265 1230 314 4.145 20.31 2 08 7223 .298 1398 333 4.186 18.57 1 09 6096 .319 1278 280 4.277 22.05 1 10 7769 .291 1423 357 4.247 17.34 1 11 7059 .338 1580 370 4.306 16.61 1 12 8264 .301 1589 419 4.212 19.20 Reading data Trout.dat (too long to show) Paper on this research, see Foregrounding by Miall & Kuiken
Files for SPSS analysis (in Rutherford lab):
Corneille example from Kenny Word frequency lists from Kenny Coleridge poetry, section data, including selected words Text and reader data from Foregrounding study
Left-click on the name of a file to open it in SPSS
Statistics programs by Stephen Reimer: CHI; LitStats; by David Miall: z.exe
To run, right-click on filename and save link to your local directory, preferably to a folder on the C: drive, e.g., C:\Stats
Then open a Command Prompt (available through Windows programs, under Accessories); navigate to \Stats, and type the name of the program and press Enter to run it. You should also save to this directory the text files you want to make accessible to LitStats. Type Exit to close the window when finished.
CHI only requires that you type in a data table (as above, for example, for Rhet1_X.dat). Avoid a number less than 5 in one or more cells if possible -- this makes the test less reliable.
For an example run of LitStats on Heart of Darkness, see the output file Heart.sts (you can read this in Word). For other texts you submit to LitStats, save them first as .txt files with line endings.
X2 distribution, for testing significance of CHI report:
Degrees of freedom Level of Significance0.05 0.01 1 3.84 6.63 2 5.99 9.21 3 7.81 11.34 4 9.49 13.28 5 11.07 15.09 6 12.59 16.81 7 14.07 18.48 8 15.51 20.09 9 16.92 21.67 10 18.31 23.21
Example: If your table has 2 degrees of freedom (Df = 2), and your chi-square statistic is 10.70, your result is significant beyond the .01 level.
Z-score
In Heart of Darkness, the word darkness co-occurs 8 times with heart in a span of 7 + 7 words. Is this significant? Run z.exe (after opening Command Prompt window):
Enter number of headwords [darkness]: 24
Enter total number of co-occurring words [heart]: 24
Enter number of observed co-occurrences with headword: 8
Enter number of words in text (tokens): 39092
Enter span for co-occurrences: 14
Z = 17.15957
A z-score of 2.6 or above can be considered significant.
Document created September 3rd 2000 / Last revised October 23rd 2007