homepage of R. Harald Baayen

affiliations:




Seminar fuer Sprachwissenschaft,
Eberhard Karls University, Tuebingen
&
Department of Linguistics,
University of Alberta, Edmonton
postal address:



Eberhard Karls University
Seminar fuer Sprachwissenschaft/Quantitative Linguistics
Room 3.19
Wilhelmstrasse 19, 72074 Tuebingen, Germany
map of Tuebingen and the university
e-mail addresses:
harald.baayen@uni-tuebingen.de, baayen@ualberta.ca, harald.baayen@gmail.com
fax: +49 (0)7071 29-5212 (Tuebingen), 1 780 492 0806 (Alberta)
curriculum vitae
last update: November 2011

           
           

1. research
2. students
3. papers
4. software
software

Software for the statistical analysis of word frequency distributions, available under the GNU general public license (GPL), can be downloaded here:
linux/unix gzipped tar file
gzipped tar file for win32
zip archive for win32
The statistical theory underlying this software is described in R. H. Baayen, Word Frequency Distributions, Kluwer Academic Publishers, Dordrecht, 2001. Stefan Evert and Marco Baroni have developed much better software using R, see their zipfR package in the CRAN archives. Some simple added functionality is also provided by the languageR package, see Baayen, R. H. (2008). Analyzing Linguistic Data. A Practical Introduction to Statistics Using R, Cambridge University Press.

The languageR package has some added functionality that is not documented in this book. The following code illustrates the basic functionality of plotLMER.fnc, a function for graphing the partial effects of fixed-effect factors and covariates of mixed-effects models created with lmer() from the lme4 package. It is possible to customize individual panels, to plot splines, and to visualize two-way interactions, for details, please consult the documentation (?plotLMER.fnc).



> library(languageR) > bg.lmer = lmer(LogRT ~ReadingScore + poly(OrthLength, 2, raw=TRUE) + + LogFrequency + LogFamilySize + + (1|Word) + (1|Subject)+(0+OrthLength|Subject) + + (0+LogFrequency|Subject), data = beginningReaders) > mcmc = pvals.fnc(bg.lmer, nsim=1000, withMCMC=TRUE) > par(mfrow=c(2,2), mar=c(5,5,1,1)) > plotLMER.fnc(bg.lmer, mcmcMat=mcmc$mcmc, fun=exp, ylabel = "RT (ms)") effect size (range) for ReadingScore is 918.7794 effect size (range) for poly(OrthLength, 2, raw = TRUE) is 530.7917 effect size (range) for LogFrequency is 321.7646 effect size (range) for LogFamilySize is 125.2872




The function acf.fnc is useful for exploring autocorrelational structure in successive trails in tasks such as lexical decision and naming.


> library(languageR) > acf.fnc(beginningReadres, x="LogRT")



Each panel represents a subject (primary school children), and displays the autocorrelation function for that subject. For some readers, response latencies at lag 20 are still correlated.



R code for complexity-based ordering as discussed in Plag and Baayen, Language, 2009 is available here. The data set for this code is available here. If code and data file are available in the current working directory, the following lines of R code produces the graphs shown below.




  library(graph)
  library(RBGL)
  library(Rgraphviz)
  source("CBO.R")

  mOrig = loadData.fnc()  
      # Figure 1
  plotmat.fnc(mOrig)                          
      # Figure 2 
  res = analysis.fnc(mOrig)                 
  m = as.matrix(res$m)
  mG = as(m, "graphNEL")
  print(mG)
  isConnected(mG)
      # figure not shown
  plot(mG)                                     
      # Figure 3
  plotAndHighlightViolations.fnc(m, mG)         






Figure 1: the unordered adjacency matrix



Figure 2: the same adjacency matrix for complexity-based ordering



Figure 3: the corresponding directed graph for complexity-based ordering