Using computer algorithms, scientists have finally proven what anyone who has ever learned English likely already knows: English spelling is far from ideal.
In the unruly Wild West of modern languages, English is indisputably the baddest outlaw around. Estimated to be three times more complex than German and 40 times worse than Spanish, English spelling is notorious for its irregularity—teeming with improbably silent letters, head-scratching homographs, and the mysteriously sometimes-y.
Despite the spelling system’s infamy for inefficiency, a 1968 assertion from Noam Chomsky and Morris Halle’s The Sound Pattern of English argues that English orthography, or spelling, is “close to optimal”—a claim generally dismissed by linguists ever since, though never scientifically disproved.
“When we saw Chomsky’s statement, we didn’t think he was exactly right, and talking to linguists whenever I bring up this claim, they say ‘oh, nobody really takes that all that seriously,’” says Garrett Nicolai, a graduate student in the Department of Computing Science. Armed with degrees in both linguistics and computing science, Nicolai was uniquely qualified to put this long-standing contention to the test.
“Nobody has done this before. Nobody has been able to show computationally, in a principled way, that English orthography is very far from optimal.” —Grzegorz Kondrak
Taking down a giant
“The Sound Patterns of English is extremely influential—it’s probably one of the most important books in phonology that exists,” says associate professor Grzegorz Kondrak, Nicolai’s PhD supervisor and co-author of the study. “And Chomsky is arguably the most influential linguist in the world, so people pay a lot of attention to everything he says.”
To challenge the famous linguist, the researchers enlisted the help of DirecTL+, a unique transliteration program developed at the University of Alberta by alumnus Sittichai Jiampojamarn (now a software engineer at Google). By converting individual letters to phonemes (individual units of speech sounds), DirecTL+ is able to create an objective baseline of optimality using its function of predicting the pronunciation of a word based on its spelling.
With English, this was easier said than done. “We found that it’s a very difficult task,” says Kondrak. “Unlike other languages—Spanish, for example—there are no hard and fast rules in English for pronunciation. And that’s not only for people learning English as a second language—even the native speakers are often not sure how to pronounce words.”
Lead author Garrett Nicolai.
To capture the linguistic imagination
Chomsky’s theory is built on the idea that people possess a theoretical representation of language in their minds made up of abstract underlying word-forms. The final surface pronunciations are then generated through a complicated system of spelling rules (think of the silent e that gives a “kick” to the preceding vowel in words like cane and wine) that fine-tune the abstract form into the one we say aloud.
From this perspective, Chomsky posited that English orthography doesn’t necessarily need to reflect the exact pronunciation as long as the same base word form is spelled consistently across the board. For example, economy and economics share a common stem (econom[i/y]) that is spelled almost the same way but pronounced differently. However, if the orthography of each word were changed to purely reflect its pronunciation, that relationship—what Chomsky called “morphological faithfulness”—would be lost.
“A lot of other groups that say they can do better than English orthography are often only considering the pronunciation, or the sound aspect,” says Nicolai. “But there is this morphological aspect as well that Chomsky said needs to be preserved in an optimal orthography.”
“There are no hard and fast rules in English for pronunciation. And that’s not only for people learning English as a second language—even the native speakers are often not sure how to pronounce words.”
To address this, Nicolai and Kondrak used an automatic alignment system to evaluate the spelling of a single base word in each of its derived forms and then calculate its orthographic consistency based on the changes that occurred in the different forms (economy, for example, is seven letters long, of which six are retained in economics, so the pair scores highly). They then used DirecTL+ to process data from this lexicon of approximately 51,000 word-forms to establish a base spelling to reflect the internal default representation for each.
“The problem is that when you want to find the underlying form, typically every linguist will give you a different answer. This was a challenge we faced in our approach, and that’s why we used a computer program,” says Kondrak. “We did not want to make any subjective judgments on the data, so everything we did was basically writing programs that were doing this for us, and these programs were based on objective principles.”
Confirming the worst
When compared with other proposed spelling systems including a completely phonemic system, a completely morphemic system, and a few others proposed by advocacy groups for spelling reform, traditional English orthography was found not only to be lacking—it was in fact the farthest from optimal out of any of the systems.
“Nobody has done this before,” says Kondrak. “Nobody has been able to show computationally, in a principled way, that English orthography is very far from optimal.”
Given that English is currently the dominant medium of information exchange in the world, numerous spelling reform proposals have been put forward over the years, ranging from small changes affecting a limited set of words to complete overhauls. While the goal of this study was not to put forward yet another proposal to reform English orthography, Nicolai and Kondrak expect that it may lay the groundwork for optimality testing in other languages that do undergo periodic reforms, like Dutch and some other European languages.
The most satisfying takeaway, however, may be the collective feeling of validation for anyone who has failed an English spelling test—and maybe for Nicolai, having the last word on one of the most influential linguists of our time.
The study, “English orthography is not ‘close to optimal’” was presented at a conference of the Association of Computational Linguistics.