Computing scientists at the University of Alberta are using artificial intelligence to decipher ancient manuscripts.
The mysterious text in the 15th century Voynich manuscript has plagued historians and cryptographers since its discovery in the 19th century. This ancient mystery made its way to the artificial intelligence community, where computing science professor Greg Kondrak was keen to lend his expertise in natural language processing to the search.
Kondrak and his graduate student Bradley Hauer set out to use computers for decoding the ambiguities in human language using the Voynich manuscript as a case study. Their first step was to address the language of origin, which is exquisitely enciphered on hundreds of delicate vellum pages with accompanying illustrations.
Kondrak and Hauer used samples of 400 different languages from the “Universal Declaration of Human Rights” to systematically identify the language. The scientists initially hypothesized that the Voynich manuscript was written in Arabic. After running their algorithms, it turned out that the most likely language was Hebrew.
“That was surprising,” said Kondrak. “And just saying ‘this is Hebrew’ is the first step. The next step is how do we decipher it.”
Kondrak and Hauer hypothesized the manuscript was created using alphagrams, defining one phrase with another, exemplary of the ambiguities in human language. Assuming that, they tried to come up with an algorithm to decipher that type of scrambled text.
“It turned out that over 80 percent of the words were in a Hebrew dictionary, but we didn’t know if they made sense together,” said Kondrak.
After unsuccessfully seeking Hebrew scholars to validate their findings, the scientists turned to Google Translate. “It came up with a sentence that is grammatical, and you can interpret it,” said Kondrak, “she made recommendations to the priest, man of the house and me and people. It’s a kind of strange sentence to start a manuscript but it definitely makes sense.”
Without historians of ancient Hebrew, Kondrak explained that the full meaning of the Voynich manuscript will remain a mystery. He said he is looking forward to applying the algorithms he and Hauer developed to other ancient scripts.
An avid language aficionado, Kondrak is renowned for his work with natural language processing, a subset of artificial intelligence defined as helping computers understand human language.
“We use human language to communicate with other humans, but computers don’t understand this language, because it’s designed for people. There are so many ambiguous meanings that we don’t even realize,” said Kondrak. “Natural language processing helps computers make sense of human language. Not only do we want to talk to computers in our language because it’s easier and more convenient but also there is a lot of information that exists in the form of written word. Take the internet, for example.”
Kondrak and Hauer are part of the University of Alberta’s Department of Computing Science, with an international reputation for excellence in artificial intelligence research.
“Decoding Anagrammed Texts Written in an Unknown Language and Script” appeared in Volume 4 of the Transactions of the Association of Computational Linguistics.