Real-life CSI: finding a voice in a haystack

    New forensic method identifies voices to help nab criminals

    By Bridget Stirling for Thought Box on November 2, 2015

    Crime shows make it all look easy. A camera zooms in to magnify a licence plate to hundreds of times its original size. A quick scan of a drop of blood and a database search exonerates an innocent man. Sherlock Holmes identifies a thief based on the specific dialect she speaks. A technician compares two audio files and the peaks magically line up. And it all happens in the space of just 60 minutes (45 if you account for commercial breaks).

    In the real world, forensic science — science used for legal evidence — looks very different. But what is real is that since the emergence of technologies such as DNA analysis in the mid-1990s, empirical, quantifiable data are increasingly affecting the outcome of court cases.

    One such type of evidence is forensic voice comparison — when a forensic scientist helps a court identify who is speaking on an audio recording. At the Alberta Phonetics Laboratory in UAlberta’s Department of Linguistics, Geoffrey Stewart Morrison, ’06 PhD,  is researching a more scientific and accurate way to do this.

    The Old Way

    Until now, methods of matching voice recordings have been somewhat inexact. An analyst listens to recordings of a suspect’s voice and a voice in an unidentified recording, then applies his or her own interpretation, sometimes supported by computer analysis, to compare the sound characteristics of the two. This approach is reliant largely on the subjective interpretation of expert witnesses and is highly susceptible to a phenomenon known as cognitive bias — unintentional errors in interpreting the data. In addition, this method rarely tests recordings under the same conditions as the original.

    The New Way

    Morrison’s technique relies on quantitative measurements, statistical models and a database of voice recordings. To use a visual comparison, imagine that police arrest a bank robbery suspect with blond hair. All the witnesses say the offender had blond hair. Two things need to be assessed: 1) the probability that the offender would have blond hair if he were the suspect, and 2) the probability that the offender would have blond hair if he were not the suspect. To assess the latter, you need to know how common blond hair is in the population in question, so you would need to take a statistical sample.

    How It Works

    So, let’s say there’s a suspect in a fraud case and police have a voicemail of the offender. To calculate the strength of evidence, one thing Morrison has to know is how likely it would be for the characteristics of the offender’s voice to occur in a specific population. He must also take into account how different recording environments and technologies will affect those voices. Then he uses statistical models to estimate the likelihood of getting the same acoustic properties on the voicemail if the words were spoken by the suspect versus if it were spoken by a person selected at random from the relevant population. He tests the validity and reliability of his system under the conditions of the case. In any given case, his lab collects a large number of voice samples representing a particular population under specific conditions.

    Some Tricky Things About Voices

    Did you know that time of day can affect a person’s voice? Many people sound different first thing in the morning than they do in the late afternoon. A person’s emotional state and health are also important: a chest cold or sore throat can alter voices a lot. The environment and the technology used to record the voice also influence the characteristics of the recording. Telephones, especially mobile phones, lose acoustic information in the process of transmitting, so there are simply not as many characteristics to analyze. Compressed recording formats such as MP3 also reduce the available acoustic data. On top of that, there can be background noise such as voices, traffic or ventilation systems, for example, or reverberation if the recording is made in a room with hard walls. Comparing recordings involves taking into account how all these factors might alter the validity and reliability of a forensic voice comparison.

    How Is It Used in Real Life?

    One of Morrison’s current research projects has to do with the Saskatchewan robocall scandal. In order to perform a forensic analysis of the voice on the voicemail recordings in question, his lab is collecting voice samples of adult men who speak Canadian English. (Volunteer your voice for the robocall study.) This pool of voice recordings will be used to determine the statistical probability of certain characteristics in the recordings under investigation.

    Whatever the type of forensic science — whether fingerprinting, DNA or voice analysis — Morrison emphasizes that the same principles apply. While real forensic science might not be as fast or glamorous as the TV version, the shift toward scientific rigour is helping courts make decisions based on the best evidence possible. Bet you won’t see that on CSI.

    More from Thought Box