Deep Learning for Sound Recognition

Interdisciplinary project seeking to create machine learning algorithms to enable the automatic labelling, annotating, or tagging of audio files in digital fieldwork repositories, to support of audio search, browse, retrieval, and big data statistical analysis.

At the present time audio repositories comprising one of a kind field recordings are unsearchable without adding hand-prepared metadata (labels, tags, annotations), a process which is exceedingly time-consuming, since it must transpire in real-time.  This project aims to combine machine learning techniques—“deep learning” artificial neural networks (ANNs)—with digital repositories of ethnomusicological field recordings and hand-coded metadata, in order to create algorithms capable of labelling, annotating, and tagging such recordings automatically. These algorithms will allow large numbers of assets to be consistently labelled across multiple repositories and enable the transformation of atemporal labels (ordinary metadata) to temporal labels (annotations), by applying algorithms to sliding windows across audio signals. The ability to automatically label large repositories of field recordings will revolutionize the way such repositories are used in ethnomusicological and anthropological research.  The same algorithms are being prepared for other sorts of field recording collections (and disciplines), including natural soundscapes (bioacoustics), phonetic data (linguistics), and free improvisation (experimental music composition).  If effective deep learning algorithms can be developed they will not only help revolutionize music information retrieval (MIR) -- enabling researchers to locate tracks or even particular track segments that meet particular auditory conditions -- but will also enable "big data" correlations to be conducted across vast swaths of media, enabling researchers to formulate and test new hypotheses about the relation between music and culture, or between speech and individual identity, or between the human and the animal soundscape.  At the same time, this experimental research will, we hope, lead to new breakthroughs in the field of machine learning itself.