You Only Hear Once: A YOLO-like Algorithm for Audio Segmentation and Sound Event Detection

Venkatesh, S; Moffat, D; Miranda, E. 2022 You Only Hear Once: A YOLO-like Algorithm for Audio Segmentation and Sound Event Detection. Applied Sciences, 12 (7). 3293. https://doi.org/10.3390/app12073293

[img]
Preview
Text
applsci-12-03293-v2.pdf - Published Version
Available under License Creative Commons Attribution.

Download (1MB) | Preview
Official URL: http://dx.doi.org/10.3390/app12073293

Abstract/Summary

Audio segmentation and sound event detection are crucial topics in machine listening that aim to detect acoustic classes and their respective boundaries. It is useful for audio-content analysis, speech recognition, audio-indexing, and music information retrieval. In recent years, most research articles adopt segmentation-by-classification. This technique divides audio into small frames and individually performs classification on these frames. In this paper, we present a novel approach called You Only Hear Once (YOHO), which is inspired by the YOLO algorithm popularly adopted in Computer Vision. We convert the detection of acoustic boundaries into a regression problem instead of frame-based classification. This is done by having separate output neurons to detect the presence of an audio class and predict its start and end points. The relative improvement for F-measure of YOHO, compared to the state-of-the-art Convolutional Recurrent Neural Network, ranged from 1% to 6% across multiple datasets for audio segmentation and sound event detection. As the output of YOHO is more end-to-end and has fewer neurons to predict, the speed of inference is at least 6 times faster than segmentation-by-classification. In addition, as this approach predicts acoustic boundaries directly, the post-processing and smoothing is about 7 times faster.

Item Type: Publication - Article
Additional Keywords: audio segmentation; sound event detection; you only look once; deep learning; regression; convolutional neural network; music-speech detection; convolutional recurrent neural network; radio
Divisions: Plymouth Marine Laboratory > Other (PML)
Depositing User: S Hawkins
Date made live: 04 Apr 2022 14:23
Last Modified: 04 Apr 2022 14:23
URI: https://plymsea.ac.uk/id/eprint/9660

Actions (login required)

View Item View Item