This document describes a computer vision approach to audio enhancement by removing unwanted noises from recordings. The approach uses object detection techniques to detect noises in spectrograms of audio clips. The user mimics the unwanted noise, which is then detected as an "object" in the spectrogram using HOG features and classification. Multiple techniques are evaluated for scanning, feature extraction, classification and detecting multiple objects. Results show the approach can effectively remove noises, though may struggle with similar noises or incomplete detections.
Related topics: