2025 Volume E108.D Issue 7 Pages 845-848
Recent studies in deep learning have shown great advantages in acoustic echo cancellation (AEC) due to its strong capability of non-linear fitting; however, most AEC models are based on the convolution recurrent network (CRN) architecture, using stacked convolution layers as the encoder to extract latent representations, without considering the misalignment between reference and echo signal. Furthermore, the masking-based filtering method disregards the inter-spectral correlation patterns and harmonic characteristics. In this paper, we propose an AEC approach called the multi-scale dual path convolution recurrent network with deep filtering block (DPDF-AEC). We propose a multi-scale encoder to capture complex patterns and time dependencies between the reference and microphone signal. After the masking method, a post-deep filtering block is introduced, incorporating spectrum patterns to further reduce residual echo. We conduct comprehensive ablation experiments to validate the effectiveness of each component in DPDF, and the results indicate that our model outperforms the AEC challenges the baseline in terms of the Echo-MOS metrics.