


default search action
IEEE/ACM Transactions on Audio, Speech and Language Processing, Volume 28
Volume 28, 2020
- Jamal Amini

, Richard Christian Hendriks
, Richard Heusdens
, Meng Guo
, Jesper Jensen
:
Rate-Constrained Noise Reduction in Wireless Acoustic Sensor Networks. 1-12 - Chitralekha Gupta

, Haizhou Li
, Ye Wang
:
Automatic Leaderboard: Evaluation of Singing Quality Without a Standard Reference. 13-26 - Sefik Emre Eskimez

, Ross K. Maddox
, Chenliang Xu
, Zhiyao Duan
:
Noise-Resilient Training Method for Face Landmark Generation From Speech. 27-38 - Peidong Wang

, Ke Tan
, DeLiang Wang
:
Bridging the Gap Between Monaural Speech Enhancement and Recognition With Distortion-Independent Acoustic Modeling. 39-48 - Yuki Mitsufuji

, Stefan Uhlich
, Norihiro Takamune, Daichi Kitamura
, Shoichi Koyama
, Hiroshi Saruwatari
:
Multichannel Non-Negative Matrix Factorization Using Banded Spatial Covariance Matrices in Wavenumber Domain. 49-60 - Yaron Laufer, Sharon Gannot

:
Scoring-Based ML Estimation and CRBs for Reverberation, Speech, and Noise PSDs in a Spatially Homogeneous Noise Field. 61-76 - Naveen Kumar Desiraju

, Simon Doclo
, Markus Buck, Tobias Wolff
:
Online Estimation of Reverberation Parameters For Late Residual Echo Suppression. 77-91 - Mehdi Zohourian

, Rainer Martin
:
Binaural Direct-to-Reverberant Energy Ratio and Speaker Distance Estimation. 92-104 - Youhyun Shin

, Sang-goo Lee:
Learning Context Using Segment-Level LSTM for Neural Sequence Labeling. 105-115 - Gongping Huang

, Jingdong Chen
, Jacob Benesty
:
Design of Planar Differential Microphone Arrays With Fractional Orders. 116-130 - Ming-Hsiang Su

, Chung-Hsien Wu
, Liang-Yu Chen:
Attention-Based Response Generation Using Parallel Double Q-Learning for Dialog Policy Decision in a Conversational System. 131-143 - Satoru Emura

:
Wave-Domain Residual Echo Reduction Using Subspace Tracking. 144-156 - Xin Wang

, Shinji Takaki, Junichi Yamagishi
, Simon King, Keiichi Tokuda:
A Vector Quantized Variational Autoencoder (VQ-VAE) Autoregressive Neural F0 Model for Statistical Parametric Speech Synthesis. 157-170 - Falk-Martin Hoffmann

, Philip Arthur Nelson, Filippo Maria Fazi
:
DOA Estimation Performance With Circular Arrays in Sound Fields With Finite Rate of Innovation. 171-184 - Rongfeng Su

, Xunying Liu
, Lan Wang, Jingzhou Yang
:
Cross-Domain Deep Visual Feature Generation for Mandarin Audio-Visual Speech Recognition. 185-197 - Titouan Parcollet

, Mohamed Morchid
, Xavier Bost, Georges Linarès, Renato De Mori
:
Real to H-Space Autoencoders for Theme Identification in Telephone Conversations. 198-210 - Antonio Canclini

, Fabio Antonacci, Stefano Tubaro, Augusto Sarti
:
A Methodology for the Robust Estimation of the Radiation Pattern of Acoustic Sources. 211-224 - Yi Yu

, Hongsen He
, Badong Chen
, Jianghui Li
, Youwen Zhang
, Lu Lu
:
M-Estimate Based Normalized Subband Adaptive Filter Algorithm: Performance Analysis and Improvements. 225-239 - Haoxiang Wen

, Senquan Yang, Yuanquan Hong, Huan Luo:
A Partial Update Adaptive Algorithm for Sparse System Identification. 240-255 - Martin Bo Møller

, Jan Østergaard
:
A Moving Horizon Framework for Sound Zones. 256-265 - Stylianos Ioannis Mimilakis

, Konstantinos Drossos
, Estefanía Cano
, Gerald Schuller
:
Examining the Mapping Functions of Denoising Autoencoders in Singing Voice Separation. 266-278 - Lachlan Birnie

, Thushara D. Abhayapala
, Prasanga N. Samarasinghe
:
Reflection Assisted Sound Source Localization Through a Harmonic Domain MUSIC Framework. 279-293 - Wenhao Ding

, Liang He
:
Adaptive Multi-Scale Detection of Acoustic Events. 294-306 - Weijian Zhang, Peng Song

:
Transfer Sparse Discriminant Subspace Learning for Cross-Corpus Speech Emotion Recognition. 307-318 - Bidisha Sharma

, Ye Wang
:
Automatic Evaluation of Song Intelligibility Using Singing Adapted STOI and Vocal-Specific Features. 319-331 - Hai Morgenstern

, Boaz Rafaely
:
Perceptually-Transparent Online Estimation of Two-Channel Room Transfer Function for Sound Calibration. 332-342 - Shaojin Ding

, Guanlong Zhao
, Christopher Liberatore
, Ricardo Gutierrez-Osuna:
Learning Structured Sparse Representations for Voice Conversion. 343-354 - Mireia Díez

, Lukás Burget
, Federico Landini
, Jan Cernocký
:
Analysis of Speaker Diarization Based on Bayesian HMM With Eigenvoice Priors. 355-368 - Jia-Chen Gu

, Zhen-Hua Ling
, Quan Liu:
Utterance-to-Utterance Interactive Matching Network for Multi-Turn Response Selection in Retrieval-Based Chatbots. 369-379 - Ke Tan

, DeLiang Wang
:
Learning Complex Spectral Mapping With Gated Convolutional Recurrent Networks for Monaural Speech Enhancement. 380-390 - Richeng Duan

, Tatsuya Kawahara
, Masatake Dantsuji, Hiroaki Nanjo:
Cross-Lingual Transfer Learning of Non-Native Acoustic Modeling for Pronunciation Error Detection and Diagnosis. 391-401 - Xin Wang

, Shinji Takaki, Junichi Yamagishi
:
Neural Source-Filter Waveform Models for Statistical Parametric Speech Synthesis. 402-415 - Sanjeel Parekh

, Slim Essid, Alexey Ozerov
, Ngoc Q. K. Duong
, Patrick Pérez, Gaël Richard
:
Weakly Supervised Representation Learning for Audio-Visual Scene Analysis. 416-428 - Jianfei Yu

, Jing Jiang, Rui Xia:
Entity-Sensitive Attention and Fusion Network for Entity-Level Multimodal Sentiment Classification. 429-439 - John G. Beerends

, Niels M. P. Neumann, Egon L. van den Broek
, Anna Llagostera Casanovas, Jovana Torres Menendez, Christian Schmidmer, Jens Berger:
Subjective and Objective Assessment of Full Bandwidth Speech Quality. 440-449 - Vikram C. Mathad

, S. R. Mahadeva Prasanna:
Vowel Onset Point Based Screening of Misarticulated Stops in Cleft Lip and Palate Speech. 450-460 - Minh Nguyen

, Gia H. Ngo
, Nancy F. Chen
:
Hierarchical Character Embeddings: Learning Phonological and Semantic Representations in Languages of Logographic Origin Using Recursive Neural Networks. 461-473 - Dani Cherkassky, Sharon Gannot

:
Successive Relative Transfer Function Identification Using Blind Oblique Projection. 474-486 - Ivo Trowitzsch

, Christopher Schymura
, Dorothea Kolossa
, Klaus Obermayer:
Joining Sound Event Detection and Localization Through Spatial Segregation. 487-502 - Shinichi Mogami

, Norihiro Takamune, Daichi Kitamura
, Hiroshi Saruwatari
, Yu Takahashi, Kazunobu Kondo, Nobutaka Ono
:
Independent Low-Rank Matrix Analysis Based on Time-Variant Sub-Gaussian Source Model for Determined Blind Source Separation. 503-518 - Hamzeh Ghasemzadeh

, Meisam Khalil Arjmandi
:
Toward Optimum Quantification of Pathology-Induced Noises: An Investigation of Information Missed by Human Auditory System. 519-528 - Fei Ma

, Wen Zhang
, Thushara Dheemantha Abhayapala
:
Active Control of Outgoing Broadband Noise Fields in Rooms. 529-539 - Jing-Xuan Zhang

, Zhen-Hua Ling
, Li-Rong Dai:
Non-Parallel Sequence-to-Sequence Voice Conversion With Disentangled Linguistic and Speaker Representations. 540-552 - Tao Dai

, Li Zhu
, Yaxiong Wang, Kathleen M. Carley
:
Attentive Stacked Denoising Autoencoder With Bi-LSTM for Personalized Context-Aware Citation Recommendation. 553-568 - Yuta Nishimura

, Katsuhito Sudoh
, Graham Neubig, Satoshi Nakamura
:
Multi-Source Neural Machine Translation With Missing Data. 569-580 - Jin Wang

, Liang-Chih Yu
, K. Robert Lai
, Xuejie Zhang:
Tree-Structured Regional CNN-LSTM Model for Dimensional Sentiment Analysis. 581-591 - Abul Azad

, Lamine Mili
:
Robust Speech Filter and Voice Encoder Parameter Estimation Using the Phase-Phase Correlator. 592-604 - Abdullah Fahim

, Prasanga N. Samarasinghe
, Thushara D. Abhayapala
:
Multi-Source DOA Estimation Through Pattern Recognition of the Modal Coherence of a Reverberant Soundfield. 605-618 - Yaron Laufer

, Bracha Laufer-Goldshtein
, Sharon Gannot
:
ML Estimation and CRBs for Reverberation, Speech, and Noise PSDs in Rank-Deficient Noise Field. 619-634 - Zhongqing Wang

, Qingying Sun, Shoushan Li, Qiaoming Zhu, Guodong Zhou
:
Neural Stance Detection With Hierarchical Linguistic Representations. 635-645 - Ruizhi Li

, Xiaofei Wang
, Sri Harish Mallidi, Shinji Watanabe
, Takaaki Hori
, Hynek Hermansky
:
Multi-Stream End-to-End Speech Recognition. 646-655 - Yu Maeno

, Yuki Mitsufuji
, Prasanga N. Samarasinghe
, Naoki Murata
, Thushara D. Abhayapala
:
Spherical-Harmonic-Domain Feedforward Active Noise Control Using Sparse Decomposition of Reference Signals from Distributed Sensor Arrays. 656-670 - Qingyu Zhou

, Nan Yang, Furu Wei, Shaohan Huang, Ming Zhou, Tiejun Zhao:
A Joint Sentence Scoring and Selection Framework for Neural Extractive Document Summarization. 671-681 - Ivan Kukanov

, Trung Ngo Trong, Ville Hautamäki
, Sabato Marco Siniscalchi
, Valerio Mario Salerno
, Kong Aik Lee
:
Maximal Figure-of-Merit Framework to Detect Multi-Label Phonetic Features for Spoken Language Recognition. 682-695 - Shoichi Koyama

, Gilles Chardon
, Laurent Daudet
:
Optimizing Source and Sensor Placement for Sound Field Control: An Overview. 696-714 - Atsushi Ando

, Ryo Masumura, Hosana Kamiyama, Satoshi Kobashikawa, Yushi Aono, Tomoki Toda
:
Customer Satisfaction Estimation in Contact Center Calls Based on a Hierarchical Multi-Task Model. 715-728 - Thomas Dietzen

, Simon Doclo
, Marc Moonen
, Toon van Waterschoot:
Integrated Sidelobe Cancellation and Linear Prediction Kalman Filter for Joint Multi-Microphone Speech Dereverberation, Interfering Speech Cancellation, and Noise Reduction. 740-754 - Thomas Dietzen

, Simon Doclo
, Marc Moonen
, Toon van Waterschoot:
Square Root-Based Multi-Source Early PSD Estimation and Recursive RETF Update in Reverberant Environments by Means of the Orthogonal Procrustes Problem. 755-769 - Liwen Zhang

, Ziqiang Shi
, Jiqing Han
:
Pyramidal Temporal Pooling With Discriminative Mapping for Audio Classification. 770-784 - Mengfan Zhang

, Zhongshu Ge, Tiejun Liu, Xihong Wu, Tianshu Qu
:
Modeling of Individual HRTFs Based on Spatial Principal Component Analysis. 785-797 - Laureano Moro-Velázquez

, Estefanía Hernández-García
, Jorge Andrés Gómez García
, Juan Ignacio Godino-Llorente
, Najim Dehak
:
Analysis of the Effects of Supraglottal Tract Surgical Procedures in Automatic Speaker Recognition Performance. 798-812 - Yijia Liu

, Wanxiang Che, Bing Qin
, Ting Liu:
Exploring Segment Representations for Neural Semi-Markov Conditional Random Fields. 813-824 - Morten Kolbæk

, Zheng-Hua Tan
, Søren Holdt Jensen, Jesper Jensen:
On Loss Functions for Supervised Monaural Time-Domain Speech Enhancement. 825-838 - Yang Ai

, Zhen-Hua Ling
:
A Neural Vocoder With Hierarchical Generation of Amplitude and Phase Spectra for Statistical Parametric Speech Synthesis. 839-851 - Dongyan Yu, Huiping Duan

, Jun Fang
, Bing Zeng
:
Predominant Instrument Recognition Based on Deep Neural Network With Auxiliary Classification. 852-861 - Ali Aroudi

, Simon Doclo
:
Cognitive-Driven Binaural Beamforming Using EEG-Based Auditory Attention Decoding. 862-875 - Christopher Gribben

, Hyunkook Lee
:
The Perception of Band-Limited Decorrelation Between Vertically Oriented Loudspeakers. 876-888 - Olivier Perrotin

, Ian Vince McLoughlin
:
Glottal Flow Synthesis for Whisper-to-Speech Conversion. 889-900 - Gongping Huang

, Jacob Benesty
, Israel Cohen
, Jingdong Chen
:
Differential Beamforming on Graphs. 901-913 - Bracha Laufer-Goldshtein

, Ronen Talmon
, Sharon Gannot
:
Global and Local Simplex Representations for Multichannel Source Separation. 914-928 - Henning F. Schepker

, Sven Nordholm
, Simon Doclo
:
Acoustic Feedback Suppression for Multi-Microphone Hearing Devices Using a Soft-Constrained Null-Steering Beamformer. 929-940 - Zhong-Qiu Wang

, DeLiang Wang
:
Deep Learning Based Target Cancellation for Speech Dereverberation. 941-950 - Yeongseok Kim

, Youngjin Park
:
Blockwise Weighted Least Square Active Noise Control for CPU-GPU Architecture. 951-963 - Odette Scharenborg

, Lucas Ondel, Shruti Palaskar, Philip Arthur, Francesco Ciannella, Mingxing Du, Elin Larsen, Danny Merkx
, Rachid Riad, Liming Wang
, Emmanuel Dupoux, Laurent Besacier, Alan W. Black
, Mark Hasegawa-Johnson
, Florian Metze
, Graham Neubig, Sebastian Stüker, Pierre Godard, Markus Müller:
Speech Technology for Unwritten Languages. 964-975 - Andros Tjandra

, Sakriani Sakti
, Satoshi Nakamura
:
Machine Speech Chain. 976-989 - M. Khadem-hosseini

, Shahrokh Ghaemmaghami
, Azra Abtahi
, Saeed Gazor
, Farrokh Marvasti
:
Error Correction in Pitch Detection Using a Deep Learning Based Classification. 990-999 - Enzo De Sena

, Zoran Cvetkovic
, Hüseyin Hacihabiboglu
, Marc Moonen
, Toon van Waterschoot
:
Localization Uncertainty in Time-Amplitude Stereophonic Reproduction. 1000-1015 - Vera Erbes

, Sascha Spors
:
Localisation Properties of Wave Field Synthesis in a Listening Room. 1016-1024 - Jia Pan

, Genshun Wan, Jun Du
, Zhongfu Ye
:
Online Speaker Adaptation Using Memory-Aware Networks for Speech Recognition. 1025-1037 - Weicheng Cai, Jinkun Chen

, Jun Zhang, Ming Li
:
On-the-Fly Data Loader and Utterance-Level Aggregation for Speaker and Language Recognition. 1038-1051 - George Sterpu

, Christian Saam
, Naomi Harte
:
How to Teach DNNs to Pay Attention to the Visual Modality in Speech Recognition. 1052-1064 - Christopher Schymura

, Dorothea Kolossa
:
Audiovisual Speaker Tracking Using Nonlinear Dynamical Systems With Dynamic Stream Weights. 1065-1078 - Gongping Huang

, Jacob Benesty
, Israel Cohen
, Jingdong Chen
:
A Simple Theory and New Method of Differential Beamforming With Uniform Linear Microphone Arrays. 1079-1093 - Chung-Ying Ho, Kuo-Kai Shyu, Cheng-Yuan Chang

, Sen M. Kuo:
Efficient Narrowband Noise Cancellation System Using Adaptive Line Enhancer. 1094-1103 - Aditya Arie Nugraha

, Kouhei Sekiguchi
, Kazuyoshi Yoshii
:
A Flow-Based Deep Latent Variable Model for Speech Spectrogram Modeling and Enhancement. 1104-1117 - Beat Gfeller

, Christian Havnø Frank
, Dominik Roblek
, Matthew Sharifi
, Marco Tagliasacchi
, Mihajlo Velimirovic
:
SPICE: Self-Supervised Pitch Estimation. 1118-1128 - Christoph Urbanietz

, Gerald Enzner
:
Direct Spatial-Fourier Regression of HRIRs from Multi-Elevation Continuous-Azimuth Recordings. 1129-1142 - Yaakov Buchris

, Israel Cohen
, Jacob Benesty
, Alon Amar
:
Joint Sparse Concentric Array Design for Frequency and Rotationally Invariant Beampattern. 1143-1158 - Tharindu Fernando

, Sridha Sridharan
, Mitchell McLaren, Darshana Priyasad
, Simon Denman
, Clinton Fookes
:
Temporarily-Aware Context Modeling Using Generative Adversarial Networks for Speech Activity Detection. 1159-1169 - Haipeng Sun

, Rui Wang
, Kehai Chen
, Masao Utiyama, Eiichiro Sumita, Tiejun Zhao:
Unsupervised Neural Machine Translation With Cross-Lingual Language Representation Agreement. 1170-1182 - Qiaoling Zhang

, WeiQiang Xu
, Weiwei Zhang
, Jie Feng, Zhiyong Chen
:
Multi-Hypothesis Square-Root Cubature Kalman Particle Filter for Speaker Tracking in Noisy and Reverberant Environments. 1183-1197 - Yinhe Zheng

, Guanyi Chen
, Minlie Huang
:
Out-of-Domain Detection for Natural Language Understanding in Dialog Systems. 1198-1209 - Ina Kodrasi

, Hervé Bourlard:
Spectro-Temporal Sparsity Characterization for Dysarthric Speech Detection. 1210-1222 - Bharat Padi, Anand Mohan

, Sriram Ganapathy
:
Towards Relevance and Sequence Modeling in Language Recognition. 1223-1232 - Iván López-Espejo

, Zheng-Hua Tan
, Jesper Jensen
:
Improved External Speaker-Robust Keyword Spotting for Hearing Assistive Devices. 1233-1247 - Vishnuvardhan Varanasi

, Harshit Gupta
, Rajesh M. Hegde
:
A Deep Learning Framework for Robust DOA Estimation Using Spherical Harmonic Decomposition. 1248-1259 - Sahar Hashemgeloogerdi

, Mark F. Bocko
:
Adaptive Feedback Cancellation in Hearing Aids Based on Orthonormal Basis Functions With Prediction-Error Method Based Prewhitening. 1260-1269 - Maximo Cobos

, Fabio Antonacci, Luca Comanducci
, Augusto Sarti
:
Frequency-Sliding Generalized Cross-Correlation: A Sub-Band Time Delay Estimation Approach. 1270-1281 - Yingying Zhu, Haiquan Zhao

, Xiangping Zeng, Badong Chen
:
Robust Generalized Maximum Correntropy Criterion Algorithms for Active Noise Control. 1282-1292 - Hassan Taherian

, Zhong-Qiu Wang
, Jorge Chang, DeLiang Wang
:
Robust Speaker Recognition Based on Single-Channel and Multi-Channel Speech Enhancement. 1293-1302 - Cunhang Fan

, Jianhua Tao
, Bin Liu
, Jiangyan Yi
, Zhengqi Wen, Xuefei Liu:
End-to-End Post-Filter for Speech Separation With Deep Attention Fusion Features. 1303-1314 - T. Lavanya

, T. Nagarajan, P. Vijayalakshmi
:
Multi-Level Single-Channel Speech Enhancement Using a Unified Framework for Estimating Magnitude and Phase Spectra. 1315-1327 - Adrien Ycart

, Emmanouil Benetos
:
Learning and Evaluation Methodologies for Polyphonic Music Sequence Prediction With LSTMs. 1328-1341 - Takatomo Kano

, Sakriani Sakti
, Satoshi Nakamura
:
End-to-End Speech Translation With Transcoding by Multi-Task Learning for Distant Language Pairs. 1342-1355 - Huanyu Zuo

, Prasanga N. Samarasinghe
, Thushara D. Abhayapala
:
Intensity Based Spatial Soundfield Reproduction Using an Irregular Loudspeaker Array. 1356-1369 - Chenglin Xu

, Wei Rao
, Eng Siong Chng
, Haizhou Li
:
SpEx: Multi-Scale Time Domain Speaker Extraction Network. 1370-1384 - Wangyou Zhang

, Xuankai Chang
, Yanmin Qian
, Shinji Watanabe
:
Improving End-to-End Single-Channel Multi-Talker Speech Recognition. 1385-1394 - Alakananda Vempala

, Eduardo Blanco
:
Extracting Biographical Spatial Timelines: Corpus and Experiments. 1395-1403 - Qiquan Zhang

, Aaron Nicolson
, Mingjiang Wang
, Kuldip K. Paliwal
, Chenxu Wang:
DeepMMSE: A Deep Learning Approach to MMSE-Based Noise Power Spectral Density Estimation. 1404-1415 - Dhananjay Ram

, Lesly Miculicich, Hervé Bourlard:
Neural Network Based End-to-End Query by Example Spoken Term Detection. 1416-1427 - Enea Ceolini

, Ilya Kiselev, Shih-Chii Liu
:
Evaluating Multi-Channel Multi-Device Speech Separation Algorithms in the Wild: A Hardware-Software Solution. 1428-1439 - Su Zhu

, Zijian Zhao, Rao Ma
, Kai Yu
:
Prior Knowledge Driven Label Embedding for Slot Filling in Natural Language Understanding. 1440-1451 - Haoran Miao

, Gaofeng Cheng
, Pengyuan Zhang
, Yonghong Yan
:
Online Hybrid CTC/Attention End-to-End Automatic Speech Recognition Architecture. 1452-1465 - Liwei Lin

, Xiangdong Wang
, Hong Liu, Yueliang Qian:
Specialized Decision Surface and Disentangled Feature for Weakly-Supervised Polyphonic Sound Event Detection. 1466-1478 - Dong-Yuan Shi

, Woon-Seng Gan
, Bhan Lam
, Shulin Wen
:
Feedforward Selective Fixed-Filter Active Noise Control: Algorithm and Implementation. 1479-1492 - Zhihao Du

, Xueliang Zhang
, Jiqing Han
:
A Joint Framework of Denoising Autoencoder and Generative Vocoder for Monaural Speech Enhancement. 1493-1505 - Yue Zhang

, Yile Wang
, Jie Yang
:
Lattice LSTM for Chinese Sentence Representation. 1506-1519 - Zhuo Tang

, Boyan Wan
, Li Yang
:
Word-Character Graph Convolution Network for Chinese Named Entity Recognition. 1520-1532 - Zhongxin Bai

, Xiao-Lei Zhang
, Jingdong Chen
:
Speaker Verification by Partial AUC Optimization With Mahalanobis Distance Metric Learning. 1533-1548 - Mrinmoy Bhattacharjee

, S. R. Mahadeva Prasanna, Prithwijit Guha
:
Speech/Music Classification Using Features From Spectral Peaks. 1549-1559 - Liming Wang

, Mark Hasegawa-Johnson
:
Multimodal Word Discovery and Retrieval With Spoken Descriptions and Visual Concepts. 1560-1573 - Yang Fan

, Fei Tian, Yingce Xia
, Tao Qin
, Xiang-Yang Li
, Tie-Yan Liu:
Searching Better Architectures for Neural Machine Translation. 1574-1585 - Kehai Chen

, Rui Wang
, Masao Utiyama, Eiichiro Sumita, Tiejun Zhao, Muyun Yang
, Hai Zhao:
Towards More Diverse Input Representation for Neural Machine Translation. 1586-1597 - Yan Zhao

, DeLiang Wang
, Buye Xu
, Tao Zhang:
Monaural Speech Dereverberation Using Temporal Convolutional Networks With Self Attention. 1598-1607 - Yanhui Tu

, Jun Du
, Tian Gao
, Chin-Hui Lee
:
A Multi-Target SNR-Progressive Learning Approach to Regression Based Speech Enhancement. 1608-1619 - Christine Evers

, Heinrich W. Löllmann, Heinrich Mellmann, Alexander Schmidt
, Hendrik Barfuss
, Patrick A. Naylor
, Walter Kellermann:
The LOCATA Challenge: Acoustic Source Localization and Tracking. 1620-1643 - Hiroaki Tsushima, Eita Nakamura

, Kazuyoshi Yoshii
:
Bayesian Melody Harmonization Based on a Tree-Structured Generative Model of Chord Sequences and Melodies. 1644-1655 - Keunhyoung Luke Kim

, Jongpil Lee
, Sangeun Kum, Chae Lin Park, Juhan Nam
:
Semantic Tagging of Singing Voices in Popular Music Recordings. 1656-1668 - Liner Yang

, Cunliang Kong
, Yun Chen, Yang Liu
, Qinan Fan, Erhong Yang:
Incorporating Sememes into Chinese Definition Modeling. 1669-1677 - Ryo Nishikimi

, Eita Nakamura
, Masataka Goto
, Katsutoshi Itoyama
, Kazuyoshi Yoshii
:
Bayesian Singing Transcription Based on a Hierarchical Generative Model of Keys, Musical Notes, and F0 Trajectories. 1678-1691 - Byeongho Jo

, Franz Zotter
, Jung-Woo Choi
:
Extended Vector-Based EB-ESPRIT Method. 1692-1705 - Andros Tjandra

, Sakriani Sakti
, Satoshi Nakamura
:
Corrections to "Machine Speech Chain". 1706 - Zaixiang Zheng

, Shujian Huang, Rongxiang Weng, Xin-Yu Dai, Jiajun Chen:
Improving Self-Attention Networks With Sequential Relations. 1707-1716 - Parvaneh Janbakhshi

, Ina Kodrasi
, Hervé Bourlard:
Automatic Pathological Speech Intelligibility Assessment Exploiting Subspace-Based Analyses. 1717-1728 - Cagdas Tuna

, Antonio Canclini
, Federico Borra
, Philipp Götz, Fabio Antonacci, Andreas Walther, Augusto Sarti
, Emanuël A. P. Habets
:
3D Room Geometry Inference Using a Linear Loudspeaker Array and a Single Microphone. 1729-1744 - Xianjun Xia

, Roberto Togneri
, Ferdous Sohel
, Yuanjun Zhao
, Defeng David Huang
:
Sound Event Detection Using Multiple Optimized Kernels. 1745-1754 - Federico Borra

, Alberto Bernardini
, Fabio Antonacci, Augusto Sarti
:
Efficient Implementations of First-Order Steerable Differential Microphone Arrays With Arbitrary Planar Geometry. 1755-1766 - Moti Lugasi

, Boaz Rafaely
:
Speech Enhancement Using Masking for Binaural Reproduction of Ambisonics Signals. 1767-1777 - Zhong-Qiu Wang

, Peidong Wang
, DeLiang Wang
:
Complex Spectral Mapping for Single- and Multi-Channel Speech Enhancement and Robust ASR. 1778-1787 - Mostafa Sadeghi

, Simon Leglaive
, Xavier Alameda-Pineda
, Laurent Girin
, Radu Horaud
:
Audio-Visual Speech Enhancement Using Conditional Variational Auto-Encoders. 1788-1800 - Kai Song

, Xiaoqing Zhou, Heng Yu, Zhongqiang Huang, Yue Zhang
, Weihua Luo
, Xiangyu Duan
, Min Zhang:
Towards Better Word Alignment in Transformer. 1801-1812 - Lian Huang

, Chi-Man Pun
:
Audio Replay Spoof Attack Detection by Joint Segment-Based Linear Filter Bank Feature Extraction and Attention-Enhanced DenseNet-BiLSTM Network. 1813-1825 - Yang Xiang

, Changchun Bao
:
A Parallel-Data-Free Speech Enhancement Method Using Multi-Objective Learning Cycle-Consistent Generative Adversarial Network. 1826-1838 - Hao Fei, Donghong Ji, Yue Zhang

, Yafeng Ren
:
Topic-Enhanced Capsule Network for Multi-Label Emotion Classification. 1839-1848 - Hirokazu Kameoka

, Kou Tanaka, Damian Kwasny
, Takuhiro Kaneko, Nobukatsu Hojo:
ConvS2S-VC: Fully Convolutional Sequence-to-Sequence Voice Conversion. 1849-1863 - Huayang Li

, Guoping Huang
, Deng Cai
, Lemao Liu
:
Neural Machine Translation With Noisy Lexical Constraints. 1864-1874 - Chien-Yao Wang

, Tzu-Chiang Tai, Jia-Ching Wang
, Andri Santoso
, Seksan Mathulaprangsan, Chin-Chin Chiang, Chung-Hsien Wu
:
Sound Events Recognition and Retrieval Using Multi-Convolutional-Channel Sparse Coding Convolutional Neural Networks. 1875-1887 - Chang-Le Liu, Sze-Wei Fu, You-Jin Li, Jen-Wei Huang

, Hsin-Min Wang
, Yu Tsao
:
Multichannel Speech Enhancement by Raw Waveform-Mapping Using Fully Convolutional Networks. 1888-1900 - Dhananjaya N. Gowda, Sudarsana Reddy Kadiri

, Brad H. Story, Paavo Alku
:
Time-Varying Quasi-Closed-Phase Analysis for Accurate Formant Tracking in Speech Signals. 1901-1914 - Sebastian J. Schlecht

, Emanuël A. P. Habets
:
Scattering in Feedback Delay Networks. 1915-1924 - Irene Martín-Morató

, Maximo Cobos
, Francesc J. Ferri
:
Adaptive Distance-Based Pooling in Convolutional Neural Networks for Audio Event Classification. 1925-1935 - Su Zhu

, Ruisheng Cao
, Kai Yu
:
Dual Learning for Semi-Supervised Natural Language Understanding. 1936-1947 - Yuki Kubo

, Norihiro Takamune, Daichi Kitamura
, Hiroshi Saruwatari
:
Blind Speech Extraction Based on Rank-Constrained Spatial Covariance Matrix Estimation With Multivariate Generalized Gaussian Distribution. 1948-1963 - Vinayak Abrol

, Pulkit Sharma
:
Learning Hierarchy Aware Embedding From Raw Audio for Acoustic Scene Classification. 1964-1973 - Daniele Mirabilii

, Emanuël A. P. Habets
:
Spatial Coherence-Aware Multi-Channel Wind Noise Reduction. 1974-1987 - Yougen Yuan

, Lei Xie
, Cheung-Chi Leung, Hongjie Chen, Bin Ma:
Fast Query-by-Example Speech Search Using Attention-Based Deep Binary Embeddings. 1988-2000 - Daniele Salvati

, Carlo Drioli
, Gian Luca Foresti
:
Diagonal Unloading Beamforming in the Spherical Harmonic Domain for Acoustic Source Localization in Reverberant Environments. 2001-2012 - Youzhi Tu

, Man-Wai Mak
, Jen-Tzung Chien
:
Variational Domain Adversarial Learning With Mutual Information Maximization for Speaker Verification. 2013-2024 - Thomas Sgouros

, Nikolaos Mitianoudis
:
A novel Directional Framework for Source Counting and Source Separation in Instantaneous Underdetermined Audio Mixtures. 2025-2035 - Kenta Niwa

, Hironobu Chiba, Noboru Harada
, Guoqiang Zhang
, W. Bastiaan Kleijn
:
Microphone Array Wiener Post Filtering Using Monotone Operator Splitting. 2036-2046 - Hui Luo

, Jiqing Han
:
Nonnegative Matrix Factorization Based Transfer Subspace Learning for Cross-Corpus Speech Emotion Recognition. 2047-2060 - Ming-Hsiang Su

, Chung-Hsien Wu
, Hao-Tse Cheng:
A Two-Stage Transformer-Based Approach for Variable-Length Abstractive Summarization. 2061-2072 - Boqing Zhu

, Kele Xu
, Qiuqiang Kong
, Huaimin Wang, Yuxing Peng:
Audio Tagging by Cross Filtering Noisy Labels. 2073-2083 - Sangeeta Bagha, Debi Prasad Das

, Santosh Kumar Behera
:
An Efficient Narrowband Active Noise Control System for Accommodating Frequency Mismatch. 2084-2094 - Weiwei Zhang

, Zhe Chen
, Fuliang Yin
:
Multi-Pitch Estimation of Polyphonic Music Based on Pseudo Two-Dimensional Spectrum. 2095-2108 - Yuzhou Liu

, DeLiang Wang
:
Causal Deep CASA for Monaural Talker-Independent Speaker Separation. 2109-2118 - Huanyu Zuo

, Thushara D. Abhayapala
, Prasanga N. Samarasinghe
:
Particle Velocity Assisted Three Dimensional Sound Field Reproduction Using a Modal-Domain Approach. 2119-2133 - Shun Kiyono

, Jun Suzuki
, Tomoya Mizumoto, Kentaro Inui:
Massive Exploration of Pseudo Data for Grammatical Error Correction. 2134-2145 - Bin Wang

, C.-C. Jay Kuo
:
SBERT-WK: A Sentence Embedding Method by Dissecting BERT-Based Word Models. 2146-2157 - Guillaume Carbajal

, Romain Serizel, Emmanuel Vincent
, Eric Humbert:
Joint NN-Supported Multichannel Reduction of Acoustic Echo, Reverberation and Noise. 2158-2173 - Qi Liu

, Zhehuai Chen
, Hao Li, Mingkun Huang, Yizhou Lu, Kai Yu
:
Modular End-to-End Automatic Speech Recognition Framework for Acoustic-to-Word Model. 2174-2183 - Hanan Beit-On

, Boaz Rafaely
:
Focusing and Frequency Smoothing for Arbitrary Arrays With Application to Speaker Localization. 2184-2193 - Christoph Pörschmann

, Johannes M. Arend
, Fabian Brinkmann
:
Correction to "Directional Equalization of Sparse Head-Related Transfer Function Sets for Spatial Upsampling". 2194 - Tomi Kinnunen

, Héctor Delgado
, Nicholas W. D. Evans, Kong Aik Lee
, Ville Vestman
, Andreas Nautsch
, Massimiliano Todisco, Xin Wang
, Md. Sahidullah
, Junichi Yamagishi
, Douglas A. Reynolds:
Tandem Assessment of Spoofing Countermeasures and Automatic Speaker Verification: Fundamentals. 2195-2210 - Sheng-Hua Zhong

, Peiqi Liu
, Zhong Ming
, Yan Liu:
How to Evaluate Single-Round Dialogues Like Humans: An Information-Oriented Metric. 2211-2223 - Wilmer Lobato

, Márcio Holsbach Costa
:
Worst-Case-Optimization Robust-MVDR Beamformer for Stereo Noise Reduction in Hearing Aids. 2224-2237 - Luca Comanducci

, Federico Borra
, Paolo Bestagini
, Fabio Antonacci, Stefano Tubaro
, Augusto Sarti
:
Source Localization Using Distributed Microphones in Reverberant Environments Based on Deep Learning and Ray Space Transform. 2238-2251 - Feiran Yang

, Jianfeng Guo, Jun Yang
:
Stochastic Analysis of the Filtered-x LMS Algorithm for Active Noise Control. 2252-2266 - Tomohiro Nakatani

, Christoph Böddeker, Keisuke Kinoshita
, Rintaro Ikeshita
, Marc Delcroix
, Reinhold Haeb-Umbach
:
Jointly Optimal Denoising, Dereverberation, and Source Separation. 2267-2282 - Haytham M. Fayek

, Justin Johnson:
Temporal Reasoning via Audio Question Answering. 2283-2294 - Amulya Gupta, Zhu (Drew) Zhang

:
Swings and Roundabouts: Attention-Structure Interaction Effect in Deep Semantic Matching. 2295-2307 - Chang Huai You

, Jichen Yang
:
Device Feature Extraction Based on Parallel Neural Network Training for Replay Spoofing Detection. 2308-2318 - Santosh Kesiraju

, Oldrich Plchot, Lukás Burget
, Suryakanth V. Gangashetty:
Learning Document Embeddings Along With Their Uncertainties. 2319-2332 - Mirco Pezzoli

, Federico Borra
, Fabio Antonacci
, Stefano Tubaro
, Augusto Sarti
:
A Parametric Approach to Virtual Miking for Sources of Arbitrary Directivity. 2333-2348 - Shengbei Wang

, Weitao Yuan
, Masashi Unoki
:
Multi-Subspace Echo Hiding Based on Time-Frequency Similarities of Audio Signals. 2349-2363 - Yujia Qin

, Fanchao Qi
, Sicong Ouyang, Zhiyuan Liu
, Cheng Yang
, Yasheng Wang, Qun Liu, Maosong Sun:
Improving Sequence Modeling Ability of Recurrent Neural Networks via Sememes. 2364-2373 - Yuchen Dong, Jie Chen

, Wen Zhang
:
Distributed Wave-Domain Active Noise Control Based on the Diffusion Adaptation. 2374-2385 - Fatemeh Pishdadian, Gordon Wichern, Jonathan Le Roux:

Finding Strength in Weakness: Learning to Separate Sounds With Weak Supervision. 2386-2399 - Zhi Chen

, Lu Chen, Xiaoyuan Liu, Kai Yu
:
Distributed Structured Actor-Critic Reinforcement Learning for Universal Dialogue Management. 2400-2411 - Taewoong Lee

, Jesper Kjær Nielsen
, Mads Græsbøll Christensen
:
Signal-Adaptive and Perceptually Optimized Sound Zones With Variable Span Trade-Off Filters. 2412-2426 - Hao Fei

, Meishan Zhang
, Fei Li, Donghong Ji:
Cross-Lingual Semantic Role Labeling With Model Transfer. 2427-2437 - Kai Yu

, Rao Ma
, Kaiyu Shi, Qi Liu
:
Neural Network Language Model Compression With Product Quantization and Soft Binarization. 2438-2449 - Qiuqiang Kong

, Yong Xu, Wenwu Wang
, Mark D. Plumbley
:
Sound Event Detection of Weakly Labelled Data With CNN-Transformer and Automatic Threshold Optimization. 2450-2460 - Adrian Herzog

, Emanuël A. P. Habets
:
Direction and Reverberation Preserving Noise Reduction of Ambisonics Signals. 2461-2475 - Yu Wang

, Yun Li, Ziye Zhu, Hanghang Tong
, Yue Huang:
Adversarial Learning for Multi-Task Sequence Labeling With Attention Mechanism. 2476-2488 - Ashutosh Pandey

, DeLiang Wang
:
On Cross-Corpus Generalization of Deep Learning Based Speech Enhancement. 2489-2499 - Mantong Zhou

, Minlie Huang
, Xiaoyan Zhu:
Robust Reading Comprehension With Linguistic Constraints via Posterior Regularization. 2500-2510 - Michael Saxon

, Ayush Tripathi
, Yishan Jiao, Julie M. Liss, Visar Berisha:
Robust Estimation of Hypernasality in Dysarthria With Acoustic Model Likelihood Features. 2511-2522 - Lin Wang

, Andrea Cavallaro
:
A Blind Source Separation Framework for Ego-Noise Reduction on Multi-Rotor Drones. 2523-2537 - Bowen Zhang

, Xutao Li
, Xiaofei Xu, Ka-Cheong Leung
, Zhiyao Chen, Yunming Ye
:
Knowledge Guided Capsule Attention Network for Aspect-Based Sentiment Analysis. 2538-2551 - Qi Qi

, Xiaolu Wang, Haifeng Sun
, Jingyu Wang
, Xiao Liang, Jianxin Liao:
A Novel Multi-Task Learning Framework for Semi-Supervised Semantic Parsing. 2552-2560 - Haisong Ding

, Kai Chen
, Qiang Huo:
Improving Knowledge Distillation of CTC-Trained Acoustic Models With Alignment-Consistent Ensemble and Target Delay. 2561-2571 - Ayana

, Yun Chen, Cheng Yang
, Zhiyuan Liu
, Maosong Sun:
Reinforced Zero-Shot Cross-Lingual Neural Headline Generation. 2572-2584 - Mingming Yang

, Rui Wang
, Kehai Chen
, Xing Wang, Tiejun Zhao, Min Zhang:
A Novel Sentence-Level Agreement Architecture for Neural Machine Translation. 2585-2597 - Shuai Wang

, Yexin Yang
, Zhanghao Wu
, Yanmin Qian
, Kai Yu
:
Data Augmentation Using Deep Generative Models for Embedding Based Speaker Recognition. 2598-2609 - Kouhei Sekiguchi

, Yoshiaki Bando
, Aditya Arie Nugraha
, Kazuyoshi Yoshii
, Tatsuya Kawahara
:
Fast Multichannel Nonnegative Matrix Factorization With Directivity-Aware Jointly-Diagonalizable Spatial Covariance Matrices for Blind Source Separation. 2610-2625 - Thi Ngoc Tho Nguyen

, Woon-Seng Gan
, Rishabh Ranjan, Douglas L. Jones:
Robust Source Counting and DOA Estimation Using Spatial Pseudo-Spectrum and Convolutional Neural Network. 2626-2637 - Ondrej Cífka

, Umut Simsekli, Gaël Richard
:
Groove2Groove: One-Shot Music Style Transfer With Supervision From Synthetic Data. 2638-2650 - Judy Najnudel

, Thomas Hélie
, David Roze, Henri Boutin
:
Simulation of an Ondes Martenot Circuit. 2651-2660 - R. Jyothi

, Prabhu Babu:
SOLVIT: A Reference-Free Source Localization Technique Using Majorization Minimization. 2661-2673 - Peng Shen

, Xugang Lu, Sheng Li
, Hisashi Kawai:
Knowledge Distillation-Based Representation Learning for Short-Utterance Spoken Language Identification. 2674-2683 - Vicent Molés-Cases

, Gema Piñero
, Maria de Diego
, Alberto González
:
Personal Sound Zones by Subband Filtering and Time Domain Optimization. 2684-2696 - Srinivas Parthasarathy

, Carlos Busso
:
Semi-Supervised Speech Emotion Recognition With Ladder Networks. 2697-2709 - Artuur Leeuwenberg

, Marie-Francine Moens
:
Towards Extracting Absolute Event Timelines From English Clinical Reports. 2710-2719 - Lin Sun

, Yuxuan Sun
, Fule Ji, Chi Wang
:
Joint Learning of Token Context and Span Feature for Span-Based Nested NER. 2720-2730 - Jamal Amini

, Richard Christian Hendriks
, Richard Heusdens
, Meng Guo
, Jesper Jensen
:
Spatially Correct Rate-Constrained Noise Reduction for Binaural Hearing Aids in Wireless Acoustic Sensor Networks. 2731-2742 - Zuchao Li

, Chaoyu Guan, Hai Zhao, Rui Wang
, Kevin Parnow, Zhuosheng Zhang
:
Memory Network for Linguistic Structure Parsing. 2743-2755 - Cheng Yu, Ryandhimas E. Zezario

, Syu-Siang Wang
, Jonathan Sherman, Yi-Yen Hsieh
, Xugang Lu, Hsin-Min Wang
, Yu Tsao
:
Speech Enhancement Based on Denoising Autoencoder With Multi-Branched Encoders. 2756-2769 - Xin Liu

, Qingcai Chen
, Xiangping Wu
, Yang Hua, Jing Chen, Dongfang Li, Buzhou Tang, Xiaolong Wang:
Gated Semantic Difference Based Sentence Semantic Equivalence Identification. 2770-2780 - Gilles Boulianne

:
A Study of Inductive Biases for Unsupervised Speech Representation Learning. 2781-2795 - Yu-Te Wu, Berlin Chen, Li Su

:
Multi-Instrument Automatic Music Transcription With Self-Attention-Based Instance Segmentation. 2796-2809 - Weiwei Lin

, Man-Wai Mak, Na Li, Dan Su, Dong Yu:
A Framework for Adapting DNN Speaker Embedding Across Languages. 2810-2822 - Purvi Agrawal

, Sriram Ganapathy
:
Interpretable Representation Learning for Speech and Audio Signals Based on Relevance Weighting. 2823-2836 - Xingwei Sun

, Ze-Feng Gao
, Zhong-Yi Lu, Junfeng Li, Yonghong Yan:
A Model Compression Method With Matrix Product Operators for Speech Enhancement. 2837-2847 - Koby Weisberg, Bracha Laufer-Goldshtein

, Sharon Gannot
:
Simultaneous Tracking and Separation of Multiple Sources Using Factor Graph Model. 2848-2864 - Chao Pan

, Jingdong Chen
, Guangming Shi
:
On Estimation of Time-Varying Variances of Source and Noise for Sensor Array Processing. 2865-2879 - Qiuqiang Kong

, Yin Cao, Turab Iqbal
, Yuxuan Wang, Wenwu Wang
, Mark D. Plumbley
:
PANNs: Large-Scale Pretrained Audio Neural Networks for Audio Pattern Recognition. 2880-2894 - Shuyang Zhao

, Toni Heittola
, Tuomas Virtanen
:
Active Learning for Sound Event Detection. 2895-2905 - Ondrej Mokrý

, Pavel Rajmic
:
Audio Inpainting: Revisited and Reweighted. 2906-2918 - Christof Weiß

, Hendrik Schreiber
, Meinard Müller
:
Local Key Estimation in Music Recordings: A Case Study Across Songs, Versions, and Annotators. 2919-2932 - Leilei Gan

, Yue Zhang
:
Investigating Self-Attention Network for Chinese Word Segmentation. 2933-2941 - Nico Gößling

, Elior Hadad
, Sharon Gannot
, Simon Doclo
:
Binaural LCMV Beamforming With Partial Noise Estimation. 2942-2955 - Yiming Wu

, Tristan Carsault, Eita Nakamura
, Kazuyoshi Yoshii
:
Semi-Supervised Neural Chord Estimation Based on a Variational Autoencoder With Latent Chord Labels and Features. 2956-2966 - Hieu-Thi Luong

, Junichi Yamagishi
:
NAUTILUS: A Versatile Voice Cloning System. 2967-2981 - Hirokazu Kameoka

, Takuhiro Kaneko, Kou Tanaka, Nobukatsu Hojo:
Nonparallel Voice Conversion With Augmented Classifier Star Generative Adversarial Networks. 2982-2995 - Pierre Lecomte

, Manuel Melon
, Laurent Simon
:
Spherical Fraction Beamforming. 2996-3009 - Constantinos Papayiannis

, Christine Evers
, Patrick A. Naylor
:
End-to-End Classification of Reverberant Rooms Using DNNs. 3010-3017 - Bhusan Chettri

, Emmanouil Benetos
, Bob L. T. Sturm
:
Dataset Artefacts in Anti-Spoofing Systems: A Case Study on the ASVspoof 2017 Benchmark. 3018-3028 - Alexios Gidiotis

, Grigorios Tsoumakas
:
A Divide-and-Conquer Approach to the Summarization of Long Documents. 3029-3040 - Huiyuan Sun

, Thushara D. Abhayapala
, Prasanga N. Samarasinghe
:
A Realistic Multiple Circular Array System for Active Noise Control Over 3D Space. 3041-3052 - Sasan Asadiabadi

, Engin Erzin
:
Vocal Tract Contour Tracking in rtMRI Using Deep Temporal Regression Network. 3053-3064 - Hung-Shin Lee

, Yu Tsao
, Shyh-Kang Jeng
, Hsin-Min Wang
:
Subspace-Based Representation and Learning for Phonotactic Spoken Language Recognition. 3065-3079 - Juan M. Martín-Doñas

, Jesper Jensen
, Zheng-Hua Tan
, Angel M. Gomez
, Antonio M. Peinado
:
Online Multichannel Speech Enhancement Based on Recursive EM and DNN-Based Speech Presence Estimation. 3080-3094 - Rui Wang

, Zhe Chen
, Fuliang Yin
:
Active Sampling Rate Calibration Method for Acoustic Sensor Networks. 3095-3107 - Yonggang Hu

, Prasanga N. Samarasinghe
, Sharon Gannot
, Thushara D. Abhayapala
:
Semi-Supervised Multiple Source Localization Using Relative Harmonic Coefficients Under Noisy and Reverberant Environments. 3108-3123 - Ashwin Bellur, Mounya Elhilali

:
Audio Object Classification Using Distributed Beliefs and Attention. 729-739

manage site settings
To protect your privacy, all features that rely on external API calls from your browser are turned off by default. You need to opt-in for them to become active. All settings here will be stored as cookies with your web browser. For more information see our F.A.Q.


Google
Google Scholar
Semantic Scholar
Internet Archive Scholar
CiteSeerX
ORCID














