


default search action
INTERSPEECH 2012: Portland, Oregon, USA
- 13th Annual Conference of the International Speech Communication Association, INTERSPEECH 2012, Portland, Oregon, USA, September 9-13, 2012. ISCA 2012

An Information-Extraction Approach to Speech Analysis and Processing
- Chin-Hui Lee:

An Information-Extraction Approach to Speech Analysis and Processing. 1-5
ASR: Deep Neural Networks I
- Dong Yu, Li Deng, Frank Seide:

Large Vocabulary Speech Recognition Using Deep Tensor Neural Networks. 6-9 - Brian Kingsbury, Tara N. Sainath, Hagen Soltau:

Scalable Minimum Bayes Risk Training of Deep Neural Network Acoustic Models Using Distributed Hessian-free Optimization. 10-13 - George Saon, Brian Kingsbury:

Discriminative feature-space transforms using deep neural networks. 14-17 - Zoltán Tüske, Ralf Schlüter, Hermann Ney, Martin Sundermeyer:

Context-Dependent MLPs for LVCSR: TANDEM, Hybrid or Both? 18-21 - Andrew L. Maas, Quoc V. Le, Tyler M. O'Neil, Oriol Vinyals, Patrick Nguyen, Andrew Y. Ng:

Recurrent Neural Networks for Noise Reduction in Robust ASR. 22-25 - Xie Chen, Adam Eversole, Gang Li, Dong Yu, Frank Seide:

Pipelined Back-Propagation for Context-Dependent Deep Neural Networks. 26-29
Language Recognition
- Hynek Boril, Abhijeet Sangwan, John H. L. Hansen:

Arabic Dialect Identification - 'Is the Secret in the Silence?' and Other Observations. 30-33 - Craig S. Greenberg, Alvin F. Martin, Mark A. Przybocki:

The 2011 NIST Language Recognition Evaluation. 34-37 - Luis Javier Rodríguez-Fuentes, Mikel Peñagarikano, Amparo Varona, Mireia Díez, Germán Bordel, Alberto Abad, David Martínez González, Jesús Antonio Villalba López, Alfonso Ortega, Eduardo Lleida:

The BLZ Submission to the NIST 2011 LRE: Data Collection, System Development and Performance. 38-41 - Luis Fernando D'Haro, Ondrej Glembek, Oldrich Plchot, Pavel Matejka, Mehdi Soufifar, Ricardo de Córdoba, Jan Cernocký:

Phonotactic Language Recognition using i-vectors and Phoneme Posteriogram Counts. 42-45 - Alan McCree, Bengt J. Borgstrom:

Supervector LDA: A New Approach to Reduced-Complexity I-vector Language Recognition. 46-49 - Pavel Matejka, Oldrich Plchot, Mehdi Soufifar, Ondrej Glembek, Luis Fernando D'Haro, Karel Veselý, Frantisek Grézl, Jeff Z. Ma, Spyros Matsoukas, Najim Dehak

:
Patrol Team Language Identification System for DARPA RATS P1 Evaluation. 50-53
Communication Disorders and Assistive Technologies
- Fang Hu, Yungang Wu, Wen Xu, Demin Han:

Articulatory Strategies in Obstruent Production in Mandarin Esophageal Speech. 54-57 - Marion Bechet, Fabrice Hirsch, Camille Fauth, Rudolph Sock:

Consonantal space area in Children with a Cleft Palate An acoustic Study. 58-61 - Milton Orlando Sarria-Paja, Tiago H. Falk:

Automated Dysarthria Severity Classification for Improved Objective Intelligibility Assessment of Spastic Dysarthric Speech. 62-65 - Abdellah Kacha, Francis Grenez, Jean Schoentgen:

Assessment of Disordered Voices Using Empirical Mode Decomposition in the Log-Spectral Domain. 66-69 - Anna Katharina Fuchs, Martin Hagmüller:

Learning an Artificial F0-Contour for ALT Speech. 70-73 - Korin Richmond, Steve Renals:

Ultrax: An Animated Midsagittal Vocal Tract Display for Speech Therapy. 74-77
Voice Conversion
- Hsin-Te Hwang, Yu Tsao, Hsin-Min Wang, Yih-Ru Wang, Sin-Horng Chen:

A Study of Mutual Information for GMM-Based Spectral Conversion. 78-81 - Na Li, Yu Qiao:

Bayesian Mixture of Probabilistic Linear Regressions for Voice Conversion. 82-85 - Daniel Erro, Eva Navas, Inma Hernáez:

Iterative MMSE Estimation of Vocal Tract Length Normalization Factors for Voice Transformation. 86-89 - Winston S. Percybrooks, Elliot Moore:

A HMM approach to residual estimation for high resolution voice conversion. 90-93 - Tomoki Toda, Takashi Muramatsu, Hideki Banno:

Implementation of Computationally Efficient Real-Time Voice Conversion. 94-97 - Daisuke Saito, Nobuaki Minematsu, Keikichi Hirose:

Effects of Speaker Adaptive Training on Tensor-based Arbitrary Speaker Conversion. 98-101
Speaker Trait Challenge - Part 1
- Björn W. Schuller, Stefan Steidl, Anton Batliner, Elmar Nöth, Alessandro Vinciarelli, Felix Burkhardt, Rob van Son, Felix Weninger, Florian Eyben, Tobias Bocklet, Gelareh Mohammadi, Benjamin Weiss:

The INTERSPEECH 2012 Speaker Trait Challenge. 254-257 - Tim Polzehl, Katrin Schoenenberg, Sebastian Möller, Florian Metze, Gelareh Mohammadi, Alessandro Vinciarelli:

On Speaker-Independent Personality Perception and Prediction from Speech. 258-261 - Kartik Audhkhasi, Angeliki Metallinou, Ming Li, Shrikanth S. Narayanan:

Speaker Personality Classification Using Systems Based on Acoustic-Lexical Cues and an Optimal Tree-Structured Bayesian Network. 262-265 - Clément Chastagnol, Laurence Devillers:

Personality traits detection using a parallelized modified SFFS algorithm. 266-269 - Jouni Pohjalainen, Serdar Kadioglu, Okko Räsänen:

Feature Selection for Speaker Traits. 270-273 - Johannes Wagner, Florian Lingenfelser, Elisabeth André

:
A Frame Pruning Approach for Paralinguistic Recognition Tasks. 274-277 - Alexei Ivanov, Xin Chen:

Modulation Spectrum Analysis for Speaker Personality Trait Recognition. 278-281 - Nicholas Cummins, Julien Epps, Jia Min Karen Kua:

A Comparison of Classification Paradigms for Speaker Likeability Determination. 282-285 - Dingchao Lu, Fei Sha:

Predicting Likability of Speakers with Gaussian Processes. 286-289 - Raymond Brueckner, Björn W. Schuller:

Likability Classification - A Not so Deep Neural Network Approach. 290-293 - Dongrui Wu:

Genetic Algorithm Based Feature Selection for Speaker Trait Classification. 294-297
Phonetics and Phonology
- Felix Weninger, Björn W. Schuller:

Discrimination of Linguistic and Non-Linguistic Vocalizations in Spontaneous Speech: Intra- and Inter-Corpus Perspectives. 102-105 - Mathieu Avanzi, Pauline Dubosson, Sandra Schwab, Nicolas Obin:

Accentual Transfer from Swiss-German to French. A Study of "Français Fédéral". 106-109 - Stefanie Jannedy, Melanie Weirich:

Phonology & the Interpretation of Fine Phonetic Detail in Berlin German. 110-113 - Carlos Toshinori Ishi, Chaoran Liu, Hiroshi Ishiguro, Norihiro Hagita:

Evaluation of a formant-based speech-driven lip motion generation. 114-117 - Jeffrey Kallay, Jeffrey J. Holliday:

Using spectral measures to differentiate Mandarin and Korean sibilant fricatives. 118-121 - Hua-Li Jian, Richard Konopka:

EFL Conversational Triads: Foreigner-directed Speech and Hyperarticulation. 122-125 - Iris Chuoying Ouyang, Khalil Iskarous:

Syllable perception depends on tone perception. 126-129 - Masako Fujimoto, Seiya Funatsu, Ichiro Fujimoto:

How consonants, dialect and speech rate affect vowel devoicing? 134-137
Enhancement
- Thomas Fehér, Dietmar Richter, Oliver Jokisch, Rüdiger Hoffmann:

Distance-Dependent Noise Reduction for Two-Channel Microphones. 138-141 - Wei Xue, Wenju Liu:

Direction of Arrival Estimation Based on Subband Weighting for Noisy Conditions. 142-145 - Jorge I. Marin-Hurtado, David V. Anderson:

Binaural Noise Reduction Using Frequency-Warped FIR Filters. 146-149 - Meng Yu, Jack Xin:

Exploring Off Time Nature for Speech Enhancement. 150-153 - Xulei Bao, Jie Zhu:

Model-based Single-Channel Dereverberation in Noisy Acoustical Environments. 154-157 - Majid Mirbagheri, Sahar Akram, Shihab A. Shamma:

An Auditory Inspired Multimodal Framework for Speech Enhancement. 158-161 - Oldooz Hazrati, Jaewook Lee, Philipos C. Loizou:

Binary Mask Estimation for Improved Speech Intelligibility in Reverberant Environments. 162-165 - Petko Nikolov Petkov, W. Bastiaan Kleijn

, Gustav Eje Henter:
Enhancing Subjective Speech Intelligibility Using a Statistical Model of Speech. 166-169
Language Modeling
- Amr El-Desoky Mousa, M. Ali Basha Shaik, Ralf Schlüter, Hermann Ney:

Morpheme Level Feature-based Language Models for German LVCSR. 170-173 - Hitoshi Yamamoto, Paul R. Dixon, Shigeki Matsuda, Chiori Hori, Hideki Kashioka:

Tied-State Mixture Language Model for WFST-based Speech Recognition. 174-177 - Tanel Alumäe, Kaarel Kaljurand:

Maximum Entropy Language Model Adaptation for Mobile Speech Input. 178-181 - Gwénolé Lecorvé, John Dines, Thomas Hain

, Petr Motlícek:
Supervised and unsupervised Web-based language model domain adaptation. 182-185 - Yik-Cheung Tam, Paul Vozila:

A Hierarchical Bayesian Approach for Semi-supervised Discriminative Language Modeling. 186-189 - Youzheng Wu, Kazuhiko Abe, Paul R. Dixon, Chiori Hori, Hideki Kashioka:

Leveraging Social Annotation for Topic Language Model Adaptation. 190-193 - Martin Sundermeyer, Ralf Schlüter, Hermann Ney:

LSTM Neural Networks for Language Modeling. 194-197 - Puyang Xu, Brian Roark, Sanjeev Khudanpur:

Phrasal Cohort Based Unsupervised Discriminative Language Modeling. 198-201 - Damianos G. Karakos, Brian Roark, Izhak Shafran, Kenji Sagae, Maider Lehr, Emily Tucker Prud'hommeaux, Puyang Xu, Nathan Glenn, Sanjeev Khudanpur, Murat Saraclar, Daniel M. Bikel, Mark Dredze, Chris Callison-Burch, Yuan Cao, Keith B. Hall, Eva Hasler, Philipp Koehn, Adam Lopez, Matt Post, Darcey Riley:

Deriving conversation-based features from unlabeled speech for discriminative language modeling. 202-205 - Erinç Dikici, Arda Çelebi, Murat Saraclar:

Performance Comparison of Training Algorithms for Semi-Supervised Discriminative Language Modeling. 206-209 - Kapil Thadani, Fadi Biadsy, Daniel M. Bikel:

On-the-fly Topic Adaptation for YouTube Video Transcription. 210-213
Spoken Language Understanding and Dialog
- Bassam Jabaian, Fabrice Lefèvre, Laurent Besacier:

Portability of Semantic Annotations for Fast Development of Dialogue Corpora. 214-217 - Zoraida Callejas, Ramón López-Cózar:

Optimization of Dialog Strategies using Automatic Dialog Simulation and Statistical Dialog Management Techniques. 218-221 - Hiroaki Sugiyama, Toyomi Meguro, Yasuhiro Minami:

Preference-learning based Inverse Reinforcement Learning for Dialog Control. 222-225 - Raveesh Meena, Gabriel Skantze, Joakim Gustafson:

A Data-driven Approach to Understanding Spoken Route Directions in Human-Robot Dialogue. 226-229 - Kazunori Komatani, Akira Hirano, Mikio Nakano:

Detecting System-directed Utterances using Dialogue-level Features. 230-233 - Joaquin Planells, Lluís F. Hurtado, Emilio Sanchis, Encarna Segarra:

An Online Generated Transducer to Increase Dialog Manager Coverage. 234-237 - Abe Kazemzadeh, James Gibson, Juanchen Li, Sungbok Lee, Panayiotis G. Georgiou, Shrikanth S. Narayanan:

A Sequential Bayesian Dialog Agent for Computational Ethnography. 238-241 - Frank Seide, Sean McDirmid:

ClippyScript: A Programming Language for Multi-Domain Dialogue Systems. 242-245 - Klaus-Peter Engelbrecht, Sebastian Möller:

Correlation Between Model-based Approximations of Grounding-related Cognition and User Judgments. 246-249 - Keith Vertanen, Per Ola Kristensson:

Spelling as a Complementary Strategy for Speech Recognition. 2294-2297
ASR: Noise Robustness
- Ken'ichi Kumatani, Bhiksha Raj, Rita Singh, John W. McDonough:

Microphone Array Post-filter based on Spatially-Correlated Noise Measurements for Distant Speech Recognition. 298-301 - Felix Weninger, Martin Wöllmer, Björn W. Schuller:

Combining Bottleneck-BLSTM and Semi-Supervised Sparse NMF for Recognition of Conversational Speech in Highly Instationary Noise. 302-305 - Liang Lu, K. K. Chin, Arnab Ghoshal, Steve Renals:

Noise Compensation for Subspace Gaussian Mixture Models. 306-309 - Yang Sun, Mathew M. Doss, Jort F. Gemmeke, Bert Cranen, Louis ten Bosch, Lou Boves:

Combination of Sparse Classification and Multilayer Perceptron for Noise-robust ASR. 310-313 - Weifeng Li, Hervé Bourlard:

Sub-band based Log-energy and Its Dynamic Range Stretching for Robust In-car Speech Recognition. 314-317 - Mohamed Bouallegue, Driss Matrouf, Georges Linarès, Mickael Rouvier:

Subspace Gaussian Mixture Models Based on Noise Compensation for Speech Recognition. 318-321
Spoken Language Understanding and Dialog II
- Florian Kretzschmar, Sebastian Möller:

"Help Me, I Need More User Tests!" User Simulations as Supportive Tool in the Development Process of Spoken Dialogue Systems. 322-325 - Silke M. Witt:

Caller Response Timing Patterns in Spoken Dialog Systems. 326-329 - Dilek Hakkani-Tür, Gökhan Tür, Larry P. Heck, Ashley Fidler, Asli Celikyilmaz:

A Discriminative Classification-Based Approach to Information State Updates for a Multi-Domain Dialog System. 330-333 - Elizabeth Shriberg, Andreas Stolcke, Dilek Hakkani-Tür, Larry P. Heck:

Learning When to Listen: Detecting System-Addressed Speech in Human-Human-Computer Dialog. 334-337 - Gökhan Tür, Minwoo Jeong, Ye-Yi Wang, Dilek Hakkani-Tür, Larry P. Heck:

Exploiting the Semantic Web for Unsupervised Natural Language Semantic Parsing. 338-341 - Andrew Fandrianto, Maxine Eskénazi:

Prosodic Entrainment in an Information-Driven Dialog System. 342-345
Paralinguistics I
- Fabien Ringeval, Mohamed Chetouani, Björn W. Schuller:

Novel Metrics of Speech Rhythm for the Assessment of Emotion. 346-349 - Martin Wöllmer, Florian Eyben, Björn W. Schuller, Gerhard Rigoll:

Temporal and Situational Context Modeling for Improved Dominance Recognition in Meetings. 350-353 - Marc Swerts, Kitty Leuverink, Madelène Munnik, Vera Nijveld:

Audiovisual correlates of basic emotions in blind and sighted people. 354-357 - Houwei Cao, Ragini Verma, Ani Nenkova:

Combining Ranking and Classification to Improve Emotion Recognition in Spontaneous Speech. 358-361 - Zixing Zhang, Björn W. Schuller:

Active Learning by Sparse Instance Tracking and Classifier Confidence in Acoustic Emotion Recognition. 362-365 - Viktor Rozgic, Sankaranarayanan Ananthakrishnan, Shirin Saleem, Rohit Kumar, Aravind Namandi Vembu, Rohit Prasad:

Emotion Recognition using Acoustic and Lexical Features. 366-369
Pitch and HarMondayic Analysis
- Phillip L. De Leon, Bryan Stewart, Junichi Yamagishi:

Synthetic Speech Discrimination using Pitch Pattern Statistics Derived from Image Analysis. 370-373 - Zhengqi Wen, Hideki Kawahara, Jianhua Tao:

Pitch-Scaled Analysis based Residual Reconstruction for Speech Analysis and Synthesis. 374-377 - Feng Huang, Tan Lee:

Robust Pitch Estimation Using l1-regularized Maximum Likelihood Estimation. 378-381 - Gilles Degottex, Yannis Stylianou:

A full-band adaptive harmonic representation of speech. 382-385 - Hideki Kawahara, Masanori Morise, Ryuichi Nisimura, Toshio Irino:

Deviation measure of waveform symmetry and its application to high-speed and temporally-fine F0 extraction for vocal sound texture manipulation. 386-389 - Kota Yoshizato, Hirokazu Kameoka, Daisuke Saito, Shigeki Sagayama:

Hidden Markov Convolutive Mixture Model for Pitch Contour Analysis of Speech. 390-393
Speaker Trait Challenge - Part 2
- Benjamin Weiss, Felix Burkhardt:

Is 'not bad' good enough? Aspects of unknown voices' likability. 510-513 - Michelle Hewlett Sanchez, Aaron Lawson, Dimitra Vergyri, Harry Bratt:

Multi-System Fusion of Extended Context Prosodic and Cepstral Features for Paralinguistic Speaker Trait Classification. 514-517 - Harm Buisman, Eric O. Postma:

The log-Gabor method: speech classification using spectrogram image analysis. 518-521 - Yazid Attabi, Pierre Dumouchel:

Anchor Models and WCCN Normalization For Speaker Trait Classification. 522-525 - Claude Montacié, Marie-José Caraty:

Pitch and Intonation Contribution to Speakers' Traits Classification. 526-529 - Gopala Krishna Anumanchipalli, Hugo Meinedo, Miguel M. F. Bugalho, Isabel Trancoso, Luís C. Oliveira, Alan W. Black:

Text-dependent pathological voice detection. 530-533 - Jangwon Kim, Naveen Kumar, Andreas Tsiartas, Ming Li, Shrikanth S. Narayanan:

Intelligibility classification of pathological speech using fusion of multiple high level descriptors. 534-537 - Anthony P. Stark, Alireza Bayestehtashk, Meysam Asgari, Izhak Shafran:

Interspeech Pathology Challenge: Investigations into Speaker and Sentence Specific Effects. 538-541 - Xinhui Zhou, Daniel Garcia-Romero, Nima Mesgarani, Maureen L. Stone, Carol Y. Espy-Wilson, Shihab A. Shamma:

Automatic intelligibility assessment of pathologic speech in head and neck cancer based on auditory-inspired spectro-temporal modulations. 542-545 - Dong-Yan Huang, Yongwei Zhu, Dajun Wu, Rongshan Yu:

Detecting Intelligibility by Linear Dimensionality Reduction and Normalized Voice Quality Hierarchical Features. 546-549
Perceptual Learning and Perceptual Cues to Segments and Tones
- Matthias J. Sjerps, James M. McQueen, Holger Mitterer:

Extrinsic normalization for vocal tracts depends on the signal, not on attention. 394-397 - Hiroaki Hatano, Tatsuya Kitamura, Hironori Takemoto, Parham Mokhtari, Kiyoshi Honda, Shinobu Masaki:

Correlation between vocal tract length, body height, formant frequencies, and pitch frequency for the five Japanese vowels uttered by fifteen male speakers. 402-405 - Natthawut Kertkeidkachorn, Surapol Vorapatratorn, Sirinart Tangruamsub, Proadpran Punyabukkana, Atiwong Suchato:

Contribution of Spectral Shapes to Tone Perception. 414-417 - Julien Meyer:

Pitch and phonological perception of tone in the Suruí language of Rondônia (Brazil): identification task of LHL and LHH tonal patterns. 422-425 - Rui Cao, Ratree Wayland, Edith Kaan:

The Role of Creaky Voice in Mandarin Tone 2 and Tone 3 Perception. 426-429 - K. S. Nataraj, Prem C. Pandey:

Detection of Transition Segments in VCV Utterances for Estimation of the Place of Closure of Oral Stops for Speech Training. 406-409 - Odette Scharenborg, Esther Janse, Andrea Weber:

Perceptual Learning of /f/-/s/ by Older Listeners. 398-401 - Cyril Dubois, Rudolph Sock:

Audiovisual discrimination of CV syllables: a simultaneous fMRI-EEG study. 410-413 - Charturong Tantibundhit, Chutamanee Onsuwan, P. Phienphanich, Chai Wutiwiwatchai:

Methodological Issues in Assessing Perceptual Representation of Consonant Sounds in Thai. 418-421 - Michael D. Tyler, Mona Faris:

Can litheners retune native categories acroth a thoneme boundary? 430-433
Speech Synthesis: Prosody
- Eric Morley, Esther Klabbers, Jan P. H. van Santen, Alexander Kain, Seyed Hamidreza Mohammadi:

Synthetic F0 Can Effectively Convey Speaker ID in Delexicalized Speech. 434-437 - Timo Baumann, David Schlangen:

Evaluating Prosodic Processing for Incremental Speech Synthesis. 438-441 - Kazuhiko Iwata, Tetsunori Kobayashi:

Expressing Speaker's Intentions through Sentence-Final Intonations for Japanese Conversational Speech Synthesis. 442-445 - Alok Parlikar, Alan W. Black:

Modeling Pause-Duration for Style-Specific Speech Synthesis. 446-449 - Martin Gruber:

Enumerating Differences Between Various Communicative Functions for Purposes of Czech Expressive Speech Synthesis in Limited Domain. 450-453 - Christoph Norrenbrock, Florian Hinterleitner, Ulrich Heute, Sebastian Möller:

Quality Analysis of Macroprosodic F0 Dynamics in Text-to-Speech Signals. 454-457 - Hiroya Hashimoto, Keikichi Hirose, Nobuaki Minematsu:

Improved Automatic Extraction of Generation Process Model Commands and Its use for Generating Fundamental Frequency Contours for Training HMM-based Speech Synthesis. 458-461 - Tomoki Koriyama, Takashi Nose, Takao Kobayashi:

Discontinuous Observation HMM for Prosodic-Event-Based F0 Generation. 462-465 - Fanbo Meng, Zhiyong Wu, Helen M. Meng, Jia Jia, Lianhong Cai:

Hierarchical English Emphatic Speech Synthesis Based on HMM with Limited Training Data. 466-469 - Sarah Hoffmann, Beat Pfister:

Employing Sentence Structure: Syntax Trees as Prosody Generators. 470-473 - Yasunori Ohishi, Hirokazu Kameoka, Daichi Mochihashi, Kunio Kashino:

A Stochastic Model of Singing Voice F0 Contours for Characterizing Expressive Dynamic Components. 474-477
Speaker Diarization and Age Recognition
- Jan Silovský, Petr Cerva, Jindrich Zdánský, Jan Nouza:

Study on Integration of Speaker Diarization with Speaker Adaptive Speech Recognition for Broadcast Transcription. 478-481 - Stephen Shum, Najim Dehak

, Jim Glass:
On the Use of Spectral and Iterative Methods for Speaker Diarization. 482-485 - Mary Tai Knox, Nikki Mirghafori, Gerald Friedland:

Where did I go wrong?: Identifying troublesome segments for speaker diarization systems. 486-489 - Sree Harsha Yella, Fabio Valente:

Speaker diarization of overlapping speech based on silence distribution in meeting recordings. 490-493 - Simon Bozonnet, Ravichander Vipperla, Nicholas W. D. Evans:

Phone Adaptive Training for Speaker Diarization. 494-497 - Finnian Kelly, Andrzej Drygajlo, Naomi Harte:

Compensating for Ageing and Quality variation in Speaker Verification. 498-501 - David A. van Leeuwen, Mohamad Hasan Bahari:

Calibration of probabilistic age recognition. 502-505 - Mohamad Hasan Bahari, Mitchell McLaren, Hugo Van hamme

, David A. van Leeuwen:
Age Estimation from Telephone Speech using i-vectors. 506-509
ASR: Discriminative Training
- Shakti P. Rath, Martin Karafiát, Ondrej Glembek, Jan Cernocký:

A factorized representation of FMLLR transform based on QR-decomposition. 551-554 - Vikrant Singh Tomar, Richard C. Rose:

A Correlational Discriminant Approach to Feature Extraction for Robust Speech Recognition. 555-558 - Chao Weng, Biing-Hwang Juang, Daniel Povey:

Discriminative Training Using Non-uniform Criteria for Keyword Spotting on Spontaneous Speech. 559-562 - Masayuki Suzuki, Gakuto Kurata, Masafumi Nishimura, Nobuaki Minematsu:

Discriminative Reranking for LVCSR Leveraging Invariant Structure. 563-566 - Ting-Yao Hu, Yu Tsao, Lin-Shan Lee:

Discriminative Fuzzy Clustering Maximum a Posterior Linear Regression for Speaker Adaptation. 567-570 - Muhammad Ali Tahir, Markus Nußbaum-Thom, Ralf Schlüter, Hermann Ney:

Simultaneous Discriminative Training and Mixture Splitting of HMMs for Speech Recognition. 571-574
Single Channel Speech Enhancement
- Laura E. Boucheron, Phillip L. De Leon:

Low-SNR, Speaker-Dependent Speech Enhancement using GMMs and MFCCs. 575-578 - Maria Koutsogiannaki, Michèle Pettinato, Cassie Mayo, Varvara Kandia, Yannis Stylianou:

Can modified casual speech reach the intelligibility of clear speech? 579-582 - Michael Carlin, Nicolas Malyska, Thomas F. Quatieri:

Speech Enhancement Using Sparse Convolutive Non-negative Matrix Factorization with Basis Adaptation. 583-586 - Dorothea Kolossa, Robert M. Nickel

, Steffen Zeiler, Rainer Martin:
Inventory-Based Audio-Visual Speech Enhancement. 587-590 - Emma Jokinen, Paavo Alku, Martti Vainio:

Utilization of the Lombard effect in post-filtering for intelligibility enhancement of telephone speech. 591-594 - Zhiyao Duan, Gautham J. Mysore, Paris Smaragdis:

Speech Enhancement by Online Non-negative Spectrogram Decomposition in Non-stationary Noise Environments. 595-598
Conversation and Interaction I
- Léo Varnet, Julien Meyer, Michel Hoen, Fanny Meunier:

Phoneme resistance during speech-in-speech comprehension. 599-602 - Hugo Quené, Will Schuerman:

Smile with a smile. 603-606 - Rebecca Lunsford, Peter A. Heeman, Jan P. H. van Santen:

Interactions Between Turn-taking Gaps, Disfluencies and Social Obligation. 607-610 - Maeva Garnier, Lucie Ménard, Gabrielle Richard:

Effect of being seen on the production of visible speech cues. A pilot study on Lombard speech. 611-614 - Marcin Wlodarczak, Juraj Simko, Petra Wagner:

Temporal entrainment in overlapped speech: Cross-linguistic study. 615-618 - Chi-Chun Lee, Athanasios Katsamanis, Panayiotis G. Georgiou, Shrikanth S. Narayanan:

Based on Isolated Saliency or Causal Integration? Toward a Better Understanding of Human Annotation Process using Multiple Instance Learning and Sequential Probability Ratio Test. 619-622
Speech Synthesis: Intelligibility
- Ann K. Syrdal, H. Timothy Bunnell, Susan R. Hertz, Taniya Mishra, Murray F. Spiegel, Corine A. Bickley, Deborah Rekart, Matthew J. Makashay:

Text-To-Speech Intelligibility Across Speech Rates. 623-626 - Linfang Wang, Lijuan Wang, Yan Teng, Zhe Geng, Frank K. Soong:

Objective Intelligibility Assessment of Text-to-Speech System using Template Constrained Generalized Posterior Probability. 627-630 - Cassia Valentini-Botinhao, Junichi Yamagishi, Simon King:

Mel cepstral coefficient modification based on the Glimpse Proportion measure for improving the intelligibility of HMM-generated synthetic speech in noise. 631-634 - Tudor-Catalin Zorila, Varvara Kandia, Yannis Stylianou:

Speech-in-noise intelligibility improvement based on spectral shaping and dynamic range compression. 635-638 - Daniel Erro, Yannis Stylianou, Eva Navas, Inma Hernáez:

Implementation of Simple Spectral Techniques to Enhance the Intelligibility of Speech using a Harmonic Model. 639-642 - Seyed Hamidreza Mohammadi, Alexander Kain, Jan P. H. van Santen:

Making Conversational Vowels More Clear. 643-646
Speech and Language Technologies for STEM
- Diane J. Litman, Heather Friedberg, Katherine Forbes-Riley:

Prosodic Cues to Disengagement and Uncertainty in Physics Tutorial Dialogues. 755-758 - Wayne H. Ward, Daniel Bolaños, Ronald A. Cole:

Spoken Dialogs With a Virtual Science Tutor. 759-762 - Petr Cerva, Jan Silovský, Jindrich Zdánský, Jan Nouza, Jirí Málek:

Real-Time Lecture Transcription using ASR for Czech Hearing Impaired or Deaf Students. 763-766 - Lei Chen, Su-Youn Yoon:

Application of Structural Events Detected on ASR Outputs for Automated Speaking Assessment. 767-770 - Oscar Saz, Maxine Eskénazi:

Addressing Confusions in Spoken Language in ESL Pronunciation Tutors. 771-774 - Xiaojun Qian, Helen M. Meng, Frank K. Soong:

The Use of DBN-HMMs for Mispronunciation Detection and Diagnosis in L2 English to Support Computer-Aided Pronunciation Training. 775-778 - Catia Cucchiarini, Joost van Doremalen, Helmer Strik:

Practice and feedback in L2 speaking: an evaluation of the DISCO CALL system. 779-782 - Thomas Hueber, Atef Ben Youssef, Gérard Bailly, Pierre Badin, Frédéric Elisei:

Cross-speaker Acoustic-to-Articulatory Inversion using Phone-based Trajectory HMM for Pronunciation Training. 783-786
Prosody I
- Chiharu Tsurutani, Shunichi Ishihara:

Naturalness Judgement of Prosodic Variation of Japanese Utterances with Prosody Modified Stimuli. 647-650 - Mathieu Avanzi, Pauline Dubosson, Sandra Schwab:

Effects of Dialectal Origin on Articulation Rate in French. 651-654 - Chiao-Hua Hsieh, Chen-Yu Chiang, Yih-Ru Wang, Hsiu-Min Yu, Sin-Horng Chen:

A New Approach of Speaking Rate Modeling for Mandarin Speech Prosody. 655-658 - David Doukhan, Albert Rilliard, Sophie Rosset, Christophe d'Alessandro:

Modelling pause duration as a function of contextual length. 659-662 - Bei Wang, Chenxia Li, Qian Wu, Xiaxia Zhang, Baofeng Wang, Yi Xu:

Production and Perception of Focus in PFC and non-PFC Languages: Comparing Beijing Mandarin and Hainan Tsat. 663-666 - Xiaxia Zhang, Bei Wang, Qian Wu, Yi Xu:

Prosodic Realization of Focus in Statement and Question in Tibetan (Lhasa Dialect). 667-670 - Martti Vainio, Daniel Aalto, Antti Suni, Anja Arnhold, Tuomo Raitio, Henri Seijo, Juhani Järvikivi, Paavo Alku:

Effect of noise type and level on focus related fundamental frequency changes. 671-674 - Anal Warsi, Tulika Basu, Debasis Mazumdar:

Role of Prosody in Automatic Modality Recognition of Bangla Speech. 675-678 - Bettina Braun:

Where to associate stressed additive particles? Evidence from speech prosody. 679-682 - Matthew Benton:

From PVI to Perception: A Return to the Roots of Rhythm in Broadcast News. 683-686 - Julien Meyer, Laure Dentel, Frank Seifart:

A methodology for the study of rhythm in drummed forms of languages: application to Bora Manguaré of Amazon. 687-690
Speech Analysis
- Jouni Pohjalainen, Tuomo Raitio, Hannu Pulakka, Paavo Alku:

Automatic Detection of High Vocal Effort in Telephone Speech. 691-694 - D. Gomathi, Sathya Adithya Thati, Karthik Venkat Sridaran, Bayya Yegnanarayana:

Analysis of Mimicry Speech. 695-698 - Christian H. Kasess, Wolfgang Kreuzer, Ewald Enzinger, Nadja Kerschhofer-Puhalo:

Estimation of the vocal tract shape of nasals using a Bayesian scheme. 699-702 - Peter Birkholz, Philippe Daechert, Christiane Neuschaefer-Rube:

Advances in combined electro-optical palatography. 703-706 - Byung Suk Lee, Daniel P. W. Ellis:

Noise Robust Pitch Tracking by Subband Autocorrelation Classification. 707-710 - Alexander Sepúlveda, Rodrigo Capobianco Guido, Germán Castellanos-Domínguez:

Inference of Critical Articulator Position for Fricative Consonants. 711-714 - Markus Brückl:

Vocal Tremor Measurement Based on Autocorrelation of Contours. 715-718 - Chatchawarn Hansakunbuntheung, Ananlada Chotimongkol, Sumonmas Thatphithakkul, Patcharika Chootrakool:

Model-based Duration-difference Approach on Accent Evaluation of L2 Learner. 719-722
Dialog Systems
- Thomas Hueber, Gérard Bailly, Bruce Denby:

Continuous Articulatory-to-Acoustic Mapping using Phone-based Trajectory HMM for a Silent Speech Interface. 723-726 - Tatsuya Kawahara, Takuma Iwatate, Katsuya Takanashi:

Prediction of Turn-Taking by Combining Prosodic and Eye-Gaze Information in Poster Conversations. 727-730 - Ina Wechsung, Klaus-Peter Engelbrecht, Sebastian Möller:

Using Quality Ratings to Predict Modality Choice in Multimodal Systems. 731-734 - Fuming Fang, Takahiro Shinozaki, Yasuo Horiuchi, Shingo Kuroiwa, Sadaoki Furui, Toshimitsu Musha:

HMM Based Continuous EOG Recognition for Eye-input Speech Interface. 735-738 - Jason Lilley, Amanda Stent, Ilija Zeljkovic:

A Random, Semantically Appropriate Sentence Generator for Speaker Verification. 739-742 - Daniel Macías Galindo, Wilson Wong, Lawrence Cavedon, John Thangarajah:

Coherent Topic Transition in a Conversational Agent. 743-746 - Peter A. Heeman, Jordan Fryer, Rebecca Lunsford, Andrew Rueckert, Ethan Selfridge:

Using Reinforcement Learning for Dialogue Management Policies: Towards Understanding MDP Violations and Convergence. 747-750 - Ramón López-Cózar, Zoraida Callejas, David Griol:

Enhancing Speech Understanding in Spoken Dialogue Systems by Means of a New Frame-Correction Technique. 751-754 - Zoraida Callejas, David Griol, Klaus-Peter Engelbrecht:

Assessment of user simulators for spoken dialogue systems by means of subspace multidimensional clustering. 250-253
ASR: Bayesian Modeling
- Keith Kintzley, Aren Jansen, Hynek Hermansky:

MAP Estimation of Whole-Word Acoustic Models with Dictionary Priors. 787-790 - Samuel Thomas, Sriram Ganapathy, Aren Jansen, Hynek Hermansky:

Data-driven Posterior Features for Low Resource Speech Recognition Applications. 791-794 - Xiaodong Cui, Mohamed Afify, George Saon, Vaibhava Goel:

Sparse Bayesian Factor Analysis for Stereo-based Stochastic Mapping. 795-798 - Niklas Vanhainen, Giampiero Salvi:

Word Discovery with Beta Process Factor Analysis. 799-802 - Seong-Jun Hahm, Atsunori Ogawa, Masakiyo Fujimoto, Takaaki Hori, Atsushi Nakamura:

Speaker Adaptation Using Variational Bayesian Linear Regression in Normalized Feature Space. 803-806 - Alexander Krueger, Oliver Walter, Volker Leutnant, Reinhold Haeb-Umbach:

Bayesian Feature Enhancement for ASR of Noisy Reverberant Real-World Data. 807-810
Computer Assisted Language Learning I
- Emre Yilmaz, Dirk Van Compernolle, Hugo Van hamme

:
Robust Tracking for Automatic Reading Tutors. 811-814 - Huang Hao, Jianming Wang, Halidan Abudureyimu:

Maximum F1-Score Discriminative Training for Automatic Mispronunciation Detection in Computer-Assisted Language Learning. 815-818 - Yow-Bang Wang, Lin-Shan Lee:

Error Pattern Detection Integrating Generative and Discriminative Learning for Computer-Aided Pronunciation Training. 819-822 - Florian Hönig, Tobias Bocklet, Korbinian Riedhammer, Anton Batliner, Elmar Nöth:

The Automatic Assessment of Non-native Prosody: Combining Classical Prosodic Analysis with Acoustic Modelling. 823-826 - Theban Stanley, Kadri Hacioglu:

Improving L1-Specific Phonological Error Diagnosis in Computer Assisted Pronunciation Training. 827-830 - Jort F. Gemmeke, Janneke van de Loo, Guy De Pauw, Joris Driesen, Hugo Van hamme

, Walter Daelemans:
A Self-Learning Assistive Vocal Interface Based on Vocabulary Learning and Grammar Induction. 831-834
Conversation and Interaction II
- Gina-Anne Levow, Susan Duncan:

Contrasting Cues to Verbal and Non-Verbal Backchannels in Multi-lingual Dyadic Rapport. 835-838 - Sofia Strömbergsson

, Jens Edlund, David House:
Prosodic measurements and question types in the Spontal corpus of Swedish dialogues. 839-842 - Khiet P. Truong, Dirk Heylen:

Measuring prosodic alignment in cooperative task-based conversations. 843-846 - Kornel Laskowski, Mattias Heldner, Jens Edlund:

On the Dynamics of Overlap in Multi-Party Conversation. 847-850 - Khiet P. Truong, Jürgen Trouvain:

On the acoustics of overlapping laughter in conversational speech. 851-854 - Agustín Gravano, Julia Hirschberg:

A Corpus-Based Study of Interruptions in Spoken Dialogue. 855-858
Speech Analysis and Modeling
- George P. Kafentzis, Olivier Rosec, Yannis Stylianou:

On the Modeling of Voiceless Stop Sounds of Speech using Adaptive Quasi-Harmonic Models. 859-862 - Raymond W. M. Ng, Thomas Hain

, Keikichi Hirose:
An alignment matching method to explore pseudosyllable properties across different corpora. 863-866 - Benigno Uria, Iain Murray, Steve Renals, Korin Richmond:

Deep Architectures for Articulatory Inversion. 867-870 - Katharine Henry, Morgan Sonderegger, Joseph Keshet:

Automatic Measurement of Positive and Negative Voice Onset Time. 871-874 - Vahid Khanagha, Khalid Daoudi:

Efficient multipulse approximation of speech excitation using the most singular manifold. 875-878 - Aren Jansen, Samuel Thomas, Hynek Hermansky:

Intrinsic Spectral Analysis for Zero and High Resource Speech Recognition. 879-882
Analysis of Spoken Disorders in Health Applications - Part 1
- Maider Lehr, Emily Tucker Prud'hommeaux, Izhak Shafran, Brian Roark:

Fully Automated Neuropsychological Assessment for Detecting Mild Cognitive Impairment. 1039-1042 - Daniel Bone, Matthew P. Black, Chi-Chun Lee, Marian E. Williams, Pat Levitt, Sungbok Lee, Shrikanth S. Narayanan:

Spontaneous-Speech Acoustic-Prosodic Features of Children with Autism and the Interacting Psychologist. 1043-1046 - Constantijn Kaland, Emiel Krahmer, Marc Swerts:

Contrastive intonation in autism: The effect of speaker- and listener-perspective. 1047-1050 - Christina Hagedorn, Michael I. Proctor, Louis Goldstein, Maria Luisa Gorno-Tempini, Shrikanth S. Narayanan:

Characterizing Covert Articulation in Apraxic Speech Using real-time MRI. 1051-1054 - Alberto Abad, Anna Pompili, Ângela Costa, Isabel Trancoso:

Automatic word naming recognition for treatment and assessment of aphasia. 1055-1058 - Thomas F. Quatieri, Nicolas Malyska:

Vocal-Source Biomarkers for Depression: A Link to Psychomotor Activity. 1059-1062
Language Learning and Cross-Language Production and Perception
- Odette Scharenborg, Marijt J. Witteman, Andrea Weber:

Computational Modelling of the Recognition of Foreign-Accented Speech. 883-886 - Lya Meister, Einar Meister

:
The production and perception of Estonian quantity degrees by native and non-native speakers. 887-890 - Makiko Sadakata, Mizuki Shingai, Alex Brandmeyer, Kaoru Sekiyama:

Perception of the moraic obstruent /Q/: a cross-linguistic study. 891-894 - Tomoko Nariai, Kazuyo Tanaka, Tatsuya Kawahara:

Comparative Analysis of Intensity between Native Speakers and Japanese Speakers of English. 895-898 - Christos Koniaris, Olov Engwall, Giampiero Salvi:

Auditory and Dynamic Modeling Paradigms to Detect L2 Mispronunciations. 899-902 - Sheng Li, Lan Wang:

Cross Linguistic Comparison of Mandarin and English EMA Articulatory Data. 903-906 - Chakir Zeroual, Diamantis Gafos, Phil Hoole, John H. Esling:

Physiological and acoustic study of word initial post-lexical gemination in Moroccan Arabic. 907-910 - Michael D. Tyler, Sarah Fenwick:

Perceptual Assimilation of Arabic Voiceless Fricatives by English Monolinguals. 911-914 - Okko Räsänen:

Non-auditory cognitive capabilities in computational modeling of early language acquisition. 915-918 - Okko Räsänen, Heikki Rasilo, Unto K. Laine:

Modeling spoken language acquisition with a generic cognitive architecture for associative learning. 919-922
Enhancement and Coding
- Dongmei Wang, Philipos C. Loizou:

Pitch Estimation Based on Long Frame Harmonic Model and Short Frame Average Correlation Coefficient. 923-926 - Sebastian Möller, Marcel Wältermann, Nicolas Côté:

Diagnostic Prediction of Transmitted Speech Quality: A New Framework for Signal-based and Parametric Models. 927-930 - Tom Bäckström:

Enumerative Algebraic Coding for ACELP. 931-934 - Atanu Saha, Tetsuya Shimamura:

Speech Enhancement With Bivariate Gamma Model. 935-938 - Marek B. Trawicki, Michael T. Johnson:

Improvements of the Beta-Order Minimum Mean-Square Error (MMSE) Spectral Amplitude Estimator using Chi Priors. 939-942 - Philip Harding, Ben Milner:

Enhancing Speech by Reconstruction from Robust Acoustic Features. 943-946 - Srikanth Raj Chetupally, Thippur V. Sreenivas:

Joint Pitch-Analysis Formant-Synthesis framework for CS recovery of speech. 947-950 - Shan Liang, Wei Jiang, Wenju Liu:

A new noise-tracking algorithm for generalizing binary time-frequency (T-F) masking to ratio masking. 951-954 - Yan Tang, Martin Cooke:

Optimised spectral weightings for noise-dependent speech intelligibility enhancement. 955-958
Speech Synthesis: Adaptation
- Langzhou Chen, Mark J. F. Gales, Vincent Wan, Javier Latorre, Masami Akamine:

Exploring Rich Expressive Information from Audiobook Data Using Cluster Adaptive Training. 959-962 - Ji He, Yao Qian, Frank K. Soong, Sheng Zhao:

Turning a Monolingual Speaker into Multilingual for a Mixed-language TTS. 963-966 - Christophe Veaux, Junichi Yamagishi, Simon King:

Using HMM-based Speech Synthesis to Reconstruct the Voice of Individuals with Degenerative Speech Disorders. 967-970 - Javier Latorre, Vincent Wan, Mark J. F. Gales, Langzhou Chen, K. K. Chin, Kate M. Knill, Masami Akamine:

Speech factorization for HMM-TTS based on cluster adaptive training. 971-974 - June Sig Sung, Doo Hwa Hong, Hyun Woo Koo, Nam Soo Kim:

Factored MLLR Adaptation Algorithm for HMM-based Expressive TTS. 975-978 - Dietmar Schabus, Michael Pucher, Gregor Hofer:

Speaker-adaptive visual speech synthesis in the HMM-framework. 979-982 - Viviane de Franca Oliveira, Sayaka Shiota, Yoshihiko Nankaku, Keiichi Tokuda:

Cross-lingual Speaker Adaptation for HMM-based Speech Synthesis based on Perceptual Characteristics and Speaker Interpolation. 983-986 - Mauro Nicolao, Javier Latorre, Roger K. Moore

:
C2H: A Computational Model of H&H-based Phonetic Contrast in Synthetic Speech. 987-990 - Zhen-Hua Ling, Korin Richmond, Junichi Yamagishi:

Vowel Creation by Articulatory Control in HMM-based Parametric Speech Synthesis. 991-994 - Rasmus Dall, Christophe Veaux, Junichi Yamagishi, Simon King:

Analysis of speaker clustering strategies for HMM-based speech synthesis. 995-998
Search and Decoding
- Kuan-Yu Chen, Hao-Chin Chang, Berlin Chen, Hsin-Min Wang:

Word Relevance Modeling for Speech Recognition. 999-1002 - Frank Duckhorn, Rüdiger Hoffmann:

Using context-free grammars for embedded speech recognition with Weighted Finite-State Transducers. 1003-1006 - Richard Dufour, Géraldine Damnati, Delphine Charlet, Frédéric Béchet:

Automatic transcription error recovery for Person Name Recognition. 1007-1010 - Satoshi Kobashikawa, Takaaki Hori, Yoshikazu Yamaguchi, Taichi Asami, Hirokazu Masataki, Satoshi Takahashi:

Efficient Beam Width Control to Suppress Excessive Speech Recognition Computation Time Based on Prior Score Range Normalization. 1011-1014 - David Nolden, Ralf Schlüter, Hermann Ney:

Search Space Pruning Based on Anticipated Path Recombination in LVCSR. 1015-1018 - Ian McGraw, Alexander Gruenstein:

Estimating Word-Stability During Incremental Speech Recognition. 1019-1022 - Stefan Ziegler, Bogdan Ludusan, Guillaume Gravier:

Using broad phonetic classes to guide search in automatic speech recognition. 1023-1026 - João Miranda, João Paulo Neto, Alan W. Black:

Parallel combination of multilingual speech streams for improved ASR. 1027-1030 - Fethi Bougares, Mickael Rouvier, Yannick Estève, Georges Linarès:

Low latency combination of parallelized single-pass LVCSR systems. 1031-1034 - Jungsuk Kim, Jike Chong, Ian R. Lane:

Efficient On-The-Fly Hypothesis Rescoring in a Hybrid GPU/CPU-based Large Vocabulary Continuous Speech Recognition Engine. 1035-1038
Dynamic Decoding
- Preethi Jyothi, Eric Fosler-Lussier, Karen Livescu:

Discriminatively learning factorized finite state pronunciation models from dynamic Bayesian networks. 1063-1066 - Anoop Deoras, Ruhi Sarikaya, Gökhan Tür, Dilek Hakkani-Tür

:
Joint Decoding for Speech Recognition and Semantic Tagging. 1067-1070 - M. Ali Basha Shaik, Amr El-Desoky Mousa, Ralf Schlüter, Hermann Ney:

Investigation of Maximum Entropy Hybrid Language Models for Open Vocabulary German and Polish LVCSR. 1071-1074 - Paul R. Dixon, Chiori Hori, Hideki Kashioka:

A Specialized WFST Approach for Class Models and Dynamic Vocabulary. 1075-1078 - Josef R. Novak, Nobuaki Minematsu, Keikichi Hirose:

Dynamic Grammars with Lookahead Composition for WFST-based Speech Recognition. 1079-1082 - Todd Shore, Friedrich Faubel, Hartmut Helmke, Dietrich Klakow:

Knowledge-Based Word Lattice Rescoring in a Dynamic Context. 1083-1086
Speaker Recognition I
- Richard D. McClanahan, Phillip L. De Leon:

Mixture Component Clustering for Efficient Speaker Verification. 1087-1090 - Taufiq Hasan, John H. L. Hansen:

Front-end Channel Compensation using Mixture-dependent Feature Transformations for i-Vector Speaker Recognition. 1091-1094 - William M. Campbell, Elliot Singer:

Query-by-Example using Speaker Content Graphs. 1095-1098 - Hanwu Sun, Bin Ma:

Unsupervised NAP Training Data Design for Speaker Recognition. 1099-1102 - George R. Doddington:

The Role of Score Calibration in Speaker Recognition. 1103-1106 - Takafumi Hattori, Kei Hashimoto, Yoshihiko Nankaku, Keiichi Tokuda:

A Bayesian Approach to Speaker Recognition Based on GMMs Using Multiple Model Structures. 1107-1110
Development of Speech Production and Perception
- Ellen Marklund, Francisco Lacerda, Iris-Corinna Schwarz, Ulla Sundberg:

Similarities in fundamental frequency in infant speech segmentation models. 1111-1114 - Ulrika Marklund, Ulla Sundberg, Iris-Corinna Schwarz, Francisco Lacerda:

Phonological complexity and vocabulary size in 30-month-old Swedish children. 1115-1118 - Jeesun Kim, Chris Davis, Christine Kitamura:

Auditory-visual speech to infants and adults: signals and correlations. 1119-1122 - Dongxin Xu, Jill Gilkerson, Jeffery Richards:

Objective Child Vocal Development Measurement with Naturalistic Daylong Audio Recording. 1123-1126 - Kyoko Nagao, Mark Paullin, Vilena Livinsky, James B. Polikoff, Linda D. Vallino, Thierry G. Morlet, N. Carolyn Schanen, H. Timothy Bunnell:

Speech Production-Perception Relationships in Children with Speech Delay. 1127-1130 - Sofia Strömbergsson:

Synthetic correction of deviant speech - children's perception of phonologically modified recordings of their own speech. 1131-1134
HMM Synthesis I
- Vincent Wan, Javier Latorre, K. K. Chin, Langzhou Chen, Mark J. F. Gales, Heiga Zen, Kate M. Knill, Masami Akamine:

Combining multiple high quality corpora for improving HMM-TTS. 1135-1138 - Shinnosuke Takamichi, Tomoki Toda, Yoshinori Shiga, Hisashi Kawai, Sakriani Sakti, Satoshi Nakamura:

An Evaluation of Parameter Generation Methods with Rich Context Models in HMM-Based Speech Synthesis. 1139-1142 - Heng Lu, Simon King:

Using Bayesian Networks to find relevant context features for HMM-based speech synthesis. 1143-1146 - Xiang Yin, Zhen-Hua Ling, Ming Lei, Li-Rong Dai:

Considering Global Variance of the Log Power Spectrum Derived from Mel-Cepstrum in HMM-based Parametric Speech Synthesis. 1147-1150 - Vataya Chunwijitra, Takashi Nose, Takao Kobayashi:

A speech parameter generation algorithm using local variance for HMM-based speech synthesis. 1151-1154 - Yamato Ohtani, Masatsune Tamura, Masahiro Morita, Takehiko Kagoshima, Masami Akamine:

Histogram-based spectral equalization for HMM-based speech synthesis using mel-LSP. 1155-1158
Analysis of Spoken Disorders in Health Applications - Part 2
- Thomas Drugman, Jérôme Urbain, Nathalie Bauwens, Ricardo Chessini, Anne-Sophie Aubriot, Patrick Lebecque, Thierry Dutoit:

Audio and Contact Microphones for Cough Detection. 1303-1306 - Nancy F. Chen, Wade Shen, Joseph P. Campbell:

Analyzing and Interpreting Automatically Learned Rules Across Dialects. 1307-1310 - Andrey N. Raev, Yuri Matveev, Tatiana Goloshchapova:

The Effect of Use of Drugs on Speaker's Fundamental Frequency and Formants. 1311-1314 - Marc Swerts, Cees de Bie:

On the assessment of audiovisual cues to speaker confidence by preteens with typical development (TD) and a-typical development (AD). 1315-1318 - Theodora Chaspari, Chi-Chun Lee, Shrikanth S. Narayanan:

Interplay between verbal response latency and physiology of children with autism during ECA interactions. 1319-1322 - Myung Jong Kim, Hoirin Kim:

Combination of Multiple Speech Dimensions for Automatic Assessment of Dysarthric Speech Intelligibility. 1323-1326 - Jun Wang, Ashok Samal, Jordan R. Green, Frank Rudzicz:

Whole-Word Recognition from Articulatory Movements for Silent Speech Interfaces. 1327-1330 - Shou-Chun Yin, Richard C. Rose, Yun Tang:

Verifying Session Level Pronunciation Accuracy in a Speech Therapy Application. 1331-1334 - Daryush D. Mehta, Rebecca Woodbury Listfield, Harold A. Cheyne II, James T. Heaton, Shengran W. Feng, Matías Zanartu, Robert E. Hillman:

Duration of ambulatory monitoring needed to accurately estimate voice use. 1335-1338 - Khairun-nisa Hassanali, Yang Liu, Thamar Solorio:

Evaluating NLP Features for Automatic Prediction of Language Impairment Using Child Speech Transcripts. 1339-1342 - Géza Kiss, Jan P. H. van Santen, Emily Tucker Prud'hommeaux, Lois M. Black:

Quantitative Analysis of Pitch in Speech of Children with Neurodevelopmental Disorders. 1343-1346
Paralinguistics II
- Felix Weninger, Erik Marchi, Björn W. Schuller:

Improving Recognition of Speaker States and Traits by Cumulative Evidence: Intoxication, Sleepiness, Age and Gender. 1159-1162 - Ni Ding, Julien Epps:

Speaker Clustering in Emotion Recognition. 1163-1166 - Samuel Kim, Sree Harsha Yella, Fabio Valente:

Automatic detection of conflict escalation in spoken conversations. 1167-1170 - Uwe D. Reichel:

The entropy of intoxicated speech - lexical creativity and heavy tongues. 1171-1174 - Daniel Bone, Chi-Chun Lee, Shrikanth S. Narayanan:

A Robust Unsupervised Arousal Rating Framework using Prosody with Cross-Corpora Evaluation. 1175-1178 - Carlos Busso, Tauhidur Rahman:

Unveiling the Acoustic Properties that Describe the Valence Dimension. 1179-1182 - Fabio Valente, Samuel Kim, Petr Motlícek:

Annotation and Recognition of Personality Traits in Spoken Conversations from the AMI Meetings Corpus. 1183-1186 - Shao-Ren Lyu:

The Effects of Lexical Tones and Nasal Coda /-n/ to Sadness in Taiwan Hakka. 1187-1190
ASR: Robust Modeling
- David Imseng, John Dines, Petr Motlícek, Philip N. Garner, Hervé Bourlard:

Comparing different acoustic modeling techniques for multilingual boosting. 1191-1194 - Yongqiang Wang, Mark J. F. Gales:

Model-based approaches to adaptive training in reverberant environments. 1195-1198 - Mark J. F. Gales, Federico Flego:

Model-Based Approaches for Degraded Channel Modelling in Robust ASR. 1199-1202 - William Hartmann, Eric Fosler-Lussier:

Improved Model Selection for the ASR-Driven Binary Mask. 1203-1206 - Simon Wiesler, Ralf Schlüter, Hermann Ney:

Accelerated Batch Learning of Convex Log-linear Models for LVCSR. 1207-1210 - Janne Pylkkönen, Mikko Kurimo:

Improving Discriminative Training for Robust Acoustic Models in Large Vocabulary Continuous Speech Recognition. 1211-1214 - Scott Novotney, Ivan Bulyko, Richard M. Schwartz, Sanjeev Khudanpur, Owen Kimball:

Semi-Supervised Methods for Improving Keyword Search of Unseen Terms. 1215-1218 - Xiangang Li, Dan Su, Zaihu Pang, Xihong Wu:

Probabilistic Speaker-Class based Acoustic Modeling for Large Vocabulary Continuous Speech Recognition. 1219-1222 - Xiao Yao, Takatoshi Jitsuhiro, Chiyomi Miyajima, Norihide Kitaoka, Kazuya Takeda:

Classification of Stressed Speech Using Physical Parameters Derived from Two-Mass Model. 1223-1226 - Jun Du, Qiang Huo:

IVN-Based Joint Training Of GMM And HMMs Using An Improved VTS-Based Feature Compensation For Noisy Speech Recognition. 1227-1230
ASR: Robust Features I
- Niko Moritz, Jörn Anemüller, Birger Kollmeier:

Amplitude Modulation Filters as Feature Sets for Robust ASR: Constant Absolute or Relative Bandwidth? 1231-1234 - Cemil Demir, Ali Taylan Cemgil, Murat Saraçlar:

Effect of speech priors in single-channel speech-music separation for ASR. 1235-1238 - Arun Narayanan, DeLiang Wang:

On the Role of Binary Mask Pattern in Automatic Speech Recognition. 1239-1242 - Tatsuya Kawahara, Randy Gomez:

Dereverberation based on Wavelet Packet Filtering for Robust Automatic Speech Recognition. 1243-1246 - Trausti T. Kristjansson, Thad Hughes:

Spectral Intersections for Non-Stationary Signal Separation. 1247-1250 - Kyohei Odani, Longbiao Wang, Atsuhiko Kai:

Speech Recognition by Denoising and Dereverberation Based on Spectral Subtraction in a Real Noisy Reverberant Environment. 1251-1254 - Hilman Ferdinandus Pardede, Koichi Shinoda, Koji Iwano:

Q-Gaussian based spectral subtraction for robust speech recognition. 1255-1258 - Bernd T. Meyer, Constantin Spille, Birger Kollmeier, Nelson Morgan:

Hooking up spectro-temporal filters with auditory-inspired representations for robust automatic speech recognition. 1259-1262 - Peter Li, Xie Sun:

Feature extraction based on hearing system signal processing for robust large vocabulary speech recognition. 1263-1266 - Harish Arsikere, Gary K. F. Leung, Steven M. Lulich, Abeer Alwan:

Automatic estimation of the first two subglottal resonances in children's speech with application to speaker normalization in limited-data conditions. 1267-1270
Computer Assisted Language Learning II
- Yurie Iribe, Takurou Mori, Kouichi Katsurada, Goh Kawai, Tsuneo Nitta:

Real-time Visualization of English Pronunciation on an IPA Chart Based on Articulatory Feature Extraction. 1271-1274 - Je Hun Jeon, Su-Youn Yoon:

Acoustic Feature-based Non-scorable Response Detection for an Automated Speaking Proficiency Assessment. 1275-1278 - Jorge Wuth, Néstor Becerra Yoma, Leopoldo Benavides, Hiram Vivanco:

Pronunciation quality evaluation of sentences by combining word based scores. 1279-1282 - Peter Bell, Myroslava O. Dzikovska, Amy Isard

:
Designing a spoken language interface for a tutorial dialogue system. 1283-1286 - Long Zhang, Haifeng Li:

Automatic Pronunciation Error Detection Based on Extended Pronunciation Space Using the Unsupervised Clustering of Pronunciation Errors. 1287-1290 - Thomas Pellegrini, Ângela Costa, Isabel Trancoso:

Less errors with TTS? A dictation experiment with foreign language learners. 1291-1294 - Liang-Yu Chen, Jyh-Shing Roger Jang:

Improvement in Automatic Pronunciation Scoring using Additional Basic Scores and Learning to Rank. 1295-1298 - Jian Cheng:

Automatic Tone Assessment of Non-Native Mandarin Speakers. 1299-1302
ASR: Robust Features II
- Michael A. Carlin, Kailash Patil, Sridhar Krishna Nemala, Mounya Elhilali:

Robust phoneme recognition based on biomimetic speech contours. 1348-1351 - Kaisheng Yao, Yifan Gong, Chaojun Liu:

A Feature Space Transformation Method for Personalization using Generalized I-Vector Clustering. 1352-1355 - T. J. Tsai, Nelson Morgan:

Longer Features: They do a speech detector good. 1356-1359 - Md. Jahangir Alam, Patrick Kenny, Douglas D. O'Shaughnessy:

Robust Feature Extraction for Speech Recognition by Enhancing Auditory Spectrum. 1360-1363 - Florian Müller, Alfred Mertins:

Enhancing Vocal Tract Length Normalization with Elastic Registration for Automatic Speech Recognition. 1364-1367 - Hannes Pessentheiner, Stefan Petrik, Harald Romsdorfer:

Beamforming using uniform circular arrays for distant speech recognition in reverberant environments and double talk scenarios. 1368-1371
ASR: Rich Transcription
- Ales Prazák, Zdenek Loose, Jan Trmal, Josef V. Psutka, Josef Psutka:

Novel Approach to Live Captioning Through Re-speaking: Tailoring Speech Recognition to Re-speaker's Needs. 1372-1375 - Jáchym Kolár, Lori Lamel:

Development and Evaluation of Automatic Punctuation for French and English Speech-to-Text. 1376-1379 - Shajith Ikbal, Sachindra Joshi, Ashish Verma, Om D. Deshmukh:

Spoken Document Clustering Using Word Confusion Networks. 1380-1383 - Xuancong Wang, Hwee Tou Ng, Khe Chai Sim:

Dynamic Conditional Random Fields for Joint Sentence Boundary and Punctuation Prediction. 1384-1387 - Fabio Brugnara, Daniele Falavigna, Diego Giuliani, Roberto Gretter:

Analysis of the Characteristics of Talk-show TV Programs. 1388-1391 - Andrew Rosenberg:

Rethinking The Corpus: Moving towards Dynamic Linguistic Resources. 1392-1395
Phonetics and Phonology
- Marianna Nadeu:

Effects of stress and speech rate on vowel quality in Catalan and Spanish. 1396-1399 - Michael McAuliffe, Molly Babel:

Predictability affects vowel dispersion and dynamics in the Buckeye Corpus. 1400-1403 - Robert Allen Fox, Ewa Jacewicz:

Dialectal and generational variations in vowels in spontaneous speech. 1404-1407 - Christian DiCanio, Hosung Nam, Douglas H. Whalen, H. Timothy Bunnell, Jonathan D. Amith, Rey Castillo García:

Assessing agreement level between forced alignment models with data from endangered language documentation corpora. 130-133 - Ying Chen, Vsevolod Kapatsinski, Susan Guion-Anderson:

Acoustic Cues of Vowel Quality to Coda Nasal Perception in Southern Min. 1412-1415 - Miguel Simonet, José Ignacio Hualde, Marianna Nadeu:

Lenition of /d/ in spontaneous Spanish and Catalan. 1416-1419
HMM Synthesis II
- Tuomo Raitio, Antti Suni, Martti Vainio, Paavo Alku:

Wideband Parametric Speech Synthesis Using Warped Linear Prediction. 1420-1423 - Thomas Drugman, John Kane, Christer Gobl:

Modeling the Creaky Excitation for Parametric Speech Synthesis. 1424-1427 - Zhengqi Wen, Jianhua Tao:

Amplitude Spectrum based Excitation Model for HMM-based Speech Synthesis. 1428-1431 - Nobuyuki Nishizawa, Tsuneo Kato:

Speech synthesis using a non-maximally decimated filter bank for embedded systems. 1432-1435 - Hanna Silén, Elina Helander, Jani Nurminen, Moncef Gabbouj:

Ways to Implement Global Variance in Statistical Speech Synthesis. 1436-1439 - Yamato Ohtani, Masatsune Tamura, Masahiro Morita, Takehiko Kagoshima, Masami Akamine:

HMM-based speech synthesis using sub-band basis spectrum model. 1440-1443
Glottal Source Processing: from Analysis to Applications
- Thomas Drugman, John Kane, Christer Gobl:

Resonator-based creaky voice detection. 1592-1595 - Vinay Kumar Mittal, N. Dhananjaya, Bayya Yegnanarayana:

Effect of Tongue Tip Trilling on the Glottal Excitation Source. 1596-1599 - Gang Chen, Yen-Liang Shue, Jody Kreiman, Abeer Alwan:

Estimating the voice source in noise. 1600-1603 - Alan Pinheiro, Tuomo Raitio, Danyane Gomes, Paavo Alku:

Voice source analysis using biomechanical modeling and glottal inverse filtering. 1604-1607 - Carlo Drioli, Andrea Calanca:

Speech modeling and processing by low-dimensional dynamic glottal models. 1608-1611 - Paavo Alku, Jouni Pohjalainen, Martti Vainio, Anne-Maria Laukkanen, Brad H. Story:

Improved formant frequency estimation from high-pitched vowels by downgrading the contribution of the glottal source with weighted linear prediction. 1612-1615 - Akira Sasou:

Automatic Topology Generation of Glottal Source HMM. 1616-1619 - Jaime Lorenzo-Trueba, Roberto Barra-Chicote, Tuomo Raitio, Nicolas Obin, Paavo Alku, Junichi Yamagishi, Juan Manuel Montero:

Towards Glottal Source Controllability in Expressive Speech Synthesis. 1620-1623 - Ali Alpan, Jean Schoentgen, Francis Grenez:

Combining temporal and cepstral features for the automatic perceptual categorization of disordered connected speech. 1624-1627 - Rui Sun, Elliot Moore II:

A Preliminary Study on Cross-Databases Emotion Recognition using the Glottal Features in Speech. 1628-1631 - Ranniery Maia:

Analysis on the Importance of Short-Term Speech Parameterizations for Emotional Statistical Parametric Speech Synthesis. 1632-1635 - Christophe Mertens, Francis Grenez, Jean Schoentgen:

Analysis of vocal tremor and jitter by empirical mode decomposition of glottal cycle length time series. 1636-1639 - Harri Auvinen

, Tuomo Raitio, Samuli Siltanen, Paavo Alku:
Utilizing Markov Chain Monte Carlo (MCMC) Method for Improved Glottal Inverse Filtering. 1640-1643 - Stefan Huber, Axel Röbel, Gilles Degottex:

Glottal source shape parameter estimation using phase minimization variants. 1644-1647 - Keith W. Godin, Taufiq Hasan, John H. L. Hansen:

Glottal Waveform Analysis of Physical Task Stress Speech. 1648-1651 - Juan F. Torres, Elliot Moore:

Speaker Discrimination Ability of Glottal Waveform Features. 1652-1655
Hearing
- Okko Räsänen:

Average Spectrotemporal Structure of Continuous Speech Matches with the Frequency Resolution of Human Hearing. 1444-1447 - Ibon Saratxaga, Inma Hernáez, Michael Pucher, Eva Navas, Iñaki Sainz:

Perceptual Importance of the Phase Related Information in Speech. 1448-1451 - Andrea Grigorescu, Marek Rudnicki, Michael Isik, Werner Hemmert, Stefano Rini:

Improving the Entropy Estimate of Neuronal Firings of Modeled Cochlear Nucleus Neurons. 1452-1455 - Kyoko Nagao, Mark Paullin, James B. Polikoff, Jason Lilley, H. Timothy Bunnell:

Perception of Synthetic Speech in Adult Users of Cochlear Implants. 1456-1459 - Odette Scharenborg, Esther Janse:

Hearing Loss and the Use of Acoustic Cues in Phonetic Categorisation of Fricatives. 1460-1463 - Nao Hodoshima, Takayuki Arai, Kiyohiro Kurisu:

Intelligibility of speech spoken in noise/reverberation for older adults in reverberant environments. 1464-1467 - Andrew Hines, Naomi Harte:

Improved Speech Intelligibility with a Chimaera Hearing Aid Algorithm. 1468-1471 - Elizabeth Godoy, Yannis Stylianou:

Unsupervised Acoustic Analyses of Normal and Lombard Speech, with Spectral Envelope Transformation to Improve Intelligibility. 1472-1475 - Akiko Amano-Kusumoto, Justin M. Aronoff, Motokuni Itoh, Sigfrid D. Soli:

The effect of dichotic processing on the perception of binaural cues. 1476-1479 - Nima Mesgarani, Edward Chang:

Speech and speaker separation in human auditory cortex. 1480-1483 - Jens Edlund, Mattias Heldner, Joakim Gustafson:

On the effect of the acoustic environment on the accuracy of perception of speaker orientation from auditory cues alone. 1484-1487
Degraded Speech and Enhancement
- Sira Gonzalez, Mike Brookes:

Sibilant Speech Detection in Noise. 1488-1491 - Kit Thambiratnam, Weiwu Zhu, Frank Seide:

Voice Activity Detection Using Speech Recognizer Feedback. 1492-1495 - Dushyant Sharma, Gaston Hilkhuysen, Patrick A. Naylor, Nikolay D. Gaubitch, Mark A. Huckvale, Mike Brookes:

Descriptive Vocabulary Development for Degraded Speech. 1496-1499 - Ryo Yokoyama, Yu Nasu, Koichi Shinoda, Koji Iwano:

Overlapped Speech Detection in Meeting Using Cross-Channel Spectral Subtraction and Spectrum Similarity. 1500-1503 - Xugang Lu, Shigeki Matsuda, Chiori Hori, Hideki Kashioka:

Speech restoration based on deep learning autoencoder with layer-wised pretraining. 1504-1507 - Rupayan Chakraborty, Climent Nadeu, Taras Butko:

Detection and Positioning of Overlapped Sounds in a Room Environment. 1508-1511 - Deepak K. T., Biswajit Dev Sarma, S. R. Mahadeva Prasanna:

Foreground Speech Segmentation using Zero Frequency Filtered Signal. 1512-1515 - Patrick Reidy, Mary E. Beckman:

The Effect of Spectral Estimator on Common Spectral Measures for Sibilant Fricatives. 1516-1519
Source Separation and Computational Auditory Scene Analysis
- Emad M. Grais, Hakan Erdogan:

Gaussian Mixture Gain Priors for Regularized Nonnegative Matrix Factorization in Single-Channel Source Separation. 1520-1523 - Shivesh Ranjan, Karen L. Payton, Pejman Mowlaee:

Speaker Independent Single Channel Source Separation using Sinusoidal Features. 1524-1527 - Yuxuan Wang, DeLiang Wang:

Boosting Classification Based Speech Separation Using Temporal Dynamics. 1528-1531 - Yuxuan Wang, Kun Han, DeLiang Wang:

Acoustic Features for Classification Based Speech Separation. 1532-1535 - Emad M. Grais, Hakan Erdogan:

Hidden Markov Models as Priors for Regularized Nonnegative Matrix Factorization in Single-Channel Source Separation. 1536-1539 - Ji Ming, Ramji Srinivasan, Danny Crookes:

Unconstrained Speech Separation by Composition of Longest Segments. 1540-1543 - Yi Zhang, Yunxin Zhao:

Modulation domain blind source separation for noisy speech mixture. 1544-1547 - Pejman Mowlaee, Rahim Saeidi, Rainer Martin:

Phase estimation for signal reconstruction in single-channel source separation. 1548-1551 - Jen-Tzung Chien, Hsin-Lung Hsieh:

Bayesian Group Sparse Learning for Nonnegative Matrix Factorization. 1552-1555
Speaker Recognition II
- Michael T. Johnson, Jianglin Wang:

Residual Phase Cepstrum Coefficients with Application to Cross-lingual Speaker Verification. 1556-1559 - Chunyan Liang, Jinchao Yang, Lin Yang, Yonghong Yan:

Speaker Verification Using Neighborhood Preserving Embedding. 1560-1563 - Chunyan Liang, Xiang Zhang, Lin Yang, Yonghong Yan:

Discriminative Decision Function Based Scoring Method in Joint Factor Analysis for Speaker Verification. 1564-1567 - Taufiq Hasan, John H. L. Hansen:

Integrated Feature Normalization and Enhancement for robust Speaker Recognition using Acoustic Factor Analysis. 1568-1571 - Lukás Machlica, Zbynek Zajíc:

Factor Analysis and Nuisance Attribute Projection Revisited. 1572-1575 - Sheng Chen, Mingxing Xu:

Compensation of Intrinsic Variability with Factor Analysis Modeling for Robust Speaker Verification. 1576-1579 - Anthony Larcher, Kong-Aik Lee, Bin Ma, Haizhou Li:

RSR2015: Database for Text-Dependent Speaker Verification using Multiple Pass-Phrases. 1580-1583 - Volker Dellwo, Adrian Leemann, Marie-José Kolly:

Speaker idiosyncratic rhythmic features in the speech signal. 1584-1587 - Yun Lei, Lukás Burget, Nicolas Scheffer:

Bilinear Factor Analysis for iVector Based Speaker Verification. 1588-1591
Language Modeling: New Models and Features
- Xunying Liu, Mark J. F. Gales, Philip C. Woodland:

Paraphrastic Language Models. 1656-1659 - Ariya Rastrow, Mark Dredze, Sanjeev Khudanpur:

Efficient Structured Language Modeling for Speech Recognition. 1660-1663 - Yangyang Shi, Pascal Wiggers, Catholijn M. Jonker:

Towards Recurrent Neural Networks Language Models with Linguistic and Contextual Features. 1664-1667 - Gwénolé Lecorvé, Petr Motlícek:

Conversion of Recurrent Neural Network Language Models to Weighted Finite State Transducers for Automatic Speech Recognition. 1668-1671 - Hong-Kwang Kuo, Ebru Arisoy, Ahmad Emami, Paul Vozila:

Large Scale Hierarchical Neural Network Language Models. 1672-1675 - Brian Hutchinson, Mari Ostendorf, Maryam Fazel:

A Sparse Plus Low Rank Maximum Entropy Language Model. 1676-1679
Speaker Verification
- Ye Jiang, Kong-Aik Lee, Zhenmin Tang, Bin Ma, Anthony Larcher, Haizhou Li:

PLDA Modeling in I-Vector and Supervector Space for Speaker Verification. 1680-1683 - Konstantin Simonchik, Timur Pekhovsky, Andrey Shulipa, Anton Afanasyev:

Supervized Mixture of PLDA Models for Cross-Channel Speaker Verification. 1684-1687 - Federico Alegre, Ravichander Vipperla, Nicholas W. D. Evans:

Spoofing countermeasures for the protection of automatic speaker recognition systems against attacks with artificial signals. 1688-1691 - Themos Stafylakis, Patrick Kenny, Mohammed Senoussaoui, Pierre Dumouchel:

PLDA using Gaussian Restricted Boltzmann Machines with application to Speaker Verification. 1692-1695 - Seyed Omid Sadjadi, Taufiq Hasan, John H. L. Hansen:

Mean Hilbert Envelope Coefficients (MHEC) for Robust Speaker Recognition. 1696-1699 - Zhizheng Wu, Chng Eng Siong, Haizhou Li:

Detecting Converted Speech and Natural Speech for anti-Spoofing Attack in Speaker Recognition. 1700-1703
Speech Intelligibility in Quiet and in Noise
- Julián Villegas, Martin Cooke:

Maximising objective speech intelligibility by local f0 modulation. 1704-1707 - Catherine Mayo, Vincent Aubanel, Martin Cooke:

Effect of prosodic changes on speech intelligibility. 1708-1711 - Saya Kawase, Yue Wang:

Effects of visual speech information on native listener judgments of L2 consonant intelligibility. 1712-1715 - Guy J. Brown, Amy V. Beeston, Kalle J. Palomäki:

Perceptual compensation for the effects of reverberation on consonant identification: A comparison of human and machine performance. 1716-1719 - Michael Fitzpatrick, Jeesun Kim, Chris Davis:

The Intelligibility of Lombard Speech: Communicative setting matters. 1720-1723 - João Felipe Santos, Stefano Cosentino, Oldooz Hazrati, Philipos C. Loizou, Tiago H. Falk:

Performance Comparison of Intrusive Objective Speech Intelligibility and Quality Metrics for Cochlear Implant Users. 1724-1727
Speech Tools Demo
- Florian Metze, Eric Fosler-Lussier:

The Speech Recognition Virtual Kitchen: An Initial Prototype. 1872-1873 - Uwe D. Reichel:

PermA and Balloon: Tools for string alignment and text processing. 1874-1877 - Slim Ouni, Loic Mangeonjean, Ingmar Steiner:

VisArtico: a visualization tool for articulatory data. 1878-1881 - Przemyslaw Lenkiewicz, Dieter Van Uytvanck, Peter Wittenburg, Sebastian Drude:

Towards Automated Annotation of Audio and Video Recordings by Application of Advanced Web-services. 1882-1885 - Simone Ashby, Sílvia Barbosa, Silvia Brandão, José Pedro Ferreira, Maarten Janssen, Catarina Silva, Mário Eduardo Viaro:

A Rule Based Pronunciation Generator and Regional Accent Databank for Portuguese. 1886-1887 - Roger Chappel, Kuldip K. Paliwal:

Speech Enhancement for Android (SEA): A Speech Processing Demonstration Tool for Android Based Smart Phones and Tablets. 1888-1891 - Jacob Okamoto, Serguei V. S. Pakhomov, Elizabeth Shriberg, Andreas Stolcke:

ProTK: An Improved Prosody Toolkit. 1892-1893 - Suzanne Boyce, Harriet J. Fell, Joel MacAuslan:

SpeechMark: Landmark Detection Tool for Speech Analysis. 1894-1897 - Javier Tejedor, Fernando J. López-Colino, Jordi Porta, José Colás:

An On-Line, Cloud-Based Spanish-Spanish Sign Language Translation System. 2127-2128
Audio Analysis, Estimation and Classification
- Sourish Chaudhuri, Rita Singh, Bhiksha Raj:

Exploiting Temporal Sequence Structure for Semantic Analysis of Multimedia. 1728-1731 - Hong Liu, Xiaofei Li:

Time Delay Estimation for Speech Signal Based on FOC-Spectrum. 1732-1735 - Ziqiang Shi, Tieran Zheng, Jiqing Han, Shiwen Deng:

Low-rank Audio Signal Classification Under Soft Margin and Trace Norm Constraints. 1736-1739 - Carlos Segura, Javier Hernando:

GCC-PHAT based Head Orientation Estimation. 1740-1743 - Soham De, Indradyumna Roy, Tarunima Prabhakar, Kriti Suneja, Sourish Chaudhuri, Rita Singh, Bhiksha Raj:

Plagiarism Detection in Polyphonic Music using Monaural Signal Separation. 1744-1747 - Mariem Bouafif, Zied Lachiri:

TDOA Estimation for Multiple Speakers in Underdetermined Case. 1748-1751 - Toru Nakashika, Christophe Garcia, Tetsuya Takiguchi:

Local-feature-map Integration Using Convolutional Neural Networks for Music Genre Classification. 1752-1755 - Jeffrey Berry, Ian R. Fasel, Luciano Fadiga, Diana Archangeli:

Training Deep Nets with Imbalanced and Unlabeled Data. 1756-1759
Adaptation for ASR
- Taichi Asami, Satoshi Kobashikawa, Hirokazu Masataki, Osamu Yoshioka, Satoshi Takahashi:

Speech Data Clustering Based on Phoneme Error Trend for Unsupervised Acoustic Model Adaptation. 1760-1763 - Wooil Kim, John H. L. Hansen:

Gaussian Map based Acoustic Model Adaptation Using Untranscribed Data for Speech Recognition in Severely Adverse Environments. 1764-1767 - Danning Jiang, Dimitri Kanevsky, Vaibhava Goel, Yong Qin:

Investigating Performance of the Discriminative Methods for Long-Term Speaker Adaptation. 1768-1771 - Bo Li, Khe Chai Sim:

A Two-stage Speaker Adaptation Approach for Subspace Gaussian Mixture Model based Nonnative Speech Recognition. 1772-1775 - Heidi Christensen, Stuart P. Cunningham, Charles Fox, Phil D. Green, Thomas Hain:

A comparative study of adaptive, automatic recognition of disordered speech. 1776-1779 - Seçkin Uluskan, John H. L. Hansen:

Phoneme Class Based Adaptation for Mismatch Acoustic Modeling of Distant Noisy Speech. 1780-1783 - Zoi Roupakia, Anton Ragni, Mark J. F. Gales:

Rapid Nonlinear Speaker Adaptation for Large-Vocabulary Continuous Speech Recognition. 1784-1787 - I-Fan Chen, Chin-Hui Lee:

A Study on Using Word-Level HMMs to Improve ASR Performance over State-of-the-Art Phone-Level Acoustic Modeling for LVCSR. 1788-1791 - Michael L. Seltzer, Alex Acero:

Factored adaptation using a combination of feature-space and model-space transforms. 1792-1795
Robust Speech Recognition I
- Heyun Huang, Louis ten Bosch, Bert Cranen, Lou Boves:

Exploring Discriminative Speech Trajectory Structures. 1796-1799 - Ehsan Variani, Hynek Hermansky:

Estimating Classifier Performance in Unknown Noise. 1800-1803 - Azarakhsh Jalalvand, Fabian Triefenbach, Jean-Pierre Martens:

Continuous Digit Recognition in Noise: Reservoirs can do an excellent job! 1804-1807 - Janne Pylkkönen, Mikko Kurimo:

Optimization-Based Control for the Extended Baum-Welch Algorithm. 1808-1811 - Marc René Schädler, Birger Kollmeier:

Normalization of spectro-temporal Gabor filter bank features for improved robust automatic speech recognition systems. 1812-1815 - Feipeng Li, Sri Harish Reddy Mallidi, Hynek Hermansky:

Phone recognition in critical bands using sub-band temporal modulations. 1816-1819 - Ramya Rasipuram, Mathew Magimai-Doss:

Combining Acoustic Data Driven G2P and Letter-to-Sound Rules for Under Resource Lexicon Generation. 1820-1823 - Sriram Ganapathy, Hynek Hermansky:

Analysis of Temporal Resolution in Frequency Domain Linear Prediction. 1828-1831 - Bing Zhang, Richard M. Schwartz, Stavros Tsakalidis, Long Nguyen, Spyros Matsoukas:

White Listing and Score Normalization for Keyword Spotting of Noisy Speech. 1832-1835
Rich Transcription II
- Saeid Safavi, Maryam Najafian, Abualsoud Hanani, Martin J. Russell, Peter Jancovic, Michael J. Carey:

Speaker Recognition for Children's Speech. 1836-1839 - Germán Bordel, Mikel Peñagarikano, Luis Javier Rodríguez-Fuentes, Amparo Varona:

A simple and efficient method to align very long speech signals to acoustically imperfect transcriptions. 1840-1843 - Ryoichi Takashima, Tetsuya Takiguchi, Yasuo Ariki:

Estimation of Talker's Head Orientation Based on Discrimination of the Shape of Cross-power Spectrum Phase Coefficients. 1844-1847 - Ann Lee, James R. Glass:

Sentence Detection Using Multiple Annotations. 1848-1851 - Delphine Charlet, Géraldine Damnati:

A speaker-role based approach for detecting politicians in TV broadcast news. 1852-1855 - Guangting Mai:

Relative Importance of Temporal Envelope and Fine Structure Cues in Low- and High-Order Harmonic Regions for Mandarin Lexical-tone Recognition. 1856-1859 - Nitya Tiwari, Prem C. Pandey, Pandurangarao N. Kulkarni:

Real-time Implementation of Multi-band Frequency Compression for Listeners with Moderate Sensorineural Impairment. 1860-1863 - Taniya Mishra, Vivek Kumar Rangarajan Sridhar, Alistair Conkie:

Word Prominence Detection using Robust yet Simple Prosodic Features. 1864-1867 - Amit Srivastava, Saurabh Khanwalkar, Gretchen Markiewicz, Guruprasad Saikumar:

Online Story Segmentation of Multilingual Streaming Broadcast News. 1868-1871
Adaptation & Robust Modeling
- Yanzhang He, Eric Fosler-Lussier:

Efficient Segmental Conditional Random Fields for One-Pass Phone Recognition. 1898-1901 - Udhyakumar Nallasamy, Florian Metze, Tanja Schultz:

Enhanced Polyphone Decision Tree Adaptation for Accented Speech Recognition. 1902-1905 - Jinyu Li, Michael L. Seltzer, Yifan Gong:

Efficient VTS Adaptation Using Jacobian Approximation. 1906-1909 - Milos Cernak, David Imseng, Hervé Bourlard:

Robust triphone mapping for acoustic modeling. 1910-1913 - Weibin Zhang, Pascale Fung:

sparse banded precision matrices for low resource speech recognition. 1914-1917 - Abdul Waheed Mohammed, Marco Matassoni, Hari Krishna Maganti, Maurizio Omologo:

Semi-Blind Model Adaptation using Piece-wise Energy Decay Curve for Large Reverberant Environments. 1918-1921
Multi-Channel Speech Enhancement
- Keisuke Kinoshita, Marc Delcroix, Mehrez Souden, Tomohiro Nakatani:

Example-based speech enhancement with joint utilization of spatial, spectral & temporal cues of speech and noise. 1926-1929 - Shengkui Zhao, Douglas L. Jones:

A Fast-Converging Adaptive Frequency-Domain MVDR Beamformer for Speech Enhancement. 1930-1933 - Rita Singh, Ken'ichi Kumatani, John W. McDonough, Chen Liu:

A signal-separation-based array postfilter for distant speech recognition. 1934-1937 - Meng Yu, Frank K. Soong:

Constrained Multichannel Speech Dereverberation. 1938-1941 - Meng Yu, Ryan Ritch, Jack Xin:

A Triple-Microphone Real-Time Speech Enhancement Algorithm Based on Approximate Array Analytical Solutions. 1942-1945
Prosody II
- Ratree Wayland, Donruethai Laphasradakul, Edith Kaan, Cao Rui:

Perception of Pitch Contours among Native Tone Listeners. 1946-1948 - Yosuke Igarashi, Hanae Koiso:

Pitch range control of Japanese boundary pitch movements. 1949-1952 - Grace Kuo:

Perceived prosodic boundaries in Taiwanese and their acoustic correlates. 1953-1956 - Luying Hou, Yuan Jia, Aijun Li:

Phonetic Foreignization of Mandarin for Dubbing in Imported Western Movies. 1957-1960 - Helena Moniz, Fernando Batista, Isabel Trancoso, Ana Isabel Mata:

Prosodic contex-based analysis of disfluencies. 1961-1964 - Britta Lintfert, Bernd Möbius:

Describing the development of intonational categories using a target-oriented parametric approach. 1965-1968
Voice Activity Detection
- Tim Ng, Bing Zhang, Long Nguyen, Spyros Matsoukas, Xinhui Zhou, Nima Mesgarani, Karel Veselý, Pavel Matejka:

Developing a Speech Activity Detection System for the DARPA RATS Program. 1969-1972 - Mohamed Omar:

Speech Activity Detection for Noisy Data Using Adaptation Techniques. 1973-1976 - Ananya Misra:

Speech/Nonspeech Segmentation in Web Videos. 1977-1980 - Philip Harding, Ben Milner:

On the use of Machine Learning Methods for Speech and Voicing Classification. 1981-1984 - Samuel Thomas, Sri Harish Reddy Mallidi, Thomas Janu, Hynek Hermansky, Nima Mesgarani, Xinhui Zhou, Shihab A. Shamma, Tim Ng, Bing Zhang, Long Nguyen, Spyros Matsoukas:

Acoustic and Data-driven Features for Robust Speech Activity Detection. 1985-1988 - Shuo Wang, Wenjun Wu:

A Two-step NMF Based Algorithm for Single Channel Speech Separation. 1989-1992
Systems Demo
- Peter Bell, Myroslava O. Dzikovska, Amy Isard:

A tutorial dialogue system with unrestricted spoken input. 2113-2114 - Xie Sun, Peter Li, Manli Zhu, Qiru Zhou:

Integrating Adaptive Beam-forming and Auditory Features for Robust Large Vocabulary Speech Recognition. 2115-2116 - Hansjörg Hofmann, Ute Ehrlich, Klaus Bader, Ilona Nothelfer, André Berton:

A Natural In-Car Speech Interface to Internet Services Using Hybrid ASR. 2117-2118 - Ronald A. Cole, Daniel Bolaños, Wayne H. Ward, J. T. Carmer, Eric Borts, Edward Svirsky:

How Marni Helps English Language Learners Acquire Oral Reading Fluency. 2119-2120 - Victor S. Finomore:

Demonstration of Advanced Multi-Modal, Network-Centric Communication Management Suite. 2121-2122 - Joris Pelemans, Kris Demuynck, Patrick Wambacq:

Dutch Automatic Speech Recognition on the Web: Towards a General Purpose System. 2123-2126
Perception and Production
- Michael C. W. Yip:

Meaning inhibition and sentence processing in Chinese: Evidence from negative priming. 1993-1996 - Yusuke Ijima, Mitsuaki Isogai, Hideyuki Mizuno:

Similar Speaker Selection Technique Based on Distance Metric Learning with Perceptual Voice Quality Similarity. 1997-2000 - Molly Babel, Grant McGuire:

Gendered sound symbolism and masking effects in speech processing. 2001-2004 - Louis ten Bosch, Odette Scharenborg:

Modeling Cue Trading in Human Word Recognition. 2005-2008 - David Cheng-Huan Li, Elsi Kaiser:

Accounting for Speech Rate in Spoken Word Recognition. 2009-2012 - Iris Hanique, Mirjam Ernestus:

The processes underlying two frequent casual speech phenomena in Dutch: A production experiment. 2013-2016 - Peter Birkholz, Phil Hoole:

Intrinsic velocity differences of lip and jaw movements: preliminary results. 2017-2020 - Malte C. Viebahn, Mirjam Ernestus, James M. McQueen:

Co-occurrence of reduced word forms in natural speech. 2021-2024 - Ikuyo Yoshinaga, Jiangping Kong:

Voice Production Mechanisms of Vibrato in Noh. 2025-2028 - Juan Rafael Orozco-Arroyave, Julián D. Arias-Londoño, Jesús Francisco Vargas-Bonilla, Elmar Nöth:

Automatic detection of hypernasal speech signals using nonlinear and entropy measurements. 2029-2032 - Vincent Aubanel, Martin Cooke, Emma Foster, María Luisa García Lecumberri, Cassie Mayo:

Effects of the availability of visual information and presence of competing conversations on speech production. 2033-2036
Language and Accent Recognition
- Shuai Huang, Glen A. Coppersmith, Damianos G. Karakos:

Constrained Maximum Mutual Information Dimensionality Reduction for Language Identification. 2037-2040 - Mohamed Faouzi BenZeghiba, Jean-Luc Gauvain, Lori Lamel:

Phonotactic Language Recognition Using MLP Features. 2041-2044 - Mikel Peñagarikano, Amparo Varona, Luis Javier Rodríguez-Fuentes, Mireia Díez, Germán Bordel:

The EHU Systems for the NIST 2011 Language Recognition Evaluation. 2045-2048 - Mikel Peñagarikano, Amparo Varona, Mireia Díez, Luis Javier Rodríguez-Fuentes, Germán Bordel:

Study of Different Backends in a State-Of-the-Art Language Recognition System. 2049-2052 - Sibel Yaman, Jason W. Pelecanos, Mohamed Kamal Omar:

On the Use of Non-Linear Polynomial Kernel SVMs in Language Recognition. 2053-2056 - Bing Jiang, Yan Song, Wu Guo, Li-Rong Dai:

Exemplar-Based Sparse Representation for Language Recognition on I-Vectors. 2057-2060 - Yu-Chin Shih, Hung-Shin Lee, Hsin-Min Wang, Shyh-Kang Jeng:

Subspace-Based Feature Representation and Learning for Language Recognition. 2061-2064 - Changhuai You, Haizhou Li, Bin Ma, Kong-Aik Lee:

Effect of Relevance Factor of Maximum a posteriori Adaptation for GMM-SVM in Speaker and Language Recognition. 2065-2068 - Amparo Varona, Mikel Peñagarikano, Luis Javier Rodríguez-Fuentes, Germán Bordel, Mireia Díez:

Using Time-Synchronous Phone Co-occurrences in a SVM-Phonotactic Dialect Recognition System. 2069-2072 - Mahnoosh Mehrabani, Joseph Tepperman, Emily Nava:

Nativeness Classification with Suprasegmental Features on the Accent Group Level. 2073-2076
Voice Search and Spoken Document Retrieval
- Hung-yi Lee, Po-wei Chou, Lin-Shan Lee:

Open-Vocabulary Retrieval of Spoken Content with Shorter/Longer Queries Considering Word/Subword-based Acoustic Feature Similarity. 2077-2080 - Byungki Byun, Ilseo Kim, Sabato Marco Siniscalchi, Chin-Hui Lee:

Consumer-level multimedia event detection through unsupervised audio signal modeling. 2081-2084 - Qin Jin, Peter Franz Schulam, Shourabh Rawat, Susanne Burger, Duo Ding, Florian Metze:

Event-based Video Retrieval Using Audio. 2085-2088 - Xiaodan Zhuang, Stavros Tsakalidis, Shuang Wu, Pradeep Natarajan, Rohit Prasad, Prem Natarajan:

Compact Audio Representation for Event Detection in Consumer Media. 2089-2092 - Chao Liu, Dong Wang, Javier Tejedor

:
N-gram FST Indexing for Spoken Term Detection. 2093-2096 - Haruka Majima, Rafael Torres, Yoko Fujita, Hiromichi Kawanami, Tomoko Matsui, Hiroshi Saruwatari, Kiyohiro Shikano:

Spoken Inquiry Discrimination Using Bag-of-Words for Speech-Oriented Guidance System. 2097-2100 - Stavros Tsakalidis, Xiaodan Zhuang, Roger Hsiao, Shuang Wu, Pradeep Natarajan, Rohit Prasad, Prem Natarajan:

Robust Event Detection From Spoken Content In Consumer Domain Videos. 2101-2104 - Stephanie Pancoast, Murat Akbacak:

Bag-of-Audio-Words Approach for Multimedia Event Classification. 2105-2108 - Ken-ichi Iso, Edward Whittaker, Tadashi Emori, Junpei Miyake:

Improvements in Japanese Voice Search. 2109-2112
Sparse, Template-Based Representations
- Tara N. Sainath, David Nahamoo, Dimitri Kanevsky, Bhuvana Ramabhadran:

Enhancing Exemplar-Based Posteriors for Speech Recognition Tasks. 2130-2133 - Jort F. Gemmeke, Hugo Van hamme

:
Advances in noise robust digit recognition using hybrid exemplar-based techniques. 2134-2137 - Antti Hurmalainen, Rahim Saeidi, Tuomas Virtanen:

Group Sparsity for Speaker Identity Discrimination in Factorisation-based Speech Recognition. 2138-2141 - Yang Sun, Bert Cranen, Jort F. Gemmeke, Louis ten Bosch, Lou Boves, Mathew M. Doss:

Using Sparse Classification Outputs as Feature Observations for Noise-robust ASR. 2142-2145 - Serena Soldo, Mathew Magimai-Doss, Hervé Bourlard:

Synthetic References for Template-based ASR using posterior features. 2146-2149 - Dong Wang, Javier Tejedor

:
Heterogeneous Convolutive Non-Negative Sparse Coding. 2150-2153
Speaker Diarization
- Jürgen T. Geiger, Ravichander Vipperla, Simon Bozonnet, Nicholas W. D. Evans, Björn W. Schuller, Gerhard Rigoll:

Convolutive Non-Negative Sparse Coding and New Features for Speech Overlap Handling in Speaker Diarization. 2154-2157 - Beatriz Martínez-González, José Manuel Pardo, Julián D. Echeverry-Correa, José A. Vallejo-Pinto, Roberto Barra-Chicote:

Selection of TDOA Parameters for MDM Speaker Diarization. 2158-2161 - Orith Toledo-Ronen, Hagai Aronowitz:

Confidence for Speaker Diarization using PCA Spectral Ratio. 2162-2165 - Naohiro Tawara, Tetsuji Ogawa, Shinji Watanabe, Atsushi Nakamura, Tetsunori Kobayashi:

Fully Bayesian speaker clustering based on hierarchically structured utterance-oriented Dirichlet process mixture model. 2166-2169 - Deepu Vijayasenan, Fabio Valente:

DiarTk : An Open Source Toolkit for Research in Multistream Speaker Diarization and its Application to Meetings Recordings. 2170-2173 - Grégor Dupuy, Mickael Rouvier, Sylvain Meignier, Yannick Estève:

I-vectors and ILP clustering adapted to cross-show speaker diarization. 2174-2177
Speech Production: Imaging and Models
- Assaf Israel, Michael I. Proctor, Louis Goldstein, Khalil Iskarous, Shrikanth S. Narayanan:

Emphatic segments and emphasis spread in Lebanese Arabic: a Real-time Magnetic Resonance Imaging Study. 2178-2181 - Ryan Shosted, Bradley P. Sutton, Abbas Benmamoun:

Using magnetic resonance to image the pharynx during Arabic speech: Static and dynamic aspects. 2182-2185 - Julián Andrés Valdés Vargas, Pierre Badin, Laurent Lamalle:

Articulatory speaker normalisation based on MRI-data using three-way linear decomposition methods. 2186-2189 - Takayuki Arai:

Vowels Produced by Sliding Three-tube Model with Different Lengths. 2190-2193 - Tokihiko Kaburagi, Tetsuro Takano, Yuki Sakamoto:

Estimating the Vocal-Tract Area Function From Formants Using a Sensitivity Function and Least Square. 2194-2197 - Jorge C. Lucero, Laura L. Koenig

, Susanne Fuchs:
Modeling source-tract interaction in speech production: Voicing onset vs. vowel height after a voiceless obstruent. 2198-2201
Speech Synthesis
- Bajibabu Bollepalli, Alan W. Black, Kishore Prahallad:

Modelling a Noisy-channel for Voice Conversion Using Articulatory Features. 2202-2205 - Anna C. Janska, Erich Schröger, Thomas Jacobsen, Robert A. J. Clark:

Asymmetries in the perception of synthesized speech. 2206-2209 - Erica Greene, Taniya Mishra, Patrick Haffner, Alistair Conkie:

Predicting Character-Appropriate Voices for a TTS-based Storyteller System. 2210-2213 - Alexander Sorin, Slava Shechtman, Vincent Pollet:

Psychoacoustic Segment Scoring for Multi-Form Speech Synthesis. 2214-2217 - Gérard Bailly, Cécilia Gouvernayre:

Pauses and respiratory markers of the structure of book reading. 2218-2221 - Blaise Potard, Matthew P. Aylett, Christopher J. Pidcock:

Proper Name Splicing in Computer Games with TTS. 2222-2225
Prosodic Prominence: Annotation, Prediction, Applications
- David Escudero Mancebo, Eva Estebas-Vilaplana:

Visualizing tool for evaluating inter-label similarity in prosodic labeling experiments. 2382-2385 - Petra Wagner, Fabio Tamburini, Andreas Windmann:

Objective, Subjective and Linguistic Roads to Perceptual Prominence - How are they compared and why? 2386-2389 - Martin Heckmann:

Audio-visual Evaluation and Detection of Word Prominence in a Human-Machine Interaction Scenario. 2390-2393 - Denis Arnold, Petra Wagner, Bernd Möbius:

Obtaining prominence judgments from naïve listeners - Influence of rating scales, linguistic levels and normalisation. 2394-2397 - Leonardo Badino, Robert A. J. Clark:

Towards Hierarchical Prosodic Prominence Generation in TTS Synthesis. 2398-2401 - Francesco Cutugno, Enrico Leone, Bogdan Ludusan, Antonio Origlia:

Investigating syllabic prominence with Conditional Random Fields and Latent-Dynamic Conditional Random Fields. 2402-2405 - Barbara Samlowski, Petra Wagner, Bernd Möbius:

Disentangling lexical, morphological, syntactic and semantic influences on German prominence - Evidence from a production study. 2406-2409 - Andrew Rosenberg:

Using Prominence and Phrasing Predictions to Improve Weighted Dictionary Pronunciation Models. 2410-2413 - Jean-Philippe Goldman, Mathieu Avanzi, Anne-Catherine Simon, Antoine Auchlin:

A Continuous Prominence Score Based On Acoustic Features. 2414-2417 - Christopher Sappok, Denis Arnold:

More on the Normalization of Syllable Prominence Ratings. 2418-2421 - Tim Mahrt, Jennifer Cole, Margaret M. Fleck, Mark Hasegawa-Johnson:

F0 and the Perception of Prominence. 2422-2425 - Bistra Andreeva, William J. Barry, Magdalena Wolska:

Language differences in the perceptual weight of prominence-lending properties. 2426-2429
Paralinguistics III
- Jun Deng, Björn W. Schuller:

Confidence Measures in Speech Emotion Recognition Based on Semi-supervised Learning. 2226-2229 - Rui Xia, Yang Liu:

Using i-Vector Space Model for Emotion Recognition. 2230-2233 - Nicolas Obin:

Cries and Whispers - Classification of Vocal Effort in Expressive Speech. 2234-2237 - Pouria Fewzee, Fakhri Karray:

Emotional Speech: A Spectral Analysis. 2238-2241 - Andrew Rosenberg:

Classifying Skewed Data: Importance Weighting to Optimize Average Recall. 2242-2245 - Catharine Oertel, Marcin Wlodarczak, Jens Edlund, Petra Wagner, Joakim Gustafson:

Gaze Patterns in Turn-Taking. 2246-2249 - Natalie Fecher:

The 'Audio-Visual Face Cover Corpus': Investigations into audio-visual speech and speaker recognition when the speaker's face is occluded by facewear. 2250-2253 - Dogan Can, Panayiotis G. Georgiou, David C. Atkins, Shrikanth S. Narayanan:

A Case Study: Detecting Counselor Reflections in Psychotherapy for Addictions using Linguistic Features. 2254-2257
Speech and Speaker Segmentation
- Mahnoosh Mehrabani, John H. L. Hansen:

Speaker Clustering for a Mixture of Singing and Reading. 2258-2261 - Sayan Ghosh, T. V. Sreenivas:

Automatic Speech Segmentation Using Probabilistic Latent Component Modeling. 2262-2265 - Jonathan William Dennis, Tran Huy Dat, Engsiong Chng:

Overlapping Sound Event Recognition using Local Spectrogram Features with the Generalised Hough Transform. 2266-2269 - Ozlem Kalinli:

Automatic Phoneme Segmentation Using Auditory Attention Features. 2270-2273 - Jia Min Karen Kua, Tharmarajah Thiruvaran

, Eliathamby Ambikairajah:
A Non-Uniform Filterbank for Speaker Recognition. 2274-2277 - Jaime Lorenzo-Trueba, Beatriz Martínez-González, Roberto Barra-Chicote, Verónica López-Ludeña, Javier Ferreiros, Junichi Yamagishi, Juan Manuel Montero:

Towards an Unsupervised Speaking Style Voice Building Framework: Multi-Style Speaker Diarization. 2278-2281 - Seyed Hamidreza Mohammadi, Hossein Sameti, Mahsa Sadat Elyasi Langarani, Amirhossein Tavanaei:

KNNDIST: A Non-Parametric Distance Measure for Speaker Segmentation. 2282-2285 - Wei Feng, Xuecheng Nie, Liang Wan, Lei Xie, Jianmin Jiang:

Lexical Story Co-Segmentation of Chinese Broadcast News. 2286-2289 - Montri Karnjanadecha, Stephen A. Zahorian:

Toward an Optimum Feature Set and HMM Model Parameters for Automatic Phonetic Alignment of Spontaneous Speech. 2290-2293
Spoken Language Understanding
- Tim Schlippe, Sebastian Ochs, Ngoc Thang Vu, Tanja Schultz:

Automatic Error Recovery for Pronunciation Dictionaries. 2298-2301 - Grégory Senay, Georges Linarès:

Confidence measure for speech indexing based on Latent Dirichlet Allocation. 2302-2305 - Christophe Cerisara, Alejandra Lorenzo:

Mixed probabilistic and deterministic dependency parsing. 2306-2309 - Shoko Yamahata, Yoshikazu Yamaguchi, Atsunori Ogawa, Hirokazu Masataki, Osamu Yoshioka, Satoshi Takahashi:

Automatic Vocabulary Adaptation Based on Semantic Similarity and Speech Recognition Confidence Measure. 2310-2313 - Nigel G. Ward, Alejandro Vega:

Towards Empirical Dialog-State Modeling and its Use in Language Modeling. 2314-2317 - Keigo Kubo, Hiromichi Kawanami, Hiroshi Saruwatari, Kiyohiro Shikano:

Evaluation of Many-to-Many Alignment Algorithm by Automatic Pronunciation Annotation Using Web Text Mining. 2318-2321 - Sokol Koço, Cécile Capponi, Frédéric Béchet:

Applying multiview learning algorithms to human-human conversation classification. 2322-2325 - Yuya Akita, Makoto Watanabe, Tatsuya Kawahara:

Automatic Transcription of Lecture Speech using Language Model Based on Speaking-Style Transformation of Proceeding Texts. 2326-2329 - Chen Li, Yang Liu:

Normalization of Text Messages Using Character- and Phone-based Machine Translation Approaches. 2330-2333 - Aisha S. Azim, Xiaoxuan Wang, Khe Chai Sim:

A Weighted Combination of Speech with Text-based Models for Arabic Diacritization. 2334-2337 - Matthew Stephen Seigel, Philip C. Woodland:

Using Sub-word-level Information for Confidence Estimation with Conditional Random Field Models. 2338-2341
Spoken Language Applications
- Hung-yi Lee, Yu-Yu Chou, Yow-Bang Wang, Lin-Shan Lee:

Supervised Spoken Document Summarization jointly Considering Utterance Importance and Redundancy by Structured Support Vector Machine. 2342-2345 - Yun-Nung Chen, Florian Metze:

Integrating Intra-Speaker Topic Modeling and Temporal-Based Inter-Speaker Topic Modeling in Random Walk for Improved Multi-Party Meeting Summarization. 2346-2349 - Junlan Feng, Bernard Renger:

Language Modeling for Voice-Enabled Social TV Using Tweets. 2350-2353 - Rohit Kumar, Rohit Prasad, Sankaranarayanan Ananthakrishnan, Aravind Namandi Vembu, David Stallard, Stavros Tsakalidis, Prem Natarajan:

Detecting OOV Named-Entities in Conversational Speech. 2354-2357 - Sameer Maskey, Bowen Zhou:

Unsupervised Deep Belief Features for Speech Translation. 2358-2361 - Alicia Pérez, José M. Alcaide, M. Inés Torres:

EuskoParl: a speech and text Spanish-Basque parallel corpus. 2362-2365 - Hyuksu Ryu, Sunhee Kim, Minhwa Chung:

Comparing transcription agreement on non-native English speech corpus between native and non-native annotators. 2366-2369 - Jun Ogata, Masataka Goto

:
PodCastle: Collaborative Training of Language Models on the Basis of Wisdom of Crowds. 2370-2373 - Lei Xie, Yinqing Xu, Lilei Zheng, Qiang Huang, Bingfeng Li:

Speech Pattern Discovery using Audio-Visual Fusion and Canonical Correlation Analysis. 2374-2377 - Sameer Maskey, Andrew Rosenberg:

Power Mean Pyramid Scores for Summarization Evaluation. 2378-2381
Spoken Term and Unseen Word Detection
- Haiyang Li, Jiqing Han, Tieran Zheng, Guibin Zheng:

A Novel Confidence Measure Based on Context Consistency for Spoken Term Detection. 2430-2433 - Panagiota Karanasou, Lukás Burget, Dimitra Vergyri, Murat Akbacak, Arindam Mandal:

Discriminatively trained phoneme confusion model for keyword spotting. 2434-2437 - Keith Kintzley, Aren Jansen, Kenneth Church, Hynek Hermansky:

Inverting the Point Process Model for Fast Phonetic Keyword Search. 2438-2441 - Atta Norouzian, Aren Jansen, Richard C. Rose, Samuel Thomas:

Exploiting Discriminative Point Process Models for Spoken Term Detection. 2442-2445 - Ivan Bulyko, Jose Herrero, Chris Mihelich, Owen Kimball:

Subword speech recognition for detection of unseen words. 2446-2449 - Long Qin, Alexander I. Rudnicky:

OOV Word Detection using Hybrid Models with Mixed Types of Fragments. 2450-2453
Voice Search and Spoken Document Retrieval II
- Jingjing Liu, Scott Cyphers, Panupong Pasupat, Ian McGraw, James R. Glass:

A Conversational Movie Search System Based on Conditional Random Fields. 2454-2457 - Tsung-Hsien Wen, Hung-yi Lee, Lin-Shan Lee:

Interactive Spoken Content Retrieval with Different Types of Actions Optimized By a Markov Decision Process. 2458-2461 - Cyril Allauzen, Edward Benson, Ciprian Chelba, Michael Riley, Johan Schalkwyk:

Voice Query Refinement. 2462-2465 - Aren Jansen, Benjamin Van Durme:

Indexing Raw Acoustic Features for Scalable Zero Resource Search. 2466-2469 - Julien Fayolle, Murat Saraclar, Fabienne Moreau, Christian Raymond, Guillaume Gravier:

Lexical-phonetic automata for spoken utterance indexing and retrieval. 2470-2473 - Ian McGraw, Scott Cyphers, Panupong Pasupat, Jingjing Liu, James R. Glass:

Automating Crowd-supervised Learning for Spoken Language Systems. 2474-2477
Speech and Age Differences
- Soroush Vosoughi, Deb Roy:

An Automatic Child-Directed Speech Detector for the Study of Child Language Development. 2478-2481 - Andrew R. Plummer:

Aligning manifolds to model the earliest phonological abstraction in infant-caretaker vocal imitation. 2482-2485 - Yoko Saikachi, Mafuyu Kitahara, Ken'ya Nishikawa, Ai Kanato, Reiko Mazuka:

The F0 fall delay of lexical pitch accent in Japanese Infant-directed speech. 2486-2489 - Irina Shport:

Children's Productions of Multi-Syllabic Lexical Stress Patterns in Different Prosodic Positions. 2490-2493 - Melissa A. Redford, Laura Dilley, Jessica Gamache, Elizabeth Wieland:

Prosodic Marking of Continuation versus Completion in Children's Narratives. 2494-2497 - Daniel Fogerty, Diane Kewley-Port, Larry E. Humes:

Judging temporal onset differences for concurrent vowels: Results for young, middle-aged, and older adults. 2498-2501
Acoustic Classification
- Pengfei Hu, Wenju Liu, Wei Jiang:

Combining frame and segment based models for environmental sound classification. 2502-2505 - Yi Ren Leng, Tran Huy Dat:

Using Blob Detection in Missing Feature Linear-Frequency Cepstral Coefficients for Robust Sound Event Recognition. 2506-2509 - Kailash Patil, Mounya Elhilali:

Goal-Oriented Auditory Scene Recognition. 2510-2513 - Ali Ziaei, Abhijeet Sangwan, John H. L. Hansen:

Prof-Life-Log: Audio Environment Detection for Naturalistic Audio Streams. 2514-2517 - Po-Sen Huang, Jianchao Yang, Mark Hasegawa-Johnson, Feng Liang, Thomas S. Huang:

Pooling Robust Shift-Invariant Sparse Representations of Acoustic Signals. 2518-2521 - Lee Ngee Tan, Kantapon Kaewtip, Martin L. Cody, Charles E. Taylor, Abeer Alwan:

Evaluation of a Sparse Representation-Based Classifier For Bird Phrase Classification Under Limited Data Conditions. 2522-2525
New Trends in Vowel Nasalization: The Articulation of Nasal Vowels
- Georgia Zellou:

Nasality from Moroccan Arabic Nasal and Pharyngeal Consonants: Patterns of Airflow and Nasalance. 2678-2681 - Véronique Delvaux, Kathy Huet, Myriam Piccaluga, Bernard Harmegnies:

Inter-gestural timing in French nasal vowels: A comparative study of (Liège, Tournai) Northern French vs. (Marseille, Toulouse) Southern French. 2682-2685 - Georgia Zellou, Rebecca Scarborough:

Nasal Coarticulation and Contrastive Stress. 2686-2689 - Catarina Oliveira

, Paula Martins, Samuel S. Silva, António J. S. Teixeira:
An MRI study of the oral articulation of European Portuguese nasal vowels. 2690-2693 - Rebecca Scarborough, Georgia Zellou:

Acoustic and Perceptual Similarity in Coarticulatorily Nasalized Vowels. 2694-2697 - Panying Rong, Ryan Shosted, David Kuehn:

Articulatory differences between oral and nasal vowels based on the simulation of a speaker-adaptive articulatory model. 2698-2701
Speech Synthesis: Selected Topics
- Josef R. Novak, Nobuaki Minematsu, Keikichi Hirose, Chiori Hori, Hideki Kashioka, Paul R. Dixon:

Improving WFST-based G2P Conversion with Alignment Constraints and RNNLM N-best Rescoring. 2526-2529 - Jian Luan:

Expand CRF to Model Long Distance Dependencies in Prosodic Break Prediction. 2530-2533 - Nanette Veilleux, Jonathan Barnes, Alejna Brugos, Stefanie Shattuck-Hufnagel:

Perceptual Foundations for Naturalistic Variability in the Prosody of Synthetic Speech. 2534-2537 - Stefan Hahn, Paul Vozila, Maximilian Bisani:

Comparison of Grapheme-to-Phoneme Methods on Large Pronunciation Dictionaries and LVCSR Tasks. 2538-2541 - Frédéric Berthommier, Laurent Girin, Louis-Jean Boë:

A Simple Hybrid Acoustic / Morphologically-Constrained Technique for the Synthesis of Stop Consonants in Various Vocalic Contexts. 2542-2545 - Kishore Prahallad, Naresh Kumar Elluru, Venkatesh Keri, Rajendran S, Alan W. Black:

The IIIT-H Indic Speech Databases. 2546-2549 - Rubén San Segundo

, Juan Manuel Montero, Verónica López-Ludeña, Simon King:
Detecting Acronyms from Capital Letter Sequences in Spanish. 2550-2553 - Patrick Lehnen, Stefan Hahn, Vlad-Andrei Guta, Hermann Ney:

Hidden Conditional Random Fields with M-to-N Alignments for Grapheme-to-Phoneme Conversion. 2554-2557 - Andrew Rosenberg, Raul Fernandez, Bhuvana Ramabhadran:

Phrase Boundary Assignment from Text in Multiple Domains. 2558-2561 - Nobuaki Minematsu, Shumpei Kobayashi, Shinya Shimizu, Keikichi Hirose:

Improved Prediction of Japanese Word Accent Sandhi Using CRF. 2562-2565 - Asterios Toutios, Shinji Maeda:

Articulatory VCV Synthesis from EMA Data. 2566-2569
ASR: Deep Neural Networks II
- Oriol Vinyals, Li Deng:

Are Sparse Representations Rich Enough for Acoustic Modeling? 2570-2573 - Yeming Xiao, Zhen Zhang, Shang Cai, Jielin Pan, Yonghong Yan:

A Initial Attempt on Task-Specific Adaptation for Deep Neural Network-based Large Vocabulary Continuous Speech Recognition. 2574-2577 - Navdeep Jaitly, Patrick Nguyen, Andrew W. Senior, Vincent Vanhoucke:

Application of Pretrained Deep Neural Networks to Large Vocabulary Speech Recognition. 2578-2581 - Yanmin Qian, Jia Liu:

Cross-Lingual and Ensemble MLPs Strategies for Low-Resource Speech Recognition. 2582-2585 - Ngoc Thang Vu, Wojtek Breiter, Florian Metze, Tanja Schultz:

Initialization Schemes for Multilayer Perceptron Training and their Impact on ASR Performance using Multilingual Data. 2586-2589 - Sabato Marco Siniscalchi, Jinyu Li, Chin-Hui Lee:

Hermitian based Hidden Activation Functions for Adaptation of Hybrid HMM/ANN Models. 2590-2593 - Yotaro Kubo, Takaaki Hori, Atsushi Nakamura:

Integrating Deep Neural Networks into Structural Classification Approach based on Weighted Finite-State Transducers. 2594-2597 - Li Deng, Brian Hutchinson, Dong Yu:

Parallel Training for Deep Stacking Networks. 2598-2601 - Yanmin Qian, Jia Liu:

Articulatory Feature based Multilingual MLPs for Low-Resource Speech Recognition. 2602-2605 - Ramón Fernandez Astudillo, Alberto Abad, João Paulo Neto:

Uncertainty driven Compensation of Multi-Stream MLP Acoustic Models for Robust ASR. 2606-2609
Robust Speech Recognition II
- Frank Diehl, Philip C. Woodland:

Complementary Phone Error Training. 2610-2613 - Markus Nußbaum-Thom, Zoltán Tüske, Georg Heigold, Ralf Schlüter, Hermann Ney:

Posterior-Scaled MPE: Novel Discriminative Training Criteria. 2614-2617 - Pei Ding, Liqiang He:

Improve the Implementation of Pitch Features for Mandarin Digit String Recognition Task. 2618-2621 - Hsin-Ju Hsieh, Jeih-Weih Hung, Berlin Chen:

Exploring Joint Equalization of Spatial-Temporal Contextual Statistics of Speech Features for Robust Speech Recognition. 2622-2625 - Shigeki Matsuda, Naoya Ito, Kosuke Tsujino, Hideki Kashioka, Shigeki Sagayama:

Speaker-Dependent Voice Activity Detection Robust to Background Speech Noise. 2626-2629 - José A. González, Antonio M. Peinado, Angel M. Gomez, Ning Ma:

Log-spectral feature reconstruction based on an occlusion model for noise robust speech recognition. 2630-2633 - Ahmed Hussen Abdelaziz, Dorothea Kolossa:

Decoding of Uncertain Features Using the Posterior Distribution of the Clean Data for Robust Speech Recognition. 2634-2637 - Ning Ma, Jon Barker:

Coupling identification and reconstruction of missing features for noise-robust automatic speech recognition. 2638-2641 - Bogdan Ludusan, Stefan Ziegler, Guillaume Gravier:

Integrating Stress Information in Large Vocabulary Continuous Speech Recognition. 2642-2645 - Jen-Tzung Chien, Cheng-Chun Chiang:

Group Sparse Hidden Markov Models for Speech Recognition. 2646-2649
Speaker Recognition III
- Johann Poignant, Hervé Bredin, Viet Bac Le, Laurent Besacier, Claude Barras, Georges Quénot:

Unsupervised Speaker Identification using Overlaid Texts in TV Broadcast. 2650-2653 - Yali Zhao, Lei Xie, Zhonghua Fu:

Mask Estimation and Refinement for MFT-based Robust Speaker Verification. 2654-2657 - Hai Yang, Chunyan Liang, Yunfei Xu, Lin Yang, Yonghong Yan:

Sparse Probabilistic Linear Discriminant Analysis for Speaker Verification. 2658-2661 - Achintya Kumar Sarkar, Driss Matrouf, Pierre-Michel Bousquet, Jean-François Bonastre:

Study of the Effect of I-vector Modeling on Short and Mismatch Utterance Duration for Speaker Verification. 2662-2665 - Chien-Lin Huang, Chiori Hori, Hideki Kashioka, Bin Ma:

Ensemble Classifiers Using Unsupervised Data Selection for Speaker Recognition. 2666-2669 - Songgun Hyon, Hongcui Wang, Chen Zhao, Jianguo Wei

, Jianwu Dang:
A method of speaker identification based on phoneme mean F-ratio contribution. 2670-2673 - Jeremiah Remus, Jenniffer Estrada, Stephanie A. C. Schuckers:

Mitigating Effects of Recording Condition Mismatch in Speaker Recognition Using Partial Least Squares. 2674-2677

manage site settings
To protect your privacy, all features that rely on external API calls from your browser are turned off by default. You need to opt-in for them to become active. All settings here will be stored as cookies with your web browser. For more information see our F.A.Q.


Google
Google Scholar
Semantic Scholar
Internet Archive Scholar
CiteSeerX
ORCID














