Audio and Speech Processing

Authors and titles for recent submissions

See today's new changes

Total of 59 entries : 1-50 51-59

Showing up to 50 entries per page: fewer | more | all

[1] arXiv:2607.01161 [pdf, html, other]: Title: Disentangling Speaker and Language Effects in Cross-Lingual Speaker Verification for Iberian Languages

Pol Buitrago, Javier Hernando

Comments: 5 pages, 8 figures, Submitted to IberSPEECH 2026

Subjects: Audio and Speech Processing (eess.AS); Computation and Language (cs.CL)
[2] arXiv:2607.00899 [pdf, html, other]: Title: Positive-Incentive Noise Predictor for Adversarial Purification in Speaker Verification

Yibo Bai, Sizhou Chen, Michele Panariello, Hao Ma, Xiao-Lei Zhang, Xuelong Li, Massimiliano Todisco, Nicholas Evan

Comments: Submitted to IEEE TASLP.13 pages for maunscript, 2 pages for supplementary material

Subjects: Audio and Speech Processing (eess.AS)
[3] arXiv:2607.00548 [pdf, html, other]: Title: AmbiDrop: Ambisonics-Based Array-Agnostic Neural Speech Enhancement

Michael Tatarjitzky, Vladimir Tourbabin, Boaz Rafaely

Comments: Submitted to IEEE Transactions on Audio, Speech, and Language Processing

Subjects: Audio and Speech Processing (eess.AS)
[4] arXiv:2607.00387 [pdf, html, other]: Title: From Objectives to Applications: Aligning Architectural Biases in Audio Self-Supervised Learning

Kele Xu, Yulu Fang, Boda Zhou, Yulin Sun, Qisheng Xu, Qiya Song, Jin Zhang, Cheng Yang, Huaimin Wang

Subjects: Audio and Speech Processing (eess.AS)
[5] arXiv:2607.00260 [pdf, html, other]: Title: Do Multimodal Large Language Models Need Reasoning to Classify Dementia from Speech?

Liming Wang, Neguine Rezaii, Bradford C. Dickerson, James Glass

Subjects: Audio and Speech Processing (eess.AS)
[6] arXiv:2607.00418 (cross-list from cs.CL) [pdf, html, other]: Title: Speech Playground: An Interactive Tool for Speech Analysis and Comparison

Stephen McIntosh, Daisuke Saito, Nobuaki Minematsu

Comments: Accepted to Interspeech 2026 (Show and Tell); 2 pages, 3 figures

Subjects: Computation and Language (cs.CL); Sound (cs.SD); Audio and Speech Processing (eess.AS)

[7] arXiv:2606.31730 [pdf, html, other]: Title: A Fair and Transparent Framework for Speech-Based Depression Detection: Balancing Interpretability and Performance

Mariel Estevez, Alfonso Ortega, Antonio Miguel, Eduardo Lleida

Comments: 7 pages, 2 figures, 3 tables. This work has been submitted to the IEEE for possible publication

Subjects: Audio and Speech Processing (eess.AS)
[8] arXiv:2606.31729 [pdf, html, other]: Title: Is Natural Always Appropriate? Investigating Naturalness and Appropriateness Across Different Domains for TTS Evaluation

Dominika Woszczyk, Andreas Triantafyllopoulos, Jura Miniota, Éva Székely, Bjoern Schuller

Comments: Accepted at Interspeech 26'

Subjects: Audio and Speech Processing (eess.AS); Machine Learning (cs.LG)
[9] arXiv:2606.31552 [pdf, html, other]: Title: Improving multichannel speech enhancement through accurate room-acoustic simulations

Georg Götz, Alessia Milo, Steinar Guðjónsson, Daniel Gert Nielsen, Jesper Pedersen, Finnur Pind

Comments: Accepted for publication at Interspeech

Subjects: Audio and Speech Processing (eess.AS); Artificial Intelligence (cs.AI); Machine Learning (cs.LG); Sound (cs.SD)
[10] arXiv:2606.31527 [pdf, html, other]: Title: How Bilingual Are SSL Speech Models? Cross-Lingual Probing of Articulatory Encoding with Finnish and Russian EMA

Ailín Pollio San Pedro, Tomi Kinnunen, Alexandre Nikolaev, Ruchi Pandey

Comments: Interspeech 2026

Subjects: Audio and Speech Processing (eess.AS); Sound (cs.SD)
[11] arXiv:2606.31365 [pdf, html, other]: Title: Beyond Cross-Reconstruction: Probing-Based Disentanglement Evaluation for Acoustic Teleportation Codecs

Philipp Grundhuber, Emanuël A. P. Habets

Comments: Accepted for Interspeech 2026

Subjects: Audio and Speech Processing (eess.AS); Sound (cs.SD)
[12] arXiv:2606.30944 [pdf, html, other]: Title: Preserving Speech-to-Text LLM Capabilities in Speech-to-Speech Generation

Yuxuan Hu, Heng Lu, Ruchao Fan, Yao Qian, Xiaofei Wang, Jian Xue, Heming Wang, Shuohang Wang, Young Jin Kim, Yelong Shen, Jinyu Li

Subjects: Audio and Speech Processing (eess.AS); Sound (cs.SD)
[13] arXiv:2606.30780 [pdf, html, other]: Title: Detecting Audio Deepfakes on the Edge:Lightweight SSL-Based Detection in a Browser Plugin

Octavian Pascu, Dan Oneata, Horia Cucu, Nicolas M. Muller

Subjects: Audio and Speech Processing (eess.AS); Artificial Intelligence (cs.AI)
[14] arXiv:2606.30675 [pdf, html, other]: Title: Listening Between the Lines: Joint Learning of ASR Embeddings and LLM-Augmented Linguistics for Dementia Detection

Olivier Jiyoun Jung, Jonghyeon Park, Myungwoo Oh

Comments: Accepted at INTERSPEECH 2026

Subjects: Audio and Speech Processing (eess.AS); Artificial Intelligence (cs.AI); Machine Learning (cs.LG); Quantitative Methods (q-bio.QM)
[15] arXiv:2606.31595 (cross-list from cs.SD) [pdf, other]: Title: Dilemmadata: On the Interoperability of Heterogeneous Roman Numeral Datasets

Johannes Hentschel, Emmanouil Karystinaios, Gerhard Widmer, Markus Neuwirth

Comments: in proceedings of the Music Encoding Conference 2026

Subjects: Sound (cs.SD); Digital Libraries (cs.DL); Audio and Speech Processing (eess.AS)
[16] arXiv:2606.31338 (cross-list from cs.SD) [pdf, html, other]: Title: Beyond Binary Instrument QA: Probing Instrument Grounding in Music Audio-Language Models

Yujun Lee, Joonhyeok Shin, Hyoeun Kim, Kyuhong Shim

Comments: Workshop on Machine Learning for Audio, ICML 2026

Subjects: Sound (cs.SD); Artificial Intelligence (cs.AI); Audio and Speech Processing (eess.AS)
[17] arXiv:2606.31259 (cross-list from cs.SD) [pdf, html, other]: Title: SwiftAudio: Data-Efficient Caption-Only Distillation for One-Step Text-to-Audio Diffusion-based Generation

Binh Mai, Tran Quoc Bao Le, Hung Dinh, Cong Tran

Comments: Under review

Subjects: Sound (cs.SD); Artificial Intelligence (cs.AI); Multimedia (cs.MM); Audio and Speech Processing (eess.AS)
[18] arXiv:2606.31247 (cross-list from cs.SD) [pdf, html, other]: Title: FlexiSLM: A Dynamic and Controllable Frame Rate Spoken Language Model

Jiaqi Li, Chaoren Wang, Xiaohai Tian, Mingjie Chen, Xinyu Liang, Xu Li, Yufan Lin, Junwen Qiu, Jun Zhang, Lu Lu, Haizhou Li, Zhizheng Wu

Comments: Preprint, under review

Subjects: Sound (cs.SD); Audio and Speech Processing (eess.AS)
[19] arXiv:2606.31128 (cross-list from cs.SD) [pdf, html, other]: Title: UniSAE: Unified Speech Attribute Editing on Speaker, Emotion and Low-Level Content via Discrete Phonetic Posteriorgram Modelling

Chuanbo Zhu, Wuyou Zhou, Rongxiu Zhong, Shilei Zhang, Kun Qian, Yike Guo, Wei Xue

Subjects: Sound (cs.SD); Artificial Intelligence (cs.AI); Computation and Language (cs.CL); Audio and Speech Processing (eess.AS)
[20] arXiv:2606.31105 (cross-list from cs.SD) [pdf, html, other]: Title: Attacking UTMOS: Probing the Robustness of a Speech Quality Assessment Model

Wen-Chin Huang, Tomoki Toda

Comments: Preprint. Audio samples: this https URL

Subjects: Sound (cs.SD); Audio and Speech Processing (eess.AS)
[21] arXiv:2606.31055 (cross-list from cs.CL) [pdf, html, other]: Title: Reference-Based Prosody and Rhythm Evaluation for Spoken Dialogue Systems

Ashish Hallur, Thomas Thebaud, Georgi Tinchev, Venkatesh Ravichandran, Laureano Moro-Velazquez

Subjects: Computation and Language (cs.CL); Sound (cs.SD); Audio and Speech Processing (eess.AS)
[22] arXiv:2606.30849 (cross-list from cs.CV) [pdf, html, other]: Title: SyncCache: Exploiting Asymmetric Dynamics for Fast Audio-Driven Portrait Animation

Juncheng Ma, Yuxuan Du, Yanan Sun, Zhening Xing, Changlin Li, Zhenyu Tang, Bo Li, Peng-Tao Jiang, Li Yuan, Daquan Zhou, Yonghong Tian

Comments: ECCV 2026

Subjects: Computer Vision and Pattern Recognition (cs.CV); Sound (cs.SD); Audio and Speech Processing (eess.AS)
[23] arXiv:2606.30811 (cross-list from cs.CV) [pdf, html, other]: Title: AVTok: 1D Unified Tokenization for Holistic Audio-Video Generation

Kien T. Pham, I Chieh Chen, Qifeng Chen, Long Chen

Comments: ECCV 2026

Subjects: Computer Vision and Pattern Recognition (cs.CV); Multimedia (cs.MM); Sound (cs.SD); Audio and Speech Processing (eess.AS)
[24] arXiv:2606.30791 (cross-list from cs.SD) [pdf, html, other]: Title: Probing-Guided Layer Selection from Self-Supervised Speech Models for Generalizable Audio Deepfake Detection

Marjan Beheshti, Majid Rostami, Bo Chen

Comments: Submitted to Computer Speech & Language

Subjects: Sound (cs.SD); Audio and Speech Processing (eess.AS)
[25] arXiv:2606.30700 (cross-list from cs.SD) [pdf, other]: Title: BEST-RQ-2: Contextualize-Then-Predict, a Two-Step Approach for Self-Supervised Audio Representations

Ludovic K. Tuncay (IRIT-SAMoVA), Etienne Labbé (IRIT-SAMoVA), Thomas Pellegrini (IRIT-SAMoVA)

Journal-ref: Interspeech 2026, Sep 2026, Sydney, Australia

Subjects: Sound (cs.SD); Artificial Intelligence (cs.AI); Machine Learning (cs.LG); Audio and Speech Processing (eess.AS); Signal Processing (eess.SP)
[26] arXiv:2606.30682 (cross-list from cs.SD) [pdf, html, other]: Title: ALM2Vec: Learning Audio Embeddings for Universal Audio Retrieval with Large Audio-Language Models

Fengjie Lu, Chenang Jiang, Jiarui Hai, Helin Wang, Aaron Yee

Comments: 7 pages, 3 figures

Subjects: Sound (cs.SD); Artificial Intelligence (cs.AI); Audio and Speech Processing (eess.AS)
[27] arXiv:2606.30671 (cross-list from cs.SD) [pdf, html, other]: Title: Enhancing BEST-RQ Pseudo-Label Quality through Online Refinement for Automatic Speech Recognition

Jingjing Xu, Zijian Yang, Mohammad Zeineldeen, Eugen Beck, Ralf Schlueter, Hermann Ney

Comments: Accepted at Interspeech 2026

Subjects: Sound (cs.SD); Audio and Speech Processing (eess.AS)
[28] arXiv:2606.30646 (cross-list from cs.SD) [pdf, html, other]: Title: ASR-Agnostic Multimodal Spectrotemporal Modeling for Early Dementia Detection

Chukwuemeka Ugwu, Oluwafemi Richard Oyeleke

Subjects: Sound (cs.SD); Artificial Intelligence (cs.AI); Computation and Language (cs.CL); Audio and Speech Processing (eess.AS)

[29] arXiv:2606.30580 [pdf, html, other]: Title: MeloDISinger: Melody-Aware & Duration-Preserving Singing Voice Editing with Audio Infilling

Yoonjeong Park, Jaekwon Im, Juhan Nam

Comments: Accepted to Interspeech 2026

Subjects: Audio and Speech Processing (eess.AS); Sound (cs.SD)
[30] arXiv:2606.30114 [pdf, other]: Title: Evaluation of Head-Related Transfer Functions Across Five Levels of Individualisation in Virtual Reality

Ludovic Pirard, Katarina C. Poole

Comments: Submitted, accepted and presented at the AES 2026 International Conference on Audio for Virtual and Augmented Reality and Immersive Games

Subjects: Audio and Speech Processing (eess.AS)
[31] arXiv:2606.29901 [pdf, html, other]: Title: Semi-Supervised Sound Event Detection with Conditional Mixup and Embedding-Level Contrastive Loss

Nian Shao, Xian Li, Xiaofei Li

Comments: 6 pages; accepted by SMC 2026

Subjects: Audio and Speech Processing (eess.AS); Artificial Intelligence (cs.AI)
[32] arXiv:2606.29632 [pdf, html, other]: Title: VIB-AVSR: Variational Information Bottleneck for Noise-Robust LLM-Based Audio-Visual Speech Recognition

Piyush Arora, Navlika Singh, Umberto Cappellazzo, Stavros Petridis, Maja Pantic

Comments: Accepted to INTERSPEECH 2026. Our code is available at this https URL

Subjects: Audio and Speech Processing (eess.AS); Computer Vision and Pattern Recognition (cs.CV); Sound (cs.SD)
[33] arXiv:2606.29480 [pdf, html, other]: Title: DTM-Codec: Dynamic Token Masking for VFR Speech Coding with Efficient Boundary Selection

Hoyeol Sohn, Juhan Nam

Comments: 10 pages, 2 figures, accepted to INTERSPEECH 2026

Subjects: Audio and Speech Processing (eess.AS); Sound (cs.SD)
[34] arXiv:2606.29450 [pdf, html, other]: Title: VeRe-Flow: Guiding Flow Matching toward Clean Speech via Velocity Contrastive Regularization and Representation Alignment for Noise-Robust Bandwidth Expansion

Sujin Koo, Sangyoon Kim, Ji Sub Um, Hoirin Kim

Comments: Accepted to Interspeech 2026

Subjects: Audio and Speech Processing (eess.AS)
[35] arXiv:2606.28884 [pdf, html, other]: Title: GigaSpeechBench: A Real-World Multilingual Speech-to-Text Benchmark

Yujie Tu, Yifan Yang, Tianrui Wang, Yanqiao Zhu, Guodong Lin, Mingchen Shao, Haoran Wang, Junzhe Liu, Yuxiang Fu, Yizhou Peng, Changsong Liu, Peng Wang, Zhikang Niu, Yunchong Xiao, Haolong Zheng, Xiuwen Zheng, Xulin Fan, Wei-Qiang Zhang, Lei Xie, Longbiao Wang, Eng-Siong Chng, Jiajun Zhang, Kele Xu, Jianwei Yu, Binbin Zhang, Jiayu Du, Wupeng Wang, Zhigao Chen, Yunlong Wu, Guoguo Chen, Xipeng Qiu, Mark Hasegawa-Johnson, Kai Yu, Zhifu Gao, Xiangang Li, Xie Chen

Subjects: Audio and Speech Processing (eess.AS)
[36] arXiv:2606.28732 [pdf, html, other]: Title: CTC-Seeded Token Edit Refinement for Non-Autoregressive Speech Recognition

Wanting Huang, Weiran Wang

Comments: Submitted to IEEE SLT 2026

Subjects: Audio and Speech Processing (eess.AS)
[37] arXiv:2606.28728 [pdf, html, other]: Title: Improving Large-Scale Weakly Supervised ASR by Filtering and Selection

Kohei Matsuura, Masato Mimura

Comments: 5 pages, 4 figures, 2 tables

Subjects: Audio and Speech Processing (eess.AS); Computation and Language (cs.CL)
[38] arXiv:2606.30356 (cross-list from cs.CL) [pdf, html, other]: Title: OLIVE: View-Augmented Latent Prediction with Waveform Reconstruction for Speech SSL

Karl El Hajal, Mathew Magimai.-Doss

Subjects: Computation and Language (cs.CL); Machine Learning (cs.LG); Sound (cs.SD); Audio and Speech Processing (eess.AS)
[39] arXiv:2606.30196 (cross-list from cs.CL) [pdf, html, other]: Title: Forewarned is Forearmed: When Non-Sequential Embedding Turns Into an Anomaly Detector

Elys Allesiardo, Antoine Caubrière, Valentin Vielzeuf

Comments: Accepted for presentation at LREC 2026

Subjects: Computation and Language (cs.CL); Artificial Intelligence (cs.AI); Machine Learning (cs.LG); Audio and Speech Processing (eess.AS)
[40] arXiv:2606.29534 (cross-list from cs.CL) [pdf, html, other]: Title: Preference-ASR: A Preference-Aware Test Set for Benchmarking ASR in the Era of Speech LLMs

Nithin Rao Koluguri, Sasha Meister, Nikolay Karpov, Piotr Zelasko, Desh Raj, Jagadeesh Balam, Boris Ginsburg

Comments: Accepted at Interspeech 2026

Subjects: Computation and Language (cs.CL); Audio and Speech Processing (eess.AS)
[41] arXiv:2606.29071 (cross-list from physics.med-ph) [pdf, html, other]: Title: An Optimal Contact-Mechanically Consistent and Flow-Separation Adapted Modeling of Vocal Fold Dynamics

Sardar Nafis Bin Ali, Maryam Naghibolhosseini, Mohsen Zayernouri

Comments: 30 pages, 9 figures

Subjects: Medical Physics (physics.med-ph); Sound (cs.SD); Audio and Speech Processing (eess.AS)
[42] arXiv:2606.28988 (cross-list from cs.SD) [pdf, html, other]: Title: Underwater Source Detection and Classification for Signal-based Surveillance: Audio Dataset Curation and Cross-Domain Evaluation

Quoc Thinh Vo, David K. Han

Comments: 6 pages, 4 figures. Accepted to the 2026 International Conference on Advanced Visual and Signal-Based Systems (AVSS) - Lecce, Italy

Subjects: Sound (cs.SD); Audio and Speech Processing (eess.AS); Signal Processing (eess.SP)

[43] arXiv:2606.28249 [pdf, html, other]: Title: HPRO: Hierarchical Progressive Reward Optimization via Preference Extraction for Emotional Text-to-Speech

Sihang Nie, Xiaofen Xing, Rui Xing, Haoming Li, Ruitong Xiao, Jingyuan Xing, Baiji Liu, Xiangmin Xu

Comments: 7 pages, 3 figures, 3 tables; Preprint

Subjects: Audio and Speech Processing (eess.AS); Computation and Language (cs.CL); Sound (cs.SD)
[44] arXiv:2606.28114 [pdf, other]: Title: Screening Matters: A Comparative Study of Conventional and Crowdsourced Listening Tests

Anika Treffehn, Andrea Eichenseer, Emily Kratsch, Nicola Pia

Comments: accepted at Interspeech 2026

Subjects: Audio and Speech Processing (eess.AS)
[45] arXiv:2606.28048 (cross-list from cs.SD) [pdf, html, other]: Title: DG^VoiC: Speaker Clustering for Fraud Investigation under Real Call-Centre Conditions

Muhammad Shakeel Akram, Amal Htait, Abdul Hamid Sadka, Emma Meisingseth, Karishma Jaitly

Comments: 5 pages, 4 figures, 1 table

Subjects: Sound (cs.SD); Artificial Intelligence (cs.AI); Computation and Language (cs.CL); Audio and Speech Processing (eess.AS)
[46] arXiv:2606.28032 (cross-list from cs.SD) [pdf, other]: Title: A Flexible Encoding Model for Non-Unique Note Alignments

Suhit Chiruthapudi, Adam Štefunko, Silvan Peter, Patricia Hu, Jan Hajič jr., Carlos Eduardo Cancino-Chacón

Comments: Published at the Music Encoding Conference (MEC), 2026

Subjects: Sound (cs.SD); Audio and Speech Processing (eess.AS)
[47] arXiv:2606.28002 (cross-list from cs.CL) [pdf, html, other]: Title: Dialogue to Detection: A Multimodal Hybrid NLP Pipeline for Insurance Fraud Detection

Muhammad Shakeel Akram, Amal Htait, Abdul Hamid Sadka, Emma Meisingseth, Karishma Jaitly

Comments: 10 pages, 8 figures, 2 tables

Subjects: Computation and Language (cs.CL); Artificial Intelligence (cs.AI); Audio and Speech Processing (eess.AS)
[48] arXiv:2606.27965 (cross-list from cs.SD) [pdf, html, other]: Title: Grammar-Guided Hierarchical Parsing for Long-form Audio Activity Recognition

Peng Zhang, Qingyu Luo, Philip J.B. Jackson, Wenwu Wang

Comments: Accepted to Interspeech 2026

Subjects: Sound (cs.SD); Audio and Speech Processing (eess.AS)
[49] arXiv:2606.27717 (cross-list from cs.CL) [pdf, html, other]: Title: Do Speech Emphasis Models Generalize across Languages and Emotions?

Megan Wei, Deepali Aneja, Jiaqi Su, Yunyun Wang, Haonan Chen, Zeyu Jin

Comments: Interspeech 2026

Subjects: Computation and Language (cs.CL); Artificial Intelligence (cs.AI); Machine Learning (cs.LG); Sound (cs.SD); Audio and Speech Processing (eess.AS)
[50] arXiv:2606.27320 (cross-list from cs.SD) [pdf, html, other]: Title: Elastic Time: Dynamic Frame Rate Bottlenecks for Neural Audio Coding

Dimitrios Bralios, Paris Smaragdis, Minje Kim

Comments: Interspeech 2026

Subjects: Sound (cs.SD); Machine Learning (cs.LG); Audio and Speech Processing (eess.AS)

Total of 59 entries : 1-50 51-59

Showing up to 50 entries per page: fewer | more | all

Audio and Speech Processing

Authors and titles for recent submissions

Thu, 2 Jul 2026 (showing 6 of 6 entries )

Wed, 1 Jul 2026 (showing 22 of 22 entries )

2026年6月30日 (showing 14 of 14 entries )

2026年6月29日 (showing 8 of 8 entries )