Audio and Speech Processing
Authors and titles for recent submissions
See today's new changes
Total of 59 entries : 1-50
51-59
- [1] arXiv:2607.01161 [pdf, html, other]
-
Title: Disentangling Speaker and Language Effects in Cross-Lingual Speaker Verification for Iberian LanguagesComments: 5 pages, 8 figures, Submitted to IberSPEECH 2026Subjects: Audio and Speech Processing (eess.AS); Computation and Language (cs.CL)
- [2] arXiv:2607.00899 [pdf, html, other]
-
Title: Positive-Incentive Noise Predictor for Adversarial Purification in Speaker VerificationYibo Bai, Sizhou Chen, Michele Panariello, Hao Ma, Xiao-Lei Zhang, Xuelong Li, Massimiliano Todisco, Nicholas EvanComments: Submitted to IEEE TASLP.13 pages for maunscript, 2 pages for supplementary materialSubjects: Audio and Speech Processing (eess.AS)
- [3] arXiv:2607.00548 [pdf, html, other]
-
Title: AmbiDrop: Ambisonics-Based Array-Agnostic Neural Speech EnhancementComments: Submitted to IEEE Transactions on Audio, Speech, and Language ProcessingSubjects: Audio and Speech Processing (eess.AS)
- [4] arXiv:2607.00387 [pdf, html, other]
-
Title: From Objectives to Applications: Aligning Architectural Biases in Audio Self-Supervised LearningKele Xu, Yulu Fang, Boda Zhou, Yulin Sun, Qisheng Xu, Qiya Song, Jin Zhang, Cheng Yang, Huaimin WangSubjects: Audio and Speech Processing (eess.AS)
- [5] arXiv:2607.00260 [pdf, html, other]
-
Title: Do Multimodal Large Language Models Need Reasoning to Classify Dementia from Speech?Subjects: Audio and Speech Processing (eess.AS)
- [6] arXiv:2607.00418 (cross-list from cs.CL) [pdf, html, other]
-
Title: Speech Playground: An Interactive Tool for Speech Analysis and ComparisonComments: Accepted to Interspeech 2026 (Show and Tell); 2 pages, 3 figuresSubjects: Computation and Language (cs.CL); Sound (cs.SD); Audio and Speech Processing (eess.AS)
Thu, 2 Jul 2026 (showing 6 of 6 entries )
- [7] arXiv:2606.31730 [pdf, html, other]
-
Title: A Fair and Transparent Framework for Speech-Based Depression Detection: Balancing Interpretability and PerformanceComments: 7 pages, 2 figures, 3 tables. This work has been submitted to the IEEE for possible publicationSubjects: Audio and Speech Processing (eess.AS)
- [8] arXiv:2606.31729 [pdf, html, other]
-
Title: Is Natural Always Appropriate? Investigating Naturalness and Appropriateness Across Different Domains for TTS EvaluationComments: Accepted at Interspeech 26'Subjects: Audio and Speech Processing (eess.AS); Machine Learning (cs.LG)
- [9] arXiv:2606.31552 [pdf, html, other]
-
Title: Improving multichannel speech enhancement through accurate room-acoustic simulationsComments: Accepted for publication at InterspeechSubjects: Audio and Speech Processing (eess.AS); Artificial Intelligence (cs.AI); Machine Learning (cs.LG); Sound (cs.SD)
- [10] arXiv:2606.31527 [pdf, html, other]
-
Title: How Bilingual Are SSL Speech Models? Cross-Lingual Probing of Articulatory Encoding with Finnish and Russian EMAComments: Interspeech 2026Subjects: Audio and Speech Processing (eess.AS); Sound (cs.SD)
- [11] arXiv:2606.31365 [pdf, html, other]
-
Title: Beyond Cross-Reconstruction: Probing-Based Disentanglement Evaluation for Acoustic Teleportation CodecsComments: Accepted for Interspeech 2026Subjects: Audio and Speech Processing (eess.AS); Sound (cs.SD)
- [12] arXiv:2606.30944 [pdf, html, other]
-
Title: Preserving Speech-to-Text LLM Capabilities in Speech-to-Speech GenerationYuxuan Hu, Heng Lu, Ruchao Fan, Yao Qian, Xiaofei Wang, Jian Xue, Heming Wang, Shuohang Wang, Young Jin Kim, Yelong Shen, Jinyu LiSubjects: Audio and Speech Processing (eess.AS); Sound (cs.SD)
- [13] arXiv:2606.30780 [pdf, html, other]
-
Title: Detecting Audio Deepfakes on the Edge:Lightweight SSL-Based Detection in a Browser PluginSubjects: Audio and Speech Processing (eess.AS); Artificial Intelligence (cs.AI)
- [14] arXiv:2606.30675 [pdf, html, other]
-
Title: Listening Between the Lines: Joint Learning of ASR Embeddings and LLM-Augmented Linguistics for Dementia DetectionComments: Accepted at INTERSPEECH 2026Subjects: Audio and Speech Processing (eess.AS); Artificial Intelligence (cs.AI); Machine Learning (cs.LG); Quantitative Methods (q-bio.QM)
- [15] arXiv:2606.31595 (cross-list from cs.SD) [pdf, other]
-
Title: Dilemmadata: On the Interoperability of Heterogeneous Roman Numeral DatasetsComments: in proceedings of the Music Encoding Conference 2026Subjects: Sound (cs.SD); Digital Libraries (cs.DL); Audio and Speech Processing (eess.AS)
- [16] arXiv:2606.31338 (cross-list from cs.SD) [pdf, html, other]
-
Title: Beyond Binary Instrument QA: Probing Instrument Grounding in Music Audio-Language ModelsComments: Workshop on Machine Learning for Audio, ICML 2026Subjects: Sound (cs.SD); Artificial Intelligence (cs.AI); Audio and Speech Processing (eess.AS)
- [17] arXiv:2606.31259 (cross-list from cs.SD) [pdf, html, other]
-
Title: SwiftAudio: Data-Efficient Caption-Only Distillation for One-Step Text-to-Audio Diffusion-based GenerationComments: Under reviewSubjects: Sound (cs.SD); Artificial Intelligence (cs.AI); Multimedia (cs.MM); Audio and Speech Processing (eess.AS)
- [18] arXiv:2606.31247 (cross-list from cs.SD) [pdf, html, other]
-
Title: FlexiSLM: A Dynamic and Controllable Frame Rate Spoken Language ModelJiaqi Li, Chaoren Wang, Xiaohai Tian, Mingjie Chen, Xinyu Liang, Xu Li, Yufan Lin, Junwen Qiu, Jun Zhang, Lu Lu, Haizhou Li, Zhizheng WuComments: Preprint, under reviewSubjects: Sound (cs.SD); Audio and Speech Processing (eess.AS)
- [19] arXiv:2606.31128 (cross-list from cs.SD) [pdf, html, other]
-
Title: UniSAE: Unified Speech Attribute Editing on Speaker, Emotion and Low-Level Content via Discrete Phonetic Posteriorgram ModellingSubjects: Sound (cs.SD); Artificial Intelligence (cs.AI); Computation and Language (cs.CL); Audio and Speech Processing (eess.AS)
- [20] arXiv:2606.31105 (cross-list from cs.SD) [pdf, html, other]
-
Title: Attacking UTMOS: Probing the Robustness of a Speech Quality Assessment ModelComments: Preprint. Audio samples: this https URLSubjects: Sound (cs.SD); Audio and Speech Processing (eess.AS)
- [21] arXiv:2606.31055 (cross-list from cs.CL) [pdf, html, other]
-
Title: Reference-Based Prosody and Rhythm Evaluation for Spoken Dialogue SystemsSubjects: Computation and Language (cs.CL); Sound (cs.SD); Audio and Speech Processing (eess.AS)
- [22] arXiv:2606.30849 (cross-list from cs.CV) [pdf, html, other]
-
Title: SyncCache: Exploiting Asymmetric Dynamics for Fast Audio-Driven Portrait AnimationJuncheng Ma, Yuxuan Du, Yanan Sun, Zhening Xing, Changlin Li, Zhenyu Tang, Bo Li, Peng-Tao Jiang, Li Yuan, Daquan Zhou, Yonghong TianComments: ECCV 2026Subjects: Computer Vision and Pattern Recognition (cs.CV); Sound (cs.SD); Audio and Speech Processing (eess.AS)
- [23] arXiv:2606.30811 (cross-list from cs.CV) [pdf, html, other]
-
Title: AVTok: 1D Unified Tokenization for Holistic Audio-Video GenerationComments: ECCV 2026Subjects: Computer Vision and Pattern Recognition (cs.CV); Multimedia (cs.MM); Sound (cs.SD); Audio and Speech Processing (eess.AS)
- [24] arXiv:2606.30791 (cross-list from cs.SD) [pdf, html, other]
-
Title: Probing-Guided Layer Selection from Self-Supervised Speech Models for Generalizable Audio Deepfake DetectionComments: Submitted to Computer Speech & LanguageSubjects: Sound (cs.SD); Audio and Speech Processing (eess.AS)
- [25] arXiv:2606.30700 (cross-list from cs.SD) [pdf, other]
-
Title: BEST-RQ-2: Contextualize-Then-Predict, a Two-Step Approach for Self-Supervised Audio RepresentationsJournal-ref: Interspeech 2026, Sep 2026, Sydney, AustraliaSubjects: Sound (cs.SD); Artificial Intelligence (cs.AI); Machine Learning (cs.LG); Audio and Speech Processing (eess.AS); Signal Processing (eess.SP)
- [26] arXiv:2606.30682 (cross-list from cs.SD) [pdf, html, other]
-
Title: ALM2Vec: Learning Audio Embeddings for Universal Audio Retrieval with Large Audio-Language ModelsComments: 7 pages, 3 figuresSubjects: Sound (cs.SD); Artificial Intelligence (cs.AI); Audio and Speech Processing (eess.AS)
- [27] arXiv:2606.30671 (cross-list from cs.SD) [pdf, html, other]
-
Title: Enhancing BEST-RQ Pseudo-Label Quality through Online Refinement for Automatic Speech RecognitionComments: Accepted at Interspeech 2026Subjects: Sound (cs.SD); Audio and Speech Processing (eess.AS)
- [28] arXiv:2606.30646 (cross-list from cs.SD) [pdf, html, other]
-
Title: ASR-Agnostic Multimodal Spectrotemporal Modeling for Early Dementia DetectionSubjects: Sound (cs.SD); Artificial Intelligence (cs.AI); Computation and Language (cs.CL); Audio and Speech Processing (eess.AS)
Wed, 1 Jul 2026 (showing 22 of 22 entries )
- [29] arXiv:2606.30580 [pdf, html, other]
-
Title: MeloDISinger: Melody-Aware & Duration-Preserving Singing Voice Editing with Audio InfillingComments: Accepted to Interspeech 2026Subjects: Audio and Speech Processing (eess.AS); Sound (cs.SD)
- [30] arXiv:2606.30114 [pdf, other]
-
Title: Evaluation of Head-Related Transfer Functions Across Five Levels of Individualisation in Virtual RealityComments: Submitted, accepted and presented at the AES 2026 International Conference on Audio for Virtual and Augmented Reality and Immersive GamesSubjects: Audio and Speech Processing (eess.AS)
- [31] arXiv:2606.29901 [pdf, html, other]
-
Title: Semi-Supervised Sound Event Detection with Conditional Mixup and Embedding-Level Contrastive LossComments: 6 pages; accepted by SMC 2026Subjects: Audio and Speech Processing (eess.AS); Artificial Intelligence (cs.AI)
- [32] arXiv:2606.29632 [pdf, html, other]
-
Title: VIB-AVSR: Variational Information Bottleneck for Noise-Robust LLM-Based Audio-Visual Speech RecognitionComments: Accepted to INTERSPEECH 2026. Our code is available at this https URLSubjects: Audio and Speech Processing (eess.AS); Computer Vision and Pattern Recognition (cs.CV); Sound (cs.SD)
- [33] arXiv:2606.29480 [pdf, html, other]
-
Title: DTM-Codec: Dynamic Token Masking for VFR Speech Coding with Efficient Boundary SelectionComments: 10 pages, 2 figures, accepted to INTERSPEECH 2026Subjects: Audio and Speech Processing (eess.AS); Sound (cs.SD)
- [34] arXiv:2606.29450 [pdf, html, other]
-
Title: VeRe-Flow: Guiding Flow Matching toward Clean Speech via Velocity Contrastive Regularization and Representation Alignment for Noise-Robust Bandwidth ExpansionComments: Accepted to Interspeech 2026Subjects: Audio and Speech Processing (eess.AS)
- [35] arXiv:2606.28884 [pdf, html, other]
-
Title: GigaSpeechBench: A Real-World Multilingual Speech-to-Text BenchmarkYujie Tu, Yifan Yang, Tianrui Wang, Yanqiao Zhu, Guodong Lin, Mingchen Shao, Haoran Wang, Junzhe Liu, Yuxiang Fu, Yizhou Peng, Changsong Liu, Peng Wang, Zhikang Niu, Yunchong Xiao, Haolong Zheng, Xiuwen Zheng, Xulin Fan, Wei-Qiang Zhang, Lei Xie, Longbiao Wang, Eng-Siong Chng, Jiajun Zhang, Kele Xu, Jianwei Yu, Binbin Zhang, Jiayu Du, Wupeng Wang, Zhigao Chen, Yunlong Wu, Guoguo Chen, Xipeng Qiu, Mark Hasegawa-Johnson, Kai Yu, Zhifu Gao, Xiangang Li, Xie ChenSubjects: Audio and Speech Processing (eess.AS)
- [36] arXiv:2606.28732 [pdf, html, other]
-
Title: CTC-Seeded Token Edit Refinement for Non-Autoregressive Speech RecognitionComments: Submitted to IEEE SLT 2026Subjects: Audio and Speech Processing (eess.AS)
- [37] arXiv:2606.28728 [pdf, html, other]
-
Title: Improving Large-Scale Weakly Supervised ASR by Filtering and SelectionComments: 5 pages, 4 figures, 2 tablesSubjects: Audio and Speech Processing (eess.AS); Computation and Language (cs.CL)
- [38] arXiv:2606.30356 (cross-list from cs.CL) [pdf, html, other]
-
Title: OLIVE: View-Augmented Latent Prediction with Waveform Reconstruction for Speech SSLSubjects: Computation and Language (cs.CL); Machine Learning (cs.LG); Sound (cs.SD); Audio and Speech Processing (eess.AS)
- [39] arXiv:2606.30196 (cross-list from cs.CL) [pdf, html, other]
-
Title: Forewarned is Forearmed: When Non-Sequential Embedding Turns Into an Anomaly DetectorComments: Accepted for presentation at LREC 2026Subjects: Computation and Language (cs.CL); Artificial Intelligence (cs.AI); Machine Learning (cs.LG); Audio and Speech Processing (eess.AS)
- [40] arXiv:2606.29534 (cross-list from cs.CL) [pdf, html, other]
-
Title: Preference-ASR: A Preference-Aware Test Set for Benchmarking ASR in the Era of Speech LLMsNithin Rao Koluguri, Sasha Meister, Nikolay Karpov, Piotr Zelasko, Desh Raj, Jagadeesh Balam, Boris GinsburgComments: Accepted at Interspeech 2026Subjects: Computation and Language (cs.CL); Audio and Speech Processing (eess.AS)
- [41] arXiv:2606.29071 (cross-list from physics.med-ph) [pdf, html, other]
-
Title: An Optimal Contact-Mechanically Consistent and Flow-Separation Adapted Modeling of Vocal Fold DynamicsComments: 30 pages, 9 figuresSubjects: Medical Physics (physics.med-ph); Sound (cs.SD); Audio and Speech Processing (eess.AS)
- [42] arXiv:2606.28988 (cross-list from cs.SD) [pdf, html, other]
-
Title: Underwater Source Detection and Classification for Signal-based Surveillance: Audio Dataset Curation and Cross-Domain EvaluationComments: 6 pages, 4 figures. Accepted to the 2026 International Conference on Advanced Visual and Signal-Based Systems (AVSS) - Lecce, ItalySubjects: Sound (cs.SD); Audio and Speech Processing (eess.AS); Signal Processing (eess.SP)
2026年6月30日 (showing 14 of 14 entries )
- [43] arXiv:2606.28249 [pdf, html, other]
-
Title: HPRO: Hierarchical Progressive Reward Optimization via Preference Extraction for Emotional Text-to-SpeechComments: 7 pages, 3 figures, 3 tables; PreprintSubjects: Audio and Speech Processing (eess.AS); Computation and Language (cs.CL); Sound (cs.SD)
- [44] arXiv:2606.28114 [pdf, other]
-
Title: Screening Matters: A Comparative Study of Conventional and Crowdsourced Listening TestsComments: accepted at Interspeech 2026Subjects: Audio and Speech Processing (eess.AS)
- [45] arXiv:2606.28048 (cross-list from cs.SD) [pdf, html, other]
-
Title: DG^VoiC: Speaker Clustering for Fraud Investigation under Real Call-Centre ConditionsComments: 5 pages, 4 figures, 1 tableSubjects: Sound (cs.SD); Artificial Intelligence (cs.AI); Computation and Language (cs.CL); Audio and Speech Processing (eess.AS)
- [46] arXiv:2606.28032 (cross-list from cs.SD) [pdf, other]
-
Title: A Flexible Encoding Model for Non-Unique Note AlignmentsSuhit Chiruthapudi, Adam Štefunko, Silvan Peter, Patricia Hu, Jan Hajič jr., Carlos Eduardo Cancino-ChacónComments: Published at the Music Encoding Conference (MEC), 2026Subjects: Sound (cs.SD); Audio and Speech Processing (eess.AS)
- [47] arXiv:2606.28002 (cross-list from cs.CL) [pdf, html, other]
-
Title: Dialogue to Detection: A Multimodal Hybrid NLP Pipeline for Insurance Fraud DetectionComments: 10 pages, 8 figures, 2 tablesSubjects: Computation and Language (cs.CL); Artificial Intelligence (cs.AI); Audio and Speech Processing (eess.AS)
- [48] arXiv:2606.27965 (cross-list from cs.SD) [pdf, html, other]
-
Title: Grammar-Guided Hierarchical Parsing for Long-form Audio Activity RecognitionComments: Accepted to Interspeech 2026Subjects: Sound (cs.SD); Audio and Speech Processing (eess.AS)
- [49] arXiv:2606.27717 (cross-list from cs.CL) [pdf, html, other]
-
Title: Do Speech Emphasis Models Generalize across Languages and Emotions?Comments: Interspeech 2026Subjects: Computation and Language (cs.CL); Artificial Intelligence (cs.AI); Machine Learning (cs.LG); Sound (cs.SD); Audio and Speech Processing (eess.AS)
- [50] arXiv:2606.27320 (cross-list from cs.SD) [pdf, html, other]
-
Title: Elastic Time: Dynamic Frame Rate Bottlenecks for Neural Audio CodingComments: Interspeech 2026Subjects: Sound (cs.SD); Machine Learning (cs.LG); Audio and Speech Processing (eess.AS)
2026年6月29日 (showing 8 of 8 entries )
Total of 59 entries : 1-50
51-59