Sound

Authors and titles for recent submissions

See today's new changes

Total of 67 entries : 1-50 51-67

Showing up to 50 entries per page: fewer | more | all

[1] arXiv:2607.01108 [pdf, html, other]: Title: NPUsper: Eliminating Redundant Computation for Real-Time Whisper on Mobile NPUs

Sihyeon Lee, Hojeong Lee, Sungwon Woo, Chengpo Yan, Suman Banerjee, Seyeon Kim

Subjects: Sound (cs.SD)
[2] arXiv:2607.00946 [pdf, html, other]: Title: A Geometric Perspective on Composable Emotion Steering in Text-to-Speech Models

Siyi Wang, James Bailey, Ting Dang

Subjects: Sound (cs.SD); Machine Learning (cs.LG)
[3] arXiv:2607.00777 [pdf, html, other]: Title: Evaluating Pretrained Music Embeddings for Cross-Performance Jazz Standard Recognition

Çağrı Eser

Comments: 6 pages, 2 figures, 4 tables. Accepted to the ICML 2026 Workshop on Machine Learning for Audio

Subjects: Sound (cs.SD); Machine Learning (cs.LG)
[4] arXiv:2607.00363 [pdf, html, other]: Title: Enhancing Flow Matching with A Unified Guidance Framework for Efficient and Robust Speech Synthesis

Zuda Yu, Qianhui Xu, Ting Chen, Junhui Zhang, Tao Fu, Hongjiang Yu, Qiangqing Wang, Yang Song

Comments: Accepted to INTERSPEECH 2026

Subjects: Sound (cs.SD); Artificial Intelligence (cs.AI)
[5] arXiv:2607.00309 [pdf, html, other]: Title: A Text-Steerable Instrument for Sketching Procedural Soundscapes via Language Models

Prabal Gupta (Rama Labs, Kitchener, Canada)

Comments: 10 pages, 7 figures, 2 tables. Accepted to the International Conference on New Interfaces for Musical Expression (NIME 2026), London, UK. Supplementary material included as an appendix. Code and demo: this https URL

Subjects: Sound (cs.SD); Computation and Language (cs.CL); Human-Computer Interaction (cs.HC)
[6] arXiv:2607.00247 [pdf, html, other]: Title: Adaptive Perturbation Selection for Contrastive Audio Decoding

Aaron Isidore Grace, Zhouyuan Huo, Weiran Wang

Comments: In submission

Subjects: Sound (cs.SD); Artificial Intelligence (cs.AI)
[7] arXiv:2607.00726 (cross-list from cs.CV) [pdf, html, other]: Title: AV-SyncBench: Decoupled Benchmarking of Temporal and Semantic Audio-Visual Synchronization

Tianhong Zhou, Mingyang Han, Boyu Li, Yuxuan Jiang, Jiaxin Ye, Dongxiao Wang, Haoxiang Shi, Kunpeng Wang, Jun Song, Cheng Yu, Bo Zheng

Comments: Accepted by Interspeech 2026

Subjects: Computer Vision and Pattern Recognition (cs.CV); Sound (cs.SD)
[8] arXiv:2607.00418 (cross-list from cs.CL) [pdf, html, other]: Title: Speech Playground: An Interactive Tool for Speech Analysis and Comparison

Stephen McIntosh, Daisuke Saito, Nobuaki Minematsu

Comments: Accepted to Interspeech 2026 (Show and Tell); 2 pages, 3 figures

Subjects: Computation and Language (cs.CL); Sound (cs.SD); Audio and Speech Processing (eess.AS)

[9] arXiv:2606.31595 [pdf, other]: Title: Dilemmadata: On the Interoperability of Heterogeneous Roman Numeral Datasets

Johannes Hentschel, Emmanouil Karystinaios, Gerhard Widmer, Markus Neuwirth

Comments: in proceedings of the Music Encoding Conference 2026

Subjects: Sound (cs.SD); Digital Libraries (cs.DL); Audio and Speech Processing (eess.AS)
[10] arXiv:2606.31587 [pdf, html, other]: Title: ZEBRA: Zero-Shot Entropy-Regularized Prompt Learning for Base-to-Novel Generalization in Audio-Language Models

Asif Hanif, Mohammad Yaqub

Comments: Accepted in InterSpeech 2026

Subjects: Sound (cs.SD); Artificial Intelligence (cs.AI)
[11] arXiv:2606.31338 [pdf, html, other]: Title: Beyond Binary Instrument QA: Probing Instrument Grounding in Music Audio-Language Models

Yujun Lee, Joonhyeok Shin, Hyoeun Kim, Kyuhong Shim

Comments: Workshop on Machine Learning for Audio, ICML 2026

Subjects: Sound (cs.SD); Artificial Intelligence (cs.AI); Audio and Speech Processing (eess.AS)
[12] arXiv:2606.31259 [pdf, html, other]: Title: SwiftAudio: Data-Efficient Caption-Only Distillation for One-Step Text-to-Audio Diffusion-based Generation

Binh Mai, Tran Quoc Bao Le, Hung Dinh, Cong Tran

Comments: Under review

Subjects: Sound (cs.SD); Artificial Intelligence (cs.AI); Multimedia (cs.MM); Audio and Speech Processing (eess.AS)
[13] arXiv:2606.31247 [pdf, html, other]: Title: FlexiSLM: A Dynamic and Controllable Frame Rate Spoken Language Model

Jiaqi Li, Chaoren Wang, Xiaohai Tian, Mingjie Chen, Xinyu Liang, Xu Li, Yufan Lin, Junwen Qiu, Jun Zhang, Lu Lu, Haizhou Li, Zhizheng Wu

Comments: Preprint, under review

Subjects: Sound (cs.SD); Audio and Speech Processing (eess.AS)
[14] arXiv:2606.31128 [pdf, html, other]: Title: UniSAE: Unified Speech Attribute Editing on Speaker, Emotion and Low-Level Content via Discrete Phonetic Posteriorgram Modelling

Chuanbo Zhu, Wuyou Zhou, Rongxiu Zhong, Shilei Zhang, Kun Qian, Yike Guo, Wei Xue

Subjects: Sound (cs.SD); Artificial Intelligence (cs.AI); Computation and Language (cs.CL); Audio and Speech Processing (eess.AS)
[15] arXiv:2606.31105 [pdf, html, other]: Title: Attacking UTMOS: Probing the Robustness of a Speech Quality Assessment Model

Wen-Chin Huang, Tomoki Toda

Comments: Preprint. Audio samples: this https URL

Subjects: Sound (cs.SD); Audio and Speech Processing (eess.AS)
[16] arXiv:2606.30791 [pdf, html, other]: Title: Probing-Guided Layer Selection from Self-Supervised Speech Models for Generalizable Audio Deepfake Detection

Marjan Beheshti, Majid Rostami, Bo Chen

Comments: Submitted to Computer Speech & Language

Subjects: Sound (cs.SD); Audio and Speech Processing (eess.AS)
[17] arXiv:2606.30700 [pdf, other]: Title: BEST-RQ-2: Contextualize-Then-Predict, a Two-Step Approach for Self-Supervised Audio Representations

Ludovic K. Tuncay (IRIT-SAMoVA), Etienne Labbé (IRIT-SAMoVA), Thomas Pellegrini (IRIT-SAMoVA)

Journal-ref: Interspeech 2026, Sep 2026, Sydney, Australia

Subjects: Sound (cs.SD); Artificial Intelligence (cs.AI); Machine Learning (cs.LG); Audio and Speech Processing (eess.AS); Signal Processing (eess.SP)
[18] arXiv:2606.30682 [pdf, html, other]: Title: ALM2Vec: Learning Audio Embeddings for Universal Audio Retrieval with Large Audio-Language Models

Fengjie Lu, Chenang Jiang, Jiarui Hai, Helin Wang, Aaron Yee

Comments: 7 pages, 3 figures

Subjects: Sound (cs.SD); Artificial Intelligence (cs.AI); Audio and Speech Processing (eess.AS)
[19] arXiv:2606.30671 [pdf, html, other]: Title: Enhancing BEST-RQ Pseudo-Label Quality through Online Refinement for Automatic Speech Recognition

Jingjing Xu, Zijian Yang, Mohammad Zeineldeen, Eugen Beck, Ralf Schlueter, Hermann Ney

Comments: Accepted at Interspeech 2026

Subjects: Sound (cs.SD); Audio and Speech Processing (eess.AS)
[20] arXiv:2606.30646 [pdf, html, other]: Title: ASR-Agnostic Multimodal Spectrotemporal Modeling for Early Dementia Detection

Chukwuemeka Ugwu, Oluwafemi Richard Oyeleke

Subjects: Sound (cs.SD); Artificial Intelligence (cs.AI); Computation and Language (cs.CL); Audio and Speech Processing (eess.AS)
[21] arXiv:2606.31552 (cross-list from eess.AS) [pdf, html, other]: Title: Improving multichannel speech enhancement through accurate room-acoustic simulations

Georg Götz, Alessia Milo, Steinar Guðjónsson, Daniel Gert Nielsen, Jesper Pedersen, Finnur Pind

Comments: Accepted for publication at Interspeech

Subjects: Audio and Speech Processing (eess.AS); Artificial Intelligence (cs.AI); Machine Learning (cs.LG); Sound (cs.SD)
[22] arXiv:2606.31527 (cross-list from eess.AS) [pdf, html, other]: Title: How Bilingual Are SSL Speech Models? Cross-Lingual Probing of Articulatory Encoding with Finnish and Russian EMA

Ailín Pollio San Pedro, Tomi Kinnunen, Alexandre Nikolaev, Ruchi Pandey

Comments: Interspeech 2026

Subjects: Audio and Speech Processing (eess.AS); Sound (cs.SD)
[23] arXiv:2606.31508 (cross-list from cs.CL) [pdf, other]: Title: Building an ASR Solution for Training and Assessing Children's Reading

Yacouba Diarra, Nouhoum Souleymane Coulibaly, Mamadou Dembele, Aymane Dembele, Michael Leventhal

Comments: 5 pages, 2 figures

Subjects: Computation and Language (cs.CL); Sound (cs.SD)
[24] arXiv:2606.31365 (cross-list from eess.AS) [pdf, html, other]: Title: Beyond Cross-Reconstruction: Probing-Based Disentanglement Evaluation for Acoustic Teleportation Codecs

Philipp Grundhuber, Emanuël A. P. Habets

Comments: Accepted for Interspeech 2026

Subjects: Audio and Speech Processing (eess.AS); Sound (cs.SD)
[25] arXiv:2606.31055 (cross-list from cs.CL) [pdf, html, other]: Title: Reference-Based Prosody and Rhythm Evaluation for Spoken Dialogue Systems

Ashish Hallur, Thomas Thebaud, Georgi Tinchev, Venkatesh Ravichandran, Laureano Moro-Velazquez

Subjects: Computation and Language (cs.CL); Sound (cs.SD); Audio and Speech Processing (eess.AS)
[26] arXiv:2606.30944 (cross-list from eess.AS) [pdf, html, other]: Title: Preserving Speech-to-Text LLM Capabilities in Speech-to-Speech Generation

Yuxuan Hu, Heng Lu, Ruchao Fan, Yao Qian, Xiaofei Wang, Jian Xue, Heming Wang, Shuohang Wang, Young Jin Kim, Yelong Shen, Jinyu Li

Subjects: Audio and Speech Processing (eess.AS); Sound (cs.SD)
[27] arXiv:2606.30849 (cross-list from cs.CV) [pdf, html, other]: Title: SyncCache: Exploiting Asymmetric Dynamics for Fast Audio-Driven Portrait Animation

Juncheng Ma, Yuxuan Du, Yanan Sun, Zhening Xing, Changlin Li, Zhenyu Tang, Bo Li, Peng-Tao Jiang, Li Yuan, Daquan Zhou, Yonghong Tian

Comments: ECCV 2026

Subjects: Computer Vision and Pattern Recognition (cs.CV); Sound (cs.SD); Audio and Speech Processing (eess.AS)
[28] arXiv:2606.30811 (cross-list from cs.CV) [pdf, html, other]: Title: AVTok: 1D Unified Tokenization for Holistic Audio-Video Generation

Kien T. Pham, I Chieh Chen, Qifeng Chen, Long Chen

Comments: ECCV 2026

Subjects: Computer Vision and Pattern Recognition (cs.CV); Multimedia (cs.MM); Sound (cs.SD); Audio and Speech Processing (eess.AS)

[29] arXiv:2606.30642 [pdf, html, other]: Title: LeVo 2: Stable and Melodious Song Generation via Hierarchical Representation Modeling and Progressive Post-Training

Shun Lei, Huaicheng Zhang, Dapeng Wu, Yaoxun Xu, Lishi Zuo, Wei Tan, Hangting Chen, Guangzheng Li, Jianwei Yu, Zhiyong Wu, Dong Yu

Subjects: Sound (cs.SD); Artificial Intelligence (cs.AI)
[30] arXiv:2606.30550 [pdf, html, other]: Title: SIGMA: Saliency-Guided Sparse Mask Attacks for Speech Emotion Recognition

Qiyang Sun, Yi Chang, Zixing Zhang, Björn W. Schuller

Comments: Under review

Subjects: Sound (cs.SD)
[31] arXiv:2606.30369 [pdf, html, other]: Title: Predicting Timbre Traits for Interpretable Assessment of Musical Sound Synthesizers

Théo Chasle Cauchy, Modan Tailleur, Lindsey Reymore, Fanny Roche, Mathieu Lagrange

Subjects: Sound (cs.SD)
[32] arXiv:2606.29897 [pdf, html, other]: Title: Child-Centric Voice Anonymization in Single and Multi-Speaker Speech via Domain-Adapted SSL Models

Pranav Tushar, Xiao Xiao Miao, Rong Tong

Comments: accepted by INTERSPEECH2026

Subjects: Sound (cs.SD); Artificial Intelligence (cs.AI); Signal Processing (eess.SP)
[33] arXiv:2606.29589 [pdf, html, other]: Title: EchoHawk: A Reproducible Acoustic Pipeline for Drone Detection, Classification, and Direction-Finding, with a Cautionary Study of Session-Level Data Leakage

David Shulman

Subjects: Sound (cs.SD); Applied Physics (physics.app-ph)
[34] arXiv:2606.29575 [pdf, html, other]: Title: TF-MoE: Time-Frequency Mixture-of-Experts for Efficient Speech Separation

Qinzhe Hu, Chenda Li, Wangyou Zhang, Shujie Liu, Yan Lu, Yanmin Qian

Comments: Accepted to Interspeech 2026

Subjects: Sound (cs.SD); Artificial Intelligence (cs.AI)
[35] arXiv:2606.29544 [pdf, html, other]: Title: Proteus: Automated Adversarial Robustness Testing for Audio Deepfake Detectors

Nicolas M. Müller, Aditya Tirumala Bukkapatnam, Zohaib Ahmed

Subjects: Sound (cs.SD); Artificial Intelligence (cs.AI)
[36] arXiv:2606.29497 [pdf, html, other]: Title: Position-Aware Target Speaker Extraction for Long-Form Multi-Party Conversations: A Diarization-Free Framework for ASR

Yichi Wang, Junzhe Chen, Wangjin Zhou, Tatsuya Kawahara

Comments: 5 pages, 2 figures, Accept by Interspeech 2026

Subjects: Sound (cs.SD); Multimedia (cs.MM)
[37] arXiv:2606.28988 [pdf, html, other]: Title: Underwater Source Detection and Classification for Signal-based Surveillance: Audio Dataset Curation and Cross-Domain Evaluation

Quoc Thinh Vo, David K. Han

Comments: 6 pages, 4 figures. Accepted to the 2026 International Conference on Advanced Visual and Signal-Based Systems (AVSS) - Lecce, Italy

Subjects: Sound (cs.SD); Audio and Speech Processing (eess.AS); Signal Processing (eess.SP)
[38] arXiv:2606.28953 [pdf, html, other]: Title: Clustering Unsupervised Representations as Defense against Poisoning Attacks on Speech Commands Classification System

Thomas Thebaud, Sonal Joshi, Henry Li, Martin Sustek, Jesus Villalba, Sanjeev Khudanpur, Najim Dehak

Comments: published in ASRU 2025

Subjects: Sound (cs.SD); Artificial Intelligence (cs.AI); Computation and Language (cs.CL)
[39] arXiv:2606.28857 [pdf, html, other]: Title: wav2VOT: Automatic estimation of voice onset time, closure duration, and burst realisation with wav2vec2

James Tanner, Morgan Sonderegger, Jane Stuart-Smith, Tyler Kendall, Jeff Mielke

Comments: Accepted for Interspeech 2026. 6 pages, 4 figures

Subjects: Sound (cs.SD); Computation and Language (cs.CL)
[40] arXiv:2606.28445 [pdf, html, other]: Title: LoRA-Tuned Large Language Models for Dementia Detection via Multi-View Speech-Derived Features

Jonghyeon Park, Olivier Jiyoun Jung, Myungwoo Oh

Comments: Accepted at INTERSPEECH 2026

Subjects: Sound (cs.SD); Artificial Intelligence (cs.AI); Computation and Language (cs.CL); Machine Learning (cs.LG)
[41] arXiv:2606.30580 (cross-list from eess.AS) [pdf, html, other]: Title: MeloDISinger: Melody-Aware & Duration-Preserving Singing Voice Editing with Audio Infilling

Yoonjeong Park, Jaekwon Im, Juhan Nam

Comments: Accepted to Interspeech 2026

Subjects: Audio and Speech Processing (eess.AS); Sound (cs.SD)
[42] arXiv:2606.30356 (cross-list from cs.CL) [pdf, html, other]: Title: OLIVE: View-Augmented Latent Prediction with Waveform Reconstruction for Speech SSL

Karl El Hajal, Mathew Magimai.-Doss

Subjects: Computation and Language (cs.CL); Machine Learning (cs.LG); Sound (cs.SD); Audio and Speech Processing (eess.AS)
[43] arXiv:2606.30001 (cross-list from cs.CV) [pdf, html, other]: Title: SICAGE: Speaker-Independent Culture-Aware Gesture Generation using TED4C-L Dataset

Ariel Gjaci, Antonio Sgorbissa, Vittorio Murino

Comments: Accepted at ECCV 2026

Subjects: Computer Vision and Pattern Recognition (cs.CV); Graphics (cs.GR); Human-Computer Interaction (cs.HC); Sound (cs.SD)
[44] arXiv:2606.29632 (cross-list from eess.AS) [pdf, html, other]: Title: VIB-AVSR: Variational Information Bottleneck for Noise-Robust LLM-Based Audio-Visual Speech Recognition

Piyush Arora, Navlika Singh, Umberto Cappellazzo, Stavros Petridis, Maja Pantic

Comments: Accepted to INTERSPEECH 2026. Our code is available at this https URL

Subjects: Audio and Speech Processing (eess.AS); Computer Vision and Pattern Recognition (cs.CV); Sound (cs.SD)
[45] arXiv:2606.29480 (cross-list from eess.AS) [pdf, html, other]: Title: DTM-Codec: Dynamic Token Masking for VFR Speech Coding with Efficient Boundary Selection

Hoyeol Sohn, Juhan Nam

Comments: 10 pages, 2 figures, accepted to INTERSPEECH 2026

Subjects: Audio and Speech Processing (eess.AS); Sound (cs.SD)
[46] arXiv:2606.29335 (cross-list from cs.LG) [pdf, html, other]: Title: AMR: Adaptive Modality Routing for Multimodal Polyglot Speaker Identification

Chuxiao Zuo, Yao Zhu, Minqiang Xu, Manhong Wang, Yunke Zhang, Fei Huang

Subjects: Machine Learning (cs.LG); Artificial Intelligence (cs.AI); Sound (cs.SD)
[47] arXiv:2606.29071 (cross-list from physics.med-ph) [pdf, html, other]: Title: An Optimal Contact-Mechanically Consistent and Flow-Separation Adapted Modeling of Vocal Fold Dynamics

Sardar Nafis Bin Ali, Maryam Naghibolhosseini, Mohsen Zayernouri

Comments: 30 pages, 9 figures

Subjects: Medical Physics (physics.med-ph); Sound (cs.SD); Audio and Speech Processing (eess.AS)

[48] arXiv:2606.28048 [pdf, html, other]: Title: DG^VoiC: Speaker Clustering for Fraud Investigation under Real Call-Centre Conditions

Muhammad Shakeel Akram, Amal Htait, Abdul Hamid Sadka, Emma Meisingseth, Karishma Jaitly

Comments: 5 pages, 4 figures, 1 table

Subjects: Sound (cs.SD); Artificial Intelligence (cs.AI); Computation and Language (cs.CL); Audio and Speech Processing (eess.AS)
[49] arXiv:2606.28032 [pdf, other]: Title: A Flexible Encoding Model for Non-Unique Note Alignments

Suhit Chiruthapudi, Adam Štefunko, Silvan Peter, Patricia Hu, Jan Hajič jr., Carlos Eduardo Cancino-Chacón

Comments: Published at the Music Encoding Conference (MEC), 2026

Subjects: Sound (cs.SD); Audio and Speech Processing (eess.AS)
[50] arXiv:2606.27965 [pdf, html, other]: Title: Grammar-Guided Hierarchical Parsing for Long-form Audio Activity Recognition

Peng Zhang, Qingyu Luo, Philip J.B. Jackson, Wenwu Wang

Comments: Accepted to Interspeech 2026

Subjects: Sound (cs.SD); Audio and Speech Processing (eess.AS)

Total of 67 entries : 1-50 51-67

Showing up to 50 entries per page: fewer | more | all

Sound

Authors and titles for recent submissions

Thu, 2 Jul 2026 (showing 8 of 8 entries )

Wed, 1 Jul 2026 (showing 20 of 20 entries )

2026年6月30日 (showing 19 of 19 entries )

2026年6月29日 (showing first 3 of 10 entries )