Sound
Authors and titles for recent submissions
See today's new changes
Total of 67 entries : 1-50
51-67
- [1] arXiv:2607.01108 [pdf, html, other]
-
Title: NPUsper: Eliminating Redundant Computation for Real-Time Whisper on Mobile NPUsSubjects: Sound (cs.SD)
- [2] arXiv:2607.00946 [pdf, html, other]
-
Title: A Geometric Perspective on Composable Emotion Steering in Text-to-Speech ModelsSubjects: Sound (cs.SD); Machine Learning (cs.LG)
- [3] arXiv:2607.00777 [pdf, html, other]
-
Title: Evaluating Pretrained Music Embeddings for Cross-Performance Jazz Standard RecognitionComments: 6 pages, 2 figures, 4 tables. Accepted to the ICML 2026 Workshop on Machine Learning for AudioSubjects: Sound (cs.SD); Machine Learning (cs.LG)
- [4] arXiv:2607.00363 [pdf, html, other]
-
Title: Enhancing Flow Matching with A Unified Guidance Framework for Efficient and Robust Speech SynthesisComments: Accepted to INTERSPEECH 2026Subjects: Sound (cs.SD); Artificial Intelligence (cs.AI)
- [5] arXiv:2607.00309 [pdf, html, other]
-
Title: A Text-Steerable Instrument for Sketching Procedural Soundscapes via Language ModelsPrabal Gupta (Rama Labs, Kitchener, Canada)Comments: 10 pages, 7 figures, 2 tables. Accepted to the International Conference on New Interfaces for Musical Expression (NIME 2026), London, UK. Supplementary material included as an appendix. Code and demo: this https URLSubjects: Sound (cs.SD); Computation and Language (cs.CL); Human-Computer Interaction (cs.HC)
- [6] arXiv:2607.00247 [pdf, html, other]
-
Title: Adaptive Perturbation Selection for Contrastive Audio DecodingComments: In submissionSubjects: Sound (cs.SD); Artificial Intelligence (cs.AI)
- [7] arXiv:2607.00726 (cross-list from cs.CV) [pdf, html, other]
-
Title: AV-SyncBench: Decoupled Benchmarking of Temporal and Semantic Audio-Visual SynchronizationTianhong Zhou, Mingyang Han, Boyu Li, Yuxuan Jiang, Jiaxin Ye, Dongxiao Wang, Haoxiang Shi, Kunpeng Wang, Jun Song, Cheng Yu, Bo ZhengComments: Accepted by Interspeech 2026Subjects: Computer Vision and Pattern Recognition (cs.CV); Sound (cs.SD)
- [8] arXiv:2607.00418 (cross-list from cs.CL) [pdf, html, other]
-
Title: Speech Playground: An Interactive Tool for Speech Analysis and ComparisonComments: Accepted to Interspeech 2026 (Show and Tell); 2 pages, 3 figuresSubjects: Computation and Language (cs.CL); Sound (cs.SD); Audio and Speech Processing (eess.AS)
Thu, 2 Jul 2026 (showing 8 of 8 entries )
- [9] arXiv:2606.31595 [pdf, other]
-
Title: Dilemmadata: On the Interoperability of Heterogeneous Roman Numeral DatasetsComments: in proceedings of the Music Encoding Conference 2026Subjects: Sound (cs.SD); Digital Libraries (cs.DL); Audio and Speech Processing (eess.AS)
- [10] arXiv:2606.31587 [pdf, html, other]
-
Title: ZEBRA: Zero-Shot Entropy-Regularized Prompt Learning for Base-to-Novel Generalization in Audio-Language ModelsComments: Accepted in InterSpeech 2026Subjects: Sound (cs.SD); Artificial Intelligence (cs.AI)
- [11] arXiv:2606.31338 [pdf, html, other]
-
Title: Beyond Binary Instrument QA: Probing Instrument Grounding in Music Audio-Language ModelsComments: Workshop on Machine Learning for Audio, ICML 2026Subjects: Sound (cs.SD); Artificial Intelligence (cs.AI); Audio and Speech Processing (eess.AS)
- [12] arXiv:2606.31259 [pdf, html, other]
-
Title: SwiftAudio: Data-Efficient Caption-Only Distillation for One-Step Text-to-Audio Diffusion-based GenerationComments: Under reviewSubjects: Sound (cs.SD); Artificial Intelligence (cs.AI); Multimedia (cs.MM); Audio and Speech Processing (eess.AS)
- [13] arXiv:2606.31247 [pdf, html, other]
-
Title: FlexiSLM: A Dynamic and Controllable Frame Rate Spoken Language ModelJiaqi Li, Chaoren Wang, Xiaohai Tian, Mingjie Chen, Xinyu Liang, Xu Li, Yufan Lin, Junwen Qiu, Jun Zhang, Lu Lu, Haizhou Li, Zhizheng WuComments: Preprint, under reviewSubjects: Sound (cs.SD); Audio and Speech Processing (eess.AS)
- [14] arXiv:2606.31128 [pdf, html, other]
-
Title: UniSAE: Unified Speech Attribute Editing on Speaker, Emotion and Low-Level Content via Discrete Phonetic Posteriorgram ModellingSubjects: Sound (cs.SD); Artificial Intelligence (cs.AI); Computation and Language (cs.CL); Audio and Speech Processing (eess.AS)
- [15] arXiv:2606.31105 [pdf, html, other]
-
Title: Attacking UTMOS: Probing the Robustness of a Speech Quality Assessment ModelComments: Preprint. Audio samples: this https URLSubjects: Sound (cs.SD); Audio and Speech Processing (eess.AS)
- [16] arXiv:2606.30791 [pdf, html, other]
-
Title: Probing-Guided Layer Selection from Self-Supervised Speech Models for Generalizable Audio Deepfake DetectionComments: Submitted to Computer Speech & LanguageSubjects: Sound (cs.SD); Audio and Speech Processing (eess.AS)
- [17] arXiv:2606.30700 [pdf, other]
-
Title: BEST-RQ-2: Contextualize-Then-Predict, a Two-Step Approach for Self-Supervised Audio RepresentationsJournal-ref: Interspeech 2026, Sep 2026, Sydney, AustraliaSubjects: Sound (cs.SD); Artificial Intelligence (cs.AI); Machine Learning (cs.LG); Audio and Speech Processing (eess.AS); Signal Processing (eess.SP)
- [18] arXiv:2606.30682 [pdf, html, other]
-
Title: ALM2Vec: Learning Audio Embeddings for Universal Audio Retrieval with Large Audio-Language ModelsComments: 7 pages, 3 figuresSubjects: Sound (cs.SD); Artificial Intelligence (cs.AI); Audio and Speech Processing (eess.AS)
- [19] arXiv:2606.30671 [pdf, html, other]
-
Title: Enhancing BEST-RQ Pseudo-Label Quality through Online Refinement for Automatic Speech RecognitionComments: Accepted at Interspeech 2026Subjects: Sound (cs.SD); Audio and Speech Processing (eess.AS)
- [20] arXiv:2606.30646 [pdf, html, other]
-
Title: ASR-Agnostic Multimodal Spectrotemporal Modeling for Early Dementia DetectionSubjects: Sound (cs.SD); Artificial Intelligence (cs.AI); Computation and Language (cs.CL); Audio and Speech Processing (eess.AS)
- [21] arXiv:2606.31552 (cross-list from eess.AS) [pdf, html, other]
-
Title: Improving multichannel speech enhancement through accurate room-acoustic simulationsComments: Accepted for publication at InterspeechSubjects: Audio and Speech Processing (eess.AS); Artificial Intelligence (cs.AI); Machine Learning (cs.LG); Sound (cs.SD)
- [22] arXiv:2606.31527 (cross-list from eess.AS) [pdf, html, other]
-
Title: How Bilingual Are SSL Speech Models? Cross-Lingual Probing of Articulatory Encoding with Finnish and Russian EMAComments: Interspeech 2026Subjects: Audio and Speech Processing (eess.AS); Sound (cs.SD)
- [23] arXiv:2606.31508 (cross-list from cs.CL) [pdf, other]
-
Title: Building an ASR Solution for Training and Assessing Children's ReadingComments: 5 pages, 2 figuresSubjects: Computation and Language (cs.CL); Sound (cs.SD)
- [24] arXiv:2606.31365 (cross-list from eess.AS) [pdf, html, other]
-
Title: Beyond Cross-Reconstruction: Probing-Based Disentanglement Evaluation for Acoustic Teleportation CodecsComments: Accepted for Interspeech 2026Subjects: Audio and Speech Processing (eess.AS); Sound (cs.SD)
- [25] arXiv:2606.31055 (cross-list from cs.CL) [pdf, html, other]
-
Title: Reference-Based Prosody and Rhythm Evaluation for Spoken Dialogue SystemsSubjects: Computation and Language (cs.CL); Sound (cs.SD); Audio and Speech Processing (eess.AS)
- [26] arXiv:2606.30944 (cross-list from eess.AS) [pdf, html, other]
-
Title: Preserving Speech-to-Text LLM Capabilities in Speech-to-Speech GenerationYuxuan Hu, Heng Lu, Ruchao Fan, Yao Qian, Xiaofei Wang, Jian Xue, Heming Wang, Shuohang Wang, Young Jin Kim, Yelong Shen, Jinyu LiSubjects: Audio and Speech Processing (eess.AS); Sound (cs.SD)
- [27] arXiv:2606.30849 (cross-list from cs.CV) [pdf, html, other]
-
Title: SyncCache: Exploiting Asymmetric Dynamics for Fast Audio-Driven Portrait AnimationJuncheng Ma, Yuxuan Du, Yanan Sun, Zhening Xing, Changlin Li, Zhenyu Tang, Bo Li, Peng-Tao Jiang, Li Yuan, Daquan Zhou, Yonghong TianComments: ECCV 2026Subjects: Computer Vision and Pattern Recognition (cs.CV); Sound (cs.SD); Audio and Speech Processing (eess.AS)
- [28] arXiv:2606.30811 (cross-list from cs.CV) [pdf, html, other]
-
Title: AVTok: 1D Unified Tokenization for Holistic Audio-Video GenerationComments: ECCV 2026Subjects: Computer Vision and Pattern Recognition (cs.CV); Multimedia (cs.MM); Sound (cs.SD); Audio and Speech Processing (eess.AS)
Wed, 1 Jul 2026 (showing 20 of 20 entries )
- [29] arXiv:2606.30642 [pdf, html, other]
-
Title: LeVo 2: Stable and Melodious Song Generation via Hierarchical Representation Modeling and Progressive Post-TrainingShun Lei, Huaicheng Zhang, Dapeng Wu, Yaoxun Xu, Lishi Zuo, Wei Tan, Hangting Chen, Guangzheng Li, Jianwei Yu, Zhiyong Wu, Dong YuSubjects: Sound (cs.SD); Artificial Intelligence (cs.AI)
- [30] arXiv:2606.30550 [pdf, html, other]
-
Title: SIGMA: Saliency-Guided Sparse Mask Attacks for Speech Emotion RecognitionComments: Under reviewSubjects: Sound (cs.SD)
- [31] arXiv:2606.30369 [pdf, html, other]
-
Title: Predicting Timbre Traits for Interpretable Assessment of Musical Sound SynthesizersSubjects: Sound (cs.SD)
- [32] arXiv:2606.29897 [pdf, html, other]
-
Title: Child-Centric Voice Anonymization in Single and Multi-Speaker Speech via Domain-Adapted SSL ModelsComments: accepted by INTERSPEECH2026Subjects: Sound (cs.SD); Artificial Intelligence (cs.AI); Signal Processing (eess.SP)
- [33] arXiv:2606.29589 [pdf, html, other]
-
Title: EchoHawk: A Reproducible Acoustic Pipeline for Drone Detection, Classification, and Direction-Finding, with a Cautionary Study of Session-Level Data LeakageSubjects: Sound (cs.SD); Applied Physics (physics.app-ph)
- [34] arXiv:2606.29575 [pdf, html, other]
-
Title: TF-MoE: Time-Frequency Mixture-of-Experts for Efficient Speech SeparationComments: Accepted to Interspeech 2026Subjects: Sound (cs.SD); Artificial Intelligence (cs.AI)
- [35] arXiv:2606.29544 [pdf, html, other]
-
Title: Proteus: Automated Adversarial Robustness Testing for Audio Deepfake DetectorsSubjects: Sound (cs.SD); Artificial Intelligence (cs.AI)
- [36] arXiv:2606.29497 [pdf, html, other]
-
Title: Position-Aware Target Speaker Extraction for Long-Form Multi-Party Conversations: A Diarization-Free Framework for ASRComments: 5 pages, 2 figures, Accept by Interspeech 2026Subjects: Sound (cs.SD); Multimedia (cs.MM)
- [37] arXiv:2606.28988 [pdf, html, other]
-
Title: Underwater Source Detection and Classification for Signal-based Surveillance: Audio Dataset Curation and Cross-Domain EvaluationComments: 6 pages, 4 figures. Accepted to the 2026 International Conference on Advanced Visual and Signal-Based Systems (AVSS) - Lecce, ItalySubjects: Sound (cs.SD); Audio and Speech Processing (eess.AS); Signal Processing (eess.SP)
- [38] arXiv:2606.28953 [pdf, html, other]
-
Title: Clustering Unsupervised Representations as Defense against Poisoning Attacks on Speech Commands Classification SystemThomas Thebaud, Sonal Joshi, Henry Li, Martin Sustek, Jesus Villalba, Sanjeev Khudanpur, Najim DehakComments: published in ASRU 2025Subjects: Sound (cs.SD); Artificial Intelligence (cs.AI); Computation and Language (cs.CL)
- [39] arXiv:2606.28857 [pdf, html, other]
-
Title: wav2VOT: Automatic estimation of voice onset time, closure duration, and burst realisation with wav2vec2Comments: Accepted for Interspeech 2026. 6 pages, 4 figuresSubjects: Sound (cs.SD); Computation and Language (cs.CL)
- [40] arXiv:2606.28445 [pdf, html, other]
-
Title: LoRA-Tuned Large Language Models for Dementia Detection via Multi-View Speech-Derived FeaturesComments: Accepted at INTERSPEECH 2026Subjects: Sound (cs.SD); Artificial Intelligence (cs.AI); Computation and Language (cs.CL); Machine Learning (cs.LG)
- [41] arXiv:2606.30580 (cross-list from eess.AS) [pdf, html, other]
-
Title: MeloDISinger: Melody-Aware & Duration-Preserving Singing Voice Editing with Audio InfillingComments: Accepted to Interspeech 2026Subjects: Audio and Speech Processing (eess.AS); Sound (cs.SD)
- [42] arXiv:2606.30356 (cross-list from cs.CL) [pdf, html, other]
-
Title: OLIVE: View-Augmented Latent Prediction with Waveform Reconstruction for Speech SSLSubjects: Computation and Language (cs.CL); Machine Learning (cs.LG); Sound (cs.SD); Audio and Speech Processing (eess.AS)
- [43] arXiv:2606.30001 (cross-list from cs.CV) [pdf, html, other]
-
Title: SICAGE: Speaker-Independent Culture-Aware Gesture Generation using TED4C-L DatasetComments: Accepted at ECCV 2026Subjects: Computer Vision and Pattern Recognition (cs.CV); Graphics (cs.GR); Human-Computer Interaction (cs.HC); Sound (cs.SD)
- [44] arXiv:2606.29632 (cross-list from eess.AS) [pdf, html, other]
-
Title: VIB-AVSR: Variational Information Bottleneck for Noise-Robust LLM-Based Audio-Visual Speech RecognitionComments: Accepted to INTERSPEECH 2026. Our code is available at this https URLSubjects: Audio and Speech Processing (eess.AS); Computer Vision and Pattern Recognition (cs.CV); Sound (cs.SD)
- [45] arXiv:2606.29480 (cross-list from eess.AS) [pdf, html, other]
-
Title: DTM-Codec: Dynamic Token Masking for VFR Speech Coding with Efficient Boundary SelectionComments: 10 pages, 2 figures, accepted to INTERSPEECH 2026Subjects: Audio and Speech Processing (eess.AS); Sound (cs.SD)
- [46] arXiv:2606.29335 (cross-list from cs.LG) [pdf, html, other]
-
Title: AMR: Adaptive Modality Routing for Multimodal Polyglot Speaker IdentificationSubjects: Machine Learning (cs.LG); Artificial Intelligence (cs.AI); Sound (cs.SD)
- [47] arXiv:2606.29071 (cross-list from physics.med-ph) [pdf, html, other]
-
Title: An Optimal Contact-Mechanically Consistent and Flow-Separation Adapted Modeling of Vocal Fold DynamicsComments: 30 pages, 9 figuresSubjects: Medical Physics (physics.med-ph); Sound (cs.SD); Audio and Speech Processing (eess.AS)
2026年6月30日 (showing 19 of 19 entries )
- [48] arXiv:2606.28048 [pdf, html, other]
-
Title: DG^VoiC: Speaker Clustering for Fraud Investigation under Real Call-Centre ConditionsComments: 5 pages, 4 figures, 1 tableSubjects: Sound (cs.SD); Artificial Intelligence (cs.AI); Computation and Language (cs.CL); Audio and Speech Processing (eess.AS)
- [49] arXiv:2606.28032 [pdf, other]
-
Title: A Flexible Encoding Model for Non-Unique Note AlignmentsSuhit Chiruthapudi, Adam Štefunko, Silvan Peter, Patricia Hu, Jan Hajič jr., Carlos Eduardo Cancino-ChacónComments: Published at the Music Encoding Conference (MEC), 2026Subjects: Sound (cs.SD); Audio and Speech Processing (eess.AS)
- [50] arXiv:2606.27965 [pdf, html, other]
-
Title: Grammar-Guided Hierarchical Parsing for Long-form Audio Activity RecognitionComments: Accepted to Interspeech 2026Subjects: Sound (cs.SD); Audio and Speech Processing (eess.AS)
2026年6月29日 (showing first 3 of 10 entries )
Total of 67 entries : 1-50
51-67