Skip to main content
arXiv is now an independent nonprofit! Learn more
archive

Sound

Authors and titles for recent submissions

See today's new changes

Total of 67 entries : 1-50 51-67
Showing up to 50 entries per page: fewer | more | all

Thu, 2 Jul 2026 (showing 8 of 8 entries )

[1] arXiv:2607.01108 [pdf, html, other]
Title: NPUsper: Eliminating Redundant Computation for Real-Time Whisper on Mobile NPUs
Subjects: Sound (cs.SD)
[2] arXiv:2607.00946 [pdf, html, other]
Title: A Geometric Perspective on Composable Emotion Steering in Text-to-Speech Models
Subjects: Sound (cs.SD); Machine Learning (cs.LG)
[3] arXiv:2607.00777 [pdf, html, other]
Title: Evaluating Pretrained Music Embeddings for Cross-Performance Jazz Standard Recognition
Comments: 6 pages, 2 figures, 4 tables. Accepted to the ICML 2026 Workshop on Machine Learning for Audio
Subjects: Sound (cs.SD); Machine Learning (cs.LG)
[4] arXiv:2607.00363 [pdf, html, other]
Title: Enhancing Flow Matching with A Unified Guidance Framework for Efficient and Robust Speech Synthesis
Comments: Accepted to INTERSPEECH 2026
Subjects: Sound (cs.SD); Artificial Intelligence (cs.AI)
[5] arXiv:2607.00309 [pdf, html, other]
Title: A Text-Steerable Instrument for Sketching Procedural Soundscapes via Language Models
Prabal Gupta (Rama Labs, Kitchener, Canada)
Comments: 10 pages, 7 figures, 2 tables. Accepted to the International Conference on New Interfaces for Musical Expression (NIME 2026), London, UK. Supplementary material included as an appendix. Code and demo: this https URL
Subjects: Sound (cs.SD); Computation and Language (cs.CL); Human-Computer Interaction (cs.HC)
[6] arXiv:2607.00247 [pdf, html, other]
Title: Adaptive Perturbation Selection for Contrastive Audio Decoding
Comments: In submission
Subjects: Sound (cs.SD); Artificial Intelligence (cs.AI)
[7] arXiv:2607.00726 (cross-list from cs.CV) [pdf, html, other]
Title: AV-SyncBench: Decoupled Benchmarking of Temporal and Semantic Audio-Visual Synchronization
Comments: Accepted by Interspeech 2026
Subjects: Computer Vision and Pattern Recognition (cs.CV); Sound (cs.SD)
[8] arXiv:2607.00418 (cross-list from cs.CL) [pdf, html, other]
Title: Speech Playground: An Interactive Tool for Speech Analysis and Comparison
Comments: Accepted to Interspeech 2026 (Show and Tell); 2 pages, 3 figures
Subjects: Computation and Language (cs.CL); Sound (cs.SD); Audio and Speech Processing (eess.AS)

Wed, 1 Jul 2026 (showing 20 of 20 entries )

[9] arXiv:2606.31595 [pdf, other]
Title: Dilemmadata: On the Interoperability of Heterogeneous Roman Numeral Datasets
Comments: in proceedings of the Music Encoding Conference 2026
Subjects: Sound (cs.SD); Digital Libraries (cs.DL); Audio and Speech Processing (eess.AS)
[10] arXiv:2606.31587 [pdf, html, other]
Title: ZEBRA: Zero-Shot Entropy-Regularized Prompt Learning for Base-to-Novel Generalization in Audio-Language Models
Comments: Accepted in InterSpeech 2026
Subjects: Sound (cs.SD); Artificial Intelligence (cs.AI)
[11] arXiv:2606.31338 [pdf, html, other]
Title: Beyond Binary Instrument QA: Probing Instrument Grounding in Music Audio-Language Models
Comments: Workshop on Machine Learning for Audio, ICML 2026
Subjects: Sound (cs.SD); Artificial Intelligence (cs.AI); Audio and Speech Processing (eess.AS)
[12] arXiv:2606.31259 [pdf, html, other]
Title: SwiftAudio: Data-Efficient Caption-Only Distillation for One-Step Text-to-Audio Diffusion-based Generation
Comments: Under review
Subjects: Sound (cs.SD); Artificial Intelligence (cs.AI); Multimedia (cs.MM); Audio and Speech Processing (eess.AS)
[13] arXiv:2606.31247 [pdf, html, other]
Title: FlexiSLM: A Dynamic and Controllable Frame Rate Spoken Language Model
Comments: Preprint, under review
Subjects: Sound (cs.SD); Audio and Speech Processing (eess.AS)
[14] arXiv:2606.31128 [pdf, html, other]
Title: UniSAE: Unified Speech Attribute Editing on Speaker, Emotion and Low-Level Content via Discrete Phonetic Posteriorgram Modelling
Subjects: Sound (cs.SD); Artificial Intelligence (cs.AI); Computation and Language (cs.CL); Audio and Speech Processing (eess.AS)
[15] arXiv:2606.31105 [pdf, html, other]
Title: Attacking UTMOS: Probing the Robustness of a Speech Quality Assessment Model
Comments: Preprint. Audio samples: this https URL
Subjects: Sound (cs.SD); Audio and Speech Processing (eess.AS)
[16] arXiv:2606.30791 [pdf, html, other]
Title: Probing-Guided Layer Selection from Self-Supervised Speech Models for Generalizable Audio Deepfake Detection
Comments: Submitted to Computer Speech & Language
Subjects: Sound (cs.SD); Audio and Speech Processing (eess.AS)
[17] arXiv:2606.30700 [pdf, other]
Title: BEST-RQ-2: Contextualize-Then-Predict, a Two-Step Approach for Self-Supervised Audio Representations
Ludovic K. Tuncay (IRIT-SAMoVA), Etienne Labbé (IRIT-SAMoVA), Thomas Pellegrini (IRIT-SAMoVA)
Journal-ref: Interspeech 2026, Sep 2026, Sydney, Australia
Subjects: Sound (cs.SD); Artificial Intelligence (cs.AI); Machine Learning (cs.LG); Audio and Speech Processing (eess.AS); Signal Processing (eess.SP)
[18] arXiv:2606.30682 [pdf, html, other]
Title: ALM2Vec: Learning Audio Embeddings for Universal Audio Retrieval with Large Audio-Language Models
Comments: 7 pages, 3 figures
Subjects: Sound (cs.SD); Artificial Intelligence (cs.AI); Audio and Speech Processing (eess.AS)
[19] arXiv:2606.30671 [pdf, html, other]
Title: Enhancing BEST-RQ Pseudo-Label Quality through Online Refinement for Automatic Speech Recognition
Comments: Accepted at Interspeech 2026
Subjects: Sound (cs.SD); Audio and Speech Processing (eess.AS)
[20] arXiv:2606.30646 [pdf, html, other]
Title: ASR-Agnostic Multimodal Spectrotemporal Modeling for Early Dementia Detection
Subjects: Sound (cs.SD); Artificial Intelligence (cs.AI); Computation and Language (cs.CL); Audio and Speech Processing (eess.AS)
[21] arXiv:2606.31552 (cross-list from eess.AS) [pdf, html, other]
Title: Improving multichannel speech enhancement through accurate room-acoustic simulations
Comments: Accepted for publication at Interspeech
Subjects: Audio and Speech Processing (eess.AS); Artificial Intelligence (cs.AI); Machine Learning (cs.LG); Sound (cs.SD)
[22] arXiv:2606.31527 (cross-list from eess.AS) [pdf, html, other]
Title: How Bilingual Are SSL Speech Models? Cross-Lingual Probing of Articulatory Encoding with Finnish and Russian EMA
Comments: Interspeech 2026
Subjects: Audio and Speech Processing (eess.AS); Sound (cs.SD)
[23] arXiv:2606.31508 (cross-list from cs.CL) [pdf, other]
Title: Building an ASR Solution for Training and Assessing Children's Reading
Comments: 5 pages, 2 figures
Subjects: Computation and Language (cs.CL); Sound (cs.SD)
[24] arXiv:2606.31365 (cross-list from eess.AS) [pdf, html, other]
Title: Beyond Cross-Reconstruction: Probing-Based Disentanglement Evaluation for Acoustic Teleportation Codecs
Comments: Accepted for Interspeech 2026
Subjects: Audio and Speech Processing (eess.AS); Sound (cs.SD)
[25] arXiv:2606.31055 (cross-list from cs.CL) [pdf, html, other]
Title: Reference-Based Prosody and Rhythm Evaluation for Spoken Dialogue Systems
Subjects: Computation and Language (cs.CL); Sound (cs.SD); Audio and Speech Processing (eess.AS)
[26] arXiv:2606.30944 (cross-list from eess.AS) [pdf, html, other]
Title: Preserving Speech-to-Text LLM Capabilities in Speech-to-Speech Generation
Subjects: Audio and Speech Processing (eess.AS); Sound (cs.SD)
[27] arXiv:2606.30849 (cross-list from cs.CV) [pdf, html, other]
Title: SyncCache: Exploiting Asymmetric Dynamics for Fast Audio-Driven Portrait Animation
Comments: ECCV 2026
Subjects: Computer Vision and Pattern Recognition (cs.CV); Sound (cs.SD); Audio and Speech Processing (eess.AS)
[28] arXiv:2606.30811 (cross-list from cs.CV) [pdf, html, other]
Title: AVTok: 1D Unified Tokenization for Holistic Audio-Video Generation
Comments: ECCV 2026
Subjects: Computer Vision and Pattern Recognition (cs.CV); Multimedia (cs.MM); Sound (cs.SD); Audio and Speech Processing (eess.AS)

2026年6月30日 (showing 19 of 19 entries )

[29] arXiv:2606.30642 [pdf, html, other]
Title: LeVo 2: Stable and Melodious Song Generation via Hierarchical Representation Modeling and Progressive Post-Training
Subjects: Sound (cs.SD); Artificial Intelligence (cs.AI)
[30] arXiv:2606.30550 [pdf, html, other]
Title: SIGMA: Saliency-Guided Sparse Mask Attacks for Speech Emotion Recognition
Comments: Under review
Subjects: Sound (cs.SD)
[31] arXiv:2606.30369 [pdf, html, other]
Title: Predicting Timbre Traits for Interpretable Assessment of Musical Sound Synthesizers
Subjects: Sound (cs.SD)
[32] arXiv:2606.29897 [pdf, html, other]
Title: Child-Centric Voice Anonymization in Single and Multi-Speaker Speech via Domain-Adapted SSL Models
Comments: accepted by INTERSPEECH2026
Subjects: Sound (cs.SD); Artificial Intelligence (cs.AI); Signal Processing (eess.SP)
[33] arXiv:2606.29589 [pdf, html, other]
Title: EchoHawk: A Reproducible Acoustic Pipeline for Drone Detection, Classification, and Direction-Finding, with a Cautionary Study of Session-Level Data Leakage
Subjects: Sound (cs.SD); Applied Physics (physics.app-ph)
[34] arXiv:2606.29575 [pdf, html, other]
Title: TF-MoE: Time-Frequency Mixture-of-Experts for Efficient Speech Separation
Comments: Accepted to Interspeech 2026
Subjects: Sound (cs.SD); Artificial Intelligence (cs.AI)
[35] arXiv:2606.29544 [pdf, html, other]
Title: Proteus: Automated Adversarial Robustness Testing for Audio Deepfake Detectors
Subjects: Sound (cs.SD); Artificial Intelligence (cs.AI)
[36] arXiv:2606.29497 [pdf, html, other]
Title: Position-Aware Target Speaker Extraction for Long-Form Multi-Party Conversations: A Diarization-Free Framework for ASR
Comments: 5 pages, 2 figures, Accept by Interspeech 2026
Subjects: Sound (cs.SD); Multimedia (cs.MM)
[37] arXiv:2606.28988 [pdf, html, other]
Title: Underwater Source Detection and Classification for Signal-based Surveillance: Audio Dataset Curation and Cross-Domain Evaluation
Comments: 6 pages, 4 figures. Accepted to the 2026 International Conference on Advanced Visual and Signal-Based Systems (AVSS) - Lecce, Italy
Subjects: Sound (cs.SD); Audio and Speech Processing (eess.AS); Signal Processing (eess.SP)
[38] arXiv:2606.28953 [pdf, html, other]
Title: Clustering Unsupervised Representations as Defense against Poisoning Attacks on Speech Commands Classification System
Comments: published in ASRU 2025
Subjects: Sound (cs.SD); Artificial Intelligence (cs.AI); Computation and Language (cs.CL)
[39] arXiv:2606.28857 [pdf, html, other]
Title: wav2VOT: Automatic estimation of voice onset time, closure duration, and burst realisation with wav2vec2
Comments: Accepted for Interspeech 2026. 6 pages, 4 figures
Subjects: Sound (cs.SD); Computation and Language (cs.CL)
[40] arXiv:2606.28445 [pdf, html, other]
Title: LoRA-Tuned Large Language Models for Dementia Detection via Multi-View Speech-Derived Features
Comments: Accepted at INTERSPEECH 2026
Subjects: Sound (cs.SD); Artificial Intelligence (cs.AI); Computation and Language (cs.CL); Machine Learning (cs.LG)
[41] arXiv:2606.30580 (cross-list from eess.AS) [pdf, html, other]
Title: MeloDISinger: Melody-Aware & Duration-Preserving Singing Voice Editing with Audio Infilling
Comments: Accepted to Interspeech 2026
Subjects: Audio and Speech Processing (eess.AS); Sound (cs.SD)
[42] arXiv:2606.30356 (cross-list from cs.CL) [pdf, html, other]
Title: OLIVE: View-Augmented Latent Prediction with Waveform Reconstruction for Speech SSL
Subjects: Computation and Language (cs.CL); Machine Learning (cs.LG); Sound (cs.SD); Audio and Speech Processing (eess.AS)
[43] arXiv:2606.30001 (cross-list from cs.CV) [pdf, html, other]
Title: SICAGE: Speaker-Independent Culture-Aware Gesture Generation using TED4C-L Dataset
Comments: Accepted at ECCV 2026
Subjects: Computer Vision and Pattern Recognition (cs.CV); Graphics (cs.GR); Human-Computer Interaction (cs.HC); Sound (cs.SD)
[44] arXiv:2606.29632 (cross-list from eess.AS) [pdf, html, other]
Title: VIB-AVSR: Variational Information Bottleneck for Noise-Robust LLM-Based Audio-Visual Speech Recognition
Comments: Accepted to INTERSPEECH 2026. Our code is available at this https URL
Subjects: Audio and Speech Processing (eess.AS); Computer Vision and Pattern Recognition (cs.CV); Sound (cs.SD)
[45] arXiv:2606.29480 (cross-list from eess.AS) [pdf, html, other]
Title: DTM-Codec: Dynamic Token Masking for VFR Speech Coding with Efficient Boundary Selection
Comments: 10 pages, 2 figures, accepted to INTERSPEECH 2026
Subjects: Audio and Speech Processing (eess.AS); Sound (cs.SD)
[46] arXiv:2606.29335 (cross-list from cs.LG) [pdf, html, other]
Title: AMR: Adaptive Modality Routing for Multimodal Polyglot Speaker Identification
Subjects: Machine Learning (cs.LG); Artificial Intelligence (cs.AI); Sound (cs.SD)
[47] arXiv:2606.29071 (cross-list from physics.med-ph) [pdf, html, other]
Title: An Optimal Contact-Mechanically Consistent and Flow-Separation Adapted Modeling of Vocal Fold Dynamics
Comments: 30 pages, 9 figures
Subjects: Medical Physics (physics.med-ph); Sound (cs.SD); Audio and Speech Processing (eess.AS)

2026年6月29日 (showing first 3 of 10 entries )

[48] arXiv:2606.28048 [pdf, html, other]
Title: DG^VoiC: Speaker Clustering for Fraud Investigation under Real Call-Centre Conditions
Comments: 5 pages, 4 figures, 1 table
Subjects: Sound (cs.SD); Artificial Intelligence (cs.AI); Computation and Language (cs.CL); Audio and Speech Processing (eess.AS)
[49] arXiv:2606.28032 [pdf, other]
Title: A Flexible Encoding Model for Non-Unique Note Alignments
Comments: Published at the Music Encoding Conference (MEC), 2026
Subjects: Sound (cs.SD); Audio and Speech Processing (eess.AS)
[50] arXiv:2606.27965 [pdf, html, other]
Title: Grammar-Guided Hierarchical Parsing for Long-form Audio Activity Recognition
Comments: Accepted to Interspeech 2026
Subjects: Sound (cs.SD); Audio and Speech Processing (eess.AS)
Total of 67 entries : 1-50 51-67
Showing up to 50 entries per page: fewer | more | all

AltStyle によって変換されたページ (->オリジナル) /