Skip to main content
arXiv is now an independent nonprofit! Learn more
archive

Audio and Speech Processing

Authors and titles for recent submissions

See today's new changes

Total of 59 entries : 1-50 51-59
Showing up to 50 entries per page: fewer | more | all

Thu, 2 Jul 2026 (showing 6 of 6 entries )

[1] arXiv:2607.01161 [pdf, html, other]
Title: Disentangling Speaker and Language Effects in Cross-Lingual Speaker Verification for Iberian Languages
Comments: 5 pages, 8 figures, Submitted to IberSPEECH 2026
Subjects: Audio and Speech Processing (eess.AS); Computation and Language (cs.CL)
[2] arXiv:2607.00899 [pdf, html, other]
Title: Positive-Incentive Noise Predictor for Adversarial Purification in Speaker Verification
Comments: Submitted to IEEE TASLP.13 pages for maunscript, 2 pages for supplementary material
Subjects: Audio and Speech Processing (eess.AS)
[3] arXiv:2607.00548 [pdf, html, other]
Title: AmbiDrop: Ambisonics-Based Array-Agnostic Neural Speech Enhancement
Comments: Submitted to IEEE Transactions on Audio, Speech, and Language Processing
Subjects: Audio and Speech Processing (eess.AS)
[4] arXiv:2607.00387 [pdf, html, other]
Title: From Objectives to Applications: Aligning Architectural Biases in Audio Self-Supervised Learning
Subjects: Audio and Speech Processing (eess.AS)
[5] arXiv:2607.00260 [pdf, html, other]
Title: Do Multimodal Large Language Models Need Reasoning to Classify Dementia from Speech?
Subjects: Audio and Speech Processing (eess.AS)
[6] arXiv:2607.00418 (cross-list from cs.CL) [pdf, html, other]
Title: Speech Playground: An Interactive Tool for Speech Analysis and Comparison
Comments: Accepted to Interspeech 2026 (Show and Tell); 2 pages, 3 figures
Subjects: Computation and Language (cs.CL); Sound (cs.SD); Audio and Speech Processing (eess.AS)

Wed, 1 Jul 2026 (showing 22 of 22 entries )

[7] arXiv:2606.31730 [pdf, html, other]
Title: A Fair and Transparent Framework for Speech-Based Depression Detection: Balancing Interpretability and Performance
Comments: 7 pages, 2 figures, 3 tables. This work has been submitted to the IEEE for possible publication
Subjects: Audio and Speech Processing (eess.AS)
[8] arXiv:2606.31729 [pdf, html, other]
Title: Is Natural Always Appropriate? Investigating Naturalness and Appropriateness Across Different Domains for TTS Evaluation
Comments: Accepted at Interspeech 26'
Subjects: Audio and Speech Processing (eess.AS); Machine Learning (cs.LG)
[9] arXiv:2606.31552 [pdf, html, other]
Title: Improving multichannel speech enhancement through accurate room-acoustic simulations
Comments: Accepted for publication at Interspeech
Subjects: Audio and Speech Processing (eess.AS); Artificial Intelligence (cs.AI); Machine Learning (cs.LG); Sound (cs.SD)
[10] arXiv:2606.31527 [pdf, html, other]
Title: How Bilingual Are SSL Speech Models? Cross-Lingual Probing of Articulatory Encoding with Finnish and Russian EMA
Comments: Interspeech 2026
Subjects: Audio and Speech Processing (eess.AS); Sound (cs.SD)
[11] arXiv:2606.31365 [pdf, html, other]
Title: Beyond Cross-Reconstruction: Probing-Based Disentanglement Evaluation for Acoustic Teleportation Codecs
Comments: Accepted for Interspeech 2026
Subjects: Audio and Speech Processing (eess.AS); Sound (cs.SD)
[12] arXiv:2606.30944 [pdf, html, other]
Title: Preserving Speech-to-Text LLM Capabilities in Speech-to-Speech Generation
Subjects: Audio and Speech Processing (eess.AS); Sound (cs.SD)
[13] arXiv:2606.30780 [pdf, html, other]
Title: Detecting Audio Deepfakes on the Edge:Lightweight SSL-Based Detection in a Browser Plugin
Subjects: Audio and Speech Processing (eess.AS); Artificial Intelligence (cs.AI)
[14] arXiv:2606.30675 [pdf, html, other]
Title: Listening Between the Lines: Joint Learning of ASR Embeddings and LLM-Augmented Linguistics for Dementia Detection
Comments: Accepted at INTERSPEECH 2026
Subjects: Audio and Speech Processing (eess.AS); Artificial Intelligence (cs.AI); Machine Learning (cs.LG); Quantitative Methods (q-bio.QM)
[15] arXiv:2606.31595 (cross-list from cs.SD) [pdf, other]
Title: Dilemmadata: On the Interoperability of Heterogeneous Roman Numeral Datasets
Comments: in proceedings of the Music Encoding Conference 2026
Subjects: Sound (cs.SD); Digital Libraries (cs.DL); Audio and Speech Processing (eess.AS)
[16] arXiv:2606.31338 (cross-list from cs.SD) [pdf, html, other]
Title: Beyond Binary Instrument QA: Probing Instrument Grounding in Music Audio-Language Models
Comments: Workshop on Machine Learning for Audio, ICML 2026
Subjects: Sound (cs.SD); Artificial Intelligence (cs.AI); Audio and Speech Processing (eess.AS)
[17] arXiv:2606.31259 (cross-list from cs.SD) [pdf, html, other]
Title: SwiftAudio: Data-Efficient Caption-Only Distillation for One-Step Text-to-Audio Diffusion-based Generation
Comments: Under review
Subjects: Sound (cs.SD); Artificial Intelligence (cs.AI); Multimedia (cs.MM); Audio and Speech Processing (eess.AS)
[18] arXiv:2606.31247 (cross-list from cs.SD) [pdf, html, other]
Title: FlexiSLM: A Dynamic and Controllable Frame Rate Spoken Language Model
Comments: Preprint, under review
Subjects: Sound (cs.SD); Audio and Speech Processing (eess.AS)
[19] arXiv:2606.31128 (cross-list from cs.SD) [pdf, html, other]
Title: UniSAE: Unified Speech Attribute Editing on Speaker, Emotion and Low-Level Content via Discrete Phonetic Posteriorgram Modelling
Subjects: Sound (cs.SD); Artificial Intelligence (cs.AI); Computation and Language (cs.CL); Audio and Speech Processing (eess.AS)
[20] arXiv:2606.31105 (cross-list from cs.SD) [pdf, html, other]
Title: Attacking UTMOS: Probing the Robustness of a Speech Quality Assessment Model
Comments: Preprint. Audio samples: this https URL
Subjects: Sound (cs.SD); Audio and Speech Processing (eess.AS)
[21] arXiv:2606.31055 (cross-list from cs.CL) [pdf, html, other]
Title: Reference-Based Prosody and Rhythm Evaluation for Spoken Dialogue Systems
Subjects: Computation and Language (cs.CL); Sound (cs.SD); Audio and Speech Processing (eess.AS)
[22] arXiv:2606.30849 (cross-list from cs.CV) [pdf, html, other]
Title: SyncCache: Exploiting Asymmetric Dynamics for Fast Audio-Driven Portrait Animation
Comments: ECCV 2026
Subjects: Computer Vision and Pattern Recognition (cs.CV); Sound (cs.SD); Audio and Speech Processing (eess.AS)
[23] arXiv:2606.30811 (cross-list from cs.CV) [pdf, html, other]
Title: AVTok: 1D Unified Tokenization for Holistic Audio-Video Generation
Comments: ECCV 2026
Subjects: Computer Vision and Pattern Recognition (cs.CV); Multimedia (cs.MM); Sound (cs.SD); Audio and Speech Processing (eess.AS)
[24] arXiv:2606.30791 (cross-list from cs.SD) [pdf, html, other]
Title: Probing-Guided Layer Selection from Self-Supervised Speech Models for Generalizable Audio Deepfake Detection
Comments: Submitted to Computer Speech & Language
Subjects: Sound (cs.SD); Audio and Speech Processing (eess.AS)
[25] arXiv:2606.30700 (cross-list from cs.SD) [pdf, other]
Title: BEST-RQ-2: Contextualize-Then-Predict, a Two-Step Approach for Self-Supervised Audio Representations
Ludovic K. Tuncay (IRIT-SAMoVA), Etienne Labbé (IRIT-SAMoVA), Thomas Pellegrini (IRIT-SAMoVA)
Journal-ref: Interspeech 2026, Sep 2026, Sydney, Australia
Subjects: Sound (cs.SD); Artificial Intelligence (cs.AI); Machine Learning (cs.LG); Audio and Speech Processing (eess.AS); Signal Processing (eess.SP)
[26] arXiv:2606.30682 (cross-list from cs.SD) [pdf, html, other]
Title: ALM2Vec: Learning Audio Embeddings for Universal Audio Retrieval with Large Audio-Language Models
Comments: 7 pages, 3 figures
Subjects: Sound (cs.SD); Artificial Intelligence (cs.AI); Audio and Speech Processing (eess.AS)
[27] arXiv:2606.30671 (cross-list from cs.SD) [pdf, html, other]
Title: Enhancing BEST-RQ Pseudo-Label Quality through Online Refinement for Automatic Speech Recognition
Comments: Accepted at Interspeech 2026
Subjects: Sound (cs.SD); Audio and Speech Processing (eess.AS)
[28] arXiv:2606.30646 (cross-list from cs.SD) [pdf, html, other]
Title: ASR-Agnostic Multimodal Spectrotemporal Modeling for Early Dementia Detection
Subjects: Sound (cs.SD); Artificial Intelligence (cs.AI); Computation and Language (cs.CL); Audio and Speech Processing (eess.AS)

2026年6月30日 (showing 14 of 14 entries )

[29] arXiv:2606.30580 [pdf, html, other]
Title: MeloDISinger: Melody-Aware & Duration-Preserving Singing Voice Editing with Audio Infilling
Comments: Accepted to Interspeech 2026
Subjects: Audio and Speech Processing (eess.AS); Sound (cs.SD)
[30] arXiv:2606.30114 [pdf, other]
Title: Evaluation of Head-Related Transfer Functions Across Five Levels of Individualisation in Virtual Reality
Comments: Submitted, accepted and presented at the AES 2026 International Conference on Audio for Virtual and Augmented Reality and Immersive Games
Subjects: Audio and Speech Processing (eess.AS)
[31] arXiv:2606.29901 [pdf, html, other]
Title: Semi-Supervised Sound Event Detection with Conditional Mixup and Embedding-Level Contrastive Loss
Comments: 6 pages; accepted by SMC 2026
Subjects: Audio and Speech Processing (eess.AS); Artificial Intelligence (cs.AI)
[32] arXiv:2606.29632 [pdf, html, other]
Title: VIB-AVSR: Variational Information Bottleneck for Noise-Robust LLM-Based Audio-Visual Speech Recognition
Comments: Accepted to INTERSPEECH 2026. Our code is available at this https URL
Subjects: Audio and Speech Processing (eess.AS); Computer Vision and Pattern Recognition (cs.CV); Sound (cs.SD)
[33] arXiv:2606.29480 [pdf, html, other]
Title: DTM-Codec: Dynamic Token Masking for VFR Speech Coding with Efficient Boundary Selection
Comments: 10 pages, 2 figures, accepted to INTERSPEECH 2026
Subjects: Audio and Speech Processing (eess.AS); Sound (cs.SD)
[34] arXiv:2606.29450 [pdf, html, other]
Title: VeRe-Flow: Guiding Flow Matching toward Clean Speech via Velocity Contrastive Regularization and Representation Alignment for Noise-Robust Bandwidth Expansion
Comments: Accepted to Interspeech 2026
Subjects: Audio and Speech Processing (eess.AS)
[35] arXiv:2606.28884 [pdf, html, other]
[36] arXiv:2606.28732 [pdf, html, other]
Title: CTC-Seeded Token Edit Refinement for Non-Autoregressive Speech Recognition
Comments: Submitted to IEEE SLT 2026
Subjects: Audio and Speech Processing (eess.AS)
[37] arXiv:2606.28728 [pdf, html, other]
Title: Improving Large-Scale Weakly Supervised ASR by Filtering and Selection
Comments: 5 pages, 4 figures, 2 tables
Subjects: Audio and Speech Processing (eess.AS); Computation and Language (cs.CL)
[38] arXiv:2606.30356 (cross-list from cs.CL) [pdf, html, other]
Title: OLIVE: View-Augmented Latent Prediction with Waveform Reconstruction for Speech SSL
Subjects: Computation and Language (cs.CL); Machine Learning (cs.LG); Sound (cs.SD); Audio and Speech Processing (eess.AS)
[39] arXiv:2606.30196 (cross-list from cs.CL) [pdf, html, other]
Title: Forewarned is Forearmed: When Non-Sequential Embedding Turns Into an Anomaly Detector
Comments: Accepted for presentation at LREC 2026
Subjects: Computation and Language (cs.CL); Artificial Intelligence (cs.AI); Machine Learning (cs.LG); Audio and Speech Processing (eess.AS)
[40] arXiv:2606.29534 (cross-list from cs.CL) [pdf, html, other]
Title: Preference-ASR: A Preference-Aware Test Set for Benchmarking ASR in the Era of Speech LLMs
Comments: Accepted at Interspeech 2026
Subjects: Computation and Language (cs.CL); Audio and Speech Processing (eess.AS)
[41] arXiv:2606.29071 (cross-list from physics.med-ph) [pdf, html, other]
Title: An Optimal Contact-Mechanically Consistent and Flow-Separation Adapted Modeling of Vocal Fold Dynamics
Comments: 30 pages, 9 figures
Subjects: Medical Physics (physics.med-ph); Sound (cs.SD); Audio and Speech Processing (eess.AS)
[42] arXiv:2606.28988 (cross-list from cs.SD) [pdf, html, other]
Title: Underwater Source Detection and Classification for Signal-based Surveillance: Audio Dataset Curation and Cross-Domain Evaluation
Comments: 6 pages, 4 figures. Accepted to the 2026 International Conference on Advanced Visual and Signal-Based Systems (AVSS) - Lecce, Italy
Subjects: Sound (cs.SD); Audio and Speech Processing (eess.AS); Signal Processing (eess.SP)

2026年6月29日 (showing 8 of 8 entries )

[43] arXiv:2606.28249 [pdf, html, other]
Title: HPRO: Hierarchical Progressive Reward Optimization via Preference Extraction for Emotional Text-to-Speech
Comments: 7 pages, 3 figures, 3 tables; Preprint
Subjects: Audio and Speech Processing (eess.AS); Computation and Language (cs.CL); Sound (cs.SD)
[44] arXiv:2606.28114 [pdf, other]
Title: Screening Matters: A Comparative Study of Conventional and Crowdsourced Listening Tests
Comments: accepted at Interspeech 2026
Subjects: Audio and Speech Processing (eess.AS)
[45] arXiv:2606.28048 (cross-list from cs.SD) [pdf, html, other]
Title: DG^VoiC: Speaker Clustering for Fraud Investigation under Real Call-Centre Conditions
Comments: 5 pages, 4 figures, 1 table
Subjects: Sound (cs.SD); Artificial Intelligence (cs.AI); Computation and Language (cs.CL); Audio and Speech Processing (eess.AS)
[46] arXiv:2606.28032 (cross-list from cs.SD) [pdf, other]
Title: A Flexible Encoding Model for Non-Unique Note Alignments
Comments: Published at the Music Encoding Conference (MEC), 2026
Subjects: Sound (cs.SD); Audio and Speech Processing (eess.AS)
[47] arXiv:2606.28002 (cross-list from cs.CL) [pdf, html, other]
Title: Dialogue to Detection: A Multimodal Hybrid NLP Pipeline for Insurance Fraud Detection
Comments: 10 pages, 8 figures, 2 tables
Subjects: Computation and Language (cs.CL); Artificial Intelligence (cs.AI); Audio and Speech Processing (eess.AS)
[48] arXiv:2606.27965 (cross-list from cs.SD) [pdf, html, other]
Title: Grammar-Guided Hierarchical Parsing for Long-form Audio Activity Recognition
Comments: Accepted to Interspeech 2026
Subjects: Sound (cs.SD); Audio and Speech Processing (eess.AS)
[49] arXiv:2606.27717 (cross-list from cs.CL) [pdf, html, other]
Title: Do Speech Emphasis Models Generalize across Languages and Emotions?
Comments: Interspeech 2026
Subjects: Computation and Language (cs.CL); Artificial Intelligence (cs.AI); Machine Learning (cs.LG); Sound (cs.SD); Audio and Speech Processing (eess.AS)
[50] arXiv:2606.27320 (cross-list from cs.SD) [pdf, html, other]
Title: Elastic Time: Dynamic Frame Rate Bottlenecks for Neural Audio Coding
Comments: Interspeech 2026
Subjects: Sound (cs.SD); Machine Learning (cs.LG); Audio and Speech Processing (eess.AS)
Total of 59 entries : 1-50 51-59
Showing up to 50 entries per page: fewer | more | all

AltStyle によって変換されたページ (->オリジナル) /