IEEE Transactions on Audio, Speech and Language Processing

Scope

The IEEE Transactions on Audio, Speech and Language Processing (TASLPRO) is dedicated to innovative theory and methods for processing signals representing audio, speech and language, and their applications. This includes analysis, synthesis, enhancement, transformation, classification and interpretation of such signals as well as the design, development, and evaluation of associated signal processing systems.

Machine learning and pattern analysis applied to any of the above areas is also welcome.

Reproducible research

The Transactions encourages authors to make their publications reproducible by making all information needed to reproduce the presented results available online. This typically requires publishing the code and data used to produce the publication's figures and tables on a website; see the supplemental materials section of the information for authors. It gives other researchers easier access to the work, and facilitates fair comparisons.

Multimedia content

It is now possible to submit for review and publish in Xplore supporting multimedia material such as speech samples, images, movies, matlab code etc. A multimedia graphical abstract can also be displayed along with the traditional text. More information is available under Multimedia Materials at the IEEE Author Center.

TASLPRO Volume 33 | 2025

Adaptive Multimodal Graph Integration Network for Multimodal Sentiment Analysis

TASLPRO Volume 33 | 2025

Most current models for analyzing multimodal sequences often disregard the imbalanced contributions of individual modal representations caused by varying information densities, as well as the inherent multi-relational interactions across distinct modalities. Consequently, a biased understanding of the intricate interplay among modalities may be fostered, limiting prediction accuracy and effectiveness.

Memory-Tuning: A Unified Parameter-Efficient Tuning Method for Pre-Trained Language Models

TASLPRO Volume 33 | 2025

Conventional fine-tuning encounters increasing difficulties given the size of current Pre-trained Language Models, which makes parameter-efficient tuning become the focal point of frontier research. Recent advances in this field is the unified tuning methods that aim to tune the representations of both multi-head attention (MHA) and fully connected feed-forward network (FFN) simultaneously, but they rely on existing tuning methods and do not explicitly model domain knowledge for downstream tasks.

Predict-and-Update Network: Audio-Visual Speech Recognition Inspired by Human Speech Perception

TASLPRO Volume 33 | 2025

Audio and visual signals complement each other in human speech perception, and the same applies to automatic speech recognition. The visual signal is less evident than the acoustic signal, but more robust in a complex acoustic environment, as far as speech perception is concerned.

Disentangling Prosody Representations With Unsupervised Speech Reconstruction

TASLPRO Articles

TASLP Volume 32 | 2024

Human speech can be characterized by different components, including semantic content, speaker identity and prosodic information. Significant progress has been made in disentangling representations for semantic content and speaker identity in speech recognition and speaker verification tasks respectively. However, it is still an open challenging question to extract prosodic information because of the intrinsic association of different attributes, such as timbre and rhythm, and because of the need for supervised training schemes to achieve robust speech recognition.

Operation-Augmented Numerical Reasoning for Question Answering

TASLPRO Articles

TASLP Volume 32 | 2024

Question answering requiring numerical reasoning, which generally involves symbolic operations such as sorting, counting, and addition, is a challenging task. To address such a problem, existing mixture-of-experts (MoE)-based methods design several specific answer predictors to handle different types of questions and achieve promising performance. However, they ignore the modeling and exploitation of fine-grained reasoning-related operations to support numerical reasoning, encountering the inadequacy in reasoning capability and interpretability.

Publications & Resources

Conferences & Events

Professional Development

Community & Involvement

About IEEE SPS

For Volunteers

IEEE Transactions on Audio, Speech and Language Processing

Scope

Reproducible research

Multimedia content

TASLPRO Volume 33 | 2025

Adaptive Multimodal Graph Integration Network for Multimodal Sentiment Analysis

Memory-Tuning: A Unified Parameter-Efficient Tuning Method for Pre-Trained Language Models

Predict-and-Update Network: Audio-Visual Speech Recognition Inspired by Human Speech Perception

TASLP Volume 32 | 2024

Disentangling Prosody Representations With Unsupervised Speech Reconstruction

Operation-Augmented Numerical Reasoning for Question Answering

IEEE Signal Processing Society on YouTube

Publications & Resources

Conferences & Events

Professional Development

Community & Involvement

About IEEE SPS

For Volunteers