youngsheen/GPST

Folders and files

Name		Name	Last commit message	Last commit date
Latest commit History 14 Commits
config		config
demo		demo
fairseq_user		fairseq_user
pics		pics
preprocess		preprocess
.gitignore		.gitignore
README.md		README.md

Repository files navigation

Generative Pre-Trained Speech Language Model with Efficient Hierarchical Transformer

arXiv

💡 Some other speech AI projects from our team may interest you ✨.

Talk With Human-like Agents: Empathetic Dialogue Through Perceptible Acoustic Reception and Reaction
Haoqiu Yan#, Yongxin Zhu#, Kai Zheng, Bing Liu, Haoyu Cao, Deqiang Jiang, Linli Xu
github github arXiv

GPST PyTorch Implementation

This is a PyTorch implementation of the paper Generative Pre-Trained Speech Language Model with Efficient Hierarchical Transformer.

Overview

Demo page: https://youngsheen.github.io/GPST/demo

The overview of GPST as following picture shows. The overview of GPST

Installation

Download the code

git clone https://github.com/youngsheen/GPST.git
cd GPST

Install fairseq and encodec via pip. Install seamless_communication and fairseq2.
[Optional] Install flash-attn for faster attention computation.

Preparation

Dataset

Download the LibriSpeech or LibriLight dataset and place it in your directory at $PATH_TO_YOUR_WORKSPACE/datasets. We use xlsr2_1b_v2 from SeamlessM4T to extract semantic tokens and Encodec to extract acoustic tokens. You can set the bandwidth to 6kbps or 12 kbps to control the quality of speech resynthesis. We suggest using bandwidth=12 since the former half of its acoustic tokens are the same as 6kbps. The scripts will generate a manifest containing the path of all files, two lmdb folders containing semantic tokens and acoustic tokens separately.

bash preprocess/run.sh

Training Scripts

OUTPUT_DIR=outputs
ROOT=PATH
mkdir -p $OUTPUT_DIR
CUDA_VISIBLE_DEVICES=4,5 torchrun --nnodes=1 --nproc_per_node=2 --master_port=36666 \
 $(which fairseq-hydra-train) --config-dir config \
 --config-name st2at \
 hydra.run.dir=$ROOT/gpst \
 hydra.output_subdir=$OUTPUT_DIR \
 hydra.job.name=$OUTPUT_DIR/train \
 common.tensorboard_logdir=$OUTPUT_DIR/tb \
 checkpoint.save_dir=$OUTPUT_DIR/checkpoints \
 +task.data=$ROOT/LibriSpeech \

Inference Scripts

TTS

Voice Conversion

Citation

If you find GPST useful for your research and applications, please cite using this BibTeX:

@inproceedings{zhu-etal-2024-generative,
 title = "Generative Pre-trained Speech Language Model with Efficient Hierarchical Transformer",
 author = "Zhu, Yongxin and
 Su, Dan and
 He, Liqiang and
 Xu, Linli and
 Yu, Dong",
 editor = "Ku, Lun-Wei and
 Martins, Andre and
 Srikumar, Vivek",
 booktitle = "Proceedings of the 62nd Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)",
 month = aug,
 year = "2024",
 address = "Bangkok, Thailand",
 publisher = "Association for Computational Linguistics",
 url = "https://aclanthology.org/2024.acl-long.97",
 doi = "10.18653/v1/2024.acl-long.97",
 pages = "1764--1775",
}

About

[ACL 2024] Generative Pre-Trained Speech Language Model with Efficient Hierarchical Transformer

Releases

No releases published

Packages

No packages published

Navigation Menu

Search code, repositories, users, issues, pull requests...

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

youngsheen/GPST

Folders and files

Latest commit

History

Repository files navigation

Generative Pre-Trained Speech Language Model with Efficient Hierarchical Transformer

arXiv

GPST PyTorch Implementation

Overview

Installation

Preparation

Dataset

Training Scripts

Inference Scripts

TTS

Voice Conversion

Citation

About

Topics

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages

Languages

youngsheen/GPST

Folders and files

Latest commit

History

Repository files navigation

Generative Pre-Trained Speech Language Model with Efficient Hierarchical Transformer

arXiv

GPST PyTorch Implementation

Overview

Installation

Preparation

Dataset

Training Scripts

Inference Scripts

TTS

Voice Conversion

Citation

About

Topics

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages