liyan0628/Graduation-Project

Folders and files

Name		Name	Last commit message	Last commit date
Latest commit History 85 Commits
GPT_SoVits		GPT_SoVits
MODNet		MODNet
Utils		Utils
audio		audio
audio_file		audio_file
audio_temp		audio_temp
ckpt		ckpt
config		config
demo/test_gpfgan		demo/test_gpfgan
env		env
face_detection		face_detection
frp		frp
img		img
logo		logo
modules		modules
preprocess		preprocess
restoration		restoration
sh_tools		sh_tools
style		style
sync_batchnorm		sync_batchnorm
tensorrt/official_sample_for_Res50		tensorrt/official_sample_for_Res50
text_exp_graduate_project		text_exp_graduate_project
tools		tools
EAT_model.py		EAT_model.py
README.md		README.md
SECURITY.md		SECURITY.md
animate.py		animate.py
app_communicate.py		app_communicate.py
augmentation.py		augmentation.py
debug记录.md		debug记录.md
deepspeech_features.py		deepspeech_features.py
deepspeech_store.py		deepspeech_store.py
demo.py		demo.py
demo_o.py		demo_o.py
emotional_tts.py		emotional_tts.py
extract_ds_features.py		extract_ds_features.py
frames_dataset_transformer25.py		frames_dataset_transformer25.py
generate_txt_byglm.py		generate_txt_byglm.py
gradio_test_asr_multivoice.py		gradio_test_asr_multivoice.py
image_matting.py		image_matting.py
logger.py		logger.py
main.py		main.py
pad.npy		pad.npy
pretrain_a2kp.py		pretrain_a2kp.py
pretrain_a2kp_img.py		pretrain_a2kp_img.py
prompt_st_dp_eam3d.py		prompt_st_dp_eam3d.py
rand_sample_ours_mead100.npy		rand_sample_ours_mead100.npy
seafoam_theme.py		seafoam_theme.py
test_asr_to_tts_pipeline.py		test_asr_to_tts_pipeline.py
test_cuda.py		test_cuda.py
test_emotional_analysis.py		test_emotional_analysis.py
test_english_asr.py		test_english_asr.py
test_extract_deepspeech.py		test_extract_deepspeech.py
test_fastapi.py		test_fastapi.py
test_gfpgan.py		test_gfpgan.py
test_gradio_theme.py		test_gradio_theme.py
test_lrw_posedeep_normalize_neutral.py		test_lrw_posedeep_normalize_neutral.py
test_mead.py		test_mead.py
test_npy.py		test_npy.py
test_pose.py		test_pose.py
test_posedeep_deepprompt_eam3d.py		test_posedeep_deepprompt_eam3d.py
test_read_png.py		test_read_png.py
test_tcp_connect.py		test_tcp_connect.py
test_tts.py		test_tts.py
train_transformer.py		train_transformer.py
whole_pipeline.py		whole_pipeline.py
whole_pipeline_GPTSOVITS.py		whole_pipeline_GPTSOVITS.py
whole_pipeline_GPTSOVITS_asr_en_gradio copy.py		whole_pipeline_GPTSOVITS_asr_en_gradio copy.py
whole_pipeline_GPTSOVITS_asr_en_gradio.py		whole_pipeline_GPTSOVITS_asr_en_gradio.py
whole_pipeline_GPTSOVITS_asr_en_gradio_multivoice.py		whole_pipeline_GPTSOVITS_asr_en_gradio_multivoice.py
whole_pipeline_GPTSOVITS_gradio.py		whole_pipeline_GPTSOVITS_gradio.py
whole_pipeline_socket.py		whole_pipeline_socket.py
yaml_config.py		yaml_config.py

Repository files navigation

基于大模型的高质量情感虚拟人系统

系统流程图 image

1. 一些测试结果

1.1 测试卡通人像

driven image	original	gfpgan
	original_comic.mp4	gpfgan_comic.mp4

1.2 合成人物测试

driven image	original	gfpgan
	talking_2024年05月14日-16-34-25_liuyin.mp4	talking_restoration_2024年05月14日-16-34-34_liuyin.mp4

1.3 不同表情测试

driven image	happy	scared	neural
	happy_sound.mp4	scared_sound.mp4	neural_sound.mp4

1.4 不同的声音测试

liwen	fufu	liuying
me_no_back.mp4	fufu.mp4	talking_restoration_2024年05月14日-16-34-34_liuyin.mp4

1.5 不同语言测试

Chinese	English
talking_2024年05月03日-05-14-59.mp4	talking_2024年05月06日-22-12-02.mp4

1.6 不同的动作测试

pose1	pose2	pose3
pose1.mp4	pose2.mp4	pose3.mp4

2. 环境准备

2.1 准备EAT环境和GPT-SOVITS环境

系统环境
- Python 3.9.19
- Ubuntu 20.04.1
- Graphics-Card 2-4090

git clone https://github.com/lililuya/Graduation-Project.git
cd env

Use conda or pip

# if use conda, modify the prefix of environment.yml or delete it to use the default location
conda env create -f environment.yml

# if use pip, delete some local package index.
pip install -r requirements.txt

2.2 ModelScope和GPT-SOVISTS的环境问题,以ModelScope的为准

pip install funasr==1.0.22
pip install modelscope==1.13.3

2.3 tensorrt安装

参考tensorrt安装笔记

2.4 一些问题

主要可能出现的问题是numba版本的问题,出现后更新numba版本即可

pip install -U numba

3.权重文件

EAT权重文件
- 下载后放在根目录下的ckpt下
GFPGAN
- 下载后放到根目录的restoration下面
GPTSOVIT权重
- 下载后放到根目录下的GPT_SoVits/weights下面
MODNET权重
- 下载后放在pretrain下面
DeepSpeech
- 参考RADNERF

4.运行

4.1 本地运行

python whole_pipeline_GPTSOVITS_asr_en_gradio_multivoice.py

4.2 使用Gradio自带内网穿透

# Modify launch=True

一些配置参考Gradio Network Traversal

4.3界面

情感虚拟人生成模块 test_record
中英文TTS test_TTS_en
中英文ASR page4
抠图 page3

引用文献

@InProceedings{Gan_2023_ICCV,
 author = {Gan, Yuan and Yang, Zongxin and Yue, Xihang and Sun, Lingyun and Yang, Yi},
 title = {Efficient Emotional Adaptation for Audio-Driven Talking-Head Generation},
 booktitle = {Proceedings of the IEEE/CVF International Conference on Computer Vision (ICCV)},
 month = {October},
 year = {2023},
 pages = {22634-22645}
}
@InProceedings{wang2021gfpgan,
 author = {Xintao Wang and Yu Li and Honglun Zhang and Ying Shan},
 title = {Towards Real-World Blind Face Restoration with Generative Facial Prior},
 booktitle={The IEEE Conference on Computer Vision and Pattern Recognition (CVPR)},
 year = {2021}
}
@inproceedings{gao22b_interspeech,
 author={Zhifu Gao and ShiLiang Zhang and Ian McLoughlin and Zhijie Yan},
 title={Paraformer: Fast and Accurate Parallel Transformer for Non-autoregressive End-to-End Speech Recognition},
 year=2022,
 booktitle={Proc. Interspeech 2022},
 pages={2063--2067},
 doi={10.21437/Interspeech.2022-9996}
}
@inproceedings{du2022glm,
 title={GLM: General Language Model Pretraining with Autoregressive Blank Infilling},
 author={Du, Zhengxiao and Qian, Yujie and Liu, Xiao and Ding, Ming and Qiu, Jiezhong and Yang, Zhilin and Tang, Jie},
 booktitle={Proceedings of the 60th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)},
 pages={320--335},
 year={2022}
}

目前存在的问题

中文同步问题,参考issue
显存需要6G+10G才可以跑起来,现存占用过大。
目前展示的结果效果不太好,因为选择的初始图片不太清晰,并且onnx下损失了超分模型的部分精度。
头拼合进身体,EAT作者建议。
背景抖动,EAT作者建议,本仓库采取MODNet方案。
Deepspeech加速,目前提取音频特征需要时间特别久,使用的deepspeech-0.1版本。
GPT-SOVITS模型自定义载,资源换时间,每个模型大约1.8G左右,可以写入配置文件自定义加载。

声明

本项目以EAT为核心模型,主要做一个实验探究,不存在任何其他用途。

About

基于大模型的高质量情感虚拟人系统(Gradio+FUNASR+ChatGLM2-6B+GPT-SOVITS+EAT+GFPGAN)

Releases

No releases published

Packages

No packages published

Languages

Python 63.6%
Jupyter Notebook 36.4%

liyan0628/Graduation-Project

Folders and files

Latest commit

History

Repository files navigation

基于大模型的高质量情感虚拟人系统

1. 一些测试结果

1.1 测试卡通人像

1.2 合成人物测试

1.3 不同表情测试

1.4 不同的声音测试

1.5 不同语言测试

1.6 不同的动作测试

2. 环境准备

2.1 准备EAT环境和GPT-SOVITS环境

2.2 ModelScope和GPT-SOVISTS的环境问题,以ModelScope的为准

2.3 tensorrt安装

2.4 一些问题

3.权重文件

4.运行

4.1 本地运行

4.2 使用Gradio自带内网穿透

4.3界面

引用文献

相关仓库

目前存在的问题

声明

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages