Skip to content

Navigation Menu

Sign in
Appearance settings

Search code, repositories, users, issues, pull requests...

Provide feedback

We read every piece of feedback, and take your input very seriously.

Saved searches

Use saved searches to filter your results more quickly

Sign up
Appearance settings

基于大模型的高质量情感虚拟人系统(Gradio+FUNASR+ChatGLM2-6B+GPT-SOVITS+EAT+GFPGAN)

Notifications You must be signed in to change notification settings

liyan0628/Graduation-Project

Folders and files

NameName
Last commit message
Last commit date

Latest commit

History

85 Commits

Repository files navigation

基于大模型的高质量情感虚拟人系统

1. 一些测试结果

1.1 测试卡通人像

driven image original gfpgan
original_comic.mp4
gpfgan_comic.mp4

1.2 合成人物测试

driven image original gfpgan
talking_2024年05月14日-16-34-25_liuyin.mp4
talking_restoration_2024年05月14日-16-34-34_liuyin.mp4

1.3 不同表情测试

driven image happy scared neural
happy_sound.mp4
scared_sound.mp4
neural_sound.mp4

1.4 不同的声音测试

liwen fufu liuying
me_no_back.mp4
fufu.mp4
talking_restoration_2024年05月14日-16-34-34_liuyin.mp4

1.5 不同语言测试

Chinese English
talking_2024年05月03日-05-14-59.mp4
talking_2024年05月06日-22-12-02.mp4

1.6 不同的动作测试

pose1 pose2 pose3
pose1.mp4
pose2.mp4
pose3.mp4

2. 环境准备

2.1 准备EAT环境和GPT-SOVITS环境

  • 系统环境
    • Python 3.9.19
    • Ubuntu 20.04.1
    • Graphics-Card 2-4090
git clone https://github.com/lililuya/Graduation-Project.git
cd env
  • Use conda or pip
# if use conda, modify the prefix of environment.yml or delete it to use the default location
conda env create -f environment.yml
# if use pip, delete some local package index.
pip install -r requirements.txt

2.2 ModelScope和GPT-SOVISTS的环境问题,以ModelScope的为准

pip install funasr==1.0.22
pip install modelscope==1.13.3

2.3 tensorrt安装

参考tensorrt安装笔记

2.4 一些问题

  • 主要可能出现的问题是numba版本的问题,出现后更新numba版本即可
pip install -U numba

3.权重文件

4.运行

4.1 本地运行

python whole_pipeline_GPTSOVITS_asr_en_gradio_multivoice.py

4.2 使用Gradio自带内网穿透

# Modify launch=True

4.3界面

引用文献

@InProceedings{Gan_2023_ICCV,
 author = {Gan, Yuan and Yang, Zongxin and Yue, Xihang and Sun, Lingyun and Yang, Yi},
 title = {Efficient Emotional Adaptation for Audio-Driven Talking-Head Generation},
 booktitle = {Proceedings of the IEEE/CVF International Conference on Computer Vision (ICCV)},
 month = {October},
 year = {2023},
 pages = {22634-22645}
}
@InProceedings{wang2021gfpgan,
 author = {Xintao Wang and Yu Li and Honglun Zhang and Ying Shan},
 title = {Towards Real-World Blind Face Restoration with Generative Facial Prior},
 booktitle={The IEEE Conference on Computer Vision and Pattern Recognition (CVPR)},
 year = {2021}
}
@inproceedings{gao22b_interspeech,
 author={Zhifu Gao and ShiLiang Zhang and Ian McLoughlin and Zhijie Yan},
 title={Paraformer: Fast and Accurate Parallel Transformer for Non-autoregressive End-to-End Speech Recognition},
 year=2022,
 booktitle={Proc. Interspeech 2022},
 pages={2063--2067},
 doi={10.21437/Interspeech.2022-9996}
}
@inproceedings{du2022glm,
 title={GLM: General Language Model Pretraining with Autoregressive Blank Infilling},
 author={Du, Zhengxiao and Qian, Yujie and Liu, Xiao and Ding, Ming and Qiu, Jiezhong and Yang, Zhilin and Tang, Jie},
 booktitle={Proceedings of the 60th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)},
 pages={320--335},
 year={2022}
}

相关仓库

目前存在的问题

  • 中文同步问题,参考issue
  • 显存需要6G+10G才可以跑起来,现存占用过大。
  • 目前展示的结果效果不太好,因为选择的初始图片不太清晰,并且onnx下损失了超分模型的部分精度。
  • 头拼合进身体,EAT作者建议
  • 背景抖动,EAT作者建议,本仓库采取MODNet方案。
  • Deepspeech加速,目前提取音频特征需要时间特别久,使用的deepspeech-0.1版本。
  • GPT-SOVITS模型自定义载,资源换时间,每个模型大约1.8G左右,可以写入配置文件自定义加载。

声明

本项目以EAT为核心模型,主要做一个实验探究,不存在任何其他用途。

About

基于大模型的高质量情感虚拟人系统(Gradio+FUNASR+ChatGLM2-6B+GPT-SOVITS+EAT+GFPGAN)

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages

  • Python 63.6%
  • Jupyter Notebook 36.4%

AltStyle によって変換されたページ (->オリジナル) /