sophiefy/Sovits

Folders and files

Name		Name	Last commit message	Last commit date
Latest commit History 129 Commits
JDC		JDC
assets		assets
configs		configs
examples		examples
filelists		filelists
monotonic_align		monotonic_align
LICENSE		LICENSE
README.md		README.md
attentions.py		attentions.py
commons.py		commons.py
data_utils.py		data_utils.py
inference.ipynb		inference.ipynb
inference_v2.ipynb		inference_v2.ipynb
losses.py		losses.py
mel_processing.py		mel_processing.py
models.py		models.py
modules.py		modules.py
requirements.txt		requirements.txt
train.ipynb		train.ipynb
train.py		train.py
train_ms.py		train_ms.py
transforms.py		transforms.py
utils.py		utils.py

Repository files navigation

Stella VC Based on Soft-VC and VITS

Update

Sovits 2.0 inference demo is available!

Introduction

Inspired by Rcell, I replaced the word embedding of TextEncoder in VITS with the output of the ContentEncoder used in Soft-VC to achieve any-to-one voice conversion with non-parallel data. Of course, any-to-many voice converison is also doable!

For better voice quality, in Sovits2, I utilize the f0 model used in StarGANv2-VC to get fundamental frequency feature of an input audio and feed it to the vocoder of VITS.

Models

A Certain Magical Index

index

Description

Speaker	ID
一方通行	0
上条当麻	1
御坂美琴	2
白井黑子	3

Model: Google drive
Config: in this repository
Demo
- Colab: Sovits (魔法禁书目录)
- BILIBILI: 基于Sovits的4人声音转换模型

Shiki Natsume

natsume

Description

Single speaker model of Shiki Natsume.

Model: Google drive
Config: in this repository
Demo
- Colab: Sovits (四季夏目)
- BILIBILI: 枣子姐变声器

Shiki Natsume 2.0

natsume

Description

Single speaker model of Shiki Natsume, trained with F0 feature.

Model: Google drive
Config: in this repository
Demo
- Colab: Sovits2 (四季夏目)

How to use

Train

Prepare dataset

Audio should be wav file, with mono channel and a sampling rate of 22050 Hz.

Your dataset should be like:

└───wavs
 ├───dev
 │ ├───LJ001-0001.wav
 │ ├───...
 │ └───LJ050-0278.wav
 └───train
 ├───LJ002-0332.wav
 ├───...
 └───LJ047-0007.wav

Extract speech units

Utilize the content encoder to extract speech units in the audio.

For more information, refer to this repo.

cd hubert
python3 encode.py soft path/to/wavs/directory path/to/soft/directory --extension .wav

Then you need to generate filelists for both your training and validation files. It's recommended that you prepare your filelists beforehand!

Your filelists should look like:

Single speaker:

path/to/wav|path/to/unit
...

Multi-speaker:

path/to/wav|id|path/to/unit
...

Train Sovits

Single speaker:

python train.py -c configs/config.json -m model_name

Multi-speaker:

python train_ms.py -c configs/config.json -m model_name

You may also refer to train.ipynb

Inference

Please refer to inference.ipynb

TOD0

Add F0 model
Add F0 loss

Contact

QQ: 2235306122

BILIBILI: Francis-Komizu

Ackowledgement

Special thanks to Rcell for giving me both inspiration and advice!

References

基于VITS和SoftVC实现任意对一VoiceConversion

Soft-VC

vits

StarGANv2-VC

About

An unofficial implementation of the combination of Soft-VC and VITS

License

sophiefy/Sovits

Folders and files

Latest commit

History

Repository files navigation

Stella VC Based on Soft-VC and VITS

This project is closed...

Contents

Update

Introduction

Models

A Certain Magical Index

Shiki Natsume

Shiki Natsume 2.0

How to use

Train

Prepare dataset

Extract speech units

Train Sovits

Inference

TOD0

Contact

Ackowledgement

References

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors 2

Uh oh!

Languages

Packages