Skip to content

Navigation Menu

Sign in
Appearance settings

Search code, repositories, users, issues, pull requests...

Provide feedback

We read every piece of feedback, and take your input very seriously.

Saved searches

Use saved searches to filter your results more quickly

Sign up
Appearance settings

An unofficial implementation of the combination of Soft-VC and VITS

License

Notifications You must be signed in to change notification settings

sophiefy/Sovits

Repository files navigation

Stella VC Based on Soft-VC and VITS

This project is closed...

Contents

Update

  • Sovits 2.0 inference demo is available!

Introduction

Inspired by Rcell, I replaced the word embedding of TextEncoder in VITS with the output of the ContentEncoder used in Soft-VC to achieve any-to-one voice conversion with non-parallel data. Of course, any-to-many voice converison is also doable!

For better voice quality, in Sovits2, I utilize the f0 model used in StarGANv2-VC to get fundamental frequency feature of an input audio and feed it to the vocoder of VITS.

Models

A Certain Magical Index

index

  • Description
Speaker ID
一方通行 0
上条当麻 1
御坂美琴 2
白井黑子 3

Shiki Natsume

natsume

  • Description

Single speaker model of Shiki Natsume.

Shiki Natsume 2.0

natsume

  • Description

Single speaker model of Shiki Natsume, trained with F0 feature.

How to use

Train

Prepare dataset

Audio should be wav file, with mono channel and a sampling rate of 22050 Hz.

Your dataset should be like:

└───wavs
 ├───dev
 │ ├───LJ001-0001.wav
 │ ├───...
 │ └───LJ050-0278.wav
 └───train
 ├───LJ002-0332.wav
 ├───...
 └───LJ047-0007.wav

Extract speech units

Utilize the content encoder to extract speech units in the audio.

For more information, refer to this repo.

cd hubert
python3 encode.py soft path/to/wavs/directory path/to/soft/directory --extension .wav

Then you need to generate filelists for both your training and validation files. It's recommended that you prepare your filelists beforehand!

Your filelists should look like:

Single speaker:

path/to/wav|path/to/unit
...

Multi-speaker:

path/to/wav|id|path/to/unit
...

Train Sovits

Single speaker:

python train.py -c configs/config.json -m model_name

Multi-speaker:

python train_ms.py -c configs/config.json -m model_name

You may also refer to train.ipynb

Inference

Please refer to inference.ipynb

TOD0

  • Add F0 model
  • Add F0 loss

Contact

QQ: 2235306122

BILIBILI: Francis-Komizu

Ackowledgement

Special thanks to Rcell for giving me both inspiration and advice!

References

基于VITS和SoftVC实现任意对一VoiceConversion

Soft-VC

vits

StarGANv2-VC

About

An unofficial implementation of the combination of Soft-VC and VITS

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Contributors 2

AltStyle によって変換されたページ (->オリジナル) /