albertcao/deep-speaker

Folders and files

Name		Name	Last commit message	Last commit date
Latest commit History 52 Commits
assets		assets
audio		audio
bak		bak
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
constants.py		constants.py
librispeech_wav_reader.py		librispeech_wav_reader.py
models.py		models.py
models_train.py		models_train.py
next_batch.py		next_batch.py
pre_process.py		pre_process.py
requirements.txt		requirements.txt
triplet_loss.py		triplet_loss.py
triplet_loss_test.py		triplet_loss_test.py
triplet_visualization.py		triplet_visualization.py

Repository files navigation

Deep Speaker from Baidu Research

license dep2 dep1

Deep Speaker: an End-to-End Neural Speaker Embedding System https://arxiv.org/pdf/1705.02304.pdf

Work accomplished so far:

Triplet loss
Triplet loss test
Model implementation
Data pipeline implementation. We're going to use the LibriSpeech dataset with 2300+ different speakers.
Train the models

Visualization of a possible triplet (Anchor, Positive, Negative) in the cosine similarity space

Contributing

Please message me if you want to contribute. I'll be happy to hear your ideas. There are a lot of undisclosed things in the paper, such as:

Input size to the network? Which inputs exactly?
How many filter banks do we use?
Sample Rate?

LibriSpeech Dataset

Available here: http://www.openslr.org/12/

List of possible other datasets: http://kaldi-asr.org/doc/examples.html

Extract of this dataset:

 filenames chapter_id speaker_id dataset_id
0 /Volumes/Transcend/data-set/LibriSpeech/dev-clean/1272/128104/1272-128104-0000.wav 128104 1272 dev-clean
1 /Volumes/Transcend/data-set/LibriSpeech/dev-clean/1272/128104/1272-128104-0001.wav 128104 1272 dev-clean
2 /Volumes/Transcend/data-set/LibriSpeech/dev-clean/1272/128104/1272-128104-0002.wav 128104 1272 dev-clean
3 /Volumes/Transcend/data-set/LibriSpeech/dev-clean/1272/128104/1272-128104-0003.wav 128104 1272 dev-clean
4 /Volumes/Transcend/data-set/LibriSpeech/dev-clean/1272/128104/1272-128104-0004.wav 128104 1272 dev-clean
5 /Volumes/Transcend/data-set/LibriSpeech/dev-clean/1272/128104/1272-128104-0005.wav 128104 1272 dev-clean
6 /Volumes/Transcend/data-set/LibriSpeech/dev-clean/1272/128104/1272-128104-0006.wav 128104 1272 dev-clean
7 /Volumes/Transcend/data-set/LibriSpeech/dev-clean/1272/128104/1272-128104-0007.wav 128104 1272 dev-clean
8 /Volumes/Transcend/data-set/LibriSpeech/dev-clean/1272/128104/1272-128104-0008.wav 128104 1272 dev-clean
9 /Volumes/Transcend/data-set/LibriSpeech/dev-clean/1272/128104/1272-128104-0009.wav 128104 1272 dev-clean

Training example on GPU

Training on the GPU.

About

Deep Speaker: an End-to-End Neural Speaker Embedding System https://arxiv.org/pdf/1705.02304.pdf

Releases

No releases published

Packages

No packages published

Languages

Python 97.6%
Shell 2.4%

Navigation Menu

Search code, repositories, users, issues, pull requests...

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

License

albertcao/deep-speaker

Folders and files

Latest commit

History

Repository files navigation

Deep Speaker from Baidu Research

Contributing

LibriSpeech Dataset

Training example on GPU

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages

Languages

License

albertcao/deep-speaker

Folders and files

Latest commit

History

Repository files navigation

Deep Speaker from Baidu Research

Contributing

LibriSpeech Dataset

Training example on GPU

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages