Skip to content

Navigation Menu

Sign in
Appearance settings

Search code, repositories, users, issues, pull requests...

Provide feedback

We read every piece of feedback, and take your input very seriously.

Saved searches

Use saved searches to filter your results more quickly

Sign up
Appearance settings

albertcao/deep-speaker

Repository files navigation

Deep Speaker from Baidu Research

license dep2 dep1

Deep Speaker: an End-to-End Neural Speaker Embedding System https://arxiv.org/pdf/1705.02304.pdf

Work accomplished so far:

  • Triplet loss
  • Triplet loss test
  • Model implementation
  • Data pipeline implementation. We're going to use the LibriSpeech dataset with 2300+ different speakers.
  • Train the models


Visualization of a possible triplet (Anchor, Positive, Negative) in the cosine similarity space

Contributing

Please message me if you want to contribute. I'll be happy to hear your ideas. There are a lot of undisclosed things in the paper, such as:

  • Input size to the network? Which inputs exactly?
  • How many filter banks do we use?
  • Sample Rate?

LibriSpeech Dataset

Available here: http://www.openslr.org/12/

List of possible other datasets: http://kaldi-asr.org/doc/examples.html

Extract of this dataset:

 filenames chapter_id speaker_id dataset_id
0 /Volumes/Transcend/data-set/LibriSpeech/dev-clean/1272/128104/1272-128104-0000.wav 128104 1272 dev-clean
1 /Volumes/Transcend/data-set/LibriSpeech/dev-clean/1272/128104/1272-128104-0001.wav 128104 1272 dev-clean
2 /Volumes/Transcend/data-set/LibriSpeech/dev-clean/1272/128104/1272-128104-0002.wav 128104 1272 dev-clean
3 /Volumes/Transcend/data-set/LibriSpeech/dev-clean/1272/128104/1272-128104-0003.wav 128104 1272 dev-clean
4 /Volumes/Transcend/data-set/LibriSpeech/dev-clean/1272/128104/1272-128104-0004.wav 128104 1272 dev-clean
5 /Volumes/Transcend/data-set/LibriSpeech/dev-clean/1272/128104/1272-128104-0005.wav 128104 1272 dev-clean
6 /Volumes/Transcend/data-set/LibriSpeech/dev-clean/1272/128104/1272-128104-0006.wav 128104 1272 dev-clean
7 /Volumes/Transcend/data-set/LibriSpeech/dev-clean/1272/128104/1272-128104-0007.wav 128104 1272 dev-clean
8 /Volumes/Transcend/data-set/LibriSpeech/dev-clean/1272/128104/1272-128104-0008.wav 128104 1272 dev-clean
9 /Volumes/Transcend/data-set/LibriSpeech/dev-clean/1272/128104/1272-128104-0009.wav 128104 1272 dev-clean

Training example on GPU


Training on the GPU.

About

Deep Speaker: an End-to-End Neural Speaker Embedding System https://arxiv.org/pdf/1705.02304.pdf

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages

  • Python 97.6%
  • Shell 2.4%

AltStyle によって変換されたページ (->オリジナル) /