Skip to content

Navigation Menu

Sign in
Appearance settings

Search code, repositories, users, issues, pull requests...

Provide feedback

We read every piece of feedback, and take your input very seriously.

Saved searches

Use saved searches to filter your results more quickly

Sign up
Appearance settings

WavEncoder is a Python library for encoding audio signals, transforms for audio augmentation, and training audio classification models with PyTorch backend.

License

Notifications You must be signed in to change notification settings

shangeth/wavencoder

Repository files navigation

PyPI Downloads visitors contributions welcome PyPI - Python Version GitHub last commit GitHub code size in bytes GitHub Gitter Twitter Follow

WavEncoder

WavEncoder is a Python library for encoding audio signals, transforms for audio augmentation, and training audio classification models with PyTorch backend.

Package Contents

Layers Models Transforms Trainer and utils
  • Attention
    • Dot
    • Soft
    • Additive
    • Multiplicative
  • SincNet layer
  • Time Delay Neural Network(TDNN)
  • PreTrained
    • wav2vec
    • wav2vec2(base, large, xlsr53)
    • SincNet
    • RawNet
  • Baseline
    • 1DCNN
    • LSTM Classifier
    • LSTM Attention Classifier
  • Noise(Environmet/Gaussian White Noise)
  • Speed Change
  • PadCrop
  • Clip
  • Reverberation
  • TimeShift
  • TimeMask
  • FrequencyMask
  • Classification Trainer
  • Classification Testing
  • Download Noise Dataset
  • Download Impulse Response Dataset

Wav Models to be added

  • wav2vec [1]
  • wav2vec2 [2]
  • SincNet [3]
  • PASE [4]
  • MockingJay [5]
  • RawNet [6]
  • GaborNet [7]
  • LEAF [8]
  • CNN-1D
  • CNN-LSTM
  • CNN-LSTM-Attn

Check the Demo Colab Notebook.

Installation

Use the package manager pip to install wavencoder.

pip install wavencoder

Usage

Import pretrained encoder, baseline models and classifiers

import torch
import wavencoder
x = torch.randn(1, 16000) # [1, 16000]
encoder = wavencoder.models.Wav2Vec(pretrained=True)
z = encoder(x) # [1, 512, 98]
classifier = wavencoder.models.LSTM_Attn_Classifier(512, 64, 2, 
 return_attn_weights=True, 
 attn_type='soft')
y_hat, attn_weights = classifier(z) # [1, 2], [1, 98]

Use wavencoder with PyTorch Sequential or class modules

import torch
import torch.nn as nn
import wavencoder
model = nn.Sequential(
 wavencoder.models.Wav2Vec(),
 wavencoder.models.LSTM_Attn_Classifier(512, 64, 2, 
 return_attn_weights=True, 
 attn_type='soft')
)
x = torch.randn(1, 16000) # [1, 16000]
y_hat, attn_weights = model(x) # [1, 2], [1, 98]
import torch
import torch.nn as nn
import wavencoder
class AudioClassifier(nn.Module):
 def __init__(self):
 super(AudioClassifier, self).__init__()
 self.encoder = wavencoder.models.Wav2Vec(pretrained=True)
 self.classifier = nn.Linear(512, 2)
 def forward(self, x):
 z = self.encoder(x)
 z = torch.mean(z, dim=2)
 out = self.classifier(z)
 return out
model = AudioClassifier()
x = torch.randn(1, 16000) # [1, 16000]
y_hat = model(x) # [1, 2]

Train the encoder-classifier models

from wavencoder.models import Wav2Vec, LSTM_Attn_Classifier
from wavencoder.trainer import train, test_evaluate_classifier, test_predict_classifier
model = nn.Sequential(
 Wav2Vec(pretrained=False),
 LSTM_Attn_Classifier(512, 64, 2)
)
trainloader = ...
valloader = ...
testloader = ...
trained_model, train_dict = train(model, trainloader, valloader, n_epochs=20)
test_prediction_dict = test_predict_classifier(trained_model, testloader)

Add Transforms to your DataLoader for Augmentation/Processing the wav signal

from wavencoder.transforms import Compose, AdditiveNoise, SpeedChange, Clipping, PadCrop, Reverberation
audio, _ = torchaudio.load('test.wav')
transforms = Compose([
 AdditiveNoise('path-to-noise-folder', p=0.5, snr_levels=[5, 10, 15], p=0.5), 
 SpeedChange(factor_range=(-0.5, 0.0), p=0.5), 
 Clipping(p=0.5),
 PadCrop(48000, crop_position='random', pad_position='random') 
 ])
transformed_audio = transforms(audio)

Contributing

Pull requests are welcome. For major changes, please open an issue first to discuss what you would like to change.

Please make sure to update tests as appropriate.

License

MIT

Reference

Paper Code
[1] Wav2Vec: Unsupervised Pre-training for Speech Recognition GitHub
[2] Wav2vec 2.0: Learning the structure of speech from raw audio GitHub
[3] Speaker Recognition from Raw Waveform with SincNet GitHub
[4] Learning Problem-agnostic Speech Representations from Multiple Self-supervised Tasks GitHub
[5] Mockingjay: Unsupervised Speech Representation Learning with Deep Bidirectional Transformer Encoders GitHub
[6] Improved RawNet with Feature Map Scaling for Text-independent Speaker Verification using Raw Waveforms GitHub

About

WavEncoder is a Python library for encoding audio signals, transforms for audio augmentation, and training audio classification models with PyTorch backend.

Topics

Resources

License

Code of conduct

Contributing

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages

AltStyle によって変換されたページ (->オリジナル) /