Name	Name	Last commit message	Last commit date
Latest commit History 112 Commits
.envs	.envs
.github/workflows	.github/workflows
cfgs/FastSpeech2	cfgs/FastSpeech2
ckpts	ckpts
data	data
klaam	klaam
misc	misc
notebooks	notebooks
output	output
samples	samples
scripts	scripts
tests	tests
.gitignore	.gitignore
.pre-commit-config.yaml	.pre-commit-config.yaml
LICENSE	LICENSE
README.md	README.md
install.sh	install.sh
pyproject.toml	pyproject.toml
pytest.ini	pytest.ini
requirements.txt	requirements.txt

klaam

Arabic speech recognition, classification and text-to-speech using many advanced models like wave2vec and fastspeech2. This repository allows training and prediction using pretrained models.

1. Usage

1.1 Speech Classification

from klaam import SpeechClassification
model = SpeechClassification()
model.classify(wav_file)

1.2 Speech Recongnition

from klaam import SpeechRecognition
model = SpeechRecognition()
model.transcribe(wav_file)

1.3 Text To Speech

from klaam import TextToSpeech
prepare_tts_model_path = "../cfgs/FastSpeech2/config/Arabic/preprocess.yaml"
model_config_path = "../cfgs/FastSpeech2/config/Arabic/model.yaml"
train_config_path = "../cfgs/FastSpeech2/config/Arabic/train.yaml"
vocoder_config_path = "../cfgs/FastSpeech2/model_config/hifigan/config.json"
speaker_pre_trained_path = "../data/model_weights/hifigan/generator_universal.pth.tar"
model = TextToSpeech(prepare_tts_model_path, model_config_path, train_config_path, vocoder_config_path, speaker_pre_trained_path)
model.synthesize(sample_text)

There are two avilable models for recognition trageting Modern Standard Arabic (MSA) and Egyptian dialect (EGY) . You can set any of them using the lang attribute.

from klaam import SpeechRecognition
model = SpeechRecognition(lang = 'msa')
model.transcribe('file.wav')

2. Datasets

Dataset	Description	Link
MGB-3	Egyptian Arabic Speech recognition in the wild. Every sentence was annotated by four annotators. More than 15 hours have been collected from YouTube.	here [Registeration required]
ADI-5	More than 50 hours collected from Aljazeera TV. 4 regional dialectal: Egyptian (EGY), Levantine (LAV), Gulf (GLF), North African (NOR), and Modern Standard Arabic (MSA). This dataset is a part of the MGB-3 challenge.	here [Registeration required]
Common voice	Multlilingual dataset avilable on huggingface	here.
Arabic Speech Corpus	Arabic dataset with alignment and transcriptions	here.

3. Models

Our project currently supports four models, three of them are avilable on transformers.

Language	Description	Source
Egyptian	Speech recognition	wav2vec2-large-xlsr-53-arabic-egyptian
Standard Arabic	Speech recognition	wav2vec2-large-xlsr-53-arabic
EGY, NOR, LAV, GLF, MSA	Speech classification	wav2vec2-large-xlsr-dialect-classification
Standard Arabic	Text-to-Speech	fastspeech2

4. Example Notebooks

Name	Description	Notebook
Demo	Classification, Recongition and Text-to-speech in a few lines of code.
Demo with mic	Audio Recongition and classification with recording.

5. Training

The scripts are a modification of jqueguiner/wav2vec2-sprint.

5.1. Classification

This script is used for the classification task on the 5 classes.

python run_classifier.py \
 --model_name_or_path="facebook/wav2vec2-large-xlsr-53" \
 --output_dir=/path/to/output \
 --cache_dir=/path/to/cache/ \
 --freeze_feature_extractor \
 --num_train_epochs="50" \
 --per_device_train_batch_size="32" \
 --preprocessing_num_workers="1" \
 --learning_rate="3e-5" \
 --warmup_steps="20" \
 --evaluation_strategy="steps"\
 --save_steps="100" \
 --eval_steps="100" \
 --save_total_limit="1" \
 --logging_steps="100" \
 --do_eval \
 --do_train \

5.2. Recognition

This script is for training on the dataset for pretraining on the egyption dialects dataset.

python run_mgb3.py \
 --model_name_or_path="facebook/wav2vec2-large-xlsr-53" \
 --output_dir=/path/to/output \
 --cache_dir=/path/to/cache/ \
 --freeze_feature_extractor \
 --num_train_epochs="50" \
 --per_device_train_batch_size="32" \
 --preprocessing_num_workers="1" \
 --learning_rate="3e-5" \
 --warmup_steps="20" \
 --evaluation_strategy="steps"\
 --save_steps="100" \
 --eval_steps="100" \
 --save_total_limit="1" \
 --logging_steps="100" \
 --do_eval \
 --do_train \

This script can be used for Arabic common voice training

python run_common_voice.py \
 --model_name_or_path="facebook/wav2vec2-large-xlsr-53" \
 --dataset_config_name="ar" \
 --output_dir=/path/to/output/ \
 --cache_dir=/path/to/cache \
 --overwrite_output_dir \
 --num_train_epochs="1" \
 --per_device_train_batch_size="32" \
 --per_device_eval_batch_size="32" \
 --evaluation_strategy="steps" \
 --learning_rate="3e-4" \
 --warmup_steps="500" \
 --fp16 \
 --freeze_feature_extractor \
 --save_steps="10" \
 --eval_steps="10" \
 --save_total_limit="1" \
 --logging_steps="10" \
 --group_by_length \
 --feat_proj_dropout="0.0" \
 --layerdrop="0.1" \
 --gradient_checkpointing \
 --do_train --do_eval \
 --max_train_samples 100 --max_val_samples 100

5.3. Text To Speech

We use the pytorch implementation of fastspeech2 by ming024.

The procedure is as the following:

Download the dataset and unzip it.

wget http://en.arabicspeechcorpus.com/arabic-speech-corpus.zip
unzip arabic-speech-corpus.zip

Create multiple directories for data

mkdir -p raw_data/Arabic/Arabic preprocessed_data/Arabic/TextGrid/Arabic
cp arabic-speech-corpus/textgrid/* preprocessed_data/Arabic/TextGrid/Arabic

Prepare metadata

import os
base_dir = '/content/arabic-speech-corpus'
lines = []
for lab_file in os.listdir(f'{base_dir}/lab'):
 lines.append(lab_file[:-4]+'|'+open(f'{base_dir}/lab/{lab_file}', 'r').read())
open(f'{base_dir}/metadata.csv', 'w').write(('\n').join(lines))

Clone my repository (FastSpeech2) and installl the dependencies required.

git clone --depth 1 https://github.com/zaidalyafeai/FastSpeech2
cd FastSpeech2
pip install -r requirements.txt

Prepare alignments and prepreocessed data.

python3 prepare_align.py config/Arabic/preprocess.yaml
python3 preprocess.py config/Arabic/preprocess.yaml

Unzip vocoders.

unzip hifigan/generator_LJSpeech.pth.tar.zip -d hifigan
unzip hifigan/generator_universal.pth.tar.zip -d hifigan

Start the training.

python3 train.py -p config/Arabic/preprocess.yaml -m config/Arabic/model.yaml -t config/Arabic/train.yaml

This repository was created by the ARBML team. If you have any suggestion or contribution feel free to make a pull request.

Navigation Menu

Search code, repositories, users, issues, pull requests...

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

ARBML/klaam

Folders and files

Latest commit

History

Repository files navigation

klaam

1. Usage

1.1 Speech Classification

1.2 Speech Recongnition

1.3 Text To Speech

2. Datasets

3. Models

4. Example Notebooks

5. Training

5.1. Classification

5.2. Recognition

5.3. Text To Speech

About

Topics

Resources

License

Uh oh!

Stars

Watchers

Forks

Uh oh!

Contributors

Uh oh!

Languages