Skip to content

Navigation Menu

Sign in
Appearance settings

Search code, repositories, users, issues, pull requests...

Provide feedback

We read every piece of feedback, and take your input very seriously.

Saved searches

Use saved searches to filter your results more quickly

Sign up
Appearance settings

k4black/fast-aug

Repository files navigation

fast-aug

Rust Test Workflow Status Python Test Workflow Status

Crates.io Version PyPI - Version GitHub License

fast-aug is a library for fast text augmentation, available for both Rust and Python as fast-aug.
It is designed with focus on performance and real-time usage (e.g. during training), while providing a wide range of text augmentation methods.


Please refer to respective READMEs for details:

Features and TODO

Flow

  • ChanceAugmenter
  • SelectorAugmenter
  • SequentialAugmenter

Text

  • RandomWordsAugmenter
    • Base - swaps/deletions
    • Insertions/Substitutions (from alphabet)
  • RandomCharsAugmenter
    • Base - swaps/deletions
    • Insertions/Substitutions (from provided list)
    • Insertions/Substitutions (from vocab by language tag)
  • RandomSpellingAugmenter
  • RandomKeyboardAugmenter
  • RandomEmbeddingsAugmenter
  • RandomTfIdfAugmenter
  • RandomPosAugmenter
  • EmojiNormalizer
  • Keep labels (e.g. POS tags) unchanged

Models and utils

  • Models lazy loading
    • At creation time
    • At first use
    • Background after creation
  • candle support for DL models loading
    • HF loading
    • ONNX loading
    • Optimizations (fp16/int8/int4/layers/etc)
    • GPU support
  • TF-IDF model
    • json file loading
    • sklearn model loading
  • Alphabet model
  • Language Vocab model
  • Embeddings model
    • fasttext model loading
    • word2vec model loading
  • WordNet model
    • English
    • German
    • More?

Rust

  • Formatting
    • rustfmt
    • clippy
  • rust flamegraph profiling
  • Unit tests
  • Integration tests
  • CI build and tests
  • CI publish to crates.io

Python

  • Custom Python Augmenter class (user provided to use in pipelines)
  • Bindings with
    • Base pyo3 bindings
    • maturin auto build from pyproject.toml
    • Stubs (.pyi) files generation
    • Auto generate stubs on maturing build
    • Text
    • Flow
  • Auto generate return type in stubs, see pyo3 issue
  • flamegraph profiling
  • Optimizations - see this
  • Integration tests
  • CI build and tests
  • CI publish to pypi

Development

Prerequisites

Clone the repository:

git clone git@github.com:k4black/fast-aug.git
cd fast-aug

For rust library development:

For python bindings development:

  • All rust library prerequisites
  • cd bindings/python && python -m venv .venv
  • pip >= 23.1 to use --config-settings, see pip issue

Make

The Makefile contains all the commands needed for development.

make help
  • *-rust - all targets related to rust library (fast_aug/ folder)
  • *-python - all targets related to python bindings (bindings/python/ folder)

Benchmarks

All text benchmarks are run on the tweet_eval dataset - sentiment task, test set, 12k rows.

cat test_data/tweet_eval_sentiment_test_text.txt | wc
12284 182576 1156877

License

This project and respective libraries are licensed under the MIT License - see the LICENSE file for details.

About

Fast Augmentation library for NLP

Topics

Resources

License

Stars

Watchers

Forks

Contributors 2

AltStyle によって変換されたページ (->オリジナル) /