Skip to content

Navigation Menu

Sign in
Appearance settings

Search code, repositories, users, issues, pull requests...

Provide feedback

We read every piece of feedback, and take your input very seriously.

Saved searches

Use saved searches to filter your results more quickly

Sign up
Appearance settings

mdmmn378/texy

Repository files navigation

Texy: A conservative text processing library


Python PyPI - Version

A utility library for quickly cleaning texts

Installation

Python version in the dev environment: 3.11.5

pip install -U texy

Usage

Pipelines with parallelization in Rust:

>>> from texy.pipelines import extreme_clean, strict_clean, relaxed_clean
>>> data = ["hello ;/ from the other side 😊 \t "]
print(extreme_clean(data))
>>> ['hello from the other side']
print(strict_clean(data))
>>> ['hello ;/ from the other side']
print(relaxed_clean(data))
>>> ['hello ;/ from the other side 😊']

Parallelize custom functions with Python Multiprocessing:

from texy.pipelines import parallelize
def dummy(x):
 return [i[0] for i in x]
data = ["a ", "b ", "c ", "d ", "e ", "f ", "g ", "h ?."] * 100
print(parallelize(dummy, data, 2))

Actions

Pipeline Actions
relaxed_clean remove_newlines, remove_html, remove_xml, merge_spaces
strict_clean remove_newlines, remove_urls, remove_emails, remove_html, remove_xml, remove_emoticons, remove_emojis, remove_infrequent_punctuations, merge_spaces
extreme_clean remove_newlines, remove_urls, remove_emails, remove_html, remove_xml, remove_emoticons, remove_emojis, remove_all_punctuations, merge_spaces

About

Texy: A conservative text processing library

Topics

Resources

Stars

Watchers

Forks

Packages

No packages published

AltStyle γ«γ‚ˆγ£γ¦ε€‰ζ›γ•γ‚ŒγŸγƒšγƒΌγ‚Έ (->γ‚ͺγƒͺγ‚ΈγƒŠγƒ«) /