PyYoshi/cChardet

Folders and files

Name		Name	Last commit message	Last commit date
Latest commit History 309 Commits
.github		.github
src		src
tests		tests
tools		tools
.gitignore		.gitignore
.gitmodules		.gitmodules
CHANGES.md		CHANGES.md
COPYING		COPYING
MANIFEST.in		MANIFEST.in
Makefile		Makefile
README.md		README.md
pyproject.toml		pyproject.toml
requirements-dev.lock		requirements-dev.lock
requirements-dev.txt		requirements-dev.txt
requirements.lock		requirements.lock
setup.py		setup.py

Repository files navigation

cChardet

PyPI version Run tests Build Wheels

cChardet is high speed universal character encoding detector. - binding to uchardet.

Supported Languages/Encodings

International (Unicode)
- UTF-8
- UTF-16BE / UTF-16LE
- UTF-32BE / UTF-32LE / X-ISO-10646-UCS-4-34121 / X-ISO-10646-UCS-4-21431
Arabic
- ISO-8859-6
- WINDOWS-1256
Bulgarian
- ISO-8859-5
- WINDOWS-1251
Chinese
- ISO-2022-CN
- BIG5
- EUC-TW
- GB18030
- HZ-GB-2312
Croatian:
- ISO-8859-2
- ISO-8859-13
- ISO-8859-16
- Windows-1250
- IBM852
- MAC-CENTRALEUROPE
Czech
- Windows-1250
- ISO-8859-2
- IBM852
- MAC-CENTRALEUROPE
Danish
- ISO-8859-1
- ISO-8859-15
- WINDOWS-1252
English
- ASCII
Esperanto
- ISO-8859-3
Estonian
- ISO-8859-4
- ISO-8859-13
- ISO-8859-13
- Windows-1252
- Windows-1257
Finnish
- ISO-8859-1
- ISO-8859-4
- ISO-8859-9
- ISO-8859-13
- ISO-8859-15
- WINDOWS-1252
French
- ISO-8859-1
- ISO-8859-15
- WINDOWS-1252
German
- ISO-8859-1
- WINDOWS-1252
Greek
- ISO-8859-7
- WINDOWS-1253
Hebrew
- ISO-8859-8
- WINDOWS-1255
Hungarian:
- ISO-8859-2
- WINDOWS-1250
Irish Gaelic
- ISO-8859-1
- ISO-8859-9
- ISO-8859-15
- WINDOWS-1252
Italian
- ISO-8859-1
- ISO-8859-3
- ISO-8859-9
- ISO-8859-15
- WINDOWS-1252
Japanese
- ISO-2022-JP
- SHIFT_JIS
- EUC-JP
Korean
- ISO-2022-KR
- EUC-KR / UHC
Lithuanian
- ISO-8859-4
- ISO-8859-10
- ISO-8859-13
Latvian
- ISO-8859-4
- ISO-8859-10
- ISO-8859-13
Maltese
- ISO-8859-3
Polish:
- ISO-8859-2
- ISO-8859-13
- ISO-8859-16
- Windows-1250
- IBM852
- MAC-CENTRALEUROPE
Portuguese
- ISO-8859-1
- ISO-8859-9
- ISO-8859-15
- WINDOWS-1252
Romanian:
- ISO-8859-2
- ISO-8859-16
- Windows-1250
- IBM852
Russian
- ISO-8859-5
- KOI8-R
- WINDOWS-1251
- MAC-CYRILLIC
- IBM866
- IBM855
Slovak
- Windows-1250
- ISO-8859-2
- IBM852
- MAC-CENTRALEUROPE
Slovene
- ISO-8859-2
- ISO-8859-16
- Windows-1250
- IBM852
- M

Example

import cchardet as chardet
with open(r"tests/samples/wikipediaJa_One_Thousand_and_One_Nights_SJIS.txt", "rb") as f:
 msg = f.read()
 result = chardet.detect(msg)
 print(result)

Benchmark

$ python setup.py build_ext -i -f
$ python tests/bench.py

Results

CPU: AMD Ryzen 9 7950X3D

RAM: DDR5-5600MT/s 96GB

Platform: Ubuntu 24.04 amd64

Python 3.12.3

Request (call/s)
chardet v5.2.0	1.1
cchardet v2.2.0a1	2263.6

LICENSE

See COPYING file.

Contact

Issues

Support Platforms

Windows i686, x86_64
Linux i686, x86_64
macOS x86_64

About

universal character encoding detector

Releases 5

2.1.7 Latest

Oct 27, 2020

+ 4 releases

Packages

No packages published

Navigation Menu

Search code, repositories, users, issues, pull requests...

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

License

PyYoshi/cChardet

Folders and files

Latest commit

History

Repository files navigation

cChardet

Supported Languages/Encodings

Example

Benchmark

Results

Python 3.12.3

LICENSE

Contact

Support Platforms

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases 5

Packages

Uh oh!

Contributors 11

Uh oh!

Languages

License

PyYoshi/cChardet

Folders and files

Latest commit

History

Repository files navigation

cChardet

Supported Languages/Encodings

Example

Benchmark

Results

Python 3.12.3

LICENSE

Contact

Support Platforms

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases 5

Packages 0

Uh oh!

Contributors 11

Uh oh!

Languages

Packages