guess_bytes.β‘οΈ Updated the heuristic to fix the letter Γ in UTF-8/MacRoman mojibake, which had regressed since version 5.6.
π Packaging fixes to pyproject.toml.
β‘οΈ Updated the heuristic to fix the letter Γ with more confidence.
π Fixed type annotations and added py.typed.
π¦ ftfy is packaged using Poetry now, and wheels are created and uploaded to PyPI.
π Allow the keyword argument fix_entities as a deprecated alias for
unescape_html, raising a warning.
ftfy.formatting functions now disregard ANSI terminal escapes when
calculating text width.
π This version is purely a cosmetic change, updating the maintainer's e-mail β address and the project's canonical location on GitHub.
The remove_terminal_escapes step was accidentally not being used. This
version restores it.
Specified in setup.py that ftfy 6 requires Python 3.6 or later.
π Use a lighter link color when the docs are viewed in dark mode.
New function: ftfy.fix_and_explain() can describe all the transformations
that happen when fixing a string. This is similar to what
ftfy.fixes.fix_encoding_and_explain() did in previous versions, but it
can fix more than the encoding.
fix_and_explain() and fix_encoding_and_explain() are now in the top-level
ftfy module.
π Changed the heuristic entirely. ftfy no longer needs to categorize every Unicode character, but only characters that are expected to appear in mojibake.
π Because of the new heuristic, ftfy will no longer have to release a new version for every new version of Unicode. It should also run faster and use less RAM when imported.
The heuristic ftfy.badness.is_bad(text) can be used to determine whether
there appears to be mojibake in a string. Some users were already using
the old function sequence_weirdness() for that, but this one is actually
designed for that purpose.
Instead of a pile of named keyword arguments, ftfy functions now take in a TextFixerConfig object. The keyword arguments still work, and become settings that override the defaults in TextFixerConfig.
β Added support for UTF-8 mixups with Windows-1253 and Windows-1254.
π Overhauled the documentation: https://ftfy.readthedocs.org
This version is brought to you by the letter Γ and the number 0xC3.
π Tweaked the heuristic to decode, for example, "Γ " as the letter "Γ " more often.
This combines with the non-breaking-space fixer to decode "Γ " as "Γ " as well. However, in many cases, the text " Γ " was intended to be " Γ ", preserving the space -- the underlying mojibake had two spaces after it, but the Web coalesced them into one. We detect this case based on common French and Portuguese words, and preserve the space when it appears intended.
Thanks to @zehavoc for bringing to my attention how common this case is.
π Improved detection of UTF-8 mojibake of Greek, Cyrillic, Hebrew, and Arabic scripts.
π Fixed the undeclared dependency on setuptools by removing the use of
pkg_resources.
β‘οΈ Updated the data file of Unicode character categories to Unicode 12.1, as used in Python 3.8. (No matter what version of Python you're on, ftfy uses the same data.)
Corrected an omission where short sequences involving the ACUTE ACCENT character were not being fixed.