homepage

This issue tracker has been migrated to GitHub , and is currently read-only.
For more information, see the GitHub FAQs in the Python's Developer Guide.

classification
Title: Python does not support the GEORGIAN-PS charset
Type: crash Stage: resolved
Components: Unicode Versions: Python 3.11, Python 3.10, Python 3.9
process
Status: open Resolution:
Dependencies: Superseder:
Assigned To: Nosy List: Caolán.McNamara, ezio.melotti, jwilk, lemburg, loewis, serhiy.storchaka, taleinat, vstinner
Priority: normal Keywords:

Created on 2013年10月31日 10:52 by Caolán.McNamara, last changed 2022年04月11日 14:57 by admin.

Files
File name Uploaded Description Edit
georgian_ps.py vstinner, 2013年10月31日 11:24
Messages (7)
msg201800 - (view) Author: Caolán McNamara (Caolán.McNamara) Date: 2013年10月31日 10:52
LANG=ka_GE.georgianps /usr/bin/python3
Fatal Python error: Py_Initialize: Unable to get the locale encoding
LookupError: unknown encoding: GEORGIAN-PS
Aborted (core dumped)
but with python-2.7.5 no crash...
LANG=ka_GE.georgianps /usr/bin/python2
Python 2.7.5 (default, Oct 8 2013, 12:19:40) 
[GCC 4.8.1 20130603 (Red Hat 4.8.1-1)] on linux2
Type "help", "copyright", "credits" or "license" for more information.
>>>
(fedora 19)
msg201801 - (view) Author: STINNER Victor (vstinner) * (Python committer) Date: 2013年10月31日 10:56
This bug was initially reported in LibreOffice:
https://bugs.freedesktop.org/show_bug.cgi?id=68850 
msg201802 - (view) Author: STINNER Victor (vstinner) * (Python committer) Date: 2013年10月31日 11:24
I found three georgian encodings:
https://sourceware.org/git/?p=glibc.git;a=blob;f=localedata/charmaps/GEORGIAN-PS;h=64615ff4344d74ea0c70cfd7a6c6c8019afb884e;hb=HEAD
https://sourceware.org/git/?p=glibc.git;a=blob;f=localedata/charmaps/GEORGIAN-ACADEMY;h=9dc1bc9e782e9fe6092a00daf1a75274fd6dd738;hb=HEAD
http://tools.ietf.org/html/draft-giasher-geostd8-00
The first one ("GEORGIAN-PS") is probably the most accurate because it is the one included in the GNU libc.
Could you please try to copy attached georgian_ps.py file into /usr/lib64/python3.3/encodings/ (or /usr/lib/python3.3/encodings/ for 32-bit Linux)?
Then try to print georgian letters using:
 print(bytes(range(0xc0, 0xe6)).decode("GEORGIAN-PS"))
Please give me also your locale encoding:
 import locale; print(locale.getpreferredencoding())
@Caolán: Do you know the GEORGIAN-ACADEMY encoding? It doesn't look to be used by any glibc locale.
On my Fedora 18, I have 3 georgian locales:
* ka_GE.georgianps: locale encoding GEORGIAN-PS
* ka_GE: locale encoding GEORGIAN-PS
* ka_GE.utf8: locale encoding UTF-8
You can workaround this issue by switching your locale from ka_GE.georgianps to ka_GE.utf8.
msg404214 - (view) Author: Tal Einat (taleinat) * (Python committer) Date: 2021年10月18日 19:46
With recent versions of Python (e.g. 3.9) this no longer causes a crash. Python apparently falls back to UTF-8, at least on my system:
$ LANG=ka_GE.georgianps python3.9
Python 3.9.7 (default, Sep 9 2021, 23:20:13) 
[GCC 9.3.0] on linux
Type "help", "copyright", "credits" or "license" for more information.
>>> import locale; print(locale.getpreferredencoding())
UTF-8
I'm marking this as fixed. If someone still has issues with this encoding, please open a new issue with up-to-date information.
msg404250 - (view) Author: STINNER Victor (vstinner) * (Python committer) Date: 2021年10月18日 23:46
Python uses UTF-8 if the locale is not supported:
$ LANG=xxx python3.9 -c "import sys; print(sys.flags.utf8_mode)"
1
On Fedora 34, the locale is still supported, and Python 3.11 still fails:
vstinner@apu$ LANG=ka_GE.georgianps locale
LANG=ka_GE.georgianps
LC_CTYPE="ka_GE.georgianps"
LC_NUMERIC="ka_GE.georgianps"
LC_TIME="ka_GE.georgianps"
LC_COLLATE="ka_GE.georgianps"
LC_MONETARY="ka_GE.georgianps"
LC_MESSAGES="ka_GE.georgianps"
LC_PAPER="ka_GE.georgianps"
LC_NAME="ka_GE.georgianps"
LC_ADDRESS="ka_GE.georgianps"
LC_TELEPHONE="ka_GE.georgianps"
LC_MEASUREMENT="ka_GE.georgianps"
LC_IDENTIFICATION="ka_GE.georgianps"
LC_ALL=
vstinner@apu$ LANG=ka_GE.georgianps python3.11 -c "import sys; print(sys.flags.utf8_mode)"
Python path configuration:
 PYTHONHOME = (not set)
 PYTHONPATH = (not set)
 program name = './python'
 isolated = 0
 environment = 1
 user site = 1
 import site = 1
 stdlib dir = '/home/vstinner/python/main/Lib'
 sys._base_executable = '/home/vstinner/python/main/python'
 sys.base_prefix = '/usr/local'
 sys.base_exec_prefix = '/usr/local'
 sys.platlibdir = 'lib'
 sys.executable = '/home/vstinner/python/main/python'
 sys.prefix = '/usr/local'
 sys.exec_prefix = '/usr/local'
 sys.path = [
 '/usr/local/lib/python311.zip',
 '/home/vstinner/python/main/Lib',
 '/home/vstinner/python/main/build/lib.linux-x86_64-3.11-pydebug',
 ]
Fatal Python error: init_fs_encoding: failed to get the Python codec of the filesystem encoding
Python runtime state: core initialized
LookupError: unknown encoding: GEORGIAN-PS
Current thread 0x00007ff89b81d2c0 (most recent call first):
 <no Python frame>
msg404275 - (view) Author: Serhiy Storchaka (serhiy.storchaka) * (Python committer) Date: 2021年10月19日 08:44
Possible solutions (they can be combined):
1. Add support for the GEORGIAN-PS charset and all other encodings used in libc (issue22679). The problem is that it is difficult to get the official information about these encodings.
2. Falls back to utf-8 or ascii+surrogateescape in case of unsupported locale encoding. But typos can slip unnoticed.
msg404290 - (view) Author: Marc-Andre Lemburg (lemburg) * (Python committer) Date: 2021年10月19日 11:20
On 19.10.2021 10:44, Serhiy Storchaka wrote:
> 
> Possible solutions (they can be combined):
> 
> 1. Add support for the GEORGIAN-PS charset and all other encodings used in libc (issue22679). The problem is that it is difficult to get the official information about these encodings.
As with all encodings we add: there has to be a real need to support
them natively in Python (as opposed to installing codecs via PyPI)
and we need a definite source for the encoding, e.g. a standards
document from an official body.
IMO, we should not really add more encodings to the stdlib, but instead
point people to e.g. the iconv package:
https://pypi.org/project/python-iconv/
Perhaps we ought to make it easier for such packages to provide
additional codecs even during the startup phase, e.g. via a special
env var which points Python to a list of codec packages to load
prior to initializing the I/O encoding... not sure whether this is
possible, though.
> 2. Falls back to utf-8 or ascii+surrogateescape in case of unsupported locale encoding. But typos can slip unnoticed.
I think this would be a more general solution to such cases, provided
the startup logic issues a visible warning about the fallback.
History
Date User Action Args
2022年04月11日 14:57:52adminsetgithub: 63658
2021年12月11日 19:13:45iritkatrielsetversions: + Python 3.9, Python 3.10, Python 3.11, - Python 3.3, Python 3.4
2021年10月19日 11:20:36lemburgsetmessages: + msg404290
2021年10月19日 08:44:49serhiy.storchakasetmessages: + msg404275
2021年10月18日 23:46:36vstinnersetstatus: closed -> open
resolution: fixed ->
messages: + msg404250
2021年10月18日 19:46:45taleinatsetstatus: open -> closed

nosy: + taleinat
messages: + msg404214

resolution: fixed
stage: resolved
2014年10月28日 14:29:49jwilksetnosy: + jwilk
2014年10月20日 16:50:51serhiy.storchakalinkissue22679 dependencies
2013年10月31日 11:37:25serhiy.storchakasetnosy: + lemburg, loewis, serhiy.storchaka
2013年10月31日 11:25:06vstinnersettitle: Fatal Python error: Py_Initialize: Unable to get the locale encoding: GEORGIAN-PS -> Python does not support the GEORGIAN-PS charset
versions: + Python 3.4
2013年10月31日 11:24:45vstinnersetfiles: + georgian_ps.py

messages: + msg201802
2013年10月31日 10:56:24vstinnersetnosy: + vstinner
messages: + msg201801
2013年10月31日 10:52:59Caolán.McNamaracreate

AltStyle によって変換されたページ (->オリジナル) /