homepage

This issue tracker has been migrated to GitHub , and is currently read-only.
For more information, see the GitHub FAQs in the Python's Developer Guide.

Author eryksun
Recipients eryksun, vstinner
Date 2022年02月07日.17:53:46
SpamBayes Score -1.0
Marked as misclassified Yes
Message-id <1644256426.75.0.396130146485.issue46668@roundup.psfhosted.org>
In-reply-to
Content
> I don't think that this fallback is needed anymore. Which Windows
> code page can be used as ANSI code page which is not already 
> implemented as a Python codec?
Python has full coverage of the ANSI and OEM code pages in the standard Windows locales, but I don't have any experience with custom (i.e. supplemental or replacement) locales.
https://docs.microsoft.com/en-us/windows/win32/intl/custom-locales 
Here's a simple script to check the standard locales.
 import codecs
 import ctypes
 kernel32 = ctypes.WinDLL('kernel32', use_last_error=True)
 LOCALE_ALL = 0
 LOCALE_WINDOWS = 1
 LOCALE_IDEFAULTANSICODEPAGE = 0x1004
 LOCALE_IDEFAULTCODEPAGE = 0x000B # OEM
 EnumSystemLocalesEx = kernel32.EnumSystemLocalesEx
 GetLocaleInfoEx = kernel32.GetLocaleInfoEx
 GetCPInfoExW = kernel32.GetCPInfoExW
 EnumLocalesProcEx = ctypes.WINFUNCTYPE(ctypes.c_int,
 ctypes.c_wchar_p, ctypes.c_ulong, ctypes.c_void_p)
 class CPINFOEXW(ctypes.Structure):
 _fields_ = (('MaxCharSize', ctypes.c_uint),
 ('DefaultChar', ctypes.c_ubyte * 2),
 ('LeadByte', ctypes.c_ubyte * 12),
 ('UnicodeDefaultChar', ctypes.c_wchar),
 ('CodePage', ctypes.c_uint),
 ('CodePageName', ctypes.c_wchar * 260))
 def get_all_locale_code_pages():
 result = []
 seen = set()
 info = (ctypes.c_wchar * 100)()
 @EnumLocalesProcEx
 def callback(locale, flags, param):
 for lctype in (LOCALE_IDEFAULTANSICODEPAGE, LOCALE_IDEFAULTCODEPAGE):
 if (GetLocaleInfoEx(locale, lctype, info, len(info)) and
 info.value not in ('0', '1')):
 cp = int(info.value)
 if cp in seen:
 continue
 seen.add(cp)
 cp_info = CPINFOEXW()
 if not GetCPInfoExW(cp, 0, ctypes.byref(cp_info)):
 cp_info.CodePage = cp
 cp_info.CodePageName = str(cp)
 result.append(cp_info)
 return True
 if not EnumSystemLocalesEx(callback, LOCALE_WINDOWS, None, None):
 raise ctypes.WinError(ctypes.get_last_error())
 result.sort(key=lambda x: x.CodePage)
 return result
 supported = []
 unsupported = []
 for cp_info in get_all_locale_code_pages():
 cp = cp_info.CodePage
 try:
 codecs.lookup(f'cp{cp}')
 except LookupError:
 unsupported.append(cp_info)
 else:
 supported.append(cp_info)
 if unsupported:
 print('Unsupported:\n')
 for cp_info in unsupported:
 print(cp_info.CodePageName)
 print('\nSupported:\n')
 else:
 print('All Supported:\n')
 for cp_info in supported:
 print(cp_info.CodePageName)
Output:
 All Supported:
 437 (OEM - United States)
 720 (Arabic - Transparent ASMO)
 737 (OEM - Greek 437G)
 775 (OEM - Baltic)
 850 (OEM - Multilingual Latin I)
 852 (OEM - Latin II)
 855 (OEM - Cyrillic)
 857 (OEM - Turkish)
 862 (OEM - Hebrew)
 866 (OEM - Russian)
 874 (ANSI/OEM - Thai)
 932 (ANSI/OEM - Japanese Shift-JIS)
 936 (ANSI/OEM - Simplified Chinese GBK)
 949 (ANSI/OEM - Korean)
 950 (ANSI/OEM - Traditional Chinese Big5)
 1250 (ANSI - Central Europe)
 1251 (ANSI - Cyrillic)
 1252 (ANSI - Latin I)
 1253 (ANSI - Greek)
 1254 (ANSI - Turkish)
 1255 (ANSI - Hebrew)
 1256 (ANSI - Arabic)
 1257 (ANSI - Baltic)
 1258 (ANSI/OEM - Viet Nam)
Some locales are Unicode only (e.g. Hindi-India) or have no OEM code page, which the above code skips by checking for "0" or "1" as the code page value. Windows 10+ allows setting the system locale to a Unicode-only locale, for which it uses UTF-8 (65001) for ANSI and OEM.
The OEM code page matters because the console input and output code pages default to OEM, e.g. for os.device_encoding(). The console's I/O code pages are used in Python by low-level os.read() and os.write(). Note that the console doesn't properly implement using UTF-8 (65001) as the input code page. In this case, input read from the console via ReadFile() or ReadConsoleA() has a null byte in place of each non-ASCII character.
History
Date User Action Args
2022年02月07日 17:53:46eryksunsetrecipients: + eryksun, vstinner
2022年02月07日 17:53:46eryksunsetmessageid: <1644256426.75.0.396130146485.issue46668@roundup.psfhosted.org>
2022年02月07日 17:53:46eryksunlinkissue46668 messages
2022年02月07日 17:53:46eryksuncreate

AltStyle によって変換されたページ (->オリジナル) /