When I try to redirect a UTF-8 stdout in Powershell by running python3 .\test.py > test.txt, error occurs :
UnicodeEncodeError: 'charmap' codec can't encode characters in position 0-1: character maps to <undefined>
The code in test.py are as following and that is encoded by UTF-8.
print("\u2714")
The code is uploaded here. https://github.com/lingsongfeng/pytest
-
Could you provide a reproducible example code?Rivers– Rivers2021年02月27日 15:49:02 +00:00Commented Feb 27, 2021 at 15:49
-
@Rivers OK, I uploaded it on GitHub.flsflsfls– flsflsfls2021年02月27日 16:00:16 +00:00Commented Feb 27, 2021 at 16:00
-
github.com/lingsongfeng/pytestflsflsfls– flsflsfls2021年02月27日 16:02:06 +00:00Commented Feb 27, 2021 at 16:02
-
Please post it here (edit your answer). Could you take some time to read this: stackoverflow.com/help/how-to-askRivers– Rivers2021年02月27日 16:05:18 +00:00Commented Feb 27, 2021 at 16:05
-
@Rivers Thank you to mention that. I have already edited my question.flsflsfls– flsflsfls2021年02月27日 16:16:17 +00:00Commented Feb 27, 2021 at 16:16
2 Answers 2
When redirecting I/O Python uses a default encoding for Windows (cp1252 for US Windows), but will look to an environment variable if you want to override it:
C:\> set PYTHONIOENCODING=utf8
C:\> test.py > out.txt
Recently, set PYTHONUTF8=1 will also make Python default to UTF-8 for files and I/O redirection.
3 Comments
sys.stdout.reconfigure(encoding='utf-8') can help. After adding this line, the redirection can work. However, the output file is UTF-16 LE format with FF FE at the beginning when I hexdumped it.As mentioned in python official docs:
On Windows, UTF-8 is used for the console device. Non-character devices such as disk files and pipes use the system locale encoding (i.e. the ANSI codepage). Non-console character devices such as NUL (i.e. where isatty() returns True) use the value of the console input and output codepages at startup, respectively for stdin and stdout/stderr. This defaults to the system locale encoding if the process is not initially attached to a console.
The special behaviour of the console can be overridden by setting the environment variable PYTHONLEGACYWINDOWSSTDIO before starting Python. In that case, the console codepages are used as for any other character device.
Under all platforms, you can override the character encoding by setting the PYTHONIOENCODING environment variable before starting Python or by using the new -X utf8 command line option and PYTHONUTF8 environment variable. However, for the Windows console, this only applies when PYTHONLEGACYWINDOWSSTDIO is also set.
It says that python use UTF-8 for console printing but ANSI for pipes.And IO redirection in powershell is actually a kind of pipes.So some utf-8 characters can't be encoded by ANSI,which causes the error. There are three solutions:
by environment
$env:PYTHONIOENCODING="UTF-8"
by hard code
sys.stdout.reconfigure(encoding='utf-8')
by command
python -X utf8 test.py