7

When I try to redirect a UTF-8 stdout in Powershell by running python3 .\test.py > test.txt, error occurs :

UnicodeEncodeError: 'charmap' codec can't encode characters in position 0-1: character maps to <undefined>

The code in test.py are as following and that is encoded by UTF-8.

print("\u2714")

The code is uploaded here. https://github.com/lingsongfeng/pytest

asked Feb 27, 2021 at 15:43
8
  • Could you provide a reproducible example code? Commented Feb 27, 2021 at 15:49
  • @Rivers OK, I uploaded it on GitHub. Commented Feb 27, 2021 at 16:00
  • github.com/lingsongfeng/pytest Commented Feb 27, 2021 at 16:02
  • Please post it here (edit your answer). Could you take some time to read this: stackoverflow.com/help/how-to-ask Commented Feb 27, 2021 at 16:05
  • @Rivers Thank you to mention that. I have already edited my question. Commented Feb 27, 2021 at 16:16

2 Answers 2

5

When redirecting I/O Python uses a default encoding for Windows (cp1252 for US Windows), but will look to an environment variable if you want to override it:

C:\> set PYTHONIOENCODING=utf8
C:\> test.py > out.txt

Recently, set PYTHONUTF8=1 will also make Python default to UTF-8 for files and I/O redirection.

answered Feb 27, 2021 at 17:33
Sign up to request clarification or add additional context in comments.

3 Comments

I tried modify the environment variable but it doesn't work. But I found sys.stdout.reconfigure(encoding='utf-8') can help. After adding this line, the redirection can work. However, the output file is UTF-16 LE format with FF FE at the beginning when I hexdumped it.
It works. How did you test it? I set in cmd window, tested, then started power shell and tested again.
Oh that's my fault. I directly copied the CMD set command into Powershell without knowing setting environment variable in pwsh is different... But there is still problem. The output file is formatted by UTF-16 LE, not UTF-8. Neither vscode nor notepad display non ASCII characters correctly.
2

As mentioned in python official docs:

On Windows, UTF-8 is used for the console device. Non-character devices such as disk files and pipes use the system locale encoding (i.e. the ANSI codepage). Non-console character devices such as NUL (i.e. where isatty() returns True) use the value of the console input and output codepages at startup, respectively for stdin and stdout/stderr. This defaults to the system locale encoding if the process is not initially attached to a console.

The special behaviour of the console can be overridden by setting the environment variable PYTHONLEGACYWINDOWSSTDIO before starting Python. In that case, the console codepages are used as for any other character device.

Under all platforms, you can override the character encoding by setting the PYTHONIOENCODING environment variable before starting Python or by using the new -X utf8 command line option and PYTHONUTF8 environment variable. However, for the Windows console, this only applies when PYTHONLEGACYWINDOWSSTDIO is also set.

It says that python use UTF-8 for console printing but ANSI for pipes.And IO redirection in powershell is actually a kind of pipes.So some utf-8 characters can't be encoded by ANSI,which causes the error. There are three solutions:

by environment

$env:PYTHONIOENCODING="UTF-8"

by hard code

sys.stdout.reconfigure(encoding='utf-8')

by command

python -X utf8 test.py
answered Jan 17, 2024 at 8:14

Comments

Your Answer

Draft saved
Draft discarded

Sign up or log in

Sign up using Google
Sign up using Email and Password

Post as a guest

Required, but never shown

Post as a guest

Required, but never shown

By clicking "Post Your Answer", you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.