Working with UTF-8 encoding in Python source [duplicate]

Question 1

Consider:

$ cat bla.py 
u = unicode('d...')
s = u.encode('utf-8')
print s
$ python bla.py 
 File "bla.py", line 1
SyntaxError: Non-ASCII character '\xe2' in file bla.py on line 1, but no encoding declared; see http://www.python.org/peps/pep-0263.html for details

How can I declare UTF-8 strings in source code?

Question 2

"See python.org/peps/pep-0263.html for details" seems clear to me.

Question 3

In Python 3, UTF-8 is the default source encoding (see PEP 3120), so Unicode characters can be used anywhere.

In Python 2, you can declare in the source code header:

# -*- coding: utf-8 -*-
....

This is described in PEP 0263.

Then you can use UTF-8 in strings:

# -*- coding: utf-8 -*-
u = 'idzie wąż wąską dróżką'
uu = u.decode('utf8')
s = uu.encode('cp1250')
print(s)

Question 4

now it gives """UnicodeDecodeError: 'ascii' codec can't decode byte 0xe2 in position 1: ordinal not in range(128)"""

Question 5

You need not use unicode(), simply write string in UTF-8 encoding.

Question 6

In Python versions older than 3, you also need to prefix unicode string literals with "u": some_string = u'idzie wąż wąską dróżką'.

Question 7

on a diffrent string I am getting """UnicodeEncodeError: 'charmap' codec can't encode characters in position 1845-1846: character maps to <undefined>"""... does that mean a different encoding is required?

Question 8

or #!/usr/bin/env python # coding: utf-8

Question 9

Do not forget to verify if your text editor encodes properly your code in UTF-8.

Otherwise, you may have invisible characters that are not interpreted as UTF-8.

Question 10

Is this needed for python3? I know python3 assumes all literals within the code are unicode. But does it assume the source files are also written in utf8?

Question 11

@RicardoCruz Yes I believe utf-8 is the default for Python 3. See python.org/dev/peps/pep-3120

Question 12

@ricardo-cruz With Python 3, all strings will be Unicode strings, so the original encoding of the source will have no impact at run-time. 1. PEP 3120 -- Using UTF-8 as the default source encoding 2. PEP 263 -- Defining Python Source Code Encodings

Question 13

@noobninja thanks for the links: PEP 3120 confirms that the source code itself is now assumed to be UTF-8, not just strings.

Question 14

Use # coding: utf8 instead of # -*- coding: utf-8 -*-which is far easier to remember.

Michał Niklas 54.6k19 gold badges77 silver badges125 bronze badges · Accepted Answer · 2011-06-09 07:31:59Z

876

In Python 3, UTF-8 is the default source encoding (see PEP 3120), so Unicode characters can be used anywhere.

In Python 2, you can declare in the source code header:

# -*- coding: utf-8 -*-
....

This is described in PEP 0263.

Then you can use UTF-8 in strings:

# -*- coding: utf-8 -*-
u = 'idzie wąż wąską dróżką'
uu = u.decode('utf8')
s = uu.encode('cp1250')
print(s)

Share

Improve this answer

edited Jan 3, 2023 at 7:54

answered Jun 9, 2011 at 7:31

Michał Niklas's user avatar

Michał Niklas

54.6k19 gold badges77 silver badges125 bronze badges

Sign up to request clarification or add additional context in comments.

17 Comments

Nullpoet

Nullpoet Over a year ago

now it gives """UnicodeDecodeError: 'ascii' codec can't decode byte 0xe2 in position 1: ordinal not in range(128)"""

2011年06月09日T07:36:16.827Z+00:00

Michał Niklas

Michał Niklas Over a year ago

You need not use unicode(), simply write string in UTF-8 encoding.

2011年06月09日T08:03:22.567Z+00:00

Anton Strogonoff

Anton Strogonoff Over a year ago

In Python versions older than 3, you also need to prefix unicode string literals with "u": some_string = u'idzie wąż wąską dróżką'.

2011年06月09日T08:06:28.08Z+00:00

Nullpoet

Nullpoet Over a year ago

on a diffrent string I am getting """UnicodeEncodeError: 'charmap' codec can't encode characters in position 1845-1846: character maps to <undefined>"""... does that mean a different encoding is required?

2011年06月09日T08:20:00.987Z+00:00

warvariuc

warvariuc Over a year ago

or #!/usr/bin/env python # coding: utf-8

2011年06月09日T08:47:11.237Z+00:00

|

CollectivesTM on Stack Overflow

Working with UTF-8 encoding in Python source [duplicate]

2 Answers 2

17 Comments

6 Comments

Linked

Hot Network Questions

CollectivesTM on Stack Overflow

2 Answers 2

17 Comments

6 Comments

Linked

Related