Unicode string is working on python2 but not in python3

Question 1

I wrote a program in which I have some Unicode labels, but strange issue is appearing. The Unicode string is working fine on python2, but not in python3, It works in python3 on my other computer, btw, but on the live server it is not working. It's strange. Kindly help please.

I tried the same code on my other PC running python3 and on python2 on a live server. It works, but when I run the same code on python3 on the same live server I get an error.

>>> pt = 'Casa e Decoração'

Error:

File "<stdin>", line 0
 ^
SyntaxError: 'utf-8' codec can't decode byte 0xe7 in position 19: invalid continuation byte
>>>

Question 2

What is this "live server"?

Question 3

ubuntu version# 19 .. but its working fine on my other pc ubuntu18

Question 4

it should work like stated on this page stackoverflow.com/questions/30539882/… btw that is for windows, i need it work same on ubuntu.

Question 5

is there any one who can really help please ?? this is strange issue, error appears on simple variable assignment ..

Question 6

I suspect however you're copying this data in, it's being copied in as raw latin-1 (or a related one byte per character ASCII superset encoding). On Python 2, latin-1 is the default unicode literal encoding, but on Python 3, they switched to UTF-8 (with the option to specify an alternate encoding).

If your tooling is using latin-1 for the source file bytes, then you'd see this problem. The simplest solution is to replace the literal characters with equivalent ASCII-only escapes, which Python can then decode at parse time. For example,

>>> pt = 'Casa e Decora\xe7\xe3o'

has the same meaning, and isn't subject to misinterpretation.

Alternatively, you might be able to get away with leaving your latin-1 code as is, and putting:

# -*- coding: latin-1 -*-

as the first or second line of your source file (it must be that early; at line three and beyond it's ignored) so Python knows to interpret the remaining bytes as latin-1, rather than the utf-8 default.

Question 7

actually i am grabbing data from site with selenium, that site is in other language, same code working fine on my other pc python3, and on same live server python2, but not on python3 on live server .. its still not working i tried setting latin-1 coding. no success so far

Question 8

As an addition to @ShadowRanger's answer about file encodings.

There is a conversion utility on Linux. For instance assuming your file is written in Portugese, it might be using the code page 860. On the command line (i.e. Linux shell, not Python code):

iconv -f CP860 -t UTF8 inputfile.py > outputfile.py

If it does not help, try to autodetect the encoding before converting. Link: https://superuser.com/q/301552

Question 9

ShadowRanger 158k12 gold badges222 silver badges317 bronze badges · Accepted Answer · 2019-08-17 03:06:07Z

I suspect however you're copying this data in, it's being copied in as raw latin-1 (or a related one byte per character ASCII superset encoding). On Python 2, latin-1 is the default unicode literal encoding, but on Python 3, they switched to UTF-8 (with the option to specify an alternate encoding).

If your tooling is using latin-1 for the source file bytes, then you'd see this problem. The simplest solution is to replace the literal characters with equivalent ASCII-only escapes, which Python can then decode at parse time. For example,

>>> pt = 'Casa e Decora\xe7\xe3o'

has the same meaning, and isn't subject to misinterpretation.

Alternatively, you might be able to get away with leaving your latin-1 code as is, and putting:

# -*- coding: latin-1 -*-

as the first or second line of your source file (it must be that early; at line three and beyond it's ignored) so Python knows to interpret the remaining bytes as latin-1, rather than the utf-8 default.

actually i am grabbing data from site with selenium, that site is in other language, same code working fine on my other pc python3, and on same live server python2, but not on python3 on live server .. its still not working i tried setting latin-1 coding. no success so far

CollectivesTM on Stack Overflow

Unicode string is working on python2 but not in python3

2 Answers 2

1 Comment

1 Comment

Your Answer

Sign up or log in

Post as a guest

Post as a guest

Linked

Hot Network Questions

CollectivesTM on Stack Overflow

2 Answers 2

1 Comment

1 Comment

Your Answer

Sign up or log in

Post as a guest

Post as a guest

Linked

Related