How to convert string into unicode in Python3?

Question 1

I tried a lot of ways to convert the string like b'\xef\xbb\xbf\xe5\x9b\xbd\xe9\x99\x85\xe5\x8f\x8b\xe8\xb0\x8a' into Chinese characters but all failed.

It's really strange that when I just use

print(b'\xef\xbb\xbf\xe5\x9b\xbd\xe9\x99\x85\xe5\x8f\x8b\xe8\xb0\x8a')

It will show decoded Chinese Characters.

But if I got the string by reading from my CSV file, it won't do. No matter how I decode the string, it will only show me b'\xef\xbb\xbf\xe5\x9b\xbd\xe9\x99\x85\xe5\x8f\x8b\xe8\xb0\x8a'

Here is my script:

import csv 
with open('need_convert.csv','r+') as csvfile:
 reader=csv.reader(csvfile)
 for row in reader:
 new_row=''.join(row)
 print('new_row:')
 print(type(new_row))
 print(new_row)
 print('convert:')
 print(new_row.decode('utf-8'))

Here is my data (csv file): b'\xef\xbb\xbf\xe5\x9b\xbd\xe9\x99\x85\xe5\x8f\x8b\xe8\xb0\x8a' b'\xef\xbb\xbf\xe9\xba\x92\xe9\xba\x9f\xe6\x9d\xaf' b'\xef\xbb\xbf\xe5\x9b\xbd\xe9\x99\x85\xe5\x8f\x8b\xe8\xb0\x8a'

Question 2

Do not post code/data as images. Post as text

Question 3

have you tried: print(str(your_encoding))

Question 4

Welcome to Stack Overflow! Please edit your question to include the Python-code as text and include a also some more examples of coded characters in text-form. Thanks!

Question 5

You need to read with the correct encoding.

Question 6

Hi Fallenreaper, Yes, I've tried you method, not working. Sorry.

Question 7

row contents and new_row are both strings, not byte types. Below, I'm using exec('s=' + row[0]) to interpret them as desired, assuming the input is safe.

import csv
with open('need_convert.csv','r+') as csvfile:
 reader=csv.reader(csvfile)
 for row in reader:
 print(type(row[0]), row[0])
 exec('s=' + row[0])
 print(type(s), s)
 print(s.decode('utf-8'))

Output:

<class 'str'> b'\xef\xbb\xbf\xe5\x9b\xbd\xe9\x99\x85\xe5\x8f\x8b\xe8\xb0\x8a'
<class 'bytes'> b'\xef\xbb\xbf\xe5\x9b\xbd\xe9\x99\x85\xe5\x8f\x8b\xe8\xb0\x8a'
国际友谊
<class 'str'> b'\xef\xbb\xbf\xe9\xba\x92\xe9\xba\x9f\xe6\x9d\xaf'
<class 'bytes'> b'\xef\xbb\xbf\xe9\xba\x92\xe9\xba\x9f\xe6\x9d\xaf'
麒麟杯
<class 'str'> b'\xef\xbb\xbf\xe5\x9b\xbd\xe9\x99\x85\xe5\x8f\x8b\xe8\xb0\x8a'
<class 'bytes'> b'\xef\xbb\xbf\xe5\x9b\xbd\xe9\x99\x85\xe5\x8f\x8b\xe8\xb0\x8a'
国际友谊

Question 8

What does one do when they do not trust the input?

mike.k 3,4771 gold badge15 silver badges20 bronze badges · Accepted Answer · 2018-06-19 03:40:47Z

row contents and new_row are both strings, not byte types. Below, I'm using exec('s=' + row[0]) to interpret them as desired, assuming the input is safe.

import csv
with open('need_convert.csv','r+') as csvfile:
 reader=csv.reader(csvfile)
 for row in reader:
 print(type(row[0]), row[0])
 exec('s=' + row[0])
 print(type(s), s)
 print(s.decode('utf-8'))

Output:

<class 'str'> b'\xef\xbb\xbf\xe5\x9b\xbd\xe9\x99\x85\xe5\x8f\x8b\xe8\xb0\x8a'
<class 'bytes'> b'\xef\xbb\xbf\xe5\x9b\xbd\xe9\x99\x85\xe5\x8f\x8b\xe8\xb0\x8a'
国际友谊
<class 'str'> b'\xef\xbb\xbf\xe9\xba\x92\xe9\xba\x9f\xe6\x9d\xaf'
<class 'bytes'> b'\xef\xbb\xbf\xe9\xba\x92\xe9\xba\x9f\xe6\x9d\xaf'
麒麟杯
<class 'str'> b'\xef\xbb\xbf\xe5\x9b\xbd\xe9\x99\x85\xe5\x8f\x8b\xe8\xb0\x8a'
<class 'bytes'> b'\xef\xbb\xbf\xe5\x9b\xbd\xe9\x99\x85\xe5\x8f\x8b\xe8\xb0\x8a'
国际友谊

CollectivesTM on Stack Overflow

How to convert string into unicode in Python3?

1 Answer 1

1 Comment

Your Answer

Sign up or log in

Post as a guest

Post as a guest

Linked

Hot Network Questions

CollectivesTM on Stack Overflow

1 Answer 1

1 Comment

Your Answer

Sign up or log in

Post as a guest

Post as a guest

Linked

Related