I tried a lot of ways to convert the string like b'\xef\xbb\xbf\xe5\x9b\xbd\xe9\x99\x85\xe5\x8f\x8b\xe8\xb0\x8a' into Chinese characters but all failed.
It's really strange that when I just use
print(b'\xef\xbb\xbf\xe5\x9b\xbd\xe9\x99\x85\xe5\x8f\x8b\xe8\xb0\x8a')
It will show decoded Chinese Characters.
But if I got the string by reading from my CSV file, it won't do. No matter how I decode the string, it will only show me b'\xef\xbb\xbf\xe5\x9b\xbd\xe9\x99\x85\xe5\x8f\x8b\xe8\xb0\x8a'
Here is my script:
import csv
with open('need_convert.csv','r+') as csvfile:
reader=csv.reader(csvfile)
for row in reader:
new_row=''.join(row)
print('new_row:')
print(type(new_row))
print(new_row)
print('convert:')
print(new_row.decode('utf-8'))
Here is my data (csv file): b'\xef\xbb\xbf\xe5\x9b\xbd\xe9\x99\x85\xe5\x8f\x8b\xe8\xb0\x8a' b'\xef\xbb\xbf\xe9\xba\x92\xe9\xba\x9f\xe6\x9d\xaf' b'\xef\xbb\xbf\xe5\x9b\xbd\xe9\x99\x85\xe5\x8f\x8b\xe8\xb0\x8a'
1 Answer 1
row contents and new_row are both strings, not byte types. Below, I'm using exec('s=' + row[0]) to interpret them as desired, assuming the input is safe.
import csv
with open('need_convert.csv','r+') as csvfile:
reader=csv.reader(csvfile)
for row in reader:
print(type(row[0]), row[0])
exec('s=' + row[0])
print(type(s), s)
print(s.decode('utf-8'))
Output:
<class 'str'> b'\xef\xbb\xbf\xe5\x9b\xbd\xe9\x99\x85\xe5\x8f\x8b\xe8\xb0\x8a'
<class 'bytes'> b'\xef\xbb\xbf\xe5\x9b\xbd\xe9\x99\x85\xe5\x8f\x8b\xe8\xb0\x8a'
国际友谊
<class 'str'> b'\xef\xbb\xbf\xe9\xba\x92\xe9\xba\x9f\xe6\x9d\xaf'
<class 'bytes'> b'\xef\xbb\xbf\xe9\xba\x92\xe9\xba\x9f\xe6\x9d\xaf'
麒麟杯
<class 'str'> b'\xef\xbb\xbf\xe5\x9b\xbd\xe9\x99\x85\xe5\x8f\x8b\xe8\xb0\x8a'
<class 'bytes'> b'\xef\xbb\xbf\xe5\x9b\xbd\xe9\x99\x85\xe5\x8f\x8b\xe8\xb0\x8a'
国际友谊
print(str(your_encoding))