Special characters in python

Question 1

I have a file with a lot of entries about Nobel prizes. I than convert that file into a list like this:

file = open(path, 'r')
file.readline()
content = []
for line in file:
 line = line.replace('\n', '')
 content.append(line.split(';'))
content = check(content, 'röntgen')

After that I have a function that takes that list and a other argument and checks if the list contains that argument. However if the argument takes a special character like the Ö it doen’t work because when the file is read python saves it like: Ã¶

def check(content, attr):
reducedList = []
for i in range(len(content)):
 curr = content[i][4]
 if curr.find(attr) != -1:
 reducedList.append(content[i])
return reducedList

with:

curr = 'voor hun verdiensten op het gebied van de analyse van de kristalstructuur door middel van rÃ¶ntgenstraling'
attr = 'röntgen'

I have tried converting it with utf-8 but that doesn’t seem to help. Does anyone have a solution?

Question 2

try the iso-8859-1 encoding

Question 3

Are both your python file and your text file encoded using UTF-8 ?

Question 4

the python file is encoded with # -*- coding: utf-8 -*- and the text file is encode in utf-8

Question 5

Check your encoding and open your file specifying the correct one, eg file=open(path, encoding='utf-8', 'r').

Question 6

yes it worked with open(path, 'r', encoding='utf-8'), thank you!

Question 7

This happens because you are using Python 2, likely on Windows, and your file is encoded in utf-8, not latin-1.

The best thng you do, instead of trying to randomly fix it (including with the first comments to your question: they are all random suggestions,), is to understand what is going on. So, stop what you are trying to do.

Read this: https://www.joelonsoftware.com/2003/10/08/the-absolute-minimum-every-software-developer-absolutely-positively-must-know-about-unicode-and-character-sets-no-excuses/

Then, switch to Python3 if you can - that should handle most issues automatically.

If you can't you have to proper deal with the text decoding and re-encoding manually - the concepts are on the link above. Assume your input files are in utf-8

Question 8

I'm using the python 3.5 compiler. and I do understand completely what is going on. I restored here not because I didn't know what was going on I didn't know what I was supposed to do with the problem.

Question 9

The solution is to replace open(path,’r’,) with open(path,’r’,encodeing=’utf-8’) If you add de encodeing parameter python will make sure de file is read in utf-8 so when you compare the strings they are truly the same.

jsbueno 114k11 gold badges159 silver badges239 bronze badges · Accepted Answer · 2017-01-16 15:25:09Z

This happens because you are using Python 2, likely on Windows, and your file is encoded in utf-8, not latin-1.

The best thng you do, instead of trying to randomly fix it (including with the first comments to your question: they are all random suggestions,), is to understand what is going on. So, stop what you are trying to do.

Read this: https://www.joelonsoftware.com/2003/10/08/the-absolute-minimum-every-software-developer-absolutely-positively-must-know-about-unicode-and-character-sets-no-excuses/

Then, switch to Python3 if you can - that should handle most issues automatically.

If you can't you have to proper deal with the text decoding and re-encoding manually - the concepts are on the link above. Assume your input files are in utf-8

I'm using the python 3.5 compiler. and I do understand completely what is going on. I restored here not because I didn't know what was going on I didn't know what I was supposed to do with the problem.

CollectivesTM on Stack Overflow

Special characters in python

2 Answers 2

1 Comment

Comments

Your Answer

Sign up or log in

Post as a guest

Post as a guest

Hot Network Questions

CollectivesTM on Stack Overflow

2 Answers 2

1 Comment

Comments

Your Answer

Sign up or log in

Post as a guest

Post as a guest

Related