1

I have a file with a lot of entries about Nobel prizes. I than convert that file into a list like this:

file = open(path, 'r')
file.readline()
content = []
for line in file:
 line = line.replace('\n', '')
 content.append(line.split(';'))
content = check(content, 'röntgen')

After that I have a function that takes that list and a other argument and checks if the list contains that argument. However if the argument takes a special character like the Ö it doen’t work because when the file is read python saves it like: ö

def check(content, attr):
reducedList = []
for i in range(len(content)):
 curr = content[i][4]
 if curr.find(attr) != -1:
 reducedList.append(content[i])
return reducedList

with:

curr = 'voor hun verdiensten op het gebied van de analyse van de kristalstructuur door middel van röntgenstraling'
attr = 'röntgen'

I have tried converting it with utf-8 but that doesn’t seem to help. Does anyone have a solution?

asked Jan 16, 2017 at 15:12
6
  • try the iso-8859-1 encoding Commented Jan 16, 2017 at 15:16
  • Are both your python file and your text file encoded using UTF-8 ? Commented Jan 16, 2017 at 15:16
  • the python file is encoded with # -*- coding: utf-8 -*- and the text file is encode in utf-8 Commented Jan 16, 2017 at 15:18
  • 1
    Check your encoding and open your file specifying the correct one, eg file=open(path, encoding='utf-8', 'r'). Commented Jan 16, 2017 at 15:18
  • yes it worked with open(path, 'r', encoding='utf-8'), thank you! Commented Jan 16, 2017 at 15:21

2 Answers 2

1

This happens because you are using Python 2, likely on Windows, and your file is encoded in utf-8, not latin-1.

The best thng you do, instead of trying to randomly fix it (including with the first comments to your question: they are all random suggestions,), is to understand what is going on. So, stop what you are trying to do.

Read this: https://www.joelonsoftware.com/2003/10/08/the-absolute-minimum-every-software-developer-absolutely-positively-must-know-about-unicode-and-character-sets-no-excuses/

Then, switch to Python3 if you can - that should handle most issues automatically.

If you can't you have to proper deal with the text decoding and re-encoding manually - the concepts are on the link above. Assume your input files are in utf-8

answered Jan 16, 2017 at 15:25
Sign up to request clarification or add additional context in comments.

1 Comment

I'm using the python 3.5 compiler. and I do understand completely what is going on. I restored here not because I didn't know what was going on I didn't know what I was supposed to do with the problem.
0

The solution is to replace open(path,’r’,) with open(path,’r’,encodeing=’utf-8’) If you add de encodeing parameter python will make sure de file is read in utf-8 so when you compare the strings they are truly the same.

answered Jan 16, 2017 at 20:42

Comments

Your Answer

Draft saved
Draft discarded

Sign up or log in

Sign up using Google
Sign up using Email and Password

Post as a guest

Required, but never shown

Post as a guest

Required, but never shown

By clicking "Post Your Answer", you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.