I receive a text file, but some characters on it are not correct.
One example is the text below:
Apresentação/ divulgação do curso
But the correct text is
Apresentação/ divulgação do curso
I use the Php function utf8_decode and it works, see example below
echo utf8_decode("Apresentação/ divulgação do curso");
result Apresentação/ divulgação do curso
but I can't make it work in Python, I try to use
my_str = 'Apresentação/ divulgação do curso'
print( my_str.decode("utf-8") )
But I got the following error:
AttributeError: 'str' object has no attribute 'decode'
How I can make this work in Python?
-
Can you show the code where you obtain this string? Eg. through opening the said text file.lenz– lenz2019年03月21日 14:04:08 +00:00Commented Mar 21, 2019 at 14:04
-
I get the text from a csv filefabiobh– fabiobh2019年03月21日 14:43:40 +00:00Commented Mar 21, 2019 at 14:43
2 Answers 2
The string is the result of decoding the raw UTF-8 bytes as latin-1. So just re-encode them as latin-1, then decode as utf-8:
>>> my_str = 'Apresentação/ divulgação do curso'
>>> print( my_str.encode('latin-1').decode("utf-8") )
Apresentação/ divulgação do curso
If this is coming from a file you opened in Python, you likely used latin-1 (or the similar cp1252) as the default encoding for open. In that case, the correct solution is to provide the correct encoding to open so it's decoded correctly in the first place, changing something like:
with open('myfile.txt') as f:
my_str = f.read()
to:
with open('myfile.txt', encoding='utf-8') as f:
my_str = f.read()
so no additional encode or decode steps are required.
Comments
I think the initial text is in iso-8859-1. This will fix it:
>>> s = 'Apresentação/ divulgação do curso'
>>> bytes(s, 'iso-8859-1').decode('utf-8')
'Apresentação/ divulgação do curso'