enter image description hereI am trying to open a basic file.txt file which is located in the same CWD as my python interpreter.
So I do a=open("file.txt","r")
Then I want to display its content (there's only one test line like hello world in it)
So I do content=a.read()
So you know, when I put a enter, I have this:
a
<_io.TextIOWrapper name='fichier.txt' mode='r' encoding='UTF-8'>
Then I have an error I don't understand. Does someone have an idea on how to fix this ?
Traceback (most recent call last):
File "<pyshell#6>", line 1, in <module>
contenu=a.read()
File "/Library/Frameworks/Python.framework/Versions/3.7/lib/python3.7/codecs.py", line 322, in decode
(result, consumed) = self._buffer_decode(data, self.errors, final)
UnicodeDecodeError: 'utf-8' codec can't decode byte 0xc6 in position 15: invalid continuation byte
2 Answers 2
Your file is probably not encoded in UTF-8. Try:
from chardet import detect
with open("file.txt", "rb") as infile:
raw = infile.read()
encoding = detect(raw)['encoding']
print(encoding)
3 Comments
pip3 (the 3 is because this is Python 3) which is used for installing packages. To install a package you type pip3 install <package name>. In this case, pip3 install chardet. If that runs successfully, when you're back in the python prompt, you can do import chardet or from chardet import detect.Your file is not encoded in UTF-8. The encoding is controlled by the tool used to create the file. Make sure you use the right encoding.
Here's an example:
>>> s = 'Sébastien Chabrol'
>>> s.encode('utf8') # é in UTF-8 is encoded as bytes C3 A9.
b'S\xc3\xa9bastien Chabrol'
>>> s.encode('cp1252') # é in cp1252 is encoded as byte E9.
b'S\xe9bastien Chabrol'
>>> s.encode('utf8').decode('1252') # decoding incorrectly can produce wrong characters...
'Sébastien Chabrol'
>>> s.encode('cp1252').decode('utf8') # or just fail.
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
UnicodeDecodeError: 'utf-8' codec can't decode byte 0xe9 in position 1: invalid continuation byte
If using Python 3, you can provide the encoding when you open the file:
a = open('file.txt','r',encoding='utf8')
On Python 2 or 3, you can also use the backward-compatible syntax:
import io
a = io.open('file.txt','r',encoding='utf8')
If you have no idea of the encoding, you can open in binary mode to see the raw byte content and at least make a guess:
a = open('file.txt','rb')
print(a.read())
Read more about Python and encodings here: https://nedbatchelder.com/text/unipain.html
file -I fichier.txtin the terminal and tell us the output?