I have a problem understanding an error in Python

Question 1

enter image description hereI am trying to open a basic file.txt file which is located in the same CWD as my python interpreter.

So I do a=open("file.txt","r")

Then I want to display its content (there's only one test line like hello world in it)

So I do content=a.read()

So you know, when I put a enter, I have this:

a
<_io.TextIOWrapper name='fichier.txt' mode='r' encoding='UTF-8'>

Then I have an error I don't understand. Does someone have an idea on how to fix this ?

Traceback (most recent call last):
 File "<pyshell#6>", line 1, in <module>
 contenu=a.read()
 File "/Library/Frameworks/Python.framework/Versions/3.7/lib/python3.7/codecs.py", line 322, in decode
 (result, consumed) = self._buffer_decode(data, self.errors, final)
UnicodeDecodeError: 'utf-8' codec can't decode byte 0xc6 in position 15: invalid continuation byte

Question 2

Can you show us what exactly your file contains? This error indicates that there's an invalid character in the file - specifically, the fifteenth character in the file. Fix that and this should run properly.

Question 3

can you run file -I fichier.txt in the terminal and tell us the output?

Question 4

It's always good to try to read a regular txt file with basic characters to see if there are some issues with the content.

Question 5

Ok so I did a new doc with the .rtf extension. The text inside is "this file is vanilla. It only contains letters and dots.". Now python seems to read it, but doesn't display properly what's inside. Instead, I see

Question 6

'{\\rtf1\\ansi\\ansicpg1252\\cocoartf1671\\cocoasubrtf200\n{\\fonttbl\\f0\\fswiss\\fcharset0 Helvetica;}\n{\\colortbl;\\red255\\green255\\blue255;}\n{\*\\expandedcolortbl;;}\n\\paperw11900\\paperh16840\\margl1440\\margr1440\\vieww10800\\viewh8400\\viewkind0\n\\pard\\tx566\\tx1133\\tx1700\\tx2267\\tx2834\\tx3401\\tx3968\\tx4535\\tx5102\\tx5669\\tx6236\\tx6803\\pardirnatural\\partightenfactor0\n\n\\f0\\fs24 \\cf0 this file is vanilla. It only contains letters and dots.}'

Question 7

Your file is probably not encoded in UTF-8. Try:

from chardet import detect
with open("file.txt", "rb") as infile:
 raw = infile.read()
 encoding = detect(raw)['encoding'] 
 print(encoding)

Question 8

You have to pip3 install chardet first

Question 9

I don't have the "chardet" package. I have really no idea how to download it and link it to my python. I'm a beginner btw

Question 10

Open up Terminal.app. You are now in a bash prompt. It's similar to the python prompt in your screenshot but it's used for controlling your computer. When you install Python it installs a bash command called pip3 (the 3 is because this is Python 3) which is used for installing packages. To install a package you type pip3 install <package name>. In this case, pip3 install chardet. If that runs successfully, when you're back in the python prompt, you can do import chardet or from chardet import detect.

Question 11

Your file is not encoded in UTF-8. The encoding is controlled by the tool used to create the file. Make sure you use the right encoding.

Here's an example:

>>> s = 'Sébastien Chabrol'
>>> s.encode('utf8') # é in UTF-8 is encoded as bytes C3 A9.
b'S\xc3\xa9bastien Chabrol'
>>> s.encode('cp1252') # é in cp1252 is encoded as byte E9.
b'S\xe9bastien Chabrol'
>>> s.encode('utf8').decode('1252') # decoding incorrectly can produce wrong characters...
'SÃ©bastien Chabrol'
>>> s.encode('cp1252').decode('utf8') # or just fail.
Traceback (most recent call last):
 File "<stdin>", line 1, in <module>
UnicodeDecodeError: 'utf-8' codec can't decode byte 0xe9 in position 1: invalid continuation byte

If using Python 3, you can provide the encoding when you open the file:

a = open('file.txt','r',encoding='utf8')

On Python 2 or 3, you can also use the backward-compatible syntax:

import io
a = io.open('file.txt','r',encoding='utf8')

If you have no idea of the encoding, you can open in binary mode to see the raw byte content and at least make a guess:

a = open('file.txt','rb')
print(a.read())

Read more about Python and encodings here: https://nedbatchelder.com/text/unipain.html

gaFF 7575 silver badges12 bronze badges · Accepted Answer · 2019-05-21 13:12:10Z

1

Your file is probably not encoded in UTF-8. Try:

from chardet import detect
with open("file.txt", "rb") as infile:
 raw = infile.read()
 encoding = detect(raw)['encoding'] 
 print(encoding)

Share

Improve this answer

answered May 21, 2019 at 13:12

gaFF's user avatar

gaFF

7575 silver badges12 bronze badges

Sign up to request clarification or add additional context in comments.

3 Comments

user3064538

user3064538 Over a year ago

You have to pip3 install chardet first

2019年05月21日T13:12:56.763Z+00:00

Sébastien Chabrol

Sébastien Chabrol Over a year ago

I don't have the "chardet" package. I have really no idea how to download it and link it to my python. I'm a beginner btw

2019年05月21日T13:35:39.73Z+00:00

user3064538

user3064538 Over a year ago

Open up Terminal.app. You are now in a bash prompt. It's similar to the python prompt in your screenshot but it's used for controlling your computer. When you install Python it installs a bash command called pip3 (the 3 is because this is Python 3) which is used for installing packages. To install a package you type pip3 install <package name>. In this case, pip3 install chardet. If that runs successfully, when you're back in the python prompt, you can do import chardet or from chardet import detect.

2019年05月22日T00:31:44.403Z+00:00

CollectivesTM on Stack Overflow

I have a problem understanding an error in Python

2 Answers 2

3 Comments

Comments

Your Answer

Sign up or log in

Post as a guest

Post as a guest

Linked

Hot Network Questions

CollectivesTM on Stack Overflow

2 Answers 2

3 Comments

Comments

Your Answer

Sign up or log in

Post as a guest

Post as a guest

Linked

Related