0

enter image description hereI am trying to open a basic file.txt file which is located in the same CWD as my python interpreter.

So I do a=open("file.txt","r")

Then I want to display its content (there's only one test line like hello world in it)

So I do content=a.read()

So you know, when I put a enter, I have this:

a
<_io.TextIOWrapper name='fichier.txt' mode='r' encoding='UTF-8'>

Then I have an error I don't understand. Does someone have an idea on how to fix this ?

Traceback (most recent call last):
 File "<pyshell#6>", line 1, in <module>
 contenu=a.read()
 File "/Library/Frameworks/Python.framework/Versions/3.7/lib/python3.7/codecs.py", line 322, in decode
 (result, consumed) = self._buffer_decode(data, self.errors, final)
UnicodeDecodeError: 'utf-8' codec can't decode byte 0xc6 in position 15: invalid continuation byte
asked May 21, 2019 at 13:06
9
  • 2
    Can you show us what exactly your file contains? This error indicates that there's an invalid character in the file - specifically, the fifteenth character in the file. Fix that and this should run properly. Commented May 21, 2019 at 13:10
  • can you run file -I fichier.txt in the terminal and tell us the output? Commented May 21, 2019 at 13:14
  • It's always good to try to read a regular txt file with basic characters to see if there are some issues with the content. Commented May 21, 2019 at 13:23
  • Ok so I did a new doc with the .rtf extension. The text inside is "this file is vanilla. It only contains letters and dots.". Now python seems to read it, but doesn't display properly what's inside. Instead, I see Commented May 21, 2019 at 14:45
  • '{\\rtf1\\ansi\\ansicpg1252\\cocoartf1671\\cocoasubrtf200\n{\\fonttbl\\f0\\fswiss\\fcharset0 Helvetica;}\n{\\colortbl;\\red255\\green255\\blue255;}\n{\*\\expandedcolortbl;;}\n\\paperw11900\\paperh16840\\margl1440\\margr1440\\vieww10800\\viewh8400\\viewkind0\n\\pard\\tx566\\tx1133\\tx1700\\tx2267\\tx2834\\tx3401\\tx3968\\tx4535\\tx5102\\tx5669\\tx6236\\tx6803\\pardirnatural\\partightenfactor0\n\n\\f0\\fs24 \\cf0 this file is vanilla. It only contains letters and dots.}' Commented May 21, 2019 at 14:46

2 Answers 2

1

Your file is probably not encoded in UTF-8. Try:

from chardet import detect
with open("file.txt", "rb") as infile:
 raw = infile.read()
 encoding = detect(raw)['encoding'] 
 print(encoding)
answered May 21, 2019 at 13:12
Sign up to request clarification or add additional context in comments.

3 Comments

You have to pip3 install chardet first
I don't have the "chardet" package. I have really no idea how to download it and link it to my python. I'm a beginner btw
Open up Terminal.app. You are now in a bash prompt. It's similar to the python prompt in your screenshot but it's used for controlling your computer. When you install Python it installs a bash command called pip3 (the 3 is because this is Python 3) which is used for installing packages. To install a package you type pip3 install <package name>. In this case, pip3 install chardet. If that runs successfully, when you're back in the python prompt, you can do import chardet or from chardet import detect.
0

Your file is not encoded in UTF-8. The encoding is controlled by the tool used to create the file. Make sure you use the right encoding.

Here's an example:

>>> s = 'Sébastien Chabrol'
>>> s.encode('utf8') # é in UTF-8 is encoded as bytes C3 A9.
b'S\xc3\xa9bastien Chabrol'
>>> s.encode('cp1252') # é in cp1252 is encoded as byte E9.
b'S\xe9bastien Chabrol'
>>> s.encode('utf8').decode('1252') # decoding incorrectly can produce wrong characters...
'Sébastien Chabrol'
>>> s.encode('cp1252').decode('utf8') # or just fail.
Traceback (most recent call last):
 File "<stdin>", line 1, in <module>
UnicodeDecodeError: 'utf-8' codec can't decode byte 0xe9 in position 1: invalid continuation byte

If using Python 3, you can provide the encoding when you open the file:

a = open('file.txt','r',encoding='utf8')

On Python 2 or 3, you can also use the backward-compatible syntax:

import io
a = io.open('file.txt','r',encoding='utf8')

If you have no idea of the encoding, you can open in binary mode to see the raw byte content and at least make a guess:

a = open('file.txt','rb')
print(a.read())

Read more about Python and encodings here: https://nedbatchelder.com/text/unipain.html

answered May 21, 2019 at 16:54

Comments

Your Answer

Draft saved
Draft discarded

Sign up or log in

Sign up using Google
Sign up using Email and Password

Post as a guest

Required, but never shown

Post as a guest

Required, but never shown

By clicking "Post Your Answer", you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.