UnicodeDecodeError, Invalid continuation byte again

Question 1

I can't figure out how to solve these problems once for all. I first encountered these problems when I tried to write "è" (i'm Italian). After some research, I found out that adding "#coding: utf-8" at the very beginning seemed to solve the problem....UNTIL NOW.

I edited a code wrote 1 or 2 days ago..it worked perfectly.... now, whenever i run the script, it doesn't work: it never starts, and I'm stuck with this error:

SyntaxError: 'utf-8' codec can't decode byte 0xe0 in position 32: invalid continuation byte.

The problem is... position 32? Where? what's the problematic line? I don't know exactly what I added, because I made a couple of changes. Trying to execute in debug mode doesn't help either, when I "Step Into" at the very beginning of the script, the error shows up immediately (by the way, i'm using Wingware 101 as an IDLE,I'm on Win7). Sorry if I don't have enough information, I could post the code but I'm afraid to do so, it's a mess written in Italian, maybe it could be not easy to understand exactly what's going on.

Thank you for replies and happy holidays!

Question 2

Well, I tried to delete the line "#coding: utf-8". Now when I tried to run the script the program throwed at me a bunch of unicode errors, but luckily, now i have some informations about the lines. The problem relies on using "à" or "è" within some comments (I'm pretty sure that 0xe0 it's indeed "à"). I got rid of those pesky characters and now it works. But now I have to rely on using " a' " instead of " à ", it's still a little annoying...damn unicode errors. I hate them.

Question 3

Reading this reference might help you: docs.python.org/3/howto/unicode.html#python-s-unicode-support. Basically, the encoding you added is not in the correct form (I don't know if that's an issue). Since 3.0 Python supports unicode by default, and I'm sure, that Italian special characters are in unicode as it was designed for word domination: stackoverflow.com/a/2709023/2419215. You might consider switching to English comments, as that's the easiest way to go..

Question 4

0xe0 is not valid UTF-8, so you should not use that declaration. Use the correct charset instead.

Question 5

You could try using latin1 instead of utf-8

Question 6

We need to see some code, the exception and the stacktrace. Without it, it's impossible to help you

Question 7

#coding: utf8 is a declaration that the source code is saved in UTF-8. Make sure that is actually the encoding of the source file. For example, the following file was created in Windows Notepad and saved as "ANSI", which on US Windows is the Windows-1252 encoding:

#coding: utf8
print('hàllo')

It produces the following error on Python 2.7:

 File "test.py", line 2
SyntaxError: 'utf8' codec can't decode byte 0xe0 in position 8: invalid continuation byte

As you can see, then 8th position (counting from 0) of line 2 is à, which in Windows-1252 is byte 0xe0. The wrong encoding is used and the error message is clear.

Either declare the correct encoding for your source file, or re-save the source file in UTF-8.

Note: I don't have Python 3.4 installed, but Python 3.5 gives a less clear error message:

 File "x.py", line 1
SyntaxError: encoding problem: utf8

It doesn't match your error message, though, but still indicates the file is not declared with the right encoding.

Question 8

Oh, now I see! What was strange for me was the position without indication of the line...didn't thought that was referring to the n-th byte in the file.

Question 9

So, if I get this right, the problem is related on how Wingware 101 encodes the ".py" files. If WW uses an encoding different from utf8, then i get this error.. so that's why I had no problems by using "à" in a script written in the python bundled IDLE... I guess

Question 10

I finally found the problem: it was indeed Wingware's fault. By default WW uses cp1252, I switched to UTF-8 and now it works properly

Question 11

You could have used #coding:cp1252 instead of changing the default, but UTF-8 supports all Unicode codepoints so is a better choice. Also, Python 3 assumes a file is UTF-8-encoded unless told otherwise with the #coding line, so IDLE must use UTF-8 if there was no encoding declared. I don't normally use IDLE, but at a quick glance I didn't see any option to change the encoding. I use PythonWin (bundled with the pywin32 module), and it automatically saves in the encoding declared, which is handy.

Mark Tolonen 181k26 gold badges184 silver badges279 bronze badges · Accepted Answer · 2015-12-27 00:36:36Z

#coding: utf8 is a declaration that the source code is saved in UTF-8. Make sure that is actually the encoding of the source file. For example, the following file was created in Windows Notepad and saved as "ANSI", which on US Windows is the Windows-1252 encoding:

#coding: utf8
print('hàllo')

It produces the following error on Python 2.7:

 File "test.py", line 2
SyntaxError: 'utf8' codec can't decode byte 0xe0 in position 8: invalid continuation byte

As you can see, then 8th position (counting from 0) of line 2 is à, which in Windows-1252 is byte 0xe0. The wrong encoding is used and the error message is clear.

Either declare the correct encoding for your source file, or re-save the source file in UTF-8.

Note: I don't have Python 3.4 installed, but Python 3.5 gives a less clear error message:

 File "x.py", line 1
SyntaxError: encoding problem: utf8

It doesn't match your error message, though, but still indicates the file is not declared with the right encoding.

Oh, now I see! What was strange for me was the position without indication of the line...didn't thought that was referring to the n-th byte in the file.
So, if I get this right, the problem is related on how Wingware 101 encodes the ".py" files. If WW uses an encoding different from utf8, then i get this error.. so that's why I had no problems by using "à" in a script written in the python bundled IDLE... I guess
I finally found the problem: it was indeed Wingware's fault. By default WW uses cp1252, I switched to UTF-8 and now it works properly
You could have used #coding:cp1252 instead of changing the default, but UTF-8 supports all Unicode codepoints so is a better choice. Also, Python 3 assumes a file is UTF-8-encoded unless told otherwise with the #coding line, so IDLE must use UTF-8 if there was no encoding declared. I don't normally use IDLE, but at a quick glance I didn't see any option to change the encoding. I use PythonWin (bundled with the pywin32 module), and it automatically saves in the encoding declared, which is handy.

CollectivesTM on Stack Overflow

UnicodeDecodeError, Invalid continuation byte again

1 Answer 1

4 Comments

Your Answer

Sign up or log in

Post as a guest

Post as a guest

Linked

Hot Network Questions

CollectivesTM on Stack Overflow

1 Answer 1

4 Comments

Your Answer

Sign up or log in

Post as a guest

Post as a guest

Linked

Related