Skip to main content
Stack Overflow
  1. About
  2. For Teams

You are not logged in. Your edit will be placed in a queue until it is peer reviewed.

We welcome edits that make the post easier to understand and more valuable for readers. Because community members review edits, please try to make the post substantially better than how you found it, for example, by fixing grammar or adding additional resources and hyperlinks.

Required fields*

Required fields*

Utf-8 on windows python

I have html file to read parse etc, it's encode on unicode (I saw it with the notepad) but when I tried

infile = open("path", "r") 
infile.read()

it fails and I had the famous error :

UnicodeEncodeError: 'charmap' codec can't encode characters in position xx: character maps to undefined

So for test I tried to copy paste the contain of the file in a new one and save it in utf-8 and then tried to open it with codecs like this :

inFile = codecs.open("path", "r", encoding="utf-8")
outputStream = inFile.read()

But I get this error message :

UnicodeEncodeError : 'charmap' codec can't encode character u'\ufeff' in position 0: charcater maps to undefined

I really don't understand because I was created this file in utf8.

Answer*

Draft saved
Draft discarded
Cancel
5
  • If Notepad says "Unicode" (as the OP said) it means UTF-16. The other encodings are usually called "ANSI" (cp1252 and friends) and "UTF-8" (which is UTF-8 with BOM). Commented Sep 24, 2015 at 22:05
  • 1
    @roeland: yes. "it's encode on unicode (I saw it with the notepad)" from the question can be interpreted that way. The issue with that theory is that codecs.open("path", encoding='utf-8').read() returns u'\ufeff' i.e., utf-8-sig is more likely. 'utf-8' encoding fails for both BOM_UTF16_BE and BOM_UTF16_LE. Commented Sep 24, 2015 at 22:19
  • Yeah, the question is a bit confusing as it involves two files, the original file in "Unicode", and the file he re-saved as "UTF-8". Commented Sep 24, 2015 at 22:23
  • @roeland: anyway the issue is UnicodeEncodeError i.e., when OP tries to print Unicode text to Windows console. Commented Sep 24, 2015 at 22:26
  • Aha, I see. That was subtle Commented Sep 24, 2015 at 22:32

lang-py

AltStyle によって変換されたページ (->オリジナル) /