homepage

This issue tracker has been migrated to GitHub , and is currently read-only.
For more information, see the GitHub FAQs in the Python's Developer Guide.

classification
Title: bytes.decode() UnicodeEncodeError on Apple iOS (>16-bit) characters
Type: behavior Stage:
Components: Unicode Versions: Python 3.3
process
Status: closed Resolution: not a bug
Dependencies: Superseder:
Assigned To: Nosy List: ezio.melotti, silverbacknet, vstinner
Priority: normal Keywords:

Created on 2011年12月17日 05:37 by silverbacknet, last changed 2022年04月11日 14:57 by admin. This issue is now closed.

Messages (3)
msg149659 - (view) Author: Silverback Networks (silverbacknet) Date: 2011年12月17日 05:37
I've searched high and low to find a way to make Python accept Apple's iOS characters, but it looks like Python is not supporting greater than 16-bit characters correctly. If you look at the leading character of each group, it's \xf0, indicating a 4-character sequence, which also indicates greater than 16-bit characters. I've tried all three "errors" arguments to decode - ignore, replace, and strict - and still get this error each time:
UnicodeEncodeError: 'charmap' codec can't encode characters in position 140: character maps to <undefined>
So I have no way to proceed short of rolling my own corrected unicode decoder. My assumption is that Python should convert a character regardless of whether it's found in the internal lookup database, or at a minimum there should be a way to signal Python to do so.
Below is a sample bytes string that will reproduce the problem:
b'<?xml version=\'1.0\' encoding=\'utf-8\'?>\n <dict>\n <key>\n average-user-rating\n </key>\n <real>\n 1\n </real>\n <key>\n text\n </key>\n <string>\n \xf0\x9f\x8e\x84\xf0\x9f\x8e\x85\xf0\x9f\x8e\x81\xf0\x9f\x8e\x84\xf0\x9f\x8e\x85\xf0\x9f\x8e\x81 if you haven&#39;t checked this out yet please do. download APP TRAILERS and go to videos use promo code FREE4U and enjoy free apps courtesy of apple MERRY CHRISTMAS \xf0\x9f\x8e\x84\xf0\x9f\x8e\x85\xf0\x9f\x8e\x81\xf0\x9f\x8e\x84\xf0\x9f\x8e\x85\xf0\x9f\x8e\x81\n </string>\n <key>\n title\n </key>\n <string>\n 4. IF YOU LOVE FREE STUFF (v1.5)\n </string>\n <key>\n type\n </key>\n <string>\n review\n </string>\n <key>\n user-name\n </key>\n <string>\n Freenesss on Dec 16, 2011\n </string>\n </dict>\n <dict>\n <key>\n average-user-rating\n </key>\n <real>\n 0.8\n </real>\n <key>\n text\n </key>\n <string>\n This application is very cool .. I hope only be added to the dictionary other languages \xe2\x80\x8b\xe2\x80\x8b..\n </string>\n <key>\n title\n </key>\n <string>\n 8. the dictionary (v1.5)\n </string>\n <key>\n type\n </key>\n <string>\n review\n </string>\n <key>\n user-name\n </key>\n <string>\n Rnaa on Dec 16, 2011\n </string>\n </dict>\n <dict>\n <key>\n average-user-rating\n </key>\n <real>\n 1\n </real>\n <key>\n text\n </key>\n <string>\n Hey I&#39;m 13 trying to b discovered plz check my 1st video out on you tube its called speak now cover by Bekka burton thnx and I luv luv luv this app\n </string>\n <key>\n title\n </key>\n <string>\n 9. Love this app+check me out on you tube (v1.5)\n </string>\n <key>\n type\n </key>\n <string>\n review\n </string>\n <key>\n user-name\n </key>\n <string>\n Lol\xee\x84\x86 on Dec 16, 2011\n </string>\n </dict>\n'
(Obviously, stripped down to not-well-formed XML, but for conversion purposes that's irrelevant.)
msg149664 - (view) Author: Ezio Melotti (ezio.melotti) * (Python committer) Date: 2011年12月17日 07:04
I tried to decode this with utf-8 and I got
'...\U0001f384\U0001f385\U0001f381\U0001f384\U0001f385\U0001f381 if you haven&#39;t checked this out yet please do. download APP TRAILERS and go to videos use promo code FREE4U and enjoy free apps courtesy of apple MERRY CHRISTMAS \U0001f384\U0001f385\U0001f381\U0001f384\U0001f385\U0001f381...'
How did you get that error?
msg149665 - (view) Author: Silverback Networks (silverbacknet) Date: 2011年12月17日 09:20
I feel like a 'tard now, it was because I was trying to print() it at the same time I decoded it, which is what threw up. Well, sorry about that, next time I'll be a little more careful to separate every step before I go reporting it.
History
Date User Action Args
2022年04月11日 14:57:24adminsetgithub: 57827
2011年12月17日 09:20:34silverbacknetsetstatus: open -> closed
resolution: not a bug
messages: + msg149665
2011年12月17日 07:04:13ezio.melottisetmessages: + msg149664
2011年12月17日 05:54:02vstinnersetnosy: + vstinner
2011年12月17日 05:37:54silverbacknetsettitle: bytes.deocde() UnicodeEncodeError on Apple iOS characters -> bytes.decode() UnicodeEncodeError on Apple iOS (>16-bit) characters
2011年12月17日 05:37:32silverbacknetcreate

AltStyle によって変換されたページ (->オリジナル) /