This issue tracker has been migrated to GitHub ,
and is currently read-only.
For more information,
see the GitHub FAQs in the Python's Developer Guide.
Created on 2011年12月17日 05:37 by silverbacknet, last changed 2022年04月11日 14:57 by admin. This issue is now closed.
| Messages (3) | |||
|---|---|---|---|
| msg149659 - (view) | Author: Silverback Networks (silverbacknet) | Date: 2011年12月17日 05:37 | |
I've searched high and low to find a way to make Python accept Apple's iOS characters, but it looks like Python is not supporting greater than 16-bit characters correctly. If you look at the leading character of each group, it's \xf0, indicating a 4-character sequence, which also indicates greater than 16-bit characters. I've tried all three "errors" arguments to decode - ignore, replace, and strict - and still get this error each time: UnicodeEncodeError: 'charmap' codec can't encode characters in position 140: character maps to <undefined> So I have no way to proceed short of rolling my own corrected unicode decoder. My assumption is that Python should convert a character regardless of whether it's found in the internal lookup database, or at a minimum there should be a way to signal Python to do so. Below is a sample bytes string that will reproduce the problem: b'<?xml version=\'1.0\' encoding=\'utf-8\'?>\n <dict>\n <key>\n average-user-rating\n </key>\n <real>\n 1\n </real>\n <key>\n text\n </key>\n <string>\n \xf0\x9f\x8e\x84\xf0\x9f\x8e\x85\xf0\x9f\x8e\x81\xf0\x9f\x8e\x84\xf0\x9f\x8e\x85\xf0\x9f\x8e\x81 if you haven't checked this out yet please do. download APP TRAILERS and go to videos use promo code FREE4U and enjoy free apps courtesy of apple MERRY CHRISTMAS \xf0\x9f\x8e\x84\xf0\x9f\x8e\x85\xf0\x9f\x8e\x81\xf0\x9f\x8e\x84\xf0\x9f\x8e\x85\xf0\x9f\x8e\x81\n </string>\n <key>\n title\n </key>\n <string>\n 4. IF YOU LOVE FREE STUFF (v1.5)\n </string>\n <key>\n type\n </key>\n <string>\n review\n </string>\n <key>\n user-name\n </key>\n <string>\n Freenesss on Dec 16, 2011\n </string>\n </dict>\n <dict>\n <key>\n average-user-rating\n </key>\n <real>\n 0.8\n </real>\n <key>\n text\n </key>\n <string>\n This application is very cool .. I hope only be added to the dictionary other languages \xe2\x80\x8b\xe2\x80\x8b..\n </string>\n <key>\n title\n </key>\n <string>\n 8. the dictionary (v1.5)\n </string>\n <key>\n type\n </key>\n <string>\n review\n </string>\n <key>\n user-name\n </key>\n <string>\n Rnaa on Dec 16, 2011\n </string>\n </dict>\n <dict>\n <key>\n average-user-rating\n </key>\n <real>\n 1\n </real>\n <key>\n text\n </key>\n <string>\n Hey I'm 13 trying to b discovered plz check my 1st video out on you tube its called speak now cover by Bekka burton thnx and I luv luv luv this app\n </string>\n <key>\n title\n </key>\n <string>\n 9. Love this app+check me out on you tube (v1.5)\n </string>\n <key>\n type\n </key>\n <string>\n review\n </string>\n <key>\n user-name\n </key>\n <string>\n Lol\xee\x84\x86 on Dec 16, 2011\n </string>\n </dict>\n' (Obviously, stripped down to not-well-formed XML, but for conversion purposes that's irrelevant.) |
|||
| msg149664 - (view) | Author: Ezio Melotti (ezio.melotti) * (Python committer) | Date: 2011年12月17日 07:04 | |
I tried to decode this with utf-8 and I got '...\U0001f384\U0001f385\U0001f381\U0001f384\U0001f385\U0001f381 if you haven't checked this out yet please do. download APP TRAILERS and go to videos use promo code FREE4U and enjoy free apps courtesy of apple MERRY CHRISTMAS \U0001f384\U0001f385\U0001f381\U0001f384\U0001f385\U0001f381...' How did you get that error? |
|||
| msg149665 - (view) | Author: Silverback Networks (silverbacknet) | Date: 2011年12月17日 09:20 | |
I feel like a 'tard now, it was because I was trying to print() it at the same time I decoded it, which is what threw up. Well, sorry about that, next time I'll be a little more careful to separate every step before I go reporting it. |
|||
| History | |||
|---|---|---|---|
| Date | User | Action | Args |
| 2022年04月11日 14:57:24 | admin | set | github: 57827 |
| 2011年12月17日 09:20:34 | silverbacknet | set | status: open -> closed resolution: not a bug messages: + msg149665 |
| 2011年12月17日 07:04:13 | ezio.melotti | set | messages: + msg149664 |
| 2011年12月17日 05:54:02 | vstinner | set | nosy:
+ vstinner |
| 2011年12月17日 05:37:54 | silverbacknet | set | title: bytes.deocde() UnicodeEncodeError on Apple iOS characters -> bytes.decode() UnicodeEncodeError on Apple iOS (>16-bit) characters |
| 2011年12月17日 05:37:32 | silverbacknet | create | |