Convert XML/HTML Entities into Unicode String in Python
I'm doing some web scraping and sites frequently use HTML entities to represent non ascii characters. Does Python have a utility that takes a string with HTML entities and returns a unicode type?
For example:
I get back:
ǎ
which represents an "ǎ" with a tone mark. In binary, this is represented as the 16 bit 01ce. I want to convert the html entity into the value u'\u01ce'
Cristian
- 44.2k
- 28
- 90
- 99
default