I'm doing some web scraping and sites frequently use HTML entities to represent non ascii characters. Does Python have a utility that takes a string with HTML entities and returns a unicode type?
For example:
I get back: & #x01ce; (There is no space. I put that so Markdown won't interpret it) which
ǎ
which represents an "a""ǎ" with a tone mark. In binary, this is represented as the 16 bit 01ce. I want to convert the html entity into the value u'\u01ce'u'\u01ce'
I'm doing some web scraping and sites frequently use HTML entities to represent non ascii characters. Does Python have a utility that takes a string with HTML entities and returns a unicode type?
For example:
I get back: & #x01ce; (There is no space. I put that so Markdown won't interpret it) which represents an "a" with a tone mark. In binary, this is represented as the 16 bit 01ce. I want to convert the html entity into the value u'\u01ce'
I'm doing some web scraping and sites frequently use HTML entities to represent non ascii characters. Does Python have a utility that takes a string with HTML entities and returns a unicode type?
For example:
I get back:
ǎ
which represents an "ǎ" with a tone mark. In binary, this is represented as the 16 bit 01ce. I want to convert the html entity into the value u'\u01ce'
Convert XML/HTML Entities into Unicode String in Python
I'm doing some web scraping and sites frequently use HTML entities to represent non ascii characters. Does Python have a utility that takes a string with HTML entities and returns a unicode type?
For example:
I get back: & #x01ce; (There is no space. I put that so Markdown won't interpret it) which represents an "a" with a tone mark. In binary, this is represented as the 16 bit 01ce. I want to convert the html entity into the value u'\u01ce'