1. Home
2. Questions
3. AI Assist
4. Tags
5. Challenges
6. Chat
7. Articles
8. Users
9. Companies
11. Communities for your favorite technologies. Explore all Collectives
Stack Internal

Stack Overflow for Teams is now called Stack Internal. Bring the best of human thought and AI automation together at your work.
Try for free Learn more
Bring the best of human thought and AI automation together at your work. Learn more

Return to Question

Post Closed as "Duplicate" by Antti Haapala python Users with the python badge or a synonym can single-handedly close python questions as duplicates and reopen them as needed.

Decode HTML entities in Python string?

occurred Oct 3, 2017 at 8:44

deleted 57 characters in body; edited tags

Source Link

edited Dec 16, 2010 at 6:38

Josh Lee

edited Dec 16, 2010 at 6:38

Josh Lee

179.3k
39
279
282

I'm doing some web scraping and sites frequently use HTML entities to represent non ascii characters. Does Python have a utility that takes a string with HTML entities and returns a unicode type?

For example:

I get back: & #x01ce; (There is no space. I put that so Markdown won't interpret it) which

&#x01ce;

which represents an "a""ǎ" with a tone mark. In binary, this is represented as the 16 bit 01ce. I want to convert the html entity into the value u'\u01ce'u'\u01ce'

I'm doing some web scraping and sites frequently use HTML entities to represent non ascii characters. Does Python have a utility that takes a string with HTML entities and returns a unicode type?

For example:

I get back: & #x01ce; (There is no space. I put that so Markdown won't interpret it) which represents an "a" with a tone mark. In binary, this is represented as the 16 bit 01ce. I want to convert the html entity into the value u'\u01ce'

I'm doing some web scraping and sites frequently use HTML entities to represent non ascii characters. Does Python have a utility that takes a string with HTML entities and returns a unicode type?

For example:

I get back:

&#x01ce;

which represents an "ǎ" with a tone mark. In binary, this is represented as the 16 bit 01ce. I want to convert the html entity into the value u'\u01ce'

Source Link

asked Sep 11, 2008 at 21:28

Cristian

asked Sep 11, 2008 at 21:28

Cristian

44.2k
28
90
99

Convert XML/HTML Entities into Unicode String in Python

I'm doing some web scraping and sites frequently use HTML entities to represent non ascii characters. Does Python have a utility that takes a string with HTML entities and returns a unicode type?

For example:

default

CollectivesTM on Stack Overflow

Return to Question

Convert XML/HTML Entities into Unicode String in Python