Skip to main content
Stack Overflow
  1. About
  2. For Teams

Return to Revisions

2 of 2
deleted 57 characters in body; edited tags
Josh Lee
  • 179.3k
  • 39
  • 279
  • 282

Convert XML/HTML Entities into Unicode String in Python

I'm doing some web scraping and sites frequently use HTML entities to represent non ascii characters. Does Python have a utility that takes a string with HTML entities and returns a unicode type?

For example:

I get back:

ǎ

which represents an "ǎ" with a tone mark. In binary, this is represented as the 16 bit 01ce. I want to convert the html entity into the value u'\u01ce'

Cristian
  • 44.2k
  • 28
  • 90
  • 99
default

AltStyle によって変換されたページ (->オリジナル) /