1. Home
2. Questions
3. AI Assist
4. Tags
5. Challenges
6. Chat
7. Articles
8. Users
9. Companies
11. Communities for your favorite technologies. Explore all Collectives
Stack Internal

Stack Overflow for Teams is now called Stack Internal. Bring the best of human thought and AI automation together at your work.
Try for free Learn more
Bring the best of human thought and AI automation together at your work. Learn more

Return to Question

Post Closed as "Duplicate" by jfs python Users with the python badge or a synonym can single-handedly close python questions as duplicates and reopen them as needed.

Decode HTML entities in Python string?

occurred Feb 2, 2016 at 8:34

edited tags

Link

edited Sep 26, 2008 at 0:53

lillq

edited Sep 26, 2008 at 0:53

lillq

15.5k
20
55
58

Source Link

asked Sep 10, 2008 at 0:30

Nick Fortescue

asked Sep 10, 2008 at 0:30

Nick Fortescue

44.4k
27
109
137

Getting international characters from a web page?

I want to scrape some information off a football (soccer) web page using simple python regexp's. The problem is that players such as the first chap, ÄÄRITALO, comes out as ÄÄRITALO!
That is, html uses escaped markup for the special characters, such as Ä

Is there a simple way of reading the html into the correct python string? If it was XML/XHTML it would be easy, the parser would do it.

python html unicode parse

default

CollectivesTM on Stack Overflow

Return to Question

Getting international characters from a web page?