Skip to main content
Stack Overflow
  1. About
  2. For Teams

Return to Question

Post Closed as "Duplicate" by jfs python Users with the python badge or a synonym can single-handedly close questions as duplicates and reopen them as needed.
edited tags
Link
lillq
  • 15.5k
  • 20
  • 55
  • 58
Source Link
Nick Fortescue
  • 44.4k
  • 27
  • 109
  • 137

Getting international characters from a web page?

I want to scrape some information off a football (soccer) web page using simple python regexp's. The problem is that players such as the first chap, ÄÄRITALO, comes out as ÄÄRITALO!
That is, html uses escaped markup for the special characters, such as Ä

Is there a simple way of reading the html into the correct python string? If it was XML/XHTML it would be easy, the parser would do it.

default

AltStyle によって変換されたページ (->オリジナル) /