Linked Questions

390 votes
7 answers
368k views

I'm parsing some HTML with Beautiful Soup 3, but it contains HTML entities which Beautiful Soup 3 doesn't automatically decode for me: >>> from BeautifulSoup import BeautifulSoup >>&...
jkp's user avatar
  • 81.8k
6 votes
1 answer
49k views

Possible Duplicate: Convert XML/HTML Entities into Unicode String in Python I am attempting to scrape a website using Python. I import and use the urllib2, BeautifulSoup and re modules. response =...
3 votes
2 answers
2k views

Possible Duplicate: Convert XML/HTML Entities into Unicode String in Python In html sources, there are tons of chars like "&# 58;" or "&# 46;" (have to put space between &# and numbers ...
-1 votes
1 answer
838 views

I have text like this: ‘The zoom animations everywhere on the new iOS 7 are literally making me nauseous and giving me a headache,’wroteforumuser Ensorceled. I understand that #...
user2784753's user avatar
1 vote
1 answer
748 views

I have a html text : If I'm reading lots of articles I am trying to replace ' and other such special characters into unicode '. I did rawtxt.encode('utf-8').encode('ascii','ignore'...
Harshit's user avatar
  • 1,217
0 votes
2 answers
1k views

Possible Duplicate: Convert XML/HTML Entities into Unicode String in Python I am reading an excel XML document using Python. I end up with a lot of characters such as é That ...
2 votes
1 answer
256 views

Possible Duplicate: Convert XML/HTML Entities into Unicode String in Python Decode HTML entities in Python string? I am using Python 2.7 and am fairly lost in unicode type. I looked up variety ...
rodling's user avatar
  • 998
0 votes
0 answers
28 views

I have these symbols which I am quite sure they are chinese characters. 旅行時,我生病 Please anyone know what kind of unicode is ...
Carson Yau's user avatar
346 votes
37 answers
620k views

I'd like to extract the text from an HTML file using Python. I want essentially the same output I would get if I copied the text from a browser and pasted it into notepad. I'd like something more ...
179 votes
15 answers
254k views

I have a string that is HTML encoded: '''<img class="size-medium wp-image-113"\ style="margin-left: 15px;" title="su1"\ src="...
rksprst's user avatar
  • 6,651
16 votes
3 answers
20k views

I have a html text like this: &lt;xml ... &gt; and I want to convert it to something readable: <xml ...> Any easy (and fast) way to do it in Python?
13 votes
2 answers
13k views

I'm having trouble displaying content, my program: #! /usr/bin/python import urllib import re url = "http://yahoo.com" pattern = '''<span class="medium item-label".*?>(.*)</span>''' ...
Vor's user avatar
  • 35.6k
7 votes
3 answers
20k views

I would like to convert HTML entities back to its human readable format, e.g. '&pound;' to '£', '&deg;' to '°' etc. I've read several posts regarding this question Converting html source ...
D.Q.'s user avatar
  • 547
6 votes
3 answers
2k views

I want to scrape some information off a football (soccer) web page using simple python regexp's. The problem is that players such as the first chap, ÄÄRITALO, comes out as &#196;&#196;RITALO! ...
3 votes
1 answer
3k views

I'm creating a sub-class based on 'HTMLParser' to pull out html content. Whenever I have character refs such as '&nbsp;' '&amp;' '&ndash;' '&#8230;' I'd like to replace them with ...
Dan Holman's user avatar

15 30 50 per page
1
2 3