Linked Questions
35 questions linked to/from Convert XML/HTML Entities into Unicode String in Python
390
votes
7
answers
368k
views
Decode HTML entities in Python string?
I'm parsing some HTML with Beautiful Soup 3, but it contains HTML entities which Beautiful Soup 3 doesn't automatically decode for me:
>>> from BeautifulSoup import BeautifulSoup
>>&...
6
votes
1
answer
49k
views
How do I get rid of characters like ' that appear instead of apostrophes? [duplicate]
Possible Duplicate:
Convert XML/HTML Entities into Unicode String in Python
I am attempting to scrape a website using Python. I import and use the urllib2, BeautifulSoup and re modules.
response =...
3
votes
2
answers
2k
views
How do I convert characters like ":" to ":" in python? [duplicate]
Possible Duplicate:
Convert XML/HTML Entities into Unicode String in Python
In html sources, there are tons of chars like "&# 58;" or "&# 46;" (have to put space between &# and numbers ...
-1
votes
1
answer
838
views
Convert ascii characters to normal text [duplicate]
I have text like this:
‘The zoom animations everywhere on the new iOS 7 are literally making me nauseous and giving me a headache,’wroteforumuser Ensorceled.
I understand that #...
1
vote
1
answer
748
views
encoding/decoding unicode and utf-8 : Python [duplicate]
I have a html text : If I'm reading lots of articles
I am trying to replace ' and other such special characters into unicode '. I did
rawtxt.encode('utf-8').encode('ascii','ignore'...
0
votes
2
answers
1k
views
Python, XML, é type encodings [duplicate]
Possible Duplicate:
Convert XML/HTML Entities into Unicode String in Python
I am reading an excel XML document using Python. I end up with a lot of characters such as
é
That ...
2
votes
1
answer
256
views
Unicode encoding in python [duplicate]
Possible Duplicate:
Convert XML/HTML Entities into Unicode String in Python
Decode HTML entities in Python string?
I am using Python 2.7 and am fairly lost in unicode type. I looked up variety ...
0
votes
0
answers
28
views
Anyone know what kind of unicode is this with &# and semicolon please? how to turn it into chinese characters string in Python please? [duplicate]
I have these symbols which I am quite sure they are chinese characters.
旅行時,我生病
Please anyone know what kind of unicode is ...
346
votes
37
answers
620k
views
Extracting text from HTML file using Python
I'd like to extract the text from an HTML file using Python. I want essentially the same output I would get if I copied the text from a browser and pasted it into notepad.
I'd like something more ...
179
votes
15
answers
254k
views
How do I perform HTML decoding/encoding using Python/Django?
I have a string that is HTML encoded:
'''<img class="size-medium wp-image-113"\
style="margin-left: 15px;" title="su1"\
src="...
16
votes
3
answers
20k
views
Replace html entities with the corresponding utf-8 characters in Python 2.6
I have a html text like this:
<xml ... >
and I want to convert it to something readable:
<xml ...>
Any easy (and fast) way to do it in Python?
13
votes
2
answers
13k
views
Change ' into normal character
I'm having trouble displaying content,
my program:
#! /usr/bin/python
import urllib
import re
url = "http://yahoo.com"
pattern = '''<span class="medium item-label".*?>(.*)</span>'''
...
7
votes
3
answers
20k
views
HTMLParser.HTMLParser().unescape() doesn't work
I would like to convert HTML entities back to its human readable format, e.g. '£' to '£', '°' to '°' etc.
I've read several posts regarding this question
Converting html source ...
6
votes
3
answers
2k
views
Getting international characters from a web page? [duplicate]
I want to scrape some information off a football (soccer) web page using simple python regexp's. The problem is that players such as the first chap, ÄÄRITALO, comes out as ÄÄRITALO!
...
3
votes
1
answer
3k
views
Decoding html content and HTMLParser
I'm creating a sub-class based on 'HTMLParser' to pull out html content. Whenever I have character refs such as
' ' '&' '–' '…'
I'd like to replace them with ...