Linked Questions
35 questions linked to/from Convert XML/HTML Entities into Unicode String in Python
3
votes
3
answers
2k
views
Make sequence of string.replace statements more readable
When I'm processing HTML code in Python I have to use the following code because of special characters.
line = string.replace(line, """, "\"")
line = string.replace(line, "'", "'")
...
xralf's user avatar
- 3,792
0
votes
4
answers
4k
views
Python Text Encoding
I have this text in a file - Recuérdame (notice it's a French word). When I read this file with a python script, I get this text as Recuérdame.
I read it as a unicode string. Do I need to ...
4
votes
3
answers
2k
views
Unescaping HTML entities (&#nnnn;) into plain UTF-8 [closed]
We have HTML source files which contain special characters encoded as &#nnnn; like in the word:
außergewöhnlich
We would like to convert them into plain UTF-8:
außergew...
5
votes
2
answers
2k
views
Safely remove all html code from a string in python
I've been reading many q&a on how to remove all the html code from a string using python but none was satisfying. I need a way to remove all the tags, preserve/convert the html entities and work ...
1
vote
1
answer
8k
views
Simple example BeautifulSoup Python
I was working a simple example with BeautifulSoup, but I was getting weird resutls.
Here is my code:
soup = BeautifulSoup(page)
print soup.prettify()
stuff = soup.findAll('td', attrs={'class' : '...
0
votes
2
answers
3k
views
convert html entities to unicode(utf-8) strings in c? [duplicate]
Possible Duplicate:
How to decode HTML Entities in C?
This question is very similar to that one, but I need to do the same thing in C, not python. Here are some examples of what the function ...
1
vote
1
answer
3k
views
How to decode html hex elements?
Part of a website I'm trying to scrape has this weird block of hex values instead of characters. How can I decode this with python?
I am using urllib.request to get the page source
http://www....
0
votes
1
answer
3k
views
Python, convert HTML entities to Unicode
(Edit: I'm using Python 2.7)
(Edit 2: I have already checked Convert XML/HTML Entities into Unicode String in Python, the solutions do not work. Please do not flag this as already answered.)
I've ...
2
votes
0
answers
4k
views
use of \x27 to convert to apostrophe not working in python
I tried taking some data from the web:
Example:the name 'Schindler's list' is printed as 'Schindler's List' straight from the web... tried asking python to print 'Schindler\x27s list' instead ...
0
votes
1
answer
2k
views
Convert &#xxxx; to normal character?
lxml.etree.parse() have generate string in utf-16 file as &#xxxx; How can I convert it back?
Opening output file in web browser is fine. However I still need regular string in output file, too.
...
3
votes
2
answers
951
views
Decoding querystring parameter in Django view
I have a search form in my app that uses a jQuery autocomplete plugin. The plugin sends over the suggested item after running the querystring through encodeURI(q).
So an item like Johnny's sports ...
2
votes
0
answers
2k
views
UnicodeDecodeError: 'ascii' codec can't decode byte 0xe2 in position 45: ordinal not in range(128)
Sorry for posting this again. I am getting this error UnicodeDecodeError: 'ascii' codec can't decode byte 0xe2 in position 45: ordinal not in range(128) when I run the following code strip_html():
...
1
vote
2
answers
1k
views
python reading unicode characters from html
I have this script, which reads the text from web page:
page = urllib2.urlopen(url).read()
soup = BeautifulSoup(page);
paragraphs = soup.findAll('p');
for p in paragraphs:
content = content+p....
0
votes
1
answer
220
views
python - possible encoding and decoding values
I'm trying to decode chatacters which have been encoded in the following way:
&#number;
I tried:
s.decode("utf8")
and:
s.decode("unicode-escape")
but both not seems to work.
What is the ...
2
votes
1
answer
433
views
Decoding html entities in python2
I have a string of escaped html markup , 'í', and I want it to the correct accented character 'í'.
Having read around SO, this is my attempt:
messy = 'í'
print type(messy)
>>&...