Linked Questions

3 votes
3 answers
2k views

When I'm processing HTML code in Python I have to use the following code because of special characters. line = string.replace(line, """, "\"") line = string.replace(line, "'", "'") ...
xralf's user avatar
  • 3,792
0 votes
4 answers
4k views

I have this text in a file - Recuérdame (notice it's a French word). When I read this file with a python script, I get this text as Recuérdame. I read it as a unicode string. Do I need to ...
4 votes
3 answers
2k views

We have HTML source files which contain special characters encoded as &#nnnn; like in the word: außergewöhnlich We would like to convert them into plain UTF-8: außergew&#...
dagnelies's user avatar
  • 5,345
5 votes
2 answers
2k views

I've been reading many q&a on how to remove all the html code from a string using python but none was satisfying. I need a way to remove all the tags, preserve/convert the html entities and work ...
1 vote
1 answer
8k views

I was working a simple example with BeautifulSoup, but I was getting weird resutls. Here is my code: soup = BeautifulSoup(page) print soup.prettify() stuff = soup.findAll('td', attrs={'class' : '...
0 votes
2 answers
3k views

Possible Duplicate: How to decode HTML Entities in C? This question is very similar to that one, but I need to do the same thing in C, not python. Here are some examples of what the function ...
1 vote
1 answer
3k views

Part of a website I'm trying to scrape has this weird block of hex values instead of characters. How can I decode this with python? I am using urllib.request to get the page source http://www....
0 votes
1 answer
3k views

(Edit: I'm using Python 2.7) (Edit 2: I have already checked Convert XML/HTML Entities into Unicode String in Python, the solutions do not work. Please do not flag this as already answered.) I've ...
GrantD71's user avatar
  • 1,885
2 votes
0 answers
4k views

I tried taking some data from the web: Example:the name 'Schindler's list' is printed as 'Schindler&#x27s List' straight from the web... tried asking python to print 'Schindler\x27s list' instead ...
melony's user avatar
  • 75
0 votes
1 answer
2k views

lxml.etree.parse() have generate string in utf-16 file as &#xxxx; How can I convert it back? Opening output file in web browser is fine. However I still need regular string in output file, too. ...
3 votes
2 answers
951 views

I have a search form in my app that uses a jQuery autocomplete plugin. The plugin sends over the suggested item after running the querystring through encodeURI(q). So an item like Johnny's sports ...
Abid A's user avatar
  • 7,866
2 votes
0 answers
2k views

Sorry for posting this again. I am getting this error UnicodeDecodeError: 'ascii' codec can't decode byte 0xe2 in position 45: ordinal not in range(128) when I run the following code strip_html(): ...
mchangun's user avatar
  • 10.6k
1 vote
2 answers
1k views

I have this script, which reads the text from web page: page = urllib2.urlopen(url).read() soup = BeautifulSoup(page); paragraphs = soup.findAll('p'); for p in paragraphs: content = content+p....
torayeff's user avatar
  • 9,732
0 votes
1 answer
220 views

I'm trying to decode chatacters which have been encoded in the following way: &#number; I tried: s.decode("utf8") and: s.decode("unicode-escape") but both not seems to work. What is the ...
tomermes's user avatar
  • 23.5k
2 votes
1 answer
433 views

I have a string of escaped html markup , 'í', and I want it to the correct accented character 'í'. Having read around SO, this is my attempt: messy = 'í' print type(messy) >>&...

15 30 50 per page
1
2
3