Linked Questions

35 questions linked to/from Convert XML/HTML Entities into Unicode String in Python

3 votes

3 answers

2k views

Make sequence of string.replace statements more readable

When I'm processing HTML code in Python I have to use the following code because of special characters. line = string.replace(line, """, "\"") line = string.replace(line, "'", "'") ...

xralf's user avatar

xralf

3,792

asked Jul 31, 2011 at 11:27

0 votes

4 answers

4k views

Python Text Encoding

I have this text in a file - Recuérdame (notice it's a French word). When I read this file with a python script, I get this text as Recuérdame. I read it as a unicode string. Do I need to ...

Srikar Appalaraju's user avatar

Srikar Appalaraju

74.1k

asked Dec 16, 2010 at 6:34

4 votes

3 answers

2k views

Unescaping HTML entities (&#nnnn;) into plain UTF-8 [closed]

We have HTML source files which contain special characters encoded as &#nnnn; like in the word: außergewöhnlich We would like to convert them into plain UTF-8: außergew&#...

dagnelies's user avatar

dagnelies

5,345

asked Jun 22, 2010 at 13:05

5 votes

2 answers

2k views

Safely remove all html code from a string in python

I've been reading many q&a on how to remove all the html code from a string using python but none was satisfying. I need a way to remove all the tags, preserve/convert the html entities and work ...

Arjuna Del Toso's user avatar

Arjuna Del Toso

asked Apr 9, 2013 at 0:37

1 vote

1 answer

8k views

Simple example BeautifulSoup Python

I was working a simple example with BeautifulSoup, but I was getting weird resutls. Here is my code: soup = BeautifulSoup(page) print soup.prettify() stuff = soup.findAll('td', attrs={'class' : '...

James Hallen's user avatar

James Hallen

5,084

asked May 21, 2013 at 21:12

0 votes

2 answers

3k views

convert html entities to unicode(utf-8) strings in c? [duplicate]

Possible Duplicate: How to decode HTML Entities in C? This question is very similar to that one, but I need to do the same thing in C, not python. Here are some examples of what the function ...

Kim Stebel's user avatar

Kim Stebel

42.1k

asked Sep 12, 2009 at 15:09

1 vote

1 answer

3k views

How to decode html hex elements?

Part of a website I'm trying to scrape has this weird block of hex values instead of characters. How can I decode this with python? I am using urllib.request to get the page source http://www....

Natko Kraševac's user avatar

Natko Kraševac

asked Apr 7, 2015 at 18:48

0 votes

1 answer

3k views

Python, convert HTML entities to Unicode

(Edit: I'm using Python 2.7) (Edit 2: I have already checked Convert XML/HTML Entities into Unicode String in Python, the solutions do not work. Please do not flag this as already answered.) I've ...

GrantD71's user avatar

GrantD71

1,885

asked Oct 8, 2013 at 2:22

2 votes

0 answers

4k views

use of \x27 to convert to apostrophe not working in python

I tried taking some data from the web: Example:the name 'Schindler's list' is printed as 'Schindler&#x27s List' straight from the web... tried asking python to print 'Schindler\x27s list' instead ...

melony's user avatar

melony

asked May 19, 2012 at 7:01

0 votes

1 answer

2k views

Convert &#xxxx; to normal character?

lxml.etree.parse() have generate string in utf-16 file as &#xxxx; How can I convert it back? Opening output file in web browser is fine. However I still need regular string in output file, too. ...

Bonn's user avatar

Bonn

asked Sep 18, 2016 at 10:19

3 votes

2 answers

951 views

Decoding querystring parameter in Django view

I have a search form in my app that uses a jQuery autocomplete plugin. The plugin sends over the suggested item after running the querystring through encodeURI(q). So an item like Johnny's sports ...

Abid A's user avatar

Abid A

7,866

asked Sep 23, 2012 at 18:22

2 votes

0 answers

2k views

UnicodeDecodeError: 'ascii' codec can't decode byte 0xe2 in position 45: ordinal not in range(128)

Sorry for posting this again. I am getting this error UnicodeDecodeError: 'ascii' codec can't decode byte 0xe2 in position 45: ordinal not in range(128) when I run the following code strip_html(): ...

mchangun's user avatar

mchangun

10.6k

asked Oct 28, 2013 at 17:39

1 vote

2 answers

1k views

python reading unicode characters from html

I have this script, which reads the text from web page: page = urllib2.urlopen(url).read() soup = BeautifulSoup(page); paragraphs = soup.findAll('p'); for p in paragraphs: content = content+p....

python

torayeff's user avatar

torayeff

9,732

asked May 14, 2012 at 18:04

0 votes

1 answer

220 views

python - possible encoding and decoding values

I'm trying to decode chatacters which have been encoded in the following way: &#number; I tried: s.decode("utf8") and: s.decode("unicode-escape") but both not seems to work. What is the ...

tomermes's user avatar

tomermes

23.5k

asked May 11, 2013 at 9:39

2 votes

1 answer

433 views

Decoding html entities in python2

I have a string of escaped html markup , 'í', and I want it to the correct accented character 'í'. Having read around SO, this is my attempt: messy = 'í' print type(messy) >>&...

user2958776's user avatar

user2958776

asked Nov 6, 2013 at 3:49

15 30 50 per page

Prev 1

3 Next

CollectivesTM on Stack Overflow

Linked Questions

Make sequence of string.replace statements more readable

Python Text Encoding

Unescaping HTML entities (&#nnnn;) into plain UTF-8 [closed]

Safely remove all html code from a string in python

Simple example BeautifulSoup Python

convert html entities to unicode(utf-8) strings in c? [duplicate]

How to decode html hex elements?

Python, convert HTML entities to Unicode

use of \x27 to convert to apostrophe not working in python

Convert &#xxxx; to normal character?

Decoding querystring parameter in Django view

UnicodeDecodeError: 'ascii' codec can't decode byte 0xe2 in position 45: ordinal not in range(128)

python reading unicode characters from html

python - possible encoding and decoding values

Decoding html entities in python2

Hot Network Questions