Linked Questions
107 questions linked to/from Decode HTML entities in Python string?
90
votes
6
answers
120k
views
How do I unescape HTML entities in a string in Python 3.1? [duplicate]
I have looked all around and only found solutions for python 2.6 and earlier, NOTHING on how to do this in python 3.X. (I only have access to Win7 box.)
I HAVE to be able to do this in 3.1 and ...
77
votes
10
answers
76k
views
Convert XML/HTML Entities into Unicode String in Python [duplicate]
I'm doing some web scraping and sites frequently use HTML entities to represent non ascii characters. Does Python have a utility that takes a string with HTML entities and returns a unicode type?
For ...
14
votes
4
answers
17k
views
HTML Entity Codes to Text [duplicate]
Does anyone know an easy way in Python to convert a string with HTML entity codes (e.g. < &) to a normal string (e.g. < &)?
cgi.escape() will escape strings (poorly), but there ...
8
votes
1
answer
23k
views
How can I use Python to replace HTML escape characters? [duplicate]
Possible Duplicate:
Decode HTML entities in Python string?
I have a string full of HTML escape characters such as ", ”, and —.
Do any Python libraries offer reliable ...
8
votes
1
answer
19k
views
How can I change '>' to '>' and '>' to '>'? [duplicate]
print u'<'
How can I print <
print '>'
How can I print >
6
votes
3
answers
2k
views
Getting international characters from a web page? [duplicate]
I want to scrape some information off a football (soccer) web page using simple python regexp's. The problem is that players such as the first chap, ÄÄRITALO, comes out as ÄÄRITALO!
...
2
votes
1
answer
5k
views
Removing escaped entities from a String in Python [duplicate]
I've a huge csv file of tweets. I read them both into the computer and stored them in two separate dictionaries - one for negative tweets, one for positive. I wanted to read the file in and parse it ...
2
votes
2
answers
5k
views
Replacing HTML representation to ascii using Python [duplicate]
Possible Duplicate:
Decode HTML entities in Python string?
I have parsed some HTML text. But some punctuations like apostrophe are replaced by ’. How to revert them back to `
P.S: I am ...
-1
votes
1
answer
5k
views
How to replace '&' with just '&' [duplicate]
I'm trying to write a json to csv - using Python 3.6 - and the json contains &.
How can I write just plain ampersands (&) instead of &?
I've tried str.replace using a variety of ...
1
vote
1
answer
3k
views
Convert HTML entities in plain text to characters [duplicate]
I scraped news article titles and URLs, and stored the titles and urls in a tsv file as plain text. For some reason, the scraper I use converts some characters (€ for example) into hexacode. I have ...
1
vote
1
answer
1k
views
Requests to Handle Response Encoding [duplicate]
I am using requests to request a page. The task is very simple, but I have a problem with encoding. The page contains non-ascii, Turkish characters, but in the HTML source, the result is as below:
...
0
votes
1
answer
3k
views
Python: Replace URLEncoded characters in String with what they represent [duplicate]
I've been banging my head against the wall with this for a while. I'm trying to parse an RSS feed with Python's BeautifulSoup, and every now and then I get errors like:
I don't know what I ...
0
votes
2
answers
670
views
Parsing malformed string in python [duplicate]
Possible Duplicate:
Decode HTML entities in Python string?
I have a malformed string in Python:
Muhammad Ali's fight with Larry Holmes
where ' is a apostrophe.
Firstly what ...
Bruce's user avatar
- 35.5k
0
votes
1
answer
610
views
Splitting on a lookahead [duplicate]
I'm trying to split on a lookahead, but it doesn't work for the last occurrence. How do I do this?
my_str = 'HRC’s'
import re
print(re.split(r'.(?=&)', my_str))
My ...
-2
votes
1
answer
2k
views
it's instead of it's in python string [duplicate]
My python string consists of ' instead of ' (single quotes). My current objective is to expand compound words like It's to It is, Haven't to Have not.
"This has been great for me. I&#...