Linked Questions

107 questions linked to/from Decode HTML entities in Python string?
90 votes
6 answers
120k views

I have looked all around and only found solutions for python 2.6 and earlier, NOTHING on how to do this in python 3.X. (I only have access to Win7 box.) I HAVE to be able to do this in 3.1 and ...
77 votes
10 answers
76k views

I'm doing some web scraping and sites frequently use HTML entities to represent non ascii characters. Does Python have a utility that takes a string with HTML entities and returns a unicode type? For ...
Cristian's user avatar
  • 44.2k
14 votes
4 answers
17k views

Does anyone know an easy way in Python to convert a string with HTML entity codes (e.g. &lt; &amp;) to a normal string (e.g. < &)? cgi.escape() will escape strings (poorly), but there ...
tghw's user avatar
  • 25.4k
8 votes
1 answer
23k views

Possible Duplicate: Decode HTML entities in Python string? I have a string full of HTML escape characters such as &quot;, &rdquo;, and &mdash;. Do any Python libraries offer reliable ...
8 votes
1 answer
19k views

print u'&lt;' How can I print < print '>' How can I print &gt;
zjm1126's user avatar
  • 67.5k
6 votes
3 answers
2k views

I want to scrape some information off a football (soccer) web page using simple python regexp's. The problem is that players such as the first chap, ÄÄRITALO, comes out as &#196;&#196;RITALO! ...
2 votes
1 answer
5k views

I've a huge csv file of tweets. I read them both into the computer and stored them in two separate dictionaries - one for negative tweets, one for positive. I wanted to read the file in and parse it ...
2 votes
2 answers
5k views

Possible Duplicate: Decode HTML entities in Python string? I have parsed some HTML text. But some punctuations like apostrophe are replaced by &#8217;. How to revert them back to ` P.S: I am ...
bdhar's user avatar
  • 23.4k
-1 votes
1 answer
5k views

I'm trying to write a json to csv - using Python 3.6 - and the json contains &amp;. How can I write just plain ampersands (&) instead of &amp;? I've tried str.replace using a variety of ...
John's user avatar
  • 1
1 vote
1 answer
3k views

I scraped news article titles and URLs, and stored the titles and urls in a tsv file as plain text. For some reason, the scraper I use converts some characters (€ for example) into hexacode. I have ...
1 vote
1 answer
1k views

I am using requests to request a page. The task is very simple, but I have a problem with encoding. The page contains non-ascii, Turkish characters, but in the HTML source, the result is as below: ...
0 votes
1 answer
3k views

I've been banging my head against the wall with this for a while. I'm trying to parse an RSS feed with Python's BeautifulSoup, and every now and then I get errors like: I don&#39;t know what I ...
user3716714's user avatar
0 votes
2 answers
670 views

Possible Duplicate: Decode HTML entities in Python string? I have a malformed string in Python: Muhammad Ali&#39;s fight with Larry Holmes where &#39; is a apostrophe. Firstly what ...
Bruce's user avatar
  • 35.5k
0 votes
1 answer
610 views

I'm trying to split on a lookahead, but it doesn't work for the last occurrence. How do I do this? my_str = 'HRC&#226;&#128;&#153;s' import re print(re.split(r'.(?=&)', my_str)) My ...
mtkilic's user avatar
  • 1,253
-2 votes
1 answer
2k views

My python string consists of &#039; instead of ' (single quotes). My current objective is to expand compound words like It's to It is, Haven't to Have not. "This has been great for me. I&#...

15 30 50 per page
1
2 3 4 5
...
8