Linked Questions

107 questions linked to/from Decode HTML entities in Python string?
90 votes
6 answers
120k views

I have looked all around and only found solutions for python 2.6 and earlier, NOTHING on how to do this in python 3.X. (I only have access to Win7 box.) I HAVE to be able to do this in 3.1 and ...
77 votes
10 answers
76k views

I'm doing some web scraping and sites frequently use HTML entities to represent non ascii characters. Does Python have a utility that takes a string with HTML entities and returns a unicode type? For ...
Cristian's user avatar
  • 44.2k
14 votes
4 answers
17k views

Does anyone know an easy way in Python to convert a string with HTML entity codes (e.g. &lt; &amp;) to a normal string (e.g. < &)? cgi.escape() will escape strings (poorly), but there ...
tghw's user avatar
  • 25.4k
8 votes
1 answer
23k views

Possible Duplicate: Decode HTML entities in Python string? I have a string full of HTML escape characters such as &quot;, &rdquo;, and &mdash;. Do any Python libraries offer reliable ...
8 votes
1 answer
19k views

print u'&lt;' How can I print < print '>' How can I print &gt;
zjm1126's user avatar
  • 67.5k
6 votes
3 answers
2k views

I want to scrape some information off a football (soccer) web page using simple python regexp's. The problem is that players such as the first chap, ÄÄRITALO, comes out as &#196;&#196;RITALO! ...
2 votes
1 answer
5k views

I've a huge csv file of tweets. I read them both into the computer and stored them in two separate dictionaries - one for negative tweets, one for positive. I wanted to read the file in and parse it ...
2 votes
2 answers
5k views

Possible Duplicate: Decode HTML entities in Python string? I have parsed some HTML text. But some punctuations like apostrophe are replaced by &#8217;. How to revert them back to ` P.S: I am ...
bdhar's user avatar
  • 23.4k
-1 votes
1 answer
5k views

I'm trying to write a json to csv - using Python 3.6 - and the json contains &amp;. How can I write just plain ampersands (&) instead of &amp;? I've tried str.replace using a variety of ...
John's user avatar
  • 1
1 vote
1 answer
3k views

I scraped news article titles and URLs, and stored the titles and urls in a tsv file as plain text. For some reason, the scraper I use converts some characters (€ for example) into hexacode. I have ...
1 vote
1 answer
1k views

I am using requests to request a page. The task is very simple, but I have a problem with encoding. The page contains non-ascii, Turkish characters, but in the HTML source, the result is as below: ...
0 votes
1 answer
3k views

I've been banging my head against the wall with this for a while. I'm trying to parse an RSS feed with Python's BeautifulSoup, and every now and then I get errors like: I don&#39;t know what I ...
user3716714's user avatar
0 votes
2 answers
670 views

Possible Duplicate: Decode HTML entities in Python string? I have a malformed string in Python: Muhammad Ali&#39;s fight with Larry Holmes where &#39; is a apostrophe. Firstly what ...
Bruce's user avatar
  • 35.5k
0 votes
1 answer
610 views

I'm trying to split on a lookahead, but it doesn't work for the last occurrence. How do I do this? my_str = 'HRC&#226;&#128;&#153;s' import re print(re.split(r'.(?=&)', my_str)) My ...
mtkilic's user avatar
  • 1,253
-2 votes
1 answer
2k views

My python string consists of &#039; instead of ' (single quotes). My current objective is to expand compound words like It's to It is, Haven't to Have not. "This has been great for me. I&#...
2 votes
2 answers
252 views

The task is to convert &#1044;&#1077; to Де. Does Python 3 has builtin function or I need to parse this string and then use builtin chr method to convert each number to string?
Artem Rys's user avatar
  • 147
2 votes
1 answer
256 views

Possible Duplicate: Convert XML/HTML Entities into Unicode String in Python Decode HTML entities in Python string? I am using Python 2.7 and am fairly lost in unicode type. I looked up variety ...
rodling's user avatar
  • 998
0 votes
1 answer
373 views

I have used the strip_tags function. It removes tags like "<p>, <b>", etc but things like "&nbsp; &ldquo;" and other html encodings remain. How to remove them ?
0 votes
1 answer
242 views

I'm getting data from a website and this is an example of a sentence I retrieved : PHA+Q29ycmlnJmVhY3V0ZTtzIGV4ZXJjaWNlcyBlbnRyYWluZW1lbnQgY2hhcGl0cmUgbW91dmVtZW50IGV0IGZvcmNlczwvcD4K The sentence is ...
0 votes
1 answer
203 views

I have a string like below: THE SMASH-HIT, CRITICALLY ACCLAIMED SERIES RETURNS! Now that you&#39;ve read the first two bestselling collections of SAGA , you&#39;re all caught up and ready to ...
Tim Bueno's user avatar
  • 401
0 votes
1 answer
137 views

I want to convert in Python 2.7 string like "&#128;", "&#380;" and similar to UTF-8 string. How to do it?
-2 votes
1 answer
163 views

I am trying to scrape a webpage whose charset like this <meta http-equiv="Content-Type" content="text/html; charset=UTF-8" /> and when I get the page source using python requests, I get content ...
Arman's user avatar
  • 25
0 votes
1 answer
204 views

I have a string "hello&#91; World&#93;" and I want to convert it to "hello[World]" I tried something like this: a.encode("utf-8").decode("ascii") I got back same string as input.
javaMan's user avatar
  • 6,720
0 votes
1 answer
86 views

I am trying scraping and meet an issue about the words shows as '&#xe091;'and '&#xe3c4;', i serach the whole network but there's no answer about how to decode it, so I come to here to ask for ...
M_Sea's user avatar
  • 491
0 votes
1 answer
90 views

I have an email address which is encoded su&#098;&#105;&#046;&#098;&#104;&#097;&#115;&#107;&#097;&#114;&#097;&#110;@in.ibm.co&#109; I have to ...
Mounarajan's user avatar
  • 1,437
-2 votes
1 answer
60 views

I have a database of texts, some of which contain unconverted hex codes: import pandas as pd example = pd.DataFrame({"content": ["Zwischen den Parteien ist unstreitig, dass Sch&#...
hyperinfer's user avatar
-1 votes
1 answer
45 views

I used python to get html page from a japanese comic site, and used regex to only extract some titles of chapters of the comics. I can get most of them correctly as it is but some of them comes in ...
modbender's user avatar
  • 429
1 vote
1 answer
189 views

I am having problems with code created in Python, and it is that when I generate some texts in json, the accents are not appreciated. This is the code I'm using: import requests url = requests.get(f&...
0 votes
0 answers
37 views

I have a curious problem where I am given many strings in a list. For instance, the list may be: [ "Cartier 'D&#233;claration d'un Soir'", "Hue Cool", "Lagos Caviar&#8482; Hoop" ] ...
robert's user avatar
  • 819
0 votes
0 answers
31 views

I'm parsing a large XML file using lxml in Python 3 that has HTML character codes (e.g. &lsqb; and &rsqb;). Here's an example of the problem and an example of my attempt to use html....
EpicAdv's user avatar
  • 1,222
0 votes
0 answers
28 views

I am fetching a string from a website and it is returning the string with some sort of weird charachters in it Pok&#233;mon &#38; Magic the Gathering How can I easily convert it to say ...
0 votes
0 answers
18 views

I have a list with results from re.findall from a web page. List items contains & # 171; & # 8211; etc. How do i remove these substrings from list items ?
321's user avatar
  • 1
42 votes
9 answers
17k views

I'm using beautifulsoup with html5lib, it puts the html, head and body tags automatically: BeautifulSoup('<h1>FOO</h1>', 'html5lib') # => <html><head></head><body&...
20 votes
4 answers
21k views

I've got a string from an HTTP header, but it's been escaped.. what function can I use to unescape it? myemail%40gmail.com -> [email protected] Would urllib.unquote() be the way to go?
Ian's user avatar
  • 25.6k
20 votes
4 answers
43k views

I am downloading HTML pages that have data defined in them in the following way: ... <script type= "text/javascript"> window.blog.data = {"activity":{"type":"read"}}; </script> ... I ...
16 votes
3 answers
20k views

I have a html text like this: &lt;xml ... &gt; and I want to convert it to something readable: <xml ...> Any easy (and fast) way to do it in Python?
9 votes
4 answers
16k views

I'd like to be able to use unicode in my python string. For instance I have an icon: icon = '&#x25B2;' print icon which should create icon = '▲' but instead it literally returns it in ...
Modelesq's user avatar
  • 5,452
7 votes
3 answers
20k views

I would like to convert HTML entities back to its human readable format, e.g. '&pound;' to '£', '&deg;' to '°' etc. I've read several posts regarding this question Converting html source ...
D.Q.'s user avatar
  • 547
8 votes
3 answers
6k views

The solutions in other answers do not work when I try them, the same string outputs when I try those methods. I am trying to do web scraping using Python 2.7. I have the webpage downloaded and it has ...
Ivankovich's user avatar
5 votes
1 answer
6k views

I have a reference list containing function names in this format: list_1_ref = ['Name1<abc>' , 'Name2<abc>'] From another function I get a return list containing elements with html format:...
Chris's user avatar
  • 95
7 votes
2 answers
7k views

I'd hate to open a new question even though many questions have been opened on this same topic, but I'm literally at my ends as to why this isn't working. I am attempting to create a JSON object ...
Dorian Dore's user avatar
4 votes
2 answers
7k views

There's a xml file: <body> <entry> I go to <hw>to</hw> to school. </entry> </body> For some reason, I changed <hw> to &lt;hw&gt; and ...
user1610952's user avatar
  • 1,309
8 votes
1 answer
2k views

I am trying to use google translate api as below. Translation seems ok except the apostrophe chars which are returned as &#39; instaead. Is it possible to fix those ? I can of course make a ...
2 votes
2 answers
4k views

I'm trying to read a CSV file with pandas read_csv. The data looks like this (example) thing;weight;price;colour apple;1;2;red m &amp; m's;0;10;several cherry;0,5;2;dark red Because of the HTML-...
user avatar
3 votes
3 answers
2k views

When I'm processing HTML code in Python I have to use the following code because of special characters. line = string.replace(line, "&quot;", "\"") line = string.replace(line, "&apos;", "'") ...
xralf's user avatar
  • 3,792
2 votes
3 answers
6k views

I'm using ajax and django for dynamically populate a combo box. ajax component works really fine and it parse the data to the view but int the view, when i'm using the spiting function it gives me a ...
0 votes
2 answers
907 views

Input text: Ell &#233;s la v&#237;ctima que expia els nostres pecats, i no tan sols els nostres, sin&#243; els del m&#243;n sencer. Expected output: Ell és la víctima que expia ...
josifoski's user avatar
  • 1,726
0 votes
2 answers
3k views

I recieve HTML-files and they contain Strings like that &quot; ("), &#252;(ü) and so on. I need them humand-readable. So I could use str.replace() for that. But isn't there a package/library ...
buhtz's user avatar
  • 12.5k
0 votes
3 answers
3k views

I'm currently learning how to parse xml data using elementtree. I got an error that say:ParseError: not well-formed (invalid token): line 1, column 2. My code is right below, and a bit of the xml ...
2 votes
1 answer
2k views

Say I have the following HTML emoji entity: '&#x1f604 ;' Note there isn't actually a space between the 4 and the ; it's just there so that it doesn't show up as a smiley The emoji's Python form ...

15 30 50 per page
1
2 3