Linked Questions
107 questions linked to/from Decode HTML entities in Python string?
90
votes
6
answers
120k
views
How do I unescape HTML entities in a string in Python 3.1? [duplicate]
I have looked all around and only found solutions for python 2.6 and earlier, NOTHING on how to do this in python 3.X. (I only have access to Win7 box.)
I HAVE to be able to do this in 3.1 and ...
77
votes
10
answers
76k
views
Convert XML/HTML Entities into Unicode String in Python [duplicate]
I'm doing some web scraping and sites frequently use HTML entities to represent non ascii characters. Does Python have a utility that takes a string with HTML entities and returns a unicode type?
For ...
14
votes
4
answers
17k
views
HTML Entity Codes to Text [duplicate]
Does anyone know an easy way in Python to convert a string with HTML entity codes (e.g. < &) to a normal string (e.g. < &)?
cgi.escape() will escape strings (poorly), but there ...
8
votes
1
answer
23k
views
How can I use Python to replace HTML escape characters? [duplicate]
Possible Duplicate:
Decode HTML entities in Python string?
I have a string full of HTML escape characters such as ", ”, and —.
Do any Python libraries offer reliable ...
8
votes
1
answer
19k
views
How can I change '>' to '>' and '>' to '>'? [duplicate]
print u'<'
How can I print <
print '>'
How can I print >
6
votes
3
answers
2k
views
Getting international characters from a web page? [duplicate]
I want to scrape some information off a football (soccer) web page using simple python regexp's. The problem is that players such as the first chap, ÄÄRITALO, comes out as ÄÄRITALO!
...
2
votes
1
answer
5k
views
Removing escaped entities from a String in Python [duplicate]
I've a huge csv file of tweets. I read them both into the computer and stored them in two separate dictionaries - one for negative tweets, one for positive. I wanted to read the file in and parse it ...
2
votes
2
answers
5k
views
Replacing HTML representation to ascii using Python [duplicate]
Possible Duplicate:
Decode HTML entities in Python string?
I have parsed some HTML text. But some punctuations like apostrophe are replaced by ’. How to revert them back to `
P.S: I am ...
-1
votes
1
answer
5k
views
How to replace '&' with just '&' [duplicate]
I'm trying to write a json to csv - using Python 3.6 - and the json contains &.
How can I write just plain ampersands (&) instead of &?
I've tried str.replace using a variety of ...
1
vote
1
answer
3k
views
Convert HTML entities in plain text to characters [duplicate]
I scraped news article titles and URLs, and stored the titles and urls in a tsv file as plain text. For some reason, the scraper I use converts some characters (€ for example) into hexacode. I have ...
1
vote
1
answer
1k
views
Requests to Handle Response Encoding [duplicate]
I am using requests to request a page. The task is very simple, but I have a problem with encoding. The page contains non-ascii, Turkish characters, but in the HTML source, the result is as below:
...
0
votes
1
answer
3k
views
Python: Replace URLEncoded characters in String with what they represent [duplicate]
I've been banging my head against the wall with this for a while. I'm trying to parse an RSS feed with Python's BeautifulSoup, and every now and then I get errors like:
I don't know what I ...
0
votes
2
answers
670
views
Parsing malformed string in python [duplicate]
Possible Duplicate:
Decode HTML entities in Python string?
I have a malformed string in Python:
Muhammad Ali's fight with Larry Holmes
where ' is a apostrophe.
Firstly what ...
Bruce's user avatar
- 35.5k
0
votes
1
answer
610
views
Splitting on a lookahead [duplicate]
I'm trying to split on a lookahead, but it doesn't work for the last occurrence. How do I do this?
my_str = 'HRC’s'
import re
print(re.split(r'.(?=&)', my_str))
My ...
-2
votes
1
answer
2k
views
it's instead of it's in python string [duplicate]
My python string consists of ' instead of ' (single quotes). My current objective is to expand compound words like It's to It is, Haven't to Have not.
"This has been great for me. I&#...
2
votes
2
answers
252
views
Python 3 function to convert "Д" to string [duplicate]
The task is to convert Де to Де.
Does Python 3 has builtin function or I need to parse this string and then use builtin chr method to convert each number to string?
2
votes
1
answer
256
views
Unicode encoding in python [duplicate]
Possible Duplicate:
Convert XML/HTML Entities into Unicode String in Python
Decode HTML entities in Python string?
I am using Python 2.7 and am fairly lost in unicode type. I looked up variety ...
0
votes
1
answer
373
views
how to remove html encodings from a string in django [duplicate]
I have used the strip_tags function. It removes tags like "<p>, <b>", etc but things like " “" and other html encodings remain. How to remove them ?
0
votes
1
answer
242
views
Python problem with accents when decoding from base64 [duplicate]
I'm getting data from a website and this is an example of a sentence I retrieved : PHA+Q29ycmlnJmVhY3V0ZTtzIGV4ZXJjaWNlcyBlbnRyYWluZW1lbnQgY2hhcGl0cmUgbW91dmVtZW50IGV0IGZvcmNlczwvcD4K
The sentence is ...
0
votes
1
answer
203
views
Encoding a string with ASCII Characters (Find and replace?) [duplicate]
I have a string like below:
THE SMASH-HIT, CRITICALLY ACCLAIMED SERIES RETURNS! Now that you've read the first two bestselling collections of SAGA , you're all caught up and ready to ...
0
votes
1
answer
137
views
Python - How to convert HTML entity to UTF-8 [duplicate]
I want to convert in Python 2.7 string like
"€", "ż"
and similar to UTF-8 string.
How to do it?
-2
votes
1
answer
163
views
decode utf-8 content in python [duplicate]
I am trying to scrape a webpage whose charset like this
<meta http-equiv="Content-Type" content="text/html; charset=UTF-8" />
and when I get the page source using python requests, I get content ...
0
votes
1
answer
204
views
Python print ascii character in string instead of value [duplicate]
I have a string "hello[ World]" and I want to convert it to "hello[World]"
I tried something like this:
a.encode("utf-8").decode("ascii")
I got back same string as input.
0
votes
1
answer
86
views
python decode the words beginning with  such as '' and '' [duplicate]
I am trying scraping and meet an issue about the words shows as ''and '', i serach the whole network but there's no answer about how to decode it, so I come to here to ask for ...
0
votes
1
answer
90
views
Decoding encoded email id in python [duplicate]
I have an email address which is encoded
subi.bhaskaran@in.ibm.com
I have to ...
-2
votes
1
answer
60
views
Unicode translation [duplicate]
I have a database of texts, some of which contain unconverted hex codes:
import pandas as pd
example = pd.DataFrame({"content": ["Zwischen den Parteien ist unstreitig, dass Sch&#...
-1
votes
1
answer
45
views
How to convert this text to these words using python? [duplicate]
I used python to get html page from a japanese comic site, and used regex to only extract some titles of chapters of the comics. I can get most of them correctly as it is but some of them comes in ...
1
vote
1
answer
189
views
When generating texts in JSON, accented characters look different [duplicate]
I am having problems with code created in Python, and it is that when I generate some texts in json, the accents are not appreciated.
This is the code I'm using:
import requests
url = requests.get(f&...
0
votes
0
answers
37
views
Best way to convert ascii to characters [duplicate]
I have a curious problem where I am given many strings in a list. For instance, the list may be:
[
"Cartier 'Déclaration d'un Soir'",
"Hue Cool",
"Lagos Caviar™ Hoop"
]
...
0
votes
0
answers
31
views
How to deal with malformed XML with HTML character codes in lxml [duplicate]
I'm parsing a large XML file using lxml in Python 3 that has HTML character codes (e.g. [ and ]).
Here's an example of the problem and an example of my attempt to use html....
0
votes
0
answers
28
views
Why are weird encoded characters in my string? [duplicate]
I am fetching a string from a website and it is returning the string with some sort of weird charachters in it
Pokémon & Magic the Gathering
How can I easily convert it to say ...
0
votes
0
answers
18
views
Python: How to remove html entities from list items? [duplicate]
I have a list with results from re.findall from a web page.
List items contains & # 171; & # 8211; etc.
How do i remove these substrings from list items ?
42
votes
9
answers
17k
views
Don't put html, head and body tags automatically, beautifulsoup
I'm using beautifulsoup with html5lib, it puts the html, head and body tags automatically:
BeautifulSoup('<h1>FOO</h1>', 'html5lib') # => <html><head></head><body&...
20
votes
4
answers
21k
views
Unescape Python Strings From HTTP
I've got a string from an HTTP header, but it's been escaped.. what function can I use to unescape it?
myemail%40gmail.com -> [email protected]
Would urllib.unquote() be the way to go?
20
votes
4
answers
43k
views
How to extract a JSON object that was defined in a HTML page javascript block using Python?
I am downloading HTML pages that have data defined in them in the following way:
... <script type= "text/javascript"> window.blog.data = {"activity":{"type":"read"}}; </script> ...
I ...
16
votes
3
answers
20k
views
Replace html entities with the corresponding utf-8 characters in Python 2.6
I have a html text like this:
<xml ... >
and I want to convert it to something readable:
<xml ...>
Any easy (and fast) way to do it in Python?
9
votes
4
answers
16k
views
How to use Unicode characters in a python string
I'd like to be able to use unicode in my python string. For instance I have an icon:
icon = '▲'
print icon
which should create icon = '▲'
but instead it literally returns it in ...
7
votes
3
answers
20k
views
HTMLParser.HTMLParser().unescape() doesn't work
I would like to convert HTML entities back to its human readable format, e.g. '£' to '£', '°' to '°' etc.
I've read several posts regarding this question
Converting html source ...
8
votes
3
answers
6k
views
Decoding ampersand hash strings (|xa)etc
The solutions in other answers do not work when I try them, the same string outputs when I try those methods.
I am trying to do web scraping using Python 2.7. I have the webpage downloaded and it has ...
5
votes
1
answer
6k
views
Python: how to replace html characters < with < and > with > in a Python list
I have a reference list containing function names in this format:
list_1_ref = ['Name1<abc>' , 'Name2<abc>']
From another function I get a return list containing elements with html format:...
7
votes
2
answers
7k
views
JSON.parse() returns a string instead of object
I'd hate to open a new question even though many questions have been opened on this same topic, but I'm literally at my ends as to why this isn't working.
I am attempting to create a JSON object ...
4
votes
2
answers
7k
views
How to convert < into < in lxml, Python?
There's a xml file:
<body>
<entry>
I go to <hw>to</hw> to school.
</entry>
</body>
For some reason, I changed <hw> to <hw> and ...
8
votes
1
answer
2k
views
google translate api does not return apostrophe as apostrophe in python
I am trying to use google translate api as below. Translation seems ok except the apostrophe chars which are returned as ' instaead.
Is it possible to fix those ? I can of course make a ...
2
votes
2
answers
4k
views
Reading CSV files with python (pandas) when there is HTML escaped string in there
I'm trying to read a CSV file with pandas read_csv. The data looks like this (example)
thing;weight;price;colour
apple;1;2;red
m & m's;0;10;several
cherry;0,5;2;dark red
Because of the HTML-...
user avatar
user4720834
3
votes
3
answers
2k
views
Make sequence of string.replace statements more readable
When I'm processing HTML code in Python I have to use the following code because of special characters.
line = string.replace(line, """, "\"")
line = string.replace(line, "'", "'")
...
xralf's user avatar
- 3,792
2
votes
3
answers
6k
views
django:ValueError need more than 1 value to unpack
I'm using ajax and django for dynamically populate a combo box. ajax component works really fine and it parse the data to the view but int the view, when i'm using the spiting function it gives me a ...
0
votes
2
answers
907
views
python re.sub with variable
Input text:
Ell és la víctima que expia els nostres pecats, i no tan sols els nostres, sinó els del món sencer.
Expected output:
Ell és la víctima que expia ...
0
votes
2
answers
3k
views
Replace HTML-special-character-codes in Python3 [duplicate]
I recieve HTML-files and they contain Strings like that " ("), ü(ü) and so on.
I need them humand-readable. So I could use str.replace() for that. But isn't there a package/library ...
0
votes
3
answers
3k
views
Parsing large xml data using python's elementtree
I'm currently learning how to parse xml data using elementtree. I got an error that say:ParseError: not well-formed (invalid token): line 1, column 2.
My code is right below, and a bit of the xml ...
2
votes
1
answer
2k
views
Convert HTML Entity to Python Emoji
Say I have the following HTML emoji entity: '😄 ;'
Note there isn't actually a space between the 4 and the ; it's just there so that it doesn't show up as a smiley
The emoji's Python form ...