Linked Questions

Question 1

I'm parsing some HTML with Beautiful Soup 3, but it contains HTML entities which Beautiful Soup 3 doesn't automatically decode for me: >>> from BeautifulSoup import BeautifulSoup >>&...

Question 2

Possible Duplicate: Convert XML/HTML Entities into Unicode String in Python I am attempting to scrape a website using Python. I import and use the urllib2, BeautifulSoup and re modules. response =...

Question 3

Possible Duplicate: Convert XML/HTML Entities into Unicode String in Python In html sources, there are tons of chars like "&# 58;" or "&# 46;" (have to put space between &# and numbers ...

Question 4

I have text like this: ‘The zoom animations everywhere on the new iOS 7 are literally making me nauseous and giving me a headache,’wroteforumuser Ensorceled. I understand that #...

Question 5

I have a html text : If I'm reading lots of articles I am trying to replace ' and other such special characters into unicode '. I did rawtxt.encode('utf-8').encode('ascii','ignore'...

Question 6

Possible Duplicate: Convert XML/HTML Entities into Unicode String in Python I am reading an excel XML document using Python. I end up with a lot of characters such as é That ...

Question 7

Possible Duplicate: Convert XML/HTML Entities into Unicode String in Python Decode HTML entities in Python string? I am using Python 2.7 and am fairly lost in unicode type. I looked up variety ...

Question 8

I have these symbols which I am quite sure they are chinese characters. 旅行時，我生病 Please anyone know what kind of unicode is ...

Question 9

I'd like to extract the text from an HTML file using Python. I want essentially the same output I would get if I copied the text from a browser and pasted it into notepad. I'd like something more ...

Question 10

I have a string that is HTML encoded: '''<img class="size-medium wp-image-113"\ style="margin-left: 15px;" title="su1"\ src="...

Question 11

I have a html text like this: <xml ... > and I want to convert it to something readable: <xml ...> Any easy (and fast) way to do it in Python?

Question 12

I'm having trouble displaying content, my program: #! /usr/bin/python import urllib import re url = "http://yahoo.com" pattern = '''<span class="medium item-label".*?>(.*)</span>''' ...

Question 13

I would like to convert HTML entities back to its human readable format, e.g. '£' to '£', '°' to '°' etc. I've read several posts regarding this question Converting html source ...

Question 14

I want to scrape some information off a football (soccer) web page using simple python regexp's. The problem is that players such as the first chap, ÄÄRITALO, comes out as ÄÄRITALO! ...

Question 15

I'm creating a sub-class based on 'HTMLParser' to pull out html content. Whenever I have character refs such as ' ' '&' '–' '…' I'd like to replace them with ...

Question 16

When I'm processing HTML code in Python I have to use the following code because of special characters. line = string.replace(line, """, "\"") line = string.replace(line, "'", "'") ...

Question 17

I have this text in a file - Recuérdame (notice it's a French word). When I read this file with a python script, I get this text as Recuérdame. I read it as a unicode string. Do I need to ...

Question 18

We have HTML source files which contain special characters encoded as &#nnnn; like in the word: außergewöhnlich We would like to convert them into plain UTF-8: außergew&#...

Question 19

I've been reading many q&a on how to remove all the html code from a string using python but none was satisfying. I need a way to remove all the tags, preserve/convert the html entities and work ...

Question 20

I was working a simple example with BeautifulSoup, but I was getting weird resutls. Here is my code: soup = BeautifulSoup(page) print soup.prettify() stuff = soup.findAll('td', attrs={'class' : '...

Question 21

Possible Duplicate: How to decode HTML Entities in C? This question is very similar to that one, but I need to do the same thing in C, not python. Here are some examples of what the function ...

Question 22

Part of a website I'm trying to scrape has this weird block of hex values instead of characters. How can I decode this with python? I am using urllib.request to get the page source http://www....

Question 23

(Edit: I'm using Python 2.7) (Edit 2: I have already checked Convert XML/HTML Entities into Unicode String in Python, the solutions do not work. Please do not flag this as already answered.) I've ...

Question 24

I tried taking some data from the web: Example:the name 'Schindler's list' is printed as 'Schindler&#x27s List' straight from the web... tried asking python to print 'Schindler\x27s list' instead ...

Question 25

lxml.etree.parse() have generate string in utf-16 file as &#xxxx; How can I convert it back? Opening output file in web browser is fine. However I still need regular string in output file, too. ...

Question 26

I have a search form in my app that uses a jQuery autocomplete plugin. The plugin sends over the suggested item after running the querystring through encodeURI(q). So an item like Johnny's sports ...

Question 27

Sorry for posting this again. I am getting this error UnicodeDecodeError: 'ascii' codec can't decode byte 0xe2 in position 45: ordinal not in range(128) when I run the following code strip_html(): ...

Question 28

I have this script, which reads the text from web page: page = urllib2.urlopen(url).read() soup = BeautifulSoup(page); paragraphs = soup.findAll('p'); for p in paragraphs: content = content+p....

Question 29

I'm trying to decode chatacters which have been encoded in the following way: &#number; I tried: s.decode("utf8") and: s.decode("unicode-escape") but both not seems to work. What is the ...

Question 30

I have a string of escaped html markup , 'í', and I want it to the correct accented character 'í'. Having read around SO, this is my attempt: messy = 'í' print type(messy) >>&...

Question 31

The problem from bs4 import BeautifulSoup a=BeautifulSoup('<p class="t5">₹ 10,000 or $ 133.46</p>') b=open('file.html','w') b.write(str(a)) The result is ...

Question 32

I have the following description I want scrap using my program. <hr>Provides AFROTC cadets up to 13 options for practical leadership and specialized training through exposure to USAF ...

Question 33

I am trying to make an offline copy of this website: ieeghn. Part of this task is to download all css/js that being referred to using Beautiful Soup and modify any external link to this newly ...

Question 34

I am writing program, which collects data (title,author,article) from web page with news article. I use Readability Python library. My problem is that content(which programm) of article (if article is ...

Question 35

I am getting HTML entity from server as JSON response example 😁 => 😁 , now i wish to show this emoji on my button. if i receive unicode for emoji it's works fine simply placing ...

CollectivesTM on Stack Overflow

Linked Questions

Decode HTML entities in Python string?

How do I get rid of characters like ' that appear instead of apostrophes? [duplicate]

How do I convert characters like ":" to ":" in python? [duplicate]

Convert ascii characters to normal text [duplicate]

encoding/decoding unicode and utf-8 : Python [duplicate]

Python, XML, &#233 type encodings [duplicate]

Unicode encoding in python [duplicate]

Anyone know what kind of unicode is this with &# and semicolon please? how to turn it into chinese characters string in Python please? [duplicate]

Extracting text from HTML file using Python

How do I perform HTML decoding/encoding using Python/Django?

Replace html entities with the corresponding utf-8 characters in Python 2.6

Change &#39 into normal character

HTMLParser.HTMLParser().unescape() doesn't work

Getting international characters from a web page? [duplicate]

Decoding html content and HTMLParser

Make sequence of string.replace statements more readable

Python Text Encoding

Unescaping HTML entities (&#nnnn;) into plain UTF-8 [closed]

Safely remove all html code from a string in python

Simple example BeautifulSoup Python

convert html entities to unicode(utf-8) strings in c? [duplicate]

How to decode html hex elements?

Python, convert HTML entities to Unicode

use of \x27 to convert to apostrophe not working in python

Convert &#xxxx; to normal character?

Decoding querystring parameter in Django view

UnicodeDecodeError: 'ascii' codec can't decode byte 0xe2 in position 45: ordinal not in range(128)

python reading unicode characters from html

python - possible encoding and decoding values

Decoding html entities in python2

I want to save HTML Entity (hex) from bs4 beautifulSoup object into a file

Inquiry: Why is my regex code not reading all characters?

How to properly replace the contents of text file

Encoding of content

Convert HTML entity to iOS Emoji? [duplicate]

Hot Network Questions