1. Home
2. Questions
3. AI Assist
4. Tags
5. Challenges
6. Chat
7. Articles
8. Users
9. Companies
11. Communities for your favorite technologies. Explore all Collectives
Stack Internal

Stack Overflow for Teams is now called Stack Internal. Bring the best of human thought and AI automation together at your work.
Try for free Learn more
Bring the best of human thought and AI automation together at your work. Learn more

Linked Questions

35 questions linked to/from Convert XML/HTML Entities into Unicode String in Python

390 votes

7 answers

368k views

Decode HTML entities in Python string?

I'm parsing some HTML with Beautiful Soup 3, but it contains HTML entities which Beautiful Soup 3 doesn't automatically decode for me: >>> from BeautifulSoup import BeautifulSoup >>&...

jkp's user avatar

jkp

81.8k

asked Jan 18, 2010 at 16:08

6 votes

1 answer

49k views

How do I get rid of characters like ' that appear instead of apostrophes? [duplicate]

Possible Duplicate: Convert XML/HTML Entities into Unicode String in Python I am attempting to scrape a website using Python. I import and use the urllib2, BeautifulSoup and re modules. response =...

nindalf's user avatar

nindalf

1,126

asked Dec 22, 2011 at 17:50

3 votes

2 answers

2k views

How do I convert characters like ":" to ":" in python? [duplicate]

Possible Duplicate: Convert XML/HTML Entities into Unicode String in Python In html sources, there are tons of chars like "&# 58;" or "&# 46;" (have to put space between &# and numbers ...

Shane's user avatar

Shane

5,003

asked Feb 18, 2011 at 11:47

-1 votes

1 answer

838 views

Convert ascii characters to normal text [duplicate]

I have text like this: ‘The zoom animations everywhere on the new iOS 7 are literally making me nauseous and giving me a headache,’wroteforumuser Ensorceled. I understand that #...

user2784753's user avatar

user2784753

asked Sep 28, 2013 at 15:35

1 vote

1 answer

748 views

encoding/decoding unicode and utf-8 : Python [duplicate]

I have a html text : If I'm reading lots of articles I am trying to replace ' and other such special characters into unicode '. I did rawtxt.encode('utf-8').encode('ascii','ignore'...

Harshit's user avatar

Harshit

1,217

asked May 16, 2013 at 11:51

0 votes

2 answers

1k views

Python, XML, &#233 type encodings [duplicate]

Possible Duplicate: Convert XML/HTML Entities into Unicode String in Python I am reading an excel XML document using Python. I end up with a lot of characters such as é That ...

Neil Aggarwal's user avatar

Neil Aggarwal

asked Dec 18, 2012 at 8:09

2 votes

1 answer

256 views

Unicode encoding in python [duplicate]

Possible Duplicate: Convert XML/HTML Entities into Unicode String in Python Decode HTML entities in Python string? I am using Python 2.7 and am fairly lost in unicode type. I looked up variety ...

rodling's user avatar

rodling

asked Sep 10, 2012 at 14:55

0 votes

0 answers

28 views

Anyone know what kind of unicode is this with &# and semicolon please? how to turn it into chinese characters string in Python please? [duplicate]

I have these symbols which I am quite sure they are chinese characters. 旅行時，我生病 Please anyone know what kind of unicode is ...

Carson Yau's user avatar

Carson Yau

asked Sep 5, 2017 at 7:21

346 votes

37 answers

620k views

Extracting text from HTML file using Python

I'd like to extract the text from an HTML file using Python. I want essentially the same output I would get if I copied the text from a browser and pasted it into notepad. I'd like something more ...

John D. Cook's user avatar

John D. Cook

30.2k

asked Nov 30, 2008 at 2:28

179 votes

15 answers

254k views

How do I perform HTML decoding/encoding using Python/Django?

I have a string that is HTML encoded: '''<img class="size-medium wp-image-113"\ style="margin-left: 15px;" title="su1"\ src="...

rksprst's user avatar

rksprst

6,651

asked Nov 8, 2008 at 20:44

16 votes

3 answers

20k views

Replace html entities with the corresponding utf-8 characters in Python 2.6

I have a html text like this: <xml ... > and I want to convert it to something readable: <xml ...> Any easy (and fast) way to do it in Python?

Alexandru's user avatar

Alexandru

25.9k

asked Apr 8, 2009 at 14:32

13 votes

2 answers

13k views

Change &#39 into normal character

I'm having trouble displaying content, my program: #! /usr/bin/python import urllib import re url = "http://yahoo.com" pattern = '''<span class="medium item-label".*?>(.*)</span>''' ...

Vor's user avatar

Vor

35.6k

asked Sep 28, 2012 at 19:27

7 votes

3 answers

20k views

HTMLParser.HTMLParser().unescape() doesn't work

I would like to convert HTML entities back to its human readable format, e.g. '£' to '£', '°' to '°' etc. I've read several posts regarding this question Converting html source ...

D.Q.'s user avatar

D.Q.

asked Jul 19, 2013 at 16:48

6 votes

3 answers

2k views

Getting international characters from a web page? [duplicate]

I want to scrape some information off a football (soccer) web page using simple python regexp's. The problem is that players such as the first chap, ÄÄRITALO, comes out as ÄÄRITALO! ...

Nick Fortescue's user avatar

Nick Fortescue

44.4k

asked Sep 10, 2008 at 0:30

3 votes

1 answer

3k views

Decoding html content and HTMLParser

I'm creating a sub-class based on 'HTMLParser' to pull out html content. Whenever I have character refs such as ' ' '&' '–' '…' I'd like to replace them with ...

python

Dan Holman's user avatar

Dan Holman

asked Aug 22, 2011 at 18:51

15 30 50 per page

2 3 Next

CollectivesTM on Stack Overflow

Linked Questions

Decode HTML entities in Python string?

How do I get rid of characters like ' that appear instead of apostrophes? [duplicate]

How do I convert characters like ":" to ":" in python? [duplicate]

Convert ascii characters to normal text [duplicate]

encoding/decoding unicode and utf-8 : Python [duplicate]

Python, XML, &#233 type encodings [duplicate]

Unicode encoding in python [duplicate]

Anyone know what kind of unicode is this with &# and semicolon please? how to turn it into chinese characters string in Python please? [duplicate]

Extracting text from HTML file using Python

How do I perform HTML decoding/encoding using Python/Django?

Replace html entities with the corresponding utf-8 characters in Python 2.6

Change &#39 into normal character

HTMLParser.HTMLParser().unescape() doesn't work

Getting international characters from a web page? [duplicate]

Decoding html content and HTMLParser

Hot Network Questions