Python: Replace URLEncoded characters in String with what they represent [duplicate]

Asked 10 years, 9 months ago

Viewed 3k times

I've been banging my head against the wall with this for a while. I'm trying to parse an RSS feed with Python's BeautifulSoup, and every now and then I get errors like:

I don&#39;t know what I am talking about

I can't seem to find any python library that will replace those characters with what they should be, so the resulting string looks like this:

I don't know what I am talking about

The closest I've gotten was

urllib.unquote(post_content).decode('utf-8')

But that still does not replace the url encoded character with a '. Does anyone know a good way to replace those urlencoded characters into the ascii characters they represent? There's also other errors that I get like ( and ) appearing as ( and )

Improve this question

asked Mar 16, 2015 at 1:19

user3716714's user avatar

user3716714

231 silver badge4 bronze badges

This question is more suited to Stack Overflow. Programmers SE is about program design issues, not specific questions about source code.

logc
– logc

2015年03月16日 14:01:47 +00:00
Commented Mar 16, 2015 at 14:01

Add a comment |

1 Answer 1

Sorted by: Reset to default

Those weird strings are called html entities. You can decode them as described by this URL: Decode HTML entities in Python string?. It says to use the function unescape from the module html.parse

Improve this answer

edited May 23, 2017 at 11:50

Community's user avatar

Community Bot

11 silver badge

answered Mar 16, 2015 at 4:28

jkd's user avatar

jkd

1,0451 gold badge11 silver badges27 bronze badges

Comments

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.

lang-py

CollectivesTM on Stack Overflow

Python: Replace URLEncoded characters in String with what they represent [duplicate]

1 Answer 1

Comments

Linked

Hot Network Questions

CollectivesTM on Stack Overflow

1 Answer 1

Comments

Linked

Related