0

I've been banging my head against the wall with this for a while. I'm trying to parse an RSS feed with Python's BeautifulSoup, and every now and then I get errors like:

I don't know what I am talking about

I can't seem to find any python library that will replace those characters with what they should be, so the resulting string looks like this:

I don't know what I am talking about

The closest I've gotten was

urllib.unquote(post_content).decode('utf-8')

But that still does not replace the url encoded character with a '. Does anyone know a good way to replace those urlencoded characters into the ascii characters they represent? There's also other errors that I get like ( and ) appearing as ( and )

asked Mar 16, 2015 at 1:19
1
  • This question is more suited to Stack Overflow. Programmers SE is about program design issues, not specific questions about source code. Commented Mar 16, 2015 at 14:01

1 Answer 1

0

Those weird strings are called html entities. You can decode them as described by this URL: Decode HTML entities in Python string?. It says to use the function unescape from the module html.parse

answered Mar 16, 2015 at 4:28
Sign up to request clarification or add additional context in comments.

Comments

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.