5

I want to convert, in python, special characters like "%$!&@á é ©" and not only '<&">' as all the documentation and references I've found so far shows. cgi.escape doesn't solve the problem.

For example, the string "á ê ĩ &" should be converted to "&aacute; &ecirc; &itilde; &amp;".

Does anyboy know how to solve it? I'm using python 2.6.

joshua
2,3692 gold badges29 silver badges59 bronze badges
asked Mar 8, 2012 at 11:27
4
  • 2
    Be aware of two things: (1) names entites may cause problems, you should probably use numeric entities instead. (2) Why use entities at all? In most case, a better solution is to UTF-8-encode the document so that it can contain the letters, and not use entities. Commented Mar 8, 2012 at 11:30
  • 1
    wiki.python.org/moin/EscapingHtml Commented Mar 8, 2012 at 11:32
  • I agree with you @KonradRudolph. I don't like using entities, but the system in which I'm working uses, so I have no choice. =/ Commented Mar 8, 2012 at 11:35
  • 1
    @Jayme No problem, sometimes you have no choice. Just wanted to make sure you were aware of this. Commented Mar 8, 2012 at 11:38

2 Answers 2

7

You could build your own loop using the dictionaries you can find in http://docs.python.org/library/htmllib.html#module-htmlentitydefs

The one you're looking for is htmlentitydefs.codepoint2name

answered Mar 8, 2012 at 11:30
Sign up to request clarification or add additional context in comments.

1 Comment

The link is no longer working. Use HTMLParser instead in Python 2, and the equivalent, html.parser, in Python 3.
5

I found a built in solution searching for the htmlentitydefs.codepoint2name that @Ruben Vermeersch said in his answer. The solution was found here: http://bytes.com/topic/python/answers/594350-convert-unicode-chars-html-entities

Here's the function:

def htmlescape(text):
 text = (text).decode('utf-8')
 from htmlentitydefs import codepoint2name
 d = dict((unichr(code), u'&%s;' % name) for code,name in codepoint2name.iteritems() if code!=38) # exclude "&" 
 if u"&" in text:
 text = text.replace(u"&", u"&amp;")
 for key, value in d.iteritems():
 if key in text:
 text = text.replace(key, value)
 return text

Thank you all for helping! ;)

answered Mar 8, 2012 at 11:46

Comments

Your Answer

Draft saved
Draft discarded

Sign up or log in

Sign up using Google
Sign up using Email and Password

Post as a guest

Required, but never shown

Post as a guest

Required, but never shown

By clicking "Post Your Answer", you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.