0

Input text:

Ell és la víctima que expia els nostres pecats, i no tan sols els nostres, sinó els del món sencer. 

Expected output:

Ell és la víctima que expia els nostres pecats, i no tan sols els nostres, sinó els del món sencer.

Known facts: unichr(233)=é

for now i have

re.sub('&#([^;]*);', r'unichr(int(1円))', inputtext, flags=re.UNICODE)

and of course is not working, don't know how to pass function on 1円

Any idea?

Aran-Fey
44.1k13 gold badges113 silver badges161 bronze badges
asked Jan 13, 2015 at 0:21

2 Answers 2

5

Use a lambda function:

re.sub('&#([^;]*);', lambda match: unichr(int(match.group(1))), t, flags=re.UNICODE)
answered Jan 13, 2015 at 0:25
Sign up to request clarification or add additional context in comments.

1 Comment

This was very speedy @rawing, let me check
4

Fortunately for you, re.sub accepts a function as an argument as well. The function will recieve a "MatchObject" -- From there, you can get the matched groups by match.group(1), match.group(2), etc. etc. The return value of the function will be the string to replace the matched group in the input text.

def fn(match):
 return unichr(int(match.group(1)))
re.sub('&#([^;]*);', fn, inputtext, flags=re.UNICODE)

If you really want, you can inline this and use a lambda -- But I think lambda makes it harder to read in this case1.


By the way, depending on your python version, there are better ways to un-escape html (as it will also handle the special escape sequences like '&':

Python2.x

>>> import HTMLParser
>>> s = 'Ell és la víctima que expia els nostres pecats, i no tan sols els nostres, sinó els del món sencer.'
>>> print HTMLParser.HTMLParser().unescape(s)
Ell és la víctima que expia els nostres pecats, i no tan sols els nostres, sinó els del món sencer.

Python3.x

>>> import html
>>> html.unescape(s)

reference

1especially if you give fn a more sensible name ;-)

answered Jan 13, 2015 at 0:26

4 Comments

will check it @mgilson, tnx for quick response
Yay for def, boo for lambda!-)
@josifoski -- I realized that it looks like you're formating HTML strings. If so, there's a better way -- that doesn't involve regex on your part :-). See my update.
@mgilson tnx, much better way, yes i want to make 'readable' html text

Your Answer

Draft saved
Draft discarded

Sign up or log in

Sign up using Google
Sign up using Email and Password

Post as a guest

Required, but never shown

Post as a guest

Required, but never shown

By clicking "Post Your Answer", you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.