1. Home
2. Questions
3. AI Assist
4. Tags
5. Challenges
6. Chat
7. Articles
8. Users
9. Companies
11. Communities for your favorite technologies. Explore all Collectives
Stack Internal

Stack Overflow for Teams is now called Stack Internal. Bring the best of human thought and AI automation together at your work.
Try for free Learn more
Bring the best of human thought and AI automation together at your work. Learn more

python re.sub with variable

Asked 10 years, 11 months ago

Viewed 907 times

Input text:

Ell &#233;s la v&#237;ctima que expia els nostres pecats, i no tan sols els nostres, sin&#243; els del m&#243;n sencer.

Expected output:

Ell és la víctima que expia els nostres pecats, i no tan sols els nostres, sinó els del món sencer.

Known facts: unichr(233)=é

for now i have

re.sub('&#([^;]*);', r'unichr(int(1円))', inputtext, flags=re.UNICODE)

and of course is not working, don't know how to pass function on 1円

Any idea?

Improve this question

edited Jan 13, 2015 at 0:23

Aran-Fey's user avatar

Aran-Fey

44.1k13 gold badges113 silver badges161 bronze badges

asked Jan 13, 2015 at 0:21

josifoski's user avatar

josifoski

1,7261 gold badge14 silver badges20 bronze badges

Add a comment |

2 Answers 2

Sorted by: Reset to default

Use a lambda function:

re.sub('&#([^;]*);', lambda match: unichr(int(match.group(1))), t, flags=re.UNICODE)

Improve this answer

answered Jan 13, 2015 at 0:25

Aran-Fey's user avatar

Aran-Fey

44.1k13 gold badges113 silver badges161 bronze badges

1 Comment

josifoski

josifoski Over a year ago

This was very speedy @rawing, let me check

2015年01月13日T00:26:55.833Z+00:00

Fortunately for you, re.sub accepts a function as an argument as well. The function will recieve a "MatchObject" -- From there, you can get the matched groups by match.group(1), match.group(2), etc. etc. The return value of the function will be the string to replace the matched group in the input text.

def fn(match):
 return unichr(int(match.group(1)))
re.sub('&#([^;]*);', fn, inputtext, flags=re.UNICODE)

If you really want, you can inline this and use a lambda -- But I think lambda makes it harder to read in this case¹.

By the way, depending on your python version, there are better ways to un-escape html (as it will also handle the special escape sequences like '&':

Python2.x

>>> import HTMLParser
>>> s = 'Ell &#233;s la v&#237;ctima que expia els nostres pecats, i no tan sols els nostres, sin&#243; els del m&#243;n sencer.'
>>> print HTMLParser.HTMLParser().unescape(s)
Ell és la víctima que expia els nostres pecats, i no tan sols els nostres, sinó els del món sencer.

Python3.x

>>> import html
>>> html.unescape(s)

reference

^{¹especially if you give fn a more sensible name ;-)}

Improve this answer

edited May 23, 2017 at 11:58

Community's user avatar

Community Bot

11 silver badge

answered Jan 13, 2015 at 0:26

mgilson's user avatar

mgilson

312k70 gold badges658 silver badges723 bronze badges

4 Comments

josifoski

josifoski Over a year ago

will check it @mgilson, tnx for quick response

2015年01月13日T00:28:05.297Z+00:00

Alex Martelli

Alex Martelli Over a year ago

Yay for def, boo for lambda!-)

2015年01月13日T00:29:37.733Z+00:00

mgilson

mgilson Over a year ago

@josifoski -- I realized that it looks like you're formating HTML strings. If so, there's a better way -- that doesn't involve regex on your part :-). See my update.

2015年01月13日T00:40:18.417Z+00:00

josifoski

josifoski Over a year ago

@mgilson tnx, much better way, yes i want to make 'readable' html text

2015年01月13日T00:42:12.163Z+00:00

Your Answer

Draft saved

Draft discarded

Sign up or log in

Post as a guest

Name

Required, but never shown

Post as a guest

Name

Required, but never shown

By clicking "Post Your Answer", you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.

lang-py

CollectivesTM on Stack Overflow

python re.sub with variable

2 Answers 2

1 Comment

Python2.x

Python3.x

4 Comments

Your Answer

Sign up or log in

Post as a guest

Post as a guest

Linked

Hot Network Questions

CollectivesTM on Stack Overflow

2 Answers 2

1 Comment

Python2.x

Python3.x

4 Comments

Your Answer

Sign up or log in

Post as a guest

Post as a guest

Linked

Related