Using a regular expression to replace upper case repeated letters in python with a single lowercase letter

Asked 15 years, 1 month ago

Viewed 53k times

I am trying to replace any instances of uppercase letters that repeat themselves twice in a string with a single instance of that letter in a lower case. I am using the following regular expression and it is able to match the repeated upper case letters, but I am unsure as how to make the letter that is being replaced lower case.

import re
s = 'start TT end'
re.sub(r'([A-Z]){2}', r"1円", s)
>>> 'start T end'

How can I make the "1円" lower case? Should I not be using a regular expression to do this?

Improve this question

asked Nov 10, 2010 at 14:21

ajt's user avatar

ajt

1,3813 gold badges13 silver badges30 bronze badges

Don't know how to make it lowercase, but your should use '([A-Z]){2,}' instead of '([A-Z]){2}' to replace any instances.

khachik
– khachik

2010年11月10日 14:25:26 +00:00
Commented Nov 10, 2010 at 14:25
Your regex also matches two different caps.

Sven Marnach
– Sven Marnach

2010年11月10日 14:26:21 +00:00
Commented Nov 10, 2010 at 14:26

Add a comment |

7 Answers 7

Sorted by: Reset to default

Pass a function as the repl argument. The MatchObject is passed to this function and .group(1) gives the first parenthesized subgroup:

import re
s = 'start TT end'
callback = lambda pat: pat.group(1).lower()
re.sub(r'([A-Z]){2}', callback, s)

EDIT
And yes, you should use ([A-Z])1円 instead of ([A-Z]){2} in order to not match e.g. AZ. (See @bobince's answer.)

import re
s = 'start TT end'
re.sub(r'([A-Z])1円', lambda pat: pat.group(1).lower(), s) # Inline

Gives:

'start t end'

Improve this answer

edited May 23, 2017 at 12:17

Community's user avatar

Community Bot

11 silver badge

answered Nov 10, 2010 at 14:27

jensgram's user avatar

jensgram

31.6k6 gold badges83 silver badges101 bronze badges

Comments

You can't change case in a replacement string. You would need a replacement function:

>>> def replacement(match):
... return match.group(1).lower()
... 
>>> re.sub(r'([A-Z])1円', replacement, 'start TT end')
'start t end'

Improve this answer

answered Nov 10, 2010 at 14:29

bobince's user avatar

bobince

538k111 gold badges675 silver badges846 bronze badges

Comments

def replace(s):
 return " ".join(re.findall(r"[A-Z]){2}", s)).lower()

I guess this is what you are looking for.

Improve this answer

edited Jun 20, 2020 at 7:18

answered Jun 16, 2020 at 9:44

Akash g krishnan's user avatar

Akash g krishnan

5346 silver badges17 bronze badges

Comments

You can do it with a regular expression, just pass a function as the replacement like the docs say. The problem is your pattern.

As it is, your pattern matches runs of any two capital letters. I'll leave the actual pattern to you, but it starts with AA|BB|CC|.

Improve this answer

answered Nov 10, 2010 at 14:27

Ignacio Vazquez-Abrams's user avatar

Ignacio Vazquez-Abrams

804k160 gold badges1.4k silver badges1.4k bronze badges

Comments

The 'repl' parameter that identifies the replacement can be either a string (as you have it here) or a function. This will do what you wish:

import re
def toLowercase(matchobj):
 return matchobj.group(1).lower()
s = 'start TT end'
re.sub(r'([A-Z]){2}', toLowercase, s)
>>> 'start t end'

Improve this answer

answered Nov 10, 2010 at 14:30

bgporter's user avatar

bgporter

36.9k8 gold badges65 silver badges67 bronze badges

Comments

Try this:

def tol(m):
 return m.group(0)[0].lower()
s = 'start TTT AAA end'
re.sub(r'([A-Z]){2,}', tol, s)

Note that this doesn't replace singe upper letters. If you want to do it, use r'([A-Z]){1,}'.

Improve this answer

edited Nov 10, 2010 at 14:35

SilentGhost's user avatar

SilentGhost

322k67 gold badges312 silver badges294 bronze badges

answered Nov 10, 2010 at 14:34

khachik's user avatar

khachik

28.8k10 gold badges63 silver badges98 bronze badges

4 Comments

SilentGhost

SilentGhost Over a year ago

OP says: repeat themselves twice

2010年11月10日T14:35:20.927Z+00:00

khachik

khachik Over a year ago

@SilentGhost. My fault. the re should be as suggested by Ignacio, if single upper characters shouldn't be touched.

2010年11月10日T14:40:13.267Z+00:00

SilentGhost

SilentGhost Over a year ago

if you look and bobince's and jens's answers you see the shorter way to do this.

2010年11月10日T14:44:26.887Z+00:00

khachik

khachik Over a year ago

I see, thanks. The question about single upper chars is still open.

2010年11月10日T14:56:20.813Z+00:00

WARNING! This post has no re as requested. Continue with your own responsibility!

I do not know how possible are corner cases but this is how normal Python does my naive coding.

import string
s = 'start TT end AAA BBBBBBB'
for c in string.uppercase:
 s = s.replace(c+c,c.lower())
print s
""" Output:
start t end aA bbbB
"""

Improve this answer

answered Nov 10, 2010 at 15:55

Tony Veijalainen's user avatar

Tony Veijalainen

5,56525 silver badges33 bronze badges

Comments

Your Answer

Draft saved

Draft discarded

Sign up or log in

Post as a guest

Name

Required, but never shown

Post as a guest

Name

Required, but never shown

By clicking "Post Your Answer", you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.

lang-py

CollectivesTM on Stack Overflow

Using a regular expression to replace upper case repeated letters in python with a single lowercase letter

7 Answers 7

Comments

Comments

Comments

Comments

Comments

4 Comments

Comments

Your Answer

Sign up or log in

Post as a guest

Post as a guest

Linked

Hot Network Questions

CollectivesTM on Stack Overflow

7 Answers 7

Comments

Comments

Comments

Comments

Comments

4 Comments

Comments

Your Answer

Sign up or log in

Post as a guest

Post as a guest

Linked

Related