42

I am trying to replace any instances of uppercase letters that repeat themselves twice in a string with a single instance of that letter in a lower case. I am using the following regular expression and it is able to match the repeated upper case letters, but I am unsure as how to make the letter that is being replaced lower case.

import re
s = 'start TT end'
re.sub(r'([A-Z]){2}', r"1円", s)
>>> 'start T end'

How can I make the "1円" lower case? Should I not be using a regular expression to do this?

asked Nov 10, 2010 at 14:21
2
  • Don't know how to make it lowercase, but your should use '([A-Z]){2,}' instead of '([A-Z]){2}' to replace any instances. Commented Nov 10, 2010 at 14:25
  • Your regex also matches two different caps. Commented Nov 10, 2010 at 14:26

7 Answers 7

63

Pass a function as the repl argument. The MatchObject is passed to this function and .group(1) gives the first parenthesized subgroup:

import re
s = 'start TT end'
callback = lambda pat: pat.group(1).lower()
re.sub(r'([A-Z]){2}', callback, s)

EDIT
And yes, you should use ([A-Z])1円 instead of ([A-Z]){2} in order to not match e.g. AZ. (See @bobince's answer.)

import re
s = 'start TT end'
re.sub(r'([A-Z])1円', lambda pat: pat.group(1).lower(), s) # Inline

Gives:

'start t end'
answered Nov 10, 2010 at 14:27
Sign up to request clarification or add additional context in comments.

Comments

10

You can't change case in a replacement string. You would need a replacement function:

>>> def replacement(match):
... return match.group(1).lower()
... 
>>> re.sub(r'([A-Z])1円', replacement, 'start TT end')
'start t end'
answered Nov 10, 2010 at 14:29

Comments

1
def replace(s):
 return " ".join(re.findall(r"[A-Z]){2}", s)).lower()

I guess this is what you are looking for.

answered Jun 16, 2020 at 9:44

Comments

0

You can do it with a regular expression, just pass a function as the replacement like the docs say. The problem is your pattern.

As it is, your pattern matches runs of any two capital letters. I'll leave the actual pattern to you, but it starts with AA|BB|CC|.

answered Nov 10, 2010 at 14:27

Comments

0

The 'repl' parameter that identifies the replacement can be either a string (as you have it here) or a function. This will do what you wish:

import re
def toLowercase(matchobj):
 return matchobj.group(1).lower()
s = 'start TT end'
re.sub(r'([A-Z]){2}', toLowercase, s)
>>> 'start t end'
answered Nov 10, 2010 at 14:30

Comments

0

Try this:

def tol(m):
 return m.group(0)[0].lower()
s = 'start TTT AAA end'
re.sub(r'([A-Z]){2,}', tol, s)

Note that this doesn't replace singe upper letters. If you want to do it, use r'([A-Z]){1,}'.

SilentGhost
322k67 gold badges312 silver badges294 bronze badges
answered Nov 10, 2010 at 14:34

4 Comments

OP says: repeat themselves twice
@SilentGhost. My fault. the re should be as suggested by Ignacio, if single upper characters shouldn't be touched.
if you look and bobince's and jens's answers you see the shorter way to do this.
I see, thanks. The question about single upper chars is still open.
0

WARNING! This post has no re as requested. Continue with your own responsibility!

I do not know how possible are corner cases but this is how normal Python does my naive coding.

import string
s = 'start TT end AAA BBBBBBB'
for c in string.uppercase:
 s = s.replace(c+c,c.lower())
print s
""" Output:
start t end aA bbbB
"""
answered Nov 10, 2010 at 15:55

Comments

Your Answer

Draft saved
Draft discarded

Sign up or log in

Sign up using Google
Sign up using Email and Password

Post as a guest

Required, but never shown

Post as a guest

Required, but never shown

By clicking "Post Your Answer", you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.