I need to replace some characters as follows: &
➔ \&
, #
➔ \#
, ...
I coded as follows, but I guess there should be some better way. Any hints?
strs = strs.replace('&', '\&')
strs = strs.replace('#', '\#')
...
-
4See stackoverflow.com/questions/3367809/… and stackoverflow.com/questions/3411006/…Tim McNamara– Tim McNamara2010年08月05日 04:35:46 +00:00Commented Aug 5, 2010 at 4:35
17 Answers 17
Replacing two characters
I timed all the methods in the current answers along with one extra.
With an input string of abc&def#ghi
and replacing & -> \& and # -> \#, the fastest way was to chain together the replacements like this: text.replace('&', '\&').replace('#', '\#')
.
Timings for each function:
- a) 1000000 loops, best of 3: 1.47 μs per loop
- b) 1000000 loops, best of 3: 1.51 μs per loop
- c) 100000 loops, best of 3: 12.3 μs per loop
- d) 100000 loops, best of 3: 12 μs per loop
- e) 100000 loops, best of 3: 3.27 μs per loop
- f) 1000000 loops, best of 3: 0.817 μs per loop
- g) 100000 loops, best of 3: 3.64 μs per loop
- h) 1000000 loops, best of 3: 0.927 μs per loop
- i) 1000000 loops, best of 3: 0.814 μs per loop
Here are the functions:
def a(text):
chars = "&#"
for c in chars:
text = text.replace(c, "\\" + c)
def b(text):
for ch in ['&','#']:
if ch in text:
text = text.replace(ch,"\\"+ch)
import re
def c(text):
rx = re.compile('([&#])')
text = rx.sub(r'\\1円', text)
RX = re.compile('([&#])')
def d(text):
text = RX.sub(r'\\1円', text)
def mk_esc(esc_chars):
return lambda s: ''.join(['\\' + c if c in esc_chars else c for c in s])
esc = mk_esc('&#')
def e(text):
esc(text)
def f(text):
text = text.replace('&', '\&').replace('#', '\#')
def g(text):
replacements = {"&": "\&", "#": "\#"}
text = "".join([replacements.get(c, c) for c in text])
def h(text):
text = text.replace('&', r'\&')
text = text.replace('#', r'\#')
def i(text):
text = text.replace('&', r'\&').replace('#', r'\#')
Timed like this:
python -mtimeit -s"import time_functions" "time_functions.a('abc&def#ghi')"
python -mtimeit -s"import time_functions" "time_functions.b('abc&def#ghi')"
python -mtimeit -s"import time_functions" "time_functions.c('abc&def#ghi')"
python -mtimeit -s"import time_functions" "time_functions.d('abc&def#ghi')"
python -mtimeit -s"import time_functions" "time_functions.e('abc&def#ghi')"
python -mtimeit -s"import time_functions" "time_functions.f('abc&def#ghi')"
python -mtimeit -s"import time_functions" "time_functions.g('abc&def#ghi')"
python -mtimeit -s"import time_functions" "time_functions.h('abc&def#ghi')"
python -mtimeit -s"import time_functions" "time_functions.i('abc&def#ghi')"
Replacing 17 characters
Here's similar code to do the same but with more characters to escape (\`*_{}>#+-.!$):
def a(text):
chars = "\\`*_{}[]()>#+-.!$"
for c in chars:
text = text.replace(c, "\\" + c)
def b(text):
for ch in ['\\','`','*','_','{','}','[',']','(',')','>','#','+','-','.','!','$','\'']:
if ch in text:
text = text.replace(ch,"\\"+ch)
import re
def c(text):
rx = re.compile('([&#])')
text = rx.sub(r'\\1円', text)
RX = re.compile('([\\`*_{}[]()>#+-.!$])')
def d(text):
text = RX.sub(r'\\1円', text)
def mk_esc(esc_chars):
return lambda s: ''.join(['\\' + c if c in esc_chars else c for c in s])
esc = mk_esc('\\`*_{}[]()>#+-.!$')
def e(text):
esc(text)
def f(text):
text = text.replace('\\', '\\\\').replace('`', '\`').replace('*', '\*').replace('_', '\_').replace('{', '\{').replace('}', '\}').replace('[', '\[').replace(']', '\]').replace('(', '\(').replace(')', '\)').replace('>', '\>').replace('#', '\#').replace('+', '\+').replace('-', '\-').replace('.', '\.').replace('!', '\!').replace('$', '\$')
def g(text):
replacements = {
"\\": "\\\\",
"`": "\`",
"*": "\*",
"_": "\_",
"{": "\{",
"}": "\}",
"[": "\[",
"]": "\]",
"(": "\(",
")": "\)",
">": "\>",
"#": "\#",
"+": "\+",
"-": "\-",
".": "\.",
"!": "\!",
"$": "\$",
}
text = "".join([replacements.get(c, c) for c in text])
def h(text):
text = text.replace('\\', r'\\')
text = text.replace('`', r'\`')
text = text.replace('*', r'\*')
text = text.replace('_', r'\_')
text = text.replace('{', r'\{')
text = text.replace('}', r'\}')
text = text.replace('[', r'\[')
text = text.replace(']', r'\]')
text = text.replace('(', r'\(')
text = text.replace(')', r'\)')
text = text.replace('>', r'\>')
text = text.replace('#', r'\#')
text = text.replace('+', r'\+')
text = text.replace('-', r'\-')
text = text.replace('.', r'\.')
text = text.replace('!', r'\!')
text = text.replace('$', r'\$')
def i(text):
text = text.replace('\\', r'\\').replace('`', r'\`').replace('*', r'\*').replace('_', r'\_').replace('{', r'\{').replace('}', r'\}').replace('[', r'\[').replace(']', r'\]').replace('(', r'\(').replace(')', r'\)').replace('>', r'\>').replace('#', r'\#').replace('+', r'\+').replace('-', r'\-').replace('.', r'\.').replace('!', r'\!').replace('$', r'\$')
Here's the results for the same input string abc&def#ghi
:
- a) 100000 loops, best of 3: 6.72 μs per loop
- b) 100000 loops, best of 3: 2.64 μs per loop
- c) 100000 loops, best of 3: 11.9 μs per loop
- d) 100000 loops, best of 3: 4.92 μs per loop
- e) 100000 loops, best of 3: 2.96 μs per loop
- f) 100000 loops, best of 3: 4.29 μs per loop
- g) 100000 loops, best of 3: 4.68 μs per loop
- h) 100000 loops, best of 3: 4.73 μs per loop
- i) 100000 loops, best of 3: 4.24 μs per loop
And with a longer input string (## *Something* and [another] thing in a longer sentence with {more} things to replace$
):
- a) 100000 loops, best of 3: 7.59 μs per loop
- b) 100000 loops, best of 3: 6.54 μs per loop
- c) 100000 loops, best of 3: 16.9 μs per loop
- d) 100000 loops, best of 3: 7.29 μs per loop
- e) 100000 loops, best of 3: 12.2 μs per loop
- f) 100000 loops, best of 3: 5.38 μs per loop
- g) 10000 loops, best of 3: 21.7 μs per loop
- h) 100000 loops, best of 3: 5.7 μs per loop
- i) 100000 loops, best of 3: 5.13 μs per loop
Adding a couple of variants:
def ab(text):
for ch in ['\\','`','*','_','{','}','[',']','(',')','>','#','+','-','.','!','$','\'']:
text = text.replace(ch,"\\"+ch)
def ba(text):
chars = "\\`*_{}[]()>#+-.!$"
for c in chars:
if c in text:
text = text.replace(c, "\\" + c)
With the shorter input:
- ab) 100000 loops, best of 3: 7.05 μs per loop
- ba) 100000 loops, best of 3: 2.4 μs per loop
With the longer input:
- ab) 100000 loops, best of 3: 7.71 μs per loop
- ba) 100000 loops, best of 3: 6.08 μs per loop
So I'm going to use ba
for readability and speed.
Addendum
Prompted by haccks in the comments, one difference between ab
and ba
is the if c in text:
check. Let's test them against two more variants:
def ab_with_check(text):
for ch in ['\\','`','*','_','{','}','[',']','(',')','>','#','+','-','.','!','$','\'']:
if ch in text:
text = text.replace(ch,"\\"+ch)
def ba_without_check(text):
chars = "\\`*_{}[]()>#+-.!$"
for c in chars:
text = text.replace(c, "\\" + c)
Times in μs per loop on Python 2.7.14 and 3.6.3, and on a different machine from the earlier set, so cannot be compared directly.
╭────────────╥──────┬───────────────┬──────┬──────────────────╮
│ Py, input ║ ab │ ab_with_check │ ba │ ba_without_check │
╞════════════╬══════╪═══════════════╪══════╪══════════════════╡
│ Py2, short ║ 8.81 │ 4.22 │ 3.45 │ 8.01 │
│ Py3, short ║ 5.54 │ 1.34 │ 1.46 │ 5.34 │
├────────────╫──────┼───────────────┼──────┼──────────────────┤
│ Py2, long ║ 9.3 │ 7.15 │ 6.85 │ 8.55 │
│ Py3, long ║ 7.43 │ 4.38 │ 4.41 │ 7.02 │
└────────────╨──────┴───────────────┴──────┴──────────────────┘
We can conclude that:
Those with the check are up to 4x faster than those without the check
ab_with_check
is slightly in the lead on Python 3, butba
(with check) has a greater lead on Python 2However, the biggest lesson here is Python 3 is up to 3x faster than Python 2! There's not a huge difference between the slowest on Python 3 and fastest on Python 2!
-
Is
if c in text:
necessary inba
?haccks– haccks2017年10月28日 07:08:25 +00:00Commented Oct 28, 2017 at 7:08 -
@haccks It's not necessary, but it's 2-3x quicker with it. Short string, with:
1.45 usec per loop
, and without:5.3 usec per loop
, Long string, with:4.38 usec per loop
and without:7.03 usec per loop
. (Note these aren't directly comparable with the results above, because it's a different machine etc.)Hugo– Hugo2017年10月28日 14:00:54 +00:00Commented Oct 28, 2017 at 14:00 -
1@Hugo; I think this difference in time is because of
replace
is called only whenc
is found intext
in case ofba
while it is called in every iteration inab
.haccks– haccks2017年10月28日 14:28:31 +00:00Commented Oct 28, 2017 at 14:28 -
2@haccks Thanks, I've updated my answer with further timings: adding the check is better for both, but the biggest lesson is Python 3 is up to 3x faster!Hugo– Hugo2017年10月29日 07:03:22 +00:00Commented Oct 29, 2017 at 7:03
-
1@Hugo: i.pinimg.com/originals/ed/55/82/…The Wanderer– The Wanderer2020年08月31日 20:57:56 +00:00Commented Aug 31, 2020 at 20:57
Here is a python3 method using str.translate
and str.maketrans
:
s = "abc&def#ghi"
print(s.translate(str.maketrans({'&': '\&', '#': '\#'})))
The printed string is abc\&def\#ghi
.
-
12This is a good answer, but in practice doing one
.translate()
appears to be slower than three chained.replace()
(using CPython 3.6.4).Changaco– Changaco2018年02月16日 11:53:56 +00:00Commented Feb 16, 2018 at 11:53 -
1@Changaco Thanks for timing it 👍 In practice I would use
replace()
myself, but I added this answer for the sake of completeness.tommy.carstensen– tommy.carstensen2018年04月12日 04:30:28 +00:00Commented Apr 12, 2018 at 4:30 -
For large strings and many replacements this should be faster, though some testing would be nice...Graipher– Graipher2018年11月07日 10:29:08 +00:00Commented Nov 7, 2018 at 10:29
-
6This method allows to perform "clobbering replacements" that the chained versions do not. E.g., replace "a" with "b" and "b" with "a".adavid– adavid2020年02月18日 07:01:18 +00:00Commented Feb 18, 2020 at 7:01
-
1@parity3 Since 3.6 invalid escape sequence generate a DeprecationWarning. Since 3.12 it is a SyntaxWarning. In an unspecified future version it will be a SyntaxError. docs.python.org/3.12/whatsnew/3.12.html#other-language-changesJolbas– Jolbas2024年12月17日 07:45:36 +00:00Commented Dec 17, 2024 at 7:45
>>> string="abc&def#ghi"
>>> for ch in ['&','#']:
... if ch in string:
... string=string.replace(ch,"\\"+ch)
...
>>> print string
abc\&def\#ghi
-
Why was a double backslash needed? Why doesn't just "\" work?axolotl– axolotl2016年06月17日 06:38:01 +00:00Commented Jun 17, 2016 at 6:38
-
4The double backslash escapes the backslash, otherwise python would interpret "\" as a literal quotation character within a still-open string.Riet– Riet2016年07月21日 12:55:01 +00:00Commented Jul 21, 2016 at 12:55
-
1@MattSom replace() doesn't modify the original string, but returns a copy. So you need the assignment for the code to have any effect.user1502657– user15026572017年04月19日 23:16:18 +00:00Commented Apr 19, 2017 at 23:16
-
1why are you using
string
as a variable name?Mick_– Mick_2018年04月02日 21:32:50 +00:00Commented Apr 2, 2018 at 21:32 -
8Do you really need the if? It looks like a duplication of what the replace will be doing anyway.lorenzo– lorenzo2018年06月30日 11:33:04 +00:00Commented Jun 30, 2018 at 11:33
Simply chain the replace
functions like this
strs = "abc&def#ghi"
print strs.replace('&', '\&').replace('#', '\#')
# abc\&def\#ghi
If the replacements are going to be more in number, you can do this in this generic way
strs, replacements = "abc&def#ghi", {"&": "\&", "#": "\#"}
print "".join([replacements.get(c, c) for c in strs])
# abc\&def\#ghi
Late to the party, but I lost a lot of time with this issue until I found my answer.
Short and sweet, translate
is superior to replace
. If you're more interested in funcionality over time optimization, do not use replace
.
Also use translate
if you don't know if the set of characters to be replaced overlaps the set of characters used to replace.
Case in point:
Using replace
you would naively expect the snippet "1234".replace("1", "2").replace("2", "3").replace("3", "4")
to return "2344"
, but it will return in fact "4444"
.
Translation seems to perform what OP originally desired.
Are you always going to prepend a backslash? If so, try
import re
rx = re.compile('([&#])')
# ^^ fill in the characters here.
strs = rx.sub('\\\\\1円', strs)
It may not be the most efficient method but I think it is the easiest.
For Python 3.8 and above, one can use assignment expressions
[text := text.replace(s, f"\\{s}") for s in "&#" if s in text];
Although, I am quite unsure if this would be considered "appropriate use" of assignment expressions as described in PEP 572, but looks clean and reads quite well (to my eyes). The semicolon at the end suppresses output if you run this in a REPL.
This would be "appropriate" if you wanted all intermediate strings as well. For example, (removing all lowercase vowels):
text = "Lorem ipsum dolor sit amet"
intermediates = [text := text.replace(i, "") for i in "aeiou" if i in text]
['Lorem ipsum dolor sit met',
'Lorm ipsum dolor sit mt',
'Lorm psum dolor st mt',
'Lrm psum dlr st mt',
'Lrm psm dlr st mt']
On the plus side, it does seem (unexpectedly?) faster than some of the faster methods in the accepted answer, and seems to perform nicely with both increasing strings length and an increasing number of substitutions.
The code for the above comparison is below. I am using random strings to make my life a bit simpler, and the characters to replace are chosen randomly from the string itself. (Note: I am using ipython's %timeit magic here, so run this in ipython/jupyter).
import random, string
def make_txt(length):
"makes a random string of a given length"
return "".join(random.choices(string.printable, k=length))
def get_substring(s, num):
"gets a substring"
return "".join(random.choices(s, k=num))
def a(text, replace): # one of the better performing approaches from the accepted answer
for i in replace:
if i in text:
text = text.replace(i, "")
def b(text, replace):
_ = (text := text.replace(i, "") for i in replace if i in text)
def compare(strlen, replace_length):
"use ipython / jupyter for the %timeit functionality"
times_a, times_b = [], []
for i in range(*strlen):
el = make_txt(i)
et = get_substring(el, replace_length)
res_a = %timeit -n 1000 -o a(el, et) # ipython magic
el = make_txt(i)
et = get_substring(el, replace_length)
res_b = %timeit -n 1000 -o b(el, et) # ipython magic
times_a.append(res_a.average * 1e6)
times_b.append(res_b.average * 1e6)
return times_a, times_b
#----run
t2 = compare((2*2, 1000, 50), 2)
t10 = compare((2*10, 1000, 50), 10)
-
Actually, my code showed that stacked replace functions are faster. i.sstatic.net/6sel0.pngPeyman– Peyman2022年12月11日 02:47:28 +00:00Commented Dec 11, 2022 at 2:47
-
Interesting. Which Python version is this? And can you share the comparison code?krm– krm2022年12月12日 05:17:21 +00:00Commented Dec 12, 2022 at 5:17
-
It's Python 3.8. Here it is. Should be run in Jupyter notebook. gist.github.com/kiasar/0c1bfcff7646a78b15268a5345d3faaaPeyman– Peyman2022年12月12日 05:50:04 +00:00Commented Dec 12, 2022 at 5:50
-
Ah! The difference seems to be the use of a loop for replacing items in a list instead of manually chaining each item to be replaced. I guess if the items to be replaced never change, then chaining multiple replace calls (your way) is the the fastest way to do this. @Peymankrm– krm2022年12月30日 10:45:01 +00:00Commented Dec 30, 2022 at 10:45
You may consider writing a generic escape function:
def mk_esc(esc_chars):
return lambda s: ''.join(['\\' + c if c in esc_chars else c for c in s])
>>> esc = mk_esc('&#')
>>> print esc('Learn & be #1')
Learn \& be \#1
This way you can make your function configurable with a list of character that should be escaped.
FYI, this is of little or no use to the OP but it may be of use to other readers (please do not downvote, I'm aware of this).
As a somewhat ridiculous but interesting exercise, wanted to see if I could use python functional programming to replace multiple chars. I'm pretty sure this does NOT beat just calling replace() twice. And if performance was an issue, you could easily beat this in rust, C, julia, perl, java, javascript and maybe even awk. It uses an external 'helpers' package called pytoolz, accelerated via cython (cytoolz, it's a pypi package).
from cytoolz.functoolz import compose
from cytoolz.itertoolz import chain,sliding_window
from itertools import starmap,imap,ifilter
from operator import itemgetter,contains
text='&hello#hi&yo&'
char_index_iter=compose(partial(imap, itemgetter(0)), partial(ifilter, compose(partial(contains, '#&'), itemgetter(1))), enumerate)
print '\\'.join(imap(text.__getitem__, starmap(slice, sliding_window(2, chain((0,), char_index_iter(text), (len(text),))))))
I'm not even going to explain this because no one would bother using this to accomplish multiple replace. Nevertheless, I felt somewhat accomplished in doing this and thought it might inspire other readers or win a code obfuscation contest.
-
1"functional programming" doesn't mean "using as many functions as possible", you know.Craig Andrews– Craig Andrews2017年12月12日 11:01:56 +00:00Commented Dec 12, 2017 at 11:01
-
1This is a perfectly good, pure functional multi-char replacer: gist.github.com/anonymous/4577424f586173fc6b91a215ea2ce89e No allocations, no mutations, no side effects. Readable, too.Craig Andrews– Craig Andrews2017年12月12日 11:21:35 +00:00Commented Dec 12, 2017 at 11:21
How about this?
def replace_all(dict, str):
for key in dict:
str = str.replace(key, dict[key])
return str
then
print(replace_all({"&":"\&", "#":"\#"}, "&#"))
output
\&\#
similar to answer
Using reduce which is available in python2.7 and python3.* you can easily replace mutiple substrings in a clean and pythonic way.
# Lets define a helper method to make it easy to use
def replacer(text, replacements):
return reduce(
lambda text, ptuple: text.replace(ptuple[0], ptuple[1]),
replacements, text
)
if __name__ == '__main__':
uncleaned_str = "abc&def#ghi"
cleaned_str = replacer(uncleaned_str, [("&","\&"),("#","\#")])
print(cleaned_str) # "abc\&def\#ghi"
In python2.7 you don't have to import reduce but in python3.* you have to import it from the functools module.
-
To add the 'if' condition (variant
ba
mentioned by Hugo):lambda text, ptuple: text.replace(ptuple[0], ptuple[1]) if ptuple[0] in text else text
Jean Monet– Jean Monet2020年07月28日 12:03:24 +00:00Commented Jul 28, 2020 at 12:03
advanced way using regex
import re
text = "hello ,world!"
replaces = {"hello": "hi", "world":" 2020", "!":"."}
regex = re.sub("|".join(replaces.keys()), lambda match: replaces[match.string[match.start():match.end()]], text)
print(regex)
>>> a = '&#'
>>> print a.replace('&', r'\&')
\&#
>>> print a.replace('#', r'\#')
&\#
>>>
You want to use a 'raw' string (denoted by the 'r' prefixing the replacement string), since raw strings to not treat the backslash specially.
Maybe a simple loop for chars to replace:
a = '&#'
to_replace = ['&', '#']
for char in to_replace:
a = a.replace(char, "\\"+char)
print(a)
>>> \&\#
This will help someone looking for a simple solution.
def replacemany(our_str, to_be_replaced:tuple, replace_with:str):
for nextchar in to_be_replaced:
our_str = our_str.replace(nextchar, replace_with)
return our_str
os = 'the rain in spain falls mainly on the plain ttttttttt sssssssssss nnnnnnnnnn'
tbr = ('a','t','s','n')
rw = ''
print(replacemany(os,tbr,rw))
Output:
he ri i pi fll mily o he pli
Example is given below for the or condition, it will delete all ' and , from the given string. pass as many characters as you want separated by |
import re
test = re.sub("('|,)","",str(jsonAtrList))
Before: enter image description here
After: enter image description here
Multiple replace with .translate() function
# Original string
original_string = "12#4526&18##0"
translation_dict = {'&': r'\&',
'#': r'\#'}
# Create a translation table using the dictionary
translation_table = str.maketrans(translation_dict)
# Translate the string
modified_string = original_string.translate(translation_table)
print(modified_string) # Output: 12\#4526\&18\#\#0
-
This answer has already been proposed, it looks as a duplicate.d-stroyer– d-stroyer2024年10月03日 07:35:25 +00:00Commented Oct 3, 2024 at 7:35