Python: how to replace substrings in a string given list of indices

Question 1

I have a string:

"A XYZ B XYZ C"

and a list of index-tuples:

((2, 5), (8, 11))

I would like to apply a replacement of each substring defined by indices by the sum of them:

A 7 B 19 C

I can't do it using string replace as it will match both instances of XYZ. Replacing using index information will break on the second and forth iterations as indices are shifting throughout the process.

Is there a nice solution for the problem?

UPDATE. String is given for example. I don't know its contents a priori nor can I use them in the solution.

My dirty solution is:

text = "A XYZ B XYZ C"
replace_list = ((2, 5), (8, 11))
offset = 0
for rpl in replace_list:
 l = rpl[0] + offset
 r = rpl[1] + offset
 replacement = str(r + l)
 text = text[0:l] + replacement + text[r:]
 offset += len(replacement) - (r - l)

Which counts on the order of index-tuples to be ascending. Could it be done nicer?

Question 2

Imperative and stateful:

s = 'A XYZ B XYZ C'
indices = ((2, 5), (8, 11))
res = []
i = 0
for start, end in indices:
 res.append(s[i:start] + str(start + end))
 i = end
res.append(s[end:])
print(''.join(res))

Result:

A 7 B 19 C

Question 3

This is very simple and neat

Question 4

You can use re.sub():

In [17]: s = "A XYZ B XYZ C"
In [18]: ind = ((2, 5), (8, 11))
In [19]: inds = map(sum, ind)
In [20]: re.sub(r'XYZ', lambda _: str(next(inds)), s)
Out[20]: 'A 7 B 19 C'

But note that if the number of matches is larger than your index pairs it will raise a StopIteration error. In that case you can pass a default argument to the next() to replace the sub-string with.

If you want to use the tuples of indices for finding the sub strings, here is another solution:

In [81]: flat_ind = tuple(i for sub in ind for i in sub)
# Create all the pairs with respect to your intended indices. 
In [82]: inds = [(0, ind[0][0]), *zip(flat_ind, flat_ind[1:]), (ind[-1][-1], len(s))]
# replace the respective slice of the string with sum of indices of they exist in intended pairs, otherwise just the sub-string itself.
In [85]: ''.join([str(i+j) if (i, j) in ind else s[i:j] for i, j in inds])
Out[85]: 'A 7 B 19 C'

Question 5

XYZ is just an example, they want to replace the items in the given ranges.

Question 6

@AshwiniChaudhary Yeah, I saw the edit now. I'll update the answer, thanks for note.

Question 7

One way to do this using itertools.groupby.

from itertools import groupby
indices = ((2, 5), (8, 11))
data = list("A XYZ B XYZ C")

We start with replacing the range of matched items with equal number of None.

for a, b in indices:
 data[a:b] = [None] * (b - a)
print(data)
# ['A', ' ', None, None, None, ' ', 'B', ' ', None, None, None, ' ', 'C']

The we loop over the grouped data and replace the None groups with the sum from indices list.

it = iter(indices)
output = []
for k, g in groupby(data, lambda x: x is not None):
 if k:
 output.extend(g)
 else:
 output.append(str(sum(next(it))))
print(''.join(output))
# A 7 B 19 C

Question 8

Here's a quick and slightly dirty solution using string formatting and tuple unpacking:

s = 'A XYZ B XYZ C'
reps = ((2, 5), (8, 11))
totals = (sum(r) for r in reps)
print s.replace('XYZ','{}').format(*totals)

This prints:

A 7 B 19 C

First, we use a generator expression to find the totals for each of our replacements. Then, by replacing 'XYZ' with '{}' we can use string formatting - *totals will ensure we get the totals in the correct order.

Edit

I didn't realise the indices were actually string indices - my bad. To do this, we could use re.sub as follows:

import re
s = 'A XYZ B XYZ C'
reps = ((2, 5), (8, 11))
for a, b in reps:
 s = s[:a] + '~'*(b-a) + s[b:]
totals = (sum(r) for r in reps)
print re.sub(r'(~+)', r'{}', s).format(*totals)

Assuming there are no tildes (~) used in your string - if there are, replace with a different character. This also assumes none of the "replacement" groups are consecutive.

Question 9

That's a particular case. Actually I don't know what substrings are defined by indices. XYZ is just for example of duplicate tokens.

Question 10

@DenisKulagin My apologies, I misunderstood the question. Let me update the answer

Question 11

Assuming there are no overlaps then you could do it in reverse order

text = "A XYZ B XYZ C"
replace_list = ((2, 5), (8, 11))
for start, end in reversed(replace_list):
 text = f'{text[:start]}{start + end}{text[end:]}'
# A 7 B 19 C

Question 12

Here's a reversed-order list-slice assignment solution:

text = "A XYZ B XYZ C"
indices = ((2, 5), (8, 11))
chars = list(text)
for start, end in reversed(indices):
 chars[start:end] = str(start + end)
text = ''.join(chars) # A 7 B 19 C

Question 13

There is also a solution which does exactly what you want. I have not worked it out completely, but you may want to use: re.sub() from the re library.

Look here, and look for the functions re.sub() or re.subn(): https://docs.python.org/2/library/re.html

If I have time, I will work out your example later today.

Question 14

Yet another itertools solution

from itertools import *
s = "A XYZ B XYZ C"
inds = ((2, 5), (8, 11))
res = 'A 7 B 19 C'
inds = list(chain([0], *inds, [len(s)]))
res_ = ''.join(s[i:j] if k % 2 == 0 else str(i + j)
 for k, (i,j) in enumerate(zip(inds, inds[1:])))
assert res == res_

Question 15

Anticipating that if these pairs-of-integer selections are useful here, they will also be useful in other places, then I would proably do something like this:

def make_selections(data, selections):
 start = 0
 # sorted(selections) if you don't want to require the caller to provide them in order
 for selection in selections:
 yield None, data[start:selection[0]]
 yield selection, data[selection[0]:selection[1]]
 start = selection[1]
 yield None, data[start:]
def replace_selections_with_total(data, selections):
 return ''.join(
 str(selection[0] + selection[1]) if selection else value
 for selection, value in make_selections(data, selections)
 )

This still relies on the selections not overlapping, but I'm not sure what it would even mean for them to overlap.

You could then make the replacement itself more flexible too:

def replace_selections(data, selections, replacement):
 return ''.join(
 replacement(selection, value) if selection else value
 for selection, value in make_selections(data, selections)
 )
def replace_selections_with_total(data, selections):
 return replace_selections(data, selections, lambda s,_: str(s[0]+s[1]))

Mike Müller 86k21 gold badges174 silver badges165 bronze badges · Accepted Answer · 2017-07-27 11:40:25Z

10

Imperative and stateful:

s = 'A XYZ B XYZ C'
indices = ((2, 5), (8, 11))
res = []
i = 0
for start, end in indices:
 res.append(s[i:start] + str(start + end))
 i = end
res.append(s[end:])
print(''.join(res))

Result:

A 7 B 19 C

Share

Improve this answer

edited Jul 27, 2017 at 11:42

answered Jul 27, 2017 at 11:40

Mike Müller's user avatar

Mike Müller

86k21 gold badges174 silver badges165 bronze badges

Sign up to request clarification or add additional context in comments.

1 Comment

asongtoruin

asongtoruin Over a year ago

This is very simple and neat

2017年07月27日T11:57:19.92Z+00:00

CollectivesTM on Stack Overflow

Python: how to replace substrings in a string given list of indices

9 Answers 9

1 Comment

2 Comments

Comments

Edit

2 Comments

Comments

Comments

Comments

Comments

Comments

Your Answer

Sign up or log in

Post as a guest

Post as a guest

Hot Network Questions

CollectivesTM on Stack Overflow

9 Answers 9

1 Comment

2 Comments

Comments

Edit

2 Comments

Comments

Comments

Comments

Comments

Comments

Your Answer

Sign up or log in

Post as a guest

Post as a guest

Related