6

I have a list of coordinates:

coordinates = [[1,5], [10,15], [25, 35]]

I have a string as follows:

line = 'ATCACGTGTGTGTACACGTACGTGTGNGTNGTTGAGTGKWSGTGAAAAAKCT'

I want to replace intervals indicated in pairs in coordinates as start and end with character 'N'.

The only way I can think of is the following:

for element in coordinates:
 length = element[1] - element[0]
 line = line.replace(line[element[0]:element[1]], 'N'*length)

The desired output would be:

line = 'ANNNNGTGTGNNNNNACGTACGTGTNNNNNNNNNNGTGKWSGTGAAAAAKCT'

where intervals, [1,5), [10,15) and [25, 35) are replaced with N in line.

This requires me to loop through the coordinate list and update my string line, every time. I was wondering if there is another way that one can replace a list of intervals in a string?

Note: There is a problem with the original solution in this question. In line.replace(line[element[0]:element[1]], 'N'*length), replace will replace all other instances of string identical to the one in line[element[0]:element[1]] from the sequence and for people working with DNA, this is definitely not what you want! I however, keep the solution as it is to not disturb the flow of comments and discussion following.

asked Jul 30, 2020 at 9:11
8
  • 2
    Please add example (desired) output to the question. Commented Jul 30, 2020 at 9:14
  • 2
    But I think this should do what you want: for start, end in coordinates: line = line[:start] + "N" * (end - start) + line[end:] -- if I've correctly understood. Commented Jul 30, 2020 at 9:17
  • 1
    I am not sure your current solution even does what you expect. replace replaces all occurrences of the sub-string so it might not only replace the indices you give it Commented Jul 30, 2020 at 9:19
  • @Tomerikoo Oh, really, that's so important. It looks in my example is working correctly with the indices I give to it. How do you think it could cause a problem? Is there another method I could use instead? Commented Jul 30, 2020 at 9:22
  • 2
    @Homap it might cause a problem if for example the substring between indices 1 and 5 (TCAC) appears somewhere else in the string, so it will be replaced as well. That might not be what you want Commented Jul 30, 2020 at 9:31

2 Answers 2

6

Instead of string concatenation (wich is wasteful due to created / destroyed string instances), use a list:

coordinates = [[1,5], [10,15], [25, 35]] # sorted
line = 'ATCACGTGTGTGTACACGTACGTGTGNGTNGTTGAGTGKWSGTGAAAAAKCT'
result = list(line)
# opted for exclusive end pos
for r in [range(start,end) for start,end in coordinates]:
 for p in r:
 result[p]='N'
res = ''.join(result)
print(res)

To get:

ANNNNGTGTGNNNNNACGTACGTGTNNNNNNNNNNGTGKWSGTGAAAAAKCT

optimized to use slicing and exclusive end:

for start,end in coordinates:
 result[start:end] = ["N"]*(end-start)
res = ''.join(result)
print(line)
print(res)

gives you your wanted output:

ATCACGTGTGTGTACACGTACGTGTGNGTNGTTGAGTGKWSGTGAAAAAKCT 
ANNNNGTGTGNNNNNACGTACGTGTNNNNNNNNNNGTGKWSGTGAAAAAKCT
alani
13.2k3 gold badges18 silver badges34 bronze badges
answered Jul 30, 2020 at 9:21
Sign up to request clarification or add additional context in comments.

1 Comment

This solution took about 89 seconds on 2.4 GB file.
2

Good question, this should work.

coordinates = [[1,5], [10,15], [25, 35]]
line = 'ATCACGTGTGTGTACACGTACGTGTGNGTNGTTGAGTGKWSGTGAAAAAKCT'
for L,R in coordinates:
 line = line[:L] + "N"*(R-L) + line[R:]
print(line)

You may need to adjust this depending on how the coordinates are defined, eg. inclusive/1-indexed.

We need more people working with DNA, so great work.

answered Jul 30, 2020 at 9:17

2 Comments

Good. The code in the question would probably imply that the indices should be as you have now shown (and I thought the same - see my comment under the question) but some example output from the OP would certainly help clarify this.
Ah, now we have example output in the question, and it is as suspected.

Your Answer

Draft saved
Draft discarded

Sign up or log in

Sign up using Google
Sign up using Email and Password

Post as a guest

Required, but never shown

Post as a guest

Required, but never shown

By clicking "Post Your Answer", you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.