Return to Answer

replaced http://stackoverflow.com/ with https://stackoverflow.com/

edited May 23, 2017 at 12:40

It seems inelegant to append a -1 to the list of occurrences, and then remove it again. It would be better to write the loop like this:

 result = []
 position = -1
 while True:
 position = text.find(pattern, position + 1)
 if position == -1:
 return result
 result.append(position)

Instead of returning a list, it's nearly always better in Python to generate the results using the yield instruction, explained here here. This means that you can process the results one by one without having to build up an intermediate list in memory. So I would write the function like this:

 def pattern_occurrences(pattern, text):
 """Generate the indices of the (possibly overlapping) occurrences of
 pattern in text. For example:
 >>> list(pattern_occurrences('ATAT', 'GATATATGCATATACTT'))
 [1, 3, 9]
 """
 position = -1
 while True:
 position = text.find(pattern, position + 1)
 if position == -1:
 return
 yield position

It seems inelegant to append a -1 to the list of occurrences, and then remove it again. It would be better to write the loop like this:

 result = []
 position = -1
 while True:
 position = text.find(pattern, position + 1)
 if position == -1:
 return result
 result.append(position)

Instead of returning a list, it's nearly always better in Python to generate the results using the yield instruction, explained here. This means that you can process the results one by one without having to build up an intermediate list in memory. So I would write the function like this:

 def pattern_occurrences(pattern, text):
 """Generate the indices of the (possibly overlapping) occurrences of
 pattern in text. For example:
 >>> list(pattern_occurrences('ATAT', 'GATATATGCATATACTT'))
 [1, 3, 9]
 """
 position = -1
 while True:
 position = text.find(pattern, position + 1)
 if position == -1:
 return
 yield position

It seems inelegant to append a -1 to the list of occurrences, and then remove it again. It would be better to write the loop like this:

 result = []
 position = -1
 while True:
 position = text.find(pattern, position + 1)
 if position == -1:
 return result
 result.append(position)

 def pattern_occurrences(pattern, text):
 """Generate the indices of the (possibly overlapping) occurrences of
 pattern in text. For example:
 >>> list(pattern_occurrences('ATAT', 'GATATATGCATATACTT'))
 [1, 3, 9]
 """
 position = -1
 while True:
 position = text.find(pattern, position + 1)
 if position == -1:
 return
 yield position

Added link to explain what yield does.

Source Link

edit approved Dec 24, 2013 at 15:15

richnis

edit approved Dec 24, 2013 at 15:15

richnis

It seems inelegant to append a -1 to the list of occurrences, and then remove it again. It would be better to write the loop like this:

 result = []
 position = -1
 while True:
 position = text.find(pattern, position + 1)
 if position == -1:
 return result
 result.append(position)

 def pattern_occurrences(pattern, text):
 """Generate the indices of the (possibly overlapping) occurrences of
 pattern in text. For example:
 >>> list(pattern_occurrences('ATAT', 'GATATATGCATATACTT'))
 [1, 3, 9]
 """
 position = -1
 while True:
 position = text.find(pattern, position + 1)
 if position == -1:
 return
 yield position

Something's gone wrong with the identationindentation in this function. A copy-paste error?
The documentation says Input ... number of occurances t. But the actual argument is called numberOfOccurances, not t.
The documentation for frequentPatterns says Output: All most frequent k-mers in text. But actually, only the k-mers that appear at least t times are output.
There's no explanation in the documentation for the behaviour if you don't specify a value for t.

The documentation for frequentPatterns is in a comment. But that means that I can't read it from the interactive interpreter:

 >>> help(frequentPatterns)
 Help on function frequentPatterns in module __main__:
 frequentPatterns(text='', k=2, numberOfOccurances=None)

It seems inelegant to append a -1 to the list of occurrences, and then remove it again. It would be better to write the loop like this:

 result = []
 position = -1
 while True:
 position = text.find(pattern, position + 1)
 if position == -1:
 return result
 result.append(position)

Instead of returning a list, it's nearly always better in Python to generate the results using the yield instruction. This means that you can process the results one by one without having to build up an intermediate list in memory. So I would write the function like this:

 def pattern_occurrences(pattern, text):
 """Generate the indices of the (possibly overlapping) occurrences of
 pattern in text. For example:
 >>> list(pattern_occurrences('ATAT', 'GATATATGCATATACTT'))
 [1, 3, 9]
 """
 position = -1
 while True:
 position = text.find(pattern, position + 1)
 if position == -1:
 return
 yield position

Something's gone wrong with the identation in this function. A copy-paste error?
The documentation says Input ... number of occurances t. But the actual argument is called numberOfOccurances, not t.
The documentation for frequentPatterns says Output: All most frequent k-mers in text. But actually, only the k-mers that appear at least t times are output.
There's no explanation in the documentation for the behaviour if you don't specify a value for t.

The documentation for frequentPatterns is in a comment. But that means that I can't read it from the interactive interpreter:

 >>> help(frequentPatterns)
 Help on function frequentPatterns in module __main__:
 frequentPatterns(text='', k=2, numberOfOccurances=None)

It seems inelegant to append a -1 to the list of occurrences, and then remove it again. It would be better to write the loop like this:

 result = []
 position = -1
 while True:
 position = text.find(pattern, position + 1)
 if position == -1:
 return result
 result.append(position)

 def pattern_occurrences(pattern, text):
 """Generate the indices of the (possibly overlapping) occurrences of
 pattern in text. For example:
 >>> list(pattern_occurrences('ATAT', 'GATATATGCATATACTT'))
 [1, 3, 9]
 """
 position = -1
 while True:
 position = text.find(pattern, position + 1)
 if position == -1:
 return
 yield position

Something's gone wrong with the indentation in this function. A copy-paste error?
The documentation says Input ... number of occurances t. But the actual argument is called numberOfOccurances, not t.
The documentation for frequentPatterns says Output: All most frequent k-mers in text. But actually, only the k-mers that appear at least t times are output.
There's no explanation in the documentation for the behaviour if you don't specify a value for t.

The documentation for frequentPatterns is in a comment. But that means that I can't read it from the interactive interpreter:

 >>> help(frequentPatterns)
 Help on function frequentPatterns in module __main__:
 frequentPatterns(text='', k=2, numberOfOccurances=None)

Source Link

answered Dec 23, 2013 at 15:00

Gareth Rees

answered Dec 23, 2013 at 15:00

Gareth Rees

50.1k
3
130
210

1. Introduction

This code has so many basic problems with documentation, layout, spelling, naming, and so on, that I didn't get around to looking at what it does. In a professional context we would say, "this code is not remotely ready for review" and send it back for you to fix!

It's worth reading the Python style guide (PEP8) and following the advice there. It's not compulsory, but it will make it easier for you to collaborate with other Python programmers and read each others' code. In particular, I'd use 4 spaces indentation, and lowercase_words_with_underscores for function and method names.

2. The function patternOccurances2

Your main code calls patternOccurances but when you define the function it's called patternOccurances2.
There's no documentation for patternOccurances2. What does it do and how am I supposed to call it?
This function takes default arguments text="GATATATGCATATACTT", pattern="ATAT". These do not seem like suitable default arguments. Default arguments are normally used for common values, to save the caller having to specify them. If you want to provide test data, then write a test case separately. See below for one way to do it.
This function takes its arguments in the order (text, pattern) but it would be better for it to take the arguments the other way round, for consistency with functions in the standard library like re.search.

This function starts by calling:

 position = config.genome.find(pattern, position+1)

What is config.genome and how is it relevant here? It looks as if it might be a mistake for text.

It seems inelegant to append a -1 to the list of occurrences, and then remove it again. It would be better to write the loop like this:

 result = []
 position = -1
 while True:
 position = text.find(pattern, position + 1)
 if position == -1:
 return result
 result.append(position)

 def pattern_occurrences(pattern, text):
 """Generate the indices of the (possibly overlapping) occurrences of
 pattern in text. For example:
 >>> list(pattern_occurrences('ATAT', 'GATATATGCATATACTT'))
 [1, 3, 9]
 """
 position = -1
 while True:
 position = text.find(pattern, position + 1)
 if position == -1:
 return
 yield position

Notice that the docstring includes an example test case. You can run this and check that the result is correct using the doctest module.

3. The function frequentPatterns

Something's gone wrong with the identation in this function. A copy-paste error?
The documentation says Input ... number of occurances t. But the actual argument is called numberOfOccurances, not t.
The documentation for frequentPatterns says Output: All most frequent k-mers in text. But actually, only the k-mers that appear at least t times are output.
There's no explanation in the documentation for the behaviour if you don't specify a value for t.

The documentation for frequentPatterns is in a comment. But that means that I can't read it from the interactive interpreter:

 >>> help(frequentPatterns)
 Help on function frequentPatterns in module __main__:
 frequentPatterns(text='', k=2, numberOfOccurances=None)

If you put this documentation into a docstring, like this:

 def frequent_patterns(text="", k=2, t=None):
 """Return a list of the most frequent k-mers in a string.
 Input: A string text, an integer k, and a number of occurrences t.
 Output: All k-mers that appear at least t times in text.
 If t is omitted, return a list containing the k-mer that appears
 the most often (or the k-mers, if there is a tie).
 """
 words = {}
 # ...

then you'd be able to read it from the interactive interpreter:

<!-- language: none -->
 >>> help(frequent_patterns)
 Help on function frequent_patterns in module __main__:
 frequent_patterns(text='', k=2, t=None)
 Return a list of the most frequent k-mers in a string.
 Input: A string text, an integer k, and a number of occurrences t.
 Output: All k-mers that appear at least t times in text.
 If t is omitted, return a list containing the k-mer that appears
 the most often (or the k-mers, if there is a tie).

lang-py