It seems inelegant to append a
-1
to the list of occurrences, and then remove it again. It would be better to write the loop like this:result = [] position = -1 while True: position = text.find(pattern, position + 1) if position == -1: return result result.append(position)
Instead of returning a list, it's nearly always better in Python to generate the results using the
yield
instruction, explained here here. This means that you can process the results one by one without having to build up an intermediate list in memory. So I would write the function like this:def pattern_occurrences(pattern, text): """Generate the indices of the (possibly overlapping) occurrences of pattern in text. For example: >>> list(pattern_occurrences('ATAT', 'GATATATGCATATACTT')) [1, 3, 9] """ position = -1 while True: position = text.find(pattern, position + 1) if position == -1: return yield position
It seems inelegant to append a
-1
to the list of occurrences, and then remove it again. It would be better to write the loop like this:result = [] position = -1 while True: position = text.find(pattern, position + 1) if position == -1: return result result.append(position)
Instead of returning a list, it's nearly always better in Python to generate the results using the
yield
instruction, explained here. This means that you can process the results one by one without having to build up an intermediate list in memory. So I would write the function like this:def pattern_occurrences(pattern, text): """Generate the indices of the (possibly overlapping) occurrences of pattern in text. For example: >>> list(pattern_occurrences('ATAT', 'GATATATGCATATACTT')) [1, 3, 9] """ position = -1 while True: position = text.find(pattern, position + 1) if position == -1: return yield position
It seems inelegant to append a
-1
to the list of occurrences, and then remove it again. It would be better to write the loop like this:result = [] position = -1 while True: position = text.find(pattern, position + 1) if position == -1: return result result.append(position)
Instead of returning a list, it's nearly always better in Python to generate the results using the
yield
instruction, explained here. This means that you can process the results one by one without having to build up an intermediate list in memory. So I would write the function like this:def pattern_occurrences(pattern, text): """Generate the indices of the (possibly overlapping) occurrences of pattern in text. For example: >>> list(pattern_occurrences('ATAT', 'GATATATGCATATACTT')) [1, 3, 9] """ position = -1 while True: position = text.find(pattern, position + 1) if position == -1: return yield position
It seems inelegant to append a
-1
to the list of occurrences, and then remove it again. It would be better to write the loop like this:result = [] position = -1 while True: position = text.find(pattern, position + 1) if position == -1: return result result.append(position)
Instead of returning a list, it's nearly always better in Python to generate the results using the
yield
instruction, explained here. This means that you can process the results one by one without having to build up an intermediate list in memory. So I would write the function like this:def pattern_occurrences(pattern, text): """Generate the indices of the (possibly overlapping) occurrences of pattern in text. For example: >>> list(pattern_occurrences('ATAT', 'GATATATGCATATACTT')) [1, 3, 9] """ position = -1 while True: position = text.find(pattern, position + 1) if position == -1: return yield position
Something's gone wrong with the identationindentation in this function. A copy-paste error?
The documentation says
Input ... number of occurances t
. But the actual argument is callednumberOfOccurances
, nott
.The documentation for
frequentPatterns
saysOutput: All most frequent k-mers in text.
But actually, only the k-mers that appear at leastt
times are output.There's no explanation in the documentation for the behaviour if you don't specify a value for
t
.The documentation for
frequentPatterns
is in a comment. But that means that I can't read it from the interactive interpreter:>>> help(frequentPatterns) Help on function frequentPatterns in module __main__: frequentPatterns(text='', k=2, numberOfOccurances=None)
It seems inelegant to append a
-1
to the list of occurrences, and then remove it again. It would be better to write the loop like this:result = [] position = -1 while True: position = text.find(pattern, position + 1) if position == -1: return result result.append(position)
Instead of returning a list, it's nearly always better in Python to generate the results using the
yield
instruction. This means that you can process the results one by one without having to build up an intermediate list in memory. So I would write the function like this:def pattern_occurrences(pattern, text): """Generate the indices of the (possibly overlapping) occurrences of pattern in text. For example: >>> list(pattern_occurrences('ATAT', 'GATATATGCATATACTT')) [1, 3, 9] """ position = -1 while True: position = text.find(pattern, position + 1) if position == -1: return yield position
Something's gone wrong with the identation in this function. A copy-paste error?
The documentation says
Input ... number of occurances t
. But the actual argument is callednumberOfOccurances
, nott
.The documentation for
frequentPatterns
saysOutput: All most frequent k-mers in text.
But actually, only the k-mers that appear at leastt
times are output.There's no explanation in the documentation for the behaviour if you don't specify a value for
t
.The documentation for
frequentPatterns
is in a comment. But that means that I can't read it from the interactive interpreter:>>> help(frequentPatterns) Help on function frequentPatterns in module __main__: frequentPatterns(text='', k=2, numberOfOccurances=None)
It seems inelegant to append a
-1
to the list of occurrences, and then remove it again. It would be better to write the loop like this:result = [] position = -1 while True: position = text.find(pattern, position + 1) if position == -1: return result result.append(position)
Instead of returning a list, it's nearly always better in Python to generate the results using the
yield
instruction, explained here. This means that you can process the results one by one without having to build up an intermediate list in memory. So I would write the function like this:def pattern_occurrences(pattern, text): """Generate the indices of the (possibly overlapping) occurrences of pattern in text. For example: >>> list(pattern_occurrences('ATAT', 'GATATATGCATATACTT')) [1, 3, 9] """ position = -1 while True: position = text.find(pattern, position + 1) if position == -1: return yield position
Something's gone wrong with the indentation in this function. A copy-paste error?
The documentation says
Input ... number of occurances t
. But the actual argument is callednumberOfOccurances
, nott
.The documentation for
frequentPatterns
saysOutput: All most frequent k-mers in text.
But actually, only the k-mers that appear at leastt
times are output.There's no explanation in the documentation for the behaviour if you don't specify a value for
t
.The documentation for
frequentPatterns
is in a comment. But that means that I can't read it from the interactive interpreter:>>> help(frequentPatterns) Help on function frequentPatterns in module __main__: frequentPatterns(text='', k=2, numberOfOccurances=None)
1. Introduction
This code has so many basic problems with documentation, layout, spelling, naming, and so on, that I didn't get around to looking at what it does. In a professional context we would say, "this code is not remotely ready for review" and send it back for you to fix!
It's worth reading the Python style guide (PEP8) and following the advice there. It's not compulsory, but it will make it easier for you to collaborate with other Python programmers and read each others' code. In particular, I'd use 4 spaces indentation, and lowercase_words_with_underscores
for function and method names.
2. The function patternOccurances2
Your main code calls
patternOccurances
but when you define the function it's calledpatternOccurances2
.There's no documentation for
patternOccurances2
. What does it do and how am I supposed to call it?This function takes default arguments
text="GATATATGCATATACTT", pattern="ATAT"
. These do not seem like suitable default arguments. Default arguments are normally used for common values, to save the caller having to specify them. If you want to provide test data, then write a test case separately. See below for one way to do it.This function takes its arguments in the order
(text, pattern)
but it would be better for it to take the arguments the other way round, for consistency with functions in the standard library likere.search
.This function starts by calling:
position = config.genome.find(pattern, position+1)
What is config.genome
and how is it relevant here? It looks as if it might be a mistake for text
.
It seems inelegant to append a
-1
to the list of occurrences, and then remove it again. It would be better to write the loop like this:result = [] position = -1 while True: position = text.find(pattern, position + 1) if position == -1: return result result.append(position)
Instead of returning a list, it's nearly always better in Python to generate the results using the
yield
instruction. This means that you can process the results one by one without having to build up an intermediate list in memory. So I would write the function like this:def pattern_occurrences(pattern, text): """Generate the indices of the (possibly overlapping) occurrences of pattern in text. For example: >>> list(pattern_occurrences('ATAT', 'GATATATGCATATACTT')) [1, 3, 9] """ position = -1 while True: position = text.find(pattern, position + 1) if position == -1: return yield position
Notice that the docstring includes an example test case. You can run this and check that the result is correct using the doctest
module.
3. The function frequentPatterns
Something's gone wrong with the identation in this function. A copy-paste error?
The documentation says
Input ... number of occurances t
. But the actual argument is callednumberOfOccurances
, nott
.The documentation for
frequentPatterns
saysOutput: All most frequent k-mers in text.
But actually, only the k-mers that appear at leastt
times are output.There's no explanation in the documentation for the behaviour if you don't specify a value for
t
.The documentation for
frequentPatterns
is in a comment. But that means that I can't read it from the interactive interpreter:>>> help(frequentPatterns) Help on function frequentPatterns in module __main__: frequentPatterns(text='', k=2, numberOfOccurances=None)
If you put this documentation into a docstring, like this:
def frequent_patterns(text="", k=2, t=None):
"""Return a list of the most frequent k-mers in a string.
Input: A string text, an integer k, and a number of occurrences t.
Output: All k-mers that appear at least t times in text.
If t is omitted, return a list containing the k-mer that appears
the most often (or the k-mers, if there is a tie).
"""
words = {}
# ...
then you'd be able to read it from the interactive interpreter:
<!-- language: none -->
>>> help(frequent_patterns)
Help on function frequent_patterns in module __main__:
frequent_patterns(text='', k=2, t=None)
Return a list of the most frequent k-mers in a string.
Input: A string text, an integer k, and a number of occurrences t.
Output: All k-mers that appear at least t times in text.
If t is omitted, return a list containing the k-mer that appears
the most often (or the k-mers, if there is a tie).