Skip to main content
Code Review

Return to Answer

replaced http://stackoverflow.com/ with https://stackoverflow.com/
Source Link
  1. It seems inelegant to append a -1 to the list of occurrences, and then remove it again. It would be better to write the loop like this:

     result = []
     position = -1
     while True:
     position = text.find(pattern, position + 1)
     if position == -1:
     return result
     result.append(position)
    
  2. Instead of returning a list, it's nearly always better in Python to generate the results using the yield instruction, explained here here. This means that you can process the results one by one without having to build up an intermediate list in memory. So I would write the function like this:

     def pattern_occurrences(pattern, text):
     """Generate the indices of the (possibly overlapping) occurrences of
     pattern in text. For example:
     >>> list(pattern_occurrences('ATAT', 'GATATATGCATATACTT'))
     [1, 3, 9]
     """
     position = -1
     while True:
     position = text.find(pattern, position + 1)
     if position == -1:
     return
     yield position
    
  1. It seems inelegant to append a -1 to the list of occurrences, and then remove it again. It would be better to write the loop like this:

     result = []
     position = -1
     while True:
     position = text.find(pattern, position + 1)
     if position == -1:
     return result
     result.append(position)
    
  2. Instead of returning a list, it's nearly always better in Python to generate the results using the yield instruction, explained here. This means that you can process the results one by one without having to build up an intermediate list in memory. So I would write the function like this:

     def pattern_occurrences(pattern, text):
     """Generate the indices of the (possibly overlapping) occurrences of
     pattern in text. For example:
     >>> list(pattern_occurrences('ATAT', 'GATATATGCATATACTT'))
     [1, 3, 9]
     """
     position = -1
     while True:
     position = text.find(pattern, position + 1)
     if position == -1:
     return
     yield position
    
  1. It seems inelegant to append a -1 to the list of occurrences, and then remove it again. It would be better to write the loop like this:

     result = []
     position = -1
     while True:
     position = text.find(pattern, position + 1)
     if position == -1:
     return result
     result.append(position)
    
  2. Instead of returning a list, it's nearly always better in Python to generate the results using the yield instruction, explained here. This means that you can process the results one by one without having to build up an intermediate list in memory. So I would write the function like this:

     def pattern_occurrences(pattern, text):
     """Generate the indices of the (possibly overlapping) occurrences of
     pattern in text. For example:
     >>> list(pattern_occurrences('ATAT', 'GATATATGCATATACTT'))
     [1, 3, 9]
     """
     position = -1
     while True:
     position = text.find(pattern, position + 1)
     if position == -1:
     return
     yield position
    
  1. It seems inelegant to append a -1 to the list of occurrences, and then remove it again. It would be better to write the loop like this:

     result = []
     position = -1
     while True:
     position = text.find(pattern, position + 1)
     if position == -1:
     return result
     result.append(position)
    
  2. Instead of returning a list, it's nearly always better in Python to generate the results using the yield instruction, explained here. This means that you can process the results one by one without having to build up an intermediate list in memory. So I would write the function like this:

     def pattern_occurrences(pattern, text):
     """Generate the indices of the (possibly overlapping) occurrences of
     pattern in text. For example:
     >>> list(pattern_occurrences('ATAT', 'GATATATGCATATACTT'))
     [1, 3, 9]
     """
     position = -1
     while True:
     position = text.find(pattern, position + 1)
     if position == -1:
     return
     yield position
    
  1. Something's gone wrong with the identationindentation in this function. A copy-paste error?

  2. The documentation says Input ... number of occurances t. But the actual argument is called numberOfOccurances, not t.

  3. The documentation for frequentPatterns says Output: All most frequent k-mers in text. But actually, only the k-mers that appear at least t times are output.

  4. There's no explanation in the documentation for the behaviour if you don't specify a value for t.

  5. The documentation for frequentPatterns is in a comment. But that means that I can't read it from the interactive interpreter:

     >>> help(frequentPatterns)
     Help on function frequentPatterns in module __main__:
     frequentPatterns(text='', k=2, numberOfOccurances=None)
    
  1. It seems inelegant to append a -1 to the list of occurrences, and then remove it again. It would be better to write the loop like this:

     result = []
     position = -1
     while True:
     position = text.find(pattern, position + 1)
     if position == -1:
     return result
     result.append(position)
    
  2. Instead of returning a list, it's nearly always better in Python to generate the results using the yield instruction. This means that you can process the results one by one without having to build up an intermediate list in memory. So I would write the function like this:

     def pattern_occurrences(pattern, text):
     """Generate the indices of the (possibly overlapping) occurrences of
     pattern in text. For example:
     >>> list(pattern_occurrences('ATAT', 'GATATATGCATATACTT'))
     [1, 3, 9]
     """
     position = -1
     while True:
     position = text.find(pattern, position + 1)
     if position == -1:
     return
     yield position
    
  1. Something's gone wrong with the identation in this function. A copy-paste error?

  2. The documentation says Input ... number of occurances t. But the actual argument is called numberOfOccurances, not t.

  3. The documentation for frequentPatterns says Output: All most frequent k-mers in text. But actually, only the k-mers that appear at least t times are output.

  4. There's no explanation in the documentation for the behaviour if you don't specify a value for t.

  5. The documentation for frequentPatterns is in a comment. But that means that I can't read it from the interactive interpreter:

     >>> help(frequentPatterns)
     Help on function frequentPatterns in module __main__:
     frequentPatterns(text='', k=2, numberOfOccurances=None)
    
  1. It seems inelegant to append a -1 to the list of occurrences, and then remove it again. It would be better to write the loop like this:

     result = []
     position = -1
     while True:
     position = text.find(pattern, position + 1)
     if position == -1:
     return result
     result.append(position)
    
  2. Instead of returning a list, it's nearly always better in Python to generate the results using the yield instruction, explained here. This means that you can process the results one by one without having to build up an intermediate list in memory. So I would write the function like this:

     def pattern_occurrences(pattern, text):
     """Generate the indices of the (possibly overlapping) occurrences of
     pattern in text. For example:
     >>> list(pattern_occurrences('ATAT', 'GATATATGCATATACTT'))
     [1, 3, 9]
     """
     position = -1
     while True:
     position = text.find(pattern, position + 1)
     if position == -1:
     return
     yield position
    
  1. Something's gone wrong with the indentation in this function. A copy-paste error?

  2. The documentation says Input ... number of occurances t. But the actual argument is called numberOfOccurances, not t.

  3. The documentation for frequentPatterns says Output: All most frequent k-mers in text. But actually, only the k-mers that appear at least t times are output.

  4. There's no explanation in the documentation for the behaviour if you don't specify a value for t.

  5. The documentation for frequentPatterns is in a comment. But that means that I can't read it from the interactive interpreter:

     >>> help(frequentPatterns)
     Help on function frequentPatterns in module __main__:
     frequentPatterns(text='', k=2, numberOfOccurances=None)
    
Source Link
Gareth Rees
  • 50.1k
  • 3
  • 130
  • 210

1. Introduction

This code has so many basic problems with documentation, layout, spelling, naming, and so on, that I didn't get around to looking at what it does. In a professional context we would say, "this code is not remotely ready for review" and send it back for you to fix!

It's worth reading the Python style guide (PEP8) and following the advice there. It's not compulsory, but it will make it easier for you to collaborate with other Python programmers and read each others' code. In particular, I'd use 4 spaces indentation, and lowercase_words_with_underscores for function and method names.

2. The function patternOccurances2

  1. Your main code calls patternOccurances but when you define the function it's called patternOccurances2.

  2. There's no documentation for patternOccurances2. What does it do and how am I supposed to call it?

  3. This function takes default arguments text="GATATATGCATATACTT", pattern="ATAT". These do not seem like suitable default arguments. Default arguments are normally used for common values, to save the caller having to specify them. If you want to provide test data, then write a test case separately. See below for one way to do it.

  4. This function takes its arguments in the order (text, pattern) but it would be better for it to take the arguments the other way round, for consistency with functions in the standard library like re.search.

  5. This function starts by calling:

     position = config.genome.find(pattern, position+1)
    

What is config.genome and how is it relevant here? It looks as if it might be a mistake for text.

  1. It seems inelegant to append a -1 to the list of occurrences, and then remove it again. It would be better to write the loop like this:

     result = []
     position = -1
     while True:
     position = text.find(pattern, position + 1)
     if position == -1:
     return result
     result.append(position)
    
  2. Instead of returning a list, it's nearly always better in Python to generate the results using the yield instruction. This means that you can process the results one by one without having to build up an intermediate list in memory. So I would write the function like this:

     def pattern_occurrences(pattern, text):
     """Generate the indices of the (possibly overlapping) occurrences of
     pattern in text. For example:
     >>> list(pattern_occurrences('ATAT', 'GATATATGCATATACTT'))
     [1, 3, 9]
     """
     position = -1
     while True:
     position = text.find(pattern, position + 1)
     if position == -1:
     return
     yield position
    

Notice that the docstring includes an example test case. You can run this and check that the result is correct using the doctest module.

3. The function frequentPatterns

  1. Something's gone wrong with the identation in this function. A copy-paste error?

  2. The documentation says Input ... number of occurances t. But the actual argument is called numberOfOccurances, not t.

  3. The documentation for frequentPatterns says Output: All most frequent k-mers in text. But actually, only the k-mers that appear at least t times are output.

  4. There's no explanation in the documentation for the behaviour if you don't specify a value for t.

  5. The documentation for frequentPatterns is in a comment. But that means that I can't read it from the interactive interpreter:

     >>> help(frequentPatterns)
     Help on function frequentPatterns in module __main__:
     frequentPatterns(text='', k=2, numberOfOccurances=None)
    

If you put this documentation into a docstring, like this:

 def frequent_patterns(text="", k=2, t=None):
 """Return a list of the most frequent k-mers in a string.
 Input: A string text, an integer k, and a number of occurrences t.
 Output: All k-mers that appear at least t times in text.
 If t is omitted, return a list containing the k-mer that appears
 the most often (or the k-mers, if there is a tie).
 """
 words = {}
 # ...

then you'd be able to read it from the interactive interpreter:

<!-- language: none -->
 >>> help(frequent_patterns)
 Help on function frequent_patterns in module __main__:
 frequent_patterns(text='', k=2, t=None)
 Return a list of the most frequent k-mers in a string.
 Input: A string text, an integer k, and a number of occurrences t.
 Output: All k-mers that appear at least t times in text.
 If t is omitted, return a list containing the k-mer that appears
 the most often (or the k-mers, if there is a tie).
lang-py

AltStyle によって変換されたページ (->オリジナル) /