Skip to main content
Code Review

Return to Answer

format mathematics; PEP8
Source Link
Gareth Rees
  • 50.1k
  • 3
  • 130
  • 210
  1. You can avoid the need to check counts.has_key(j-i) if you use collections.Counter.

  2. You are looping over all pairs of distinct indexes between 0 and len(array)-2. The function itertools.combinations provides a way to cleanly express your intention.

  3. Instead of comparing three pairs of array elements, you can compare one pair of array slices.

  4. Organize the code into a function with a docstring.

  5. Generalize it from 3 to k\$k\$.

  6. Python doesn't have an "array" type, but it does have sequences. So your variables could be better named.

from collections import Counter
from itertools import combinations
def subseq_distance_counts(seq, k = 3k=3):
 """
 Return"""Return a dictionary whose keys are distances and whose values
 are the number of identical pairs of `k`-length subsequences
 of `seq` separated by that distance. `k` defaults to 3.
 """
 counts = Counter()
 for i, j in combinations(range(len(seq) - k + 1), 2):
 if seq[i:i + k] == seq[j:j + k]:
 counts[j - i] += 1
 return counts

Here's an alternative approach (that you could use if the subsequences are hashable). Your original algorithm is O(n2)\$ O(n^2) \$, where n\$n\$ is the length of the sequence. The algorithm below is O(n + m)\$O(n + m)\$, where m\$m\$ is the number of repeated triples. This won't make any different in the worst case, where m = Θ(n2)\$m = Θ(n^2)\$, but in cases where m = ο(n2)\$m = ο(n^2)\$ it should be an improvement.

def subseq_distance_counts(seq, k = 3k=3):
 """
  Return"""Return a dictionary whose keys are distances and whose values
 are the number of identical pairs of `k`-length subsequences
 of `seq` separated by that distance. `k` defaults to 3.
 >>> subseq_distance_counts('abcabcabc')
 Counter({3: 4, 6: 1})
 >>> subseq_distance_counts('aaaaaa', 1)
 Counter({1: 5, 2: 4, 3: 3, 4: 2, 5: 1})
 """
 positions = defaultdict(list) # List of positions where each subsequence appears.
 for i in range(len(seq) - k + 1):
 positions[seq[i:i + k]].append(i)
 return Counter(j - i
 for p in positions.itervalues()
 for i, j in combinations(p, 2))
  1. You can avoid the need to check counts.has_key(j-i) if you use collections.Counter.

  2. You are looping over all pairs of distinct indexes between 0 and len(array)-2. The function itertools.combinations provides a way to cleanly express your intention.

  3. Instead of comparing three pairs of array elements, you can compare one pair of array slices.

  4. Organize the code into a function with a docstring.

  5. Generalize it from 3 to k.

  6. Python doesn't have an "array" type, but it does have sequences. So your variables could be better named.

from collections import Counter
from itertools import combinations
def subseq_distance_counts(seq, k = 3):
 """
 Return a dictionary whose keys are distances and whose values
 are the number of identical pairs of `k`-length subsequences
 of `seq` separated by that distance. `k` defaults to 3.
 """
 counts = Counter()
 for i, j in combinations(range(len(seq) - k + 1), 2):
 if seq[i:i + k] == seq[j:j + k]:
 counts[j - i] += 1
 return counts

Here's an alternative approach (that you could use if the subsequences are hashable). Your original algorithm is O(n2), where n is the length of the sequence. The algorithm below is O(n + m), where m is the number of repeated triples. This won't make any different in the worst case, where m = Θ(n2), but in cases where m = ο(n2) it should be an improvement.

def subseq_distance_counts(seq, k = 3):
 """
  Return a dictionary whose keys are distances and whose values
 are the number of identical pairs of `k`-length subsequences
 of `seq` separated by that distance. `k` defaults to 3.
 >>> subseq_distance_counts('abcabcabc')
 Counter({3: 4, 6: 1})
 >>> subseq_distance_counts('aaaaaa', 1)
 Counter({1: 5, 2: 4, 3: 3, 4: 2, 5: 1})
 """
 positions = defaultdict(list) # List of positions where each subsequence appears.
 for i in range(len(seq) - k + 1):
 positions[seq[i:i + k]].append(i)
 return Counter(j - i
 for p in positions.itervalues()
 for i, j in combinations(p, 2))
  1. You can avoid the need to check counts.has_key(j-i) if you use collections.Counter.

  2. You are looping over all pairs of distinct indexes between 0 and len(array)-2. The function itertools.combinations provides a way to cleanly express your intention.

  3. Instead of comparing three pairs of array elements, you can compare one pair of array slices.

  4. Organize the code into a function with a docstring.

  5. Generalize it from 3 to \$k\$.

  6. Python doesn't have an "array" type, but it does have sequences. So your variables could be better named.

from collections import Counter
from itertools import combinations
def subseq_distance_counts(seq, k=3):
 """Return a dictionary whose keys are distances and whose values
 are the number of identical pairs of `k`-length subsequences
 of `seq` separated by that distance. `k` defaults to 3.
 """
 counts = Counter()
 for i, j in combinations(range(len(seq) - k + 1), 2):
 if seq[i:i + k] == seq[j:j + k]:
 counts[j - i] += 1
 return counts

Here's an alternative approach (that you could use if the subsequences are hashable). Your original algorithm is \$ O(n^2) \$, where \$n\$ is the length of the sequence. The algorithm below is \$O(n + m)\$, where \$m\$ is the number of repeated triples. This won't make any different in the worst case, where \$m = Θ(n^2)\$, but in cases where \$m = ο(n^2)\$ it should be an improvement.

def subseq_distance_counts(seq, k=3):
 """Return a dictionary whose keys are distances and whose values
 are the number of identical pairs of `k`-length subsequences
 of `seq` separated by that distance. `k` defaults to 3.
 >>> subseq_distance_counts('abcabcabc')
 Counter({3: 4, 6: 1})
 >>> subseq_distance_counts('aaaaaa', 1)
 Counter({1: 5, 2: 4, 3: 3, 4: 2, 5: 1})
 """
 positions = defaultdict(list) # List of positions where each subsequence appears.
 for i in range(len(seq) - k + 1):
 positions[seq[i:i + k]].append(i)
 return Counter(j - i
 for p in positions.itervalues()
 for i, j in combinations(p, 2))
subsequences must be hashable
Source Link
Gareth Rees
  • 50.1k
  • 3
  • 130
  • 210

Here's an alternative approach (that you could use if the subsequences are hashable). Your original algorithm is O(n2), where n is the length of the sequence. The algorithm below is O(n + m), where m is the number of repeated triples. This won't make any different in the worst case, where m = Θ(n2), but in cases where m = ο(n2) it should be an improvement.

Here's an alternative approach. Your original algorithm is O(n2), where n is the length of the sequence. The algorithm below is O(n + m), where m is the number of repeated triples. This won't make any different in the worst case, where m = Θ(n2), but in cases where m = ο(n2) it should be an improvement.

Here's an alternative approach (that you could use if the subsequences are hashable). Your original algorithm is O(n2), where n is the length of the sequence. The algorithm below is O(n + m), where m is the number of repeated triples. This won't make any different in the worst case, where m = Θ(n2), but in cases where m = ο(n2) it should be an improvement.

improve names of variables; can avoid `abs` by using `list`
Source Link
Gareth Rees
  • 50.1k
  • 3
  • 130
  • 210
  1. You can avoid the need to check counts.has_key(j-i) if you use collections.Counter.

  2. You are looping over all pairs of distinct indexes between 0 and len(array)-2. The function itertools.combinations provides a way to cleanly express your intention.

  3. Instead of comparing three pairs of array elements, you can compare one pair of array slices.

  4. Organize the code into a function with a docstring.

  5. Generalize it from 3 to k.

  6. Python doesn't have an "array" type, but it does have sequences . So your variables could be better named.

from collections import Counter
from itertools import combinations
def substring_distance_countssubseq_distance_counts(arrayseq, k = 3):
 """
 Return a dictionary whose keys are distances and whose values are
 are the number of repeatedidentical substringspairs of `k`-length `k`subsequences
 in `array` at that
of `seq` separated by that distance. `k` defaults to 3.
 """
 counts = Counter()
 for i, j in combinations(range(len(arrayseq) - k + 1), 2):
 if array[iseq[i:i + k] == array[jseq[j:j + k]:
 counts[j - i] += 1
 return counts
 return Counter(j - i
 for i, j in combinations(range(len(arrayseq) - k + 1), 2)
 if array[iseq[i:i + k] == array[jseq[j:j + k])

Here's an alternative approach. Your original algorithm is O(n2), where n is the length of the arraysequence. The algorithm below is O(n + m), where m is the number of repeated triples. This won't make any different in the worst case, where m = Θ(n2), but in cases where m = ο(n2) it should be an improvement.

def substring_distance_countssubseq_distance_counts(arrayseq, k = 3):
 """
 Return a dictionary whose keys are distances and whose values are
 are the number of repeatedidentical substringspairs of `k`-length `k`subsequences
 in `array` at that
of `seq` separated by that distance. `k` defaults to 3.
 >>> subseq_distance_counts('abcabcabc')
 Counter({3: 4, 6: 1})
 >>> subseq_distance_counts('aaaaaa', 1)
 Counter({1: 5, 2: 4, 3: 3, 4: 2, 5: 1})
 """
 positions = defaultdict(setlist) # SetList of positions where each substringsubsequence appears.
 for i in range(len(arrayseq) - k + 1):
 positions[array[ipositions[seq[i:i + k]].addappend(i)
 return Counter(abs(j - i)
 for p in positions.itervalues()
 for i, j in combinations(p, 2))
  1. You can avoid the need to check counts.has_key(j-i) if you use collections.Counter.

  2. You are looping over all pairs of distinct indexes between 0 and len(array)-2. The function itertools.combinations provides a way to cleanly express your intention.

  3. Instead of comparing three pairs of array elements, you can compare one pair of array slices.

  4. Organize the code into a function with a docstring.

  5. Generalize it from 3 to k.

from collections import Counter
from itertools import combinations
def substring_distance_counts(array, k = 3):
 """
 Return a dictionary whose keys are distances and whose values are
 the number of repeated substrings of length `k` in `array` at that
 distance. `k` defaults to 3.
 """
 counts = Counter()
 for i, j in combinations(range(len(array) - k + 1), 2):
 if array[i:i + k] == array[j:j + k]:
 counts[j - i] += 1
 return counts
 return Counter(j - i
 for i, j in combinations(range(len(array) - k + 1), 2)
 if array[i:i + k] == array[j:j + k])

Here's an alternative approach. Your original algorithm is O(n2), where n is the length of the array. The algorithm below is O(n + m), where m is the number of repeated triples. This won't make any different in the worst case, where m = Θ(n2), but in cases where m = ο(n2) it should be an improvement.

def substring_distance_counts(array, k = 3):
 """
 Return a dictionary whose keys are distances and whose values are
 the number of repeated substrings of length `k` in `array` at that
 distance. `k` defaults to 3.
 """
 positions = defaultdict(set) # Set of positions where each substring appears.
 for i in range(len(array) - k + 1):
 positions[array[i:i + k]].add(i)
 return Counter(abs(j - i)
 for p in positions.itervalues()
 for i, j in combinations(p, 2))
  1. You can avoid the need to check counts.has_key(j-i) if you use collections.Counter.

  2. You are looping over all pairs of distinct indexes between 0 and len(array)-2. The function itertools.combinations provides a way to cleanly express your intention.

  3. Instead of comparing three pairs of array elements, you can compare one pair of array slices.

  4. Organize the code into a function with a docstring.

  5. Generalize it from 3 to k.

  6. Python doesn't have an "array" type, but it does have sequences . So your variables could be better named.

from collections import Counter
from itertools import combinations
def subseq_distance_counts(seq, k = 3):
 """
 Return a dictionary whose keys are distances and whose values
 are the number of identical pairs of `k`-length subsequences
 of `seq` separated by that distance. `k` defaults to 3.
 """
 counts = Counter()
 for i, j in combinations(range(len(seq) - k + 1), 2):
 if seq[i:i + k] == seq[j:j + k]:
 counts[j - i] += 1
 return counts
 return Counter(j - i
 for i, j in combinations(range(len(seq) - k + 1), 2)
 if seq[i:i + k] == seq[j:j + k])

Here's an alternative approach. Your original algorithm is O(n2), where n is the length of the sequence. The algorithm below is O(n + m), where m is the number of repeated triples. This won't make any different in the worst case, where m = Θ(n2), but in cases where m = ο(n2) it should be an improvement.

def subseq_distance_counts(seq, k = 3):
 """
 Return a dictionary whose keys are distances and whose values
 are the number of identical pairs of `k`-length subsequences
 of `seq` separated by that distance. `k` defaults to 3.
 >>> subseq_distance_counts('abcabcabc')
 Counter({3: 4, 6: 1})
 >>> subseq_distance_counts('aaaaaa', 1)
 Counter({1: 5, 2: 4, 3: 3, 4: 2, 5: 1})
 """
 positions = defaultdict(list) # List of positions where each subsequence appears.
 for i in range(len(seq) - k + 1):
 positions[seq[i:i + k]].append(i)
 return Counter(j - i
 for p in positions.itervalues()
 for i, j in combinations(p, 2))
need an `abs`
Source Link
Gareth Rees
  • 50.1k
  • 3
  • 130
  • 210
Loading
generalize to arbitrary k
Source Link
Gareth Rees
  • 50.1k
  • 3
  • 130
  • 210
Loading
Source Link
Gareth Rees
  • 50.1k
  • 3
  • 130
  • 210
Loading
lang-py

AltStyle によって変換されたページ (->オリジナル) /