Return to Answer

format mathematics; PEP8

edited May 4, 2015 at 9:36

50.1k
3
130
210

You can avoid the need to check counts.has_key(j-i) if you use collections.Counter.
You are looping over all pairs of distinct indexes between 0 and len(array)-2. The function itertools.combinations provides a way to cleanly express your intention.
Instead of comparing three pairs of array elements, you can compare one pair of array slices.
Organize the code into a function with a docstring.
Generalize it from 3 to k\$k\$.
Python doesn't have an "array" type, but it does have sequences. So your variables could be better named.

from collections import Counter
from itertools import combinations
def subseq_distance_counts(seq, k = 3k=3):
 """
 Return"""Return a dictionary whose keys are distances and whose values
 are the number of identical pairs of `k`-length subsequences
 of `seq` separated by that distance. `k` defaults to 3.
 """
 counts = Counter()
 for i, j in combinations(range(len(seq) - k + 1), 2):
 if seq[i:i + k] == seq[j:j + k]:
 counts[j - i] += 1
 return counts

Here's an alternative approach (that you could use if the subsequences are hashable). Your original algorithm is O(n²)\$ O(n^2) \$, where n\$n\$ is the length of the sequence. The algorithm below is O(n + m)\$O(n + m)\$, where m\$m\$ is the number of repeated triples. This won't make any different in the worst case, where m = Θ(n²)\$m = Θ(n^2)\$, but in cases where m = ο(n²)\$m = ο(n^2)\$ it should be an improvement.

def subseq_distance_counts(seq, k = 3k=3):
 """
  Return"""Return a dictionary whose keys are distances and whose values
 are the number of identical pairs of `k`-length subsequences
 of `seq` separated by that distance. `k` defaults to 3.
 >>> subseq_distance_counts('abcabcabc')
 Counter({3: 4, 6: 1})
 >>> subseq_distance_counts('aaaaaa', 1)
 Counter({1: 5, 2: 4, 3: 3, 4: 2, 5: 1})
 """
 positions = defaultdict(list) # List of positions where each subsequence appears.
 for i in range(len(seq) - k + 1):
 positions[seq[i:i + k]].append(i)
 return Counter(j - i
 for p in positions.itervalues()
 for i, j in combinations(p, 2))

You can avoid the need to check counts.has_key(j-i) if you use collections.Counter.
You are looping over all pairs of distinct indexes between 0 and len(array)-2. The function itertools.combinations provides a way to cleanly express your intention.
Instead of comparing three pairs of array elements, you can compare one pair of array slices.
Organize the code into a function with a docstring.
Generalize it from 3 to k.
Python doesn't have an "array" type, but it does have sequences. So your variables could be better named.

from collections import Counter
from itertools import combinations
def subseq_distance_counts(seq, k = 3):
 """
 Return a dictionary whose keys are distances and whose values
 are the number of identical pairs of `k`-length subsequences
 of `seq` separated by that distance. `k` defaults to 3.
 """
 counts = Counter()
 for i, j in combinations(range(len(seq) - k + 1), 2):
 if seq[i:i + k] == seq[j:j + k]:
 counts[j - i] += 1
 return counts

Here's an alternative approach (that you could use if the subsequences are hashable). Your original algorithm is O(n²), where n is the length of the sequence. The algorithm below is O(n + m), where m is the number of repeated triples. This won't make any different in the worst case, where m = Θ(n²), but in cases where m = ο(n²) it should be an improvement.

def subseq_distance_counts(seq, k = 3):
 """
  Return a dictionary whose keys are distances and whose values
 are the number of identical pairs of `k`-length subsequences
 of `seq` separated by that distance. `k` defaults to 3.
 >>> subseq_distance_counts('abcabcabc')
 Counter({3: 4, 6: 1})
 >>> subseq_distance_counts('aaaaaa', 1)
 Counter({1: 5, 2: 4, 3: 3, 4: 2, 5: 1})
 """
 positions = defaultdict(list) # List of positions where each subsequence appears.
 for i in range(len(seq) - k + 1):
 positions[seq[i:i + k]].append(i)
 return Counter(j - i
 for p in positions.itervalues()
 for i, j in combinations(p, 2))

You can avoid the need to check counts.has_key(j-i) if you use collections.Counter.
You are looping over all pairs of distinct indexes between 0 and len(array)-2. The function itertools.combinations provides a way to cleanly express your intention.
Instead of comparing three pairs of array elements, you can compare one pair of array slices.
Organize the code into a function with a docstring.
Generalize it from 3 to \$k\$.
Python doesn't have an "array" type, but it does have sequences. So your variables could be better named.

from collections import Counter
from itertools import combinations
def subseq_distance_counts(seq, k=3):
 """Return a dictionary whose keys are distances and whose values
 are the number of identical pairs of `k`-length subsequences
 of `seq` separated by that distance. `k` defaults to 3.
 """
 counts = Counter()
 for i, j in combinations(range(len(seq) - k + 1), 2):
 if seq[i:i + k] == seq[j:j + k]:
 counts[j - i] += 1
 return counts

Here's an alternative approach (that you could use if the subsequences are hashable). Your original algorithm is \$ O(n^2) \$, where \$n\$ is the length of the sequence. The algorithm below is \$O(n + m)\$, where \$m\$ is the number of repeated triples. This won't make any different in the worst case, where \$m = Θ(n^2)\$, but in cases where \$m = ο(n^2)\$ it should be an improvement.

def subseq_distance_counts(seq, k=3):
 """Return a dictionary whose keys are distances and whose values
 are the number of identical pairs of `k`-length subsequences
 of `seq` separated by that distance. `k` defaults to 3.
 >>> subseq_distance_counts('abcabcabc')
 Counter({3: 4, 6: 1})
 >>> subseq_distance_counts('aaaaaa', 1)
 Counter({1: 5, 2: 4, 3: 3, 4: 2, 5: 1})
 """
 positions = defaultdict(list) # List of positions where each subsequence appears.
 for i in range(len(seq) - k + 1):
 positions[seq[i:i + k]].append(i)
 return Counter(j - i
 for p in positions.itervalues()
 for i, j in combinations(p, 2))

subsequences must be hashable

Source Link

edited Dec 10, 2012 at 19:49

Gareth Rees

edited Dec 10, 2012 at 19:49

Gareth Rees

50.1k
3
130
210

Here's an alternative approach. Your original algorithm is O(n²), where n is the length of the sequence. The algorithm below is O(n + m), where m is the number of repeated triples. This won't make any different in the worst case, where m = Θ(n²), but in cases where m = ο(n²) it should be an improvement.

improve names of variables; can avoid `abs` by using `list`

Source Link

edited Dec 10, 2012 at 17:34

Gareth Rees

edited Dec 10, 2012 at 17:34

Gareth Rees

50.1k
3
130
210

You can avoid the need to check counts.has_key(j-i) if you use collections.Counter.
You are looping over all pairs of distinct indexes between 0 and len(array)-2. The function itertools.combinations provides a way to cleanly express your intention.
Instead of comparing three pairs of array elements, you can compare one pair of array slices.
Organize the code into a function with a docstring.
Generalize it from 3 to k.
Python doesn't have an "array" type, but it does have sequences . So your variables could be better named.

from collections import Counter
from itertools import combinations
def substring_distance_countssubseq_distance_counts(arrayseq, k = 3):
 """
 Return a dictionary whose keys are distances and whose values are
 are the number of repeatedidentical substringspairs of `k`-length `k`subsequences
 in `array` at that
of `seq` separated by that distance. `k` defaults to 3.
 """
 counts = Counter()
 for i, j in combinations(range(len(arrayseq) - k + 1), 2):
 if array[iseq[i:i + k] == array[jseq[j:j + k]:
 counts[j - i] += 1
 return counts

 return Counter(j - i
 for i, j in combinations(range(len(arrayseq) - k + 1), 2)
 if array[iseq[i:i + k] == array[jseq[j:j + k])

Here's an alternative approach. Your original algorithm is O(n²), where n is the length of the arraysequence. The algorithm below is O(n + m), where m is the number of repeated triples. This won't make any different in the worst case, where m = Θ(n²), but in cases where m = ο(n²) it should be an improvement.

def substring_distance_countssubseq_distance_counts(arrayseq, k = 3):
 """
 Return a dictionary whose keys are distances and whose values are
 are the number of repeatedidentical substringspairs of `k`-length `k`subsequences
 in `array` at that
of `seq` separated by that distance. `k` defaults to 3.
 >>> subseq_distance_counts('abcabcabc')
 Counter({3: 4, 6: 1})
 >>> subseq_distance_counts('aaaaaa', 1)
 Counter({1: 5, 2: 4, 3: 3, 4: 2, 5: 1})
 """
 positions = defaultdict(setlist) # SetList of positions where each substringsubsequence appears.
 for i in range(len(arrayseq) - k + 1):
 positions[array[ipositions[seq[i:i + k]].addappend(i)
 return Counter(abs(j - i)
 for p in positions.itervalues()
 for i, j in combinations(p, 2))

You can avoid the need to check counts.has_key(j-i) if you use collections.Counter.
You are looping over all pairs of distinct indexes between 0 and len(array)-2. The function itertools.combinations provides a way to cleanly express your intention.
Instead of comparing three pairs of array elements, you can compare one pair of array slices.
Organize the code into a function with a docstring.
Generalize it from 3 to k.

from collections import Counter
from itertools import combinations
def substring_distance_counts(array, k = 3):
 """
 Return a dictionary whose keys are distances and whose values are
 the number of repeated substrings of length `k` in `array` at that
 distance. `k` defaults to 3.
 """
 counts = Counter()
 for i, j in combinations(range(len(array) - k + 1), 2):
 if array[i:i + k] == array[j:j + k]:
 counts[j - i] += 1
 return counts

 return Counter(j - i
 for i, j in combinations(range(len(array) - k + 1), 2)
 if array[i:i + k] == array[j:j + k])

Here's an alternative approach. Your original algorithm is O(n²), where n is the length of the array. The algorithm below is O(n + m), where m is the number of repeated triples. This won't make any different in the worst case, where m = Θ(n²), but in cases where m = ο(n²) it should be an improvement.

def substring_distance_counts(array, k = 3):
 """
 Return a dictionary whose keys are distances and whose values are
 the number of repeated substrings of length `k` in `array` at that
 distance. `k` defaults to 3.
 """
 positions = defaultdict(set) # Set of positions where each substring appears.
 for i in range(len(array) - k + 1):
 positions[array[i:i + k]].add(i)
 return Counter(abs(j - i)
 for p in positions.itervalues()
 for i, j in combinations(p, 2))

You can avoid the need to check counts.has_key(j-i) if you use collections.Counter.
You are looping over all pairs of distinct indexes between 0 and len(array)-2. The function itertools.combinations provides a way to cleanly express your intention.
Instead of comparing three pairs of array elements, you can compare one pair of array slices.
Organize the code into a function with a docstring.
Generalize it from 3 to k.
Python doesn't have an "array" type, but it does have sequences . So your variables could be better named.

from collections import Counter
from itertools import combinations
def subseq_distance_counts(seq, k = 3):
 """
 Return a dictionary whose keys are distances and whose values
 are the number of identical pairs of `k`-length subsequences
 of `seq` separated by that distance. `k` defaults to 3.
 """
 counts = Counter()
 for i, j in combinations(range(len(seq) - k + 1), 2):
 if seq[i:i + k] == seq[j:j + k]:
 counts[j - i] += 1
 return counts

 return Counter(j - i
 for i, j in combinations(range(len(seq) - k + 1), 2)
 if seq[i:i + k] == seq[j:j + k])

def subseq_distance_counts(seq, k = 3):
 """
 Return a dictionary whose keys are distances and whose values
 are the number of identical pairs of `k`-length subsequences
 of `seq` separated by that distance. `k` defaults to 3.
 >>> subseq_distance_counts('abcabcabc')
 Counter({3: 4, 6: 1})
 >>> subseq_distance_counts('aaaaaa', 1)
 Counter({1: 5, 2: 4, 3: 3, 4: 2, 5: 1})
 """
 positions = defaultdict(list) # List of positions where each subsequence appears.
 for i in range(len(seq) - k + 1):
 positions[seq[i:i + k]].append(i)
 return Counter(j - i
 for p in positions.itervalues()
 for i, j in combinations(p, 2))