I'm happy to hear thoughts and ideas on structure/performance/testing/whatever and multi-threading, which I haven't gotten into yet with Python.
Full code here. Assignment file and a test file included in the folder.
Excerpt:
import unittest
import sys
def Merge_and_CountSplitInv(A,start_index,len_1st,len_2nd):
"""
Merges two similar length pre-sorted sub-slices of array A while counting
inversions.
Input:
array A, start_index (location of first sub-slice)
len_1st,len_2nd (subslice_lengths).
Output: inversions (values where i<j, but A[i]>A[j])
Side Effect: two consecutive A subsections are sorted.
"""
inversions=0
temp_array=[]
index_1st=start_index
index_2nd=start_index+len_1st
#while both indices are in range
while ((index_1st < start_index+len_1st) and
(index_2nd < start_index+len_1st+len_2nd)):
#place smaller value in temp_array, increase inversions
if A[index_1st]<=A[index_2nd]:
temp_array.append(A[index_1st])
index_1st += 1
else:
temp_array.append(A[index_2nd])
index_2nd += 1
inversions+=(start_index+len_1st-index_1st)
#one index is out of range:
if index_2nd == start_index+len_1st+len_2nd: # 2nd index complete
#add leftover 1st half values to temp_array
#before destroying them by the write process.
temp_array.extend(A[index_1st:start_index+len_1st])
else: # 1st index complete
pass #no need to write over leftovers in 2nd sub-array
#write temp_array over A[start_index:start_index+len(temp_array)]
write_index=start_index
for value in temp_array:
A[write_index]=value
write_index +=1
return inversions
def Sort_and_Count(A, n, start_index=0):
"""
Input: array A, length n, start_index(default=0)
Output: inversions (values where i<j, but A[i]>A[j])
Side Effect: A is sorted.
"""
if n==1:
return 0 # base case, array of length 1
else:
# Input: 1st and 2nd halves of current sub-array
len_1st=n//2
len_2nd=n//2+n%2
x=Sort_and_Count(A, len_1st, start_index)
y=Sort_and_Count(A, len_2nd, start_index+len_1st)
# merge the (newly sorted) half-sized sub-arrays
z=Merge_and_CountSplitInv(A,start_index,len_1st,len_2nd)
return x+y+z
class TestSort_and_Count(unittest.TestCase):
"""
Basic test class
"""
def test_Sort_and_Count(self):
A=[1]
res1 = Sort_and_Count(A,len(A)) # single element
self.assertEqual(res1, 0)
B=[1,3,5,2,4,6]
res2 = Sort_and_Count(B,len(B)) # even length
self.assertEqual(res2, 3)
C=[1,3,5,2,4,6,3]
res3 = Sort_and_Count(C,len(C)) # odd length, duplicate value
self.assertEqual(res3, 6)
def main(file_name):
#TODO: take values from file and run sort_and_count
with open(file_name) as fh:
if file_name[:4] == 'test':
print(fh.readline()) # get rid of first answer line from debug file
A = list(map(int, [line.strip() for line in fh]))
print(Sort_and_Count(A,len(A)))
if __name__ == '__main__':
#TODO: working with argv to accept file input
if len(sys.argv) > 2:
sys.exit("Usage: inv_count <file_name> (leave empty for testing)")
if len(sys.argv) == 1:
print("No filename input, testing...")
unittest.main()
# else: argv == 2
main(sys.argv[1])
1 Answer 1
1. Bug
If you pass the empty list to Sort_and_Count
, it recurses until it runs out of stack:
>>> Sort_and_Count([], 0)
Traceback (most recent call last):
File "cr107928.py", line 67, in test_Sort_and_Count
Sort_and_Count([],0)
... many lines omitted ...
File "cr107928.py", line 49, in Sort_and_Count
if n==1:
RecursionError: maximum recursion depth exceeded in comparison
2. Review
The code omits spaces around operators. This makes it hard to read. It is better to follow the Python style guide (PEP8):
Always surround these binary operators with a single space on either side: assignment (
=
), augmented assignment (+=
,-=
etc.), comparisons (==
,<
,>
,!=
,<>
,<=
,>=
,in
,not in
,is
,is not
), Booleans (and
,or
,not
). [...] If operators with different priorities are used, consider adding whitespace around the operators with the lowest priority(ies).The names do not follow the Python style guide. This recommends:
Class names should normally use the CapWords convention.
so
TestSortAndCount
would be better thanTestSort_and_Count
.Function names should be lowercase, with words separated by underscores as necessary to improve readability.
so
merge_and_count_inversions
would be better thanMerge_and_CountSplitInv
.It's not compulsory to follow the Python style guide, but it makes it easier for other Python programmers to read your code.
Python's documentation uses the term "sequence" to refer to indexable collections, not "array". There's less risk of confusion if you stick to the standard term.
The docstring says, "Merges two similar length pre-sorted sub-slices", but in fact the function works fine even if the subsequences are not of similar length. I know that when called from
Sort_and_Count
the subsequences will be equal in length or differ by one, but it's better to document the actual behaviour of the function.Instead of
index_1st
andindex_2nd
, I would usei
andj
. These variables are used many times, so using short names reduces the amount of code and make it easier to read.The arguments
len_1st
andlen_2nd
are only used within the expressionsstart_index+len_1st
andstart_index+len_1st+len_2nd
. This suggests that the arguments to the function should bestart
,middle
andend
instead.Instead of updating
A
one element at a time:write_index=start_index for value in temp_array: A[write_index]=value write_index +=1
update all elements at once using a slice assignment:
A[start_index:start_index + len(temp_array)] = temp_array
This is efficient (in CPython, anyway) because the replacement is the same length as the slice being assigned.
In
Sort_and_Count
, the caller is expected to pass in the length of the sequence. It would be simpler ifSort_and_Count
calledlen
instead. This could easily be done using a helper function; see updated code below.The test case doesn't try the empty list (see "Bug" above).
The test cases only test that the inversion count is correct, but don't actually test that the sequence is sorted! You need something like:
self.assertEqual(A, sorted(A))
The test cases have hard-coded counts of inversions. But it is easy to count the inversions in a sequence, using
itertools.combinations
:sum(j < i for i, j in combinations(seq, 2))
This is inefficient (it's \$ Θ(n^2) \$), but clearly correct, and so a good comparison for the efficient, but not clearly correct, counts produced by the function under test.
The test cases are very repetitive. This could be avoided with a helper method, something like this:
def check(self, seq): expected = sum(j < i for i, j in combinations(seq, 2)) found = sort_and_count_inversions(seq) self.assertEqual(expected, found) self.assertEqual(seq, sorted(seq))
The test cases are not very thorough. You could easily test many more cases using random test generation:
for _ in range(100): self.check([randrange(100) for _ in range(randrange(100))])
3. Revised code
from itertools import combinations
from random import randrange
from unittest import TestCase
def merge_and_count_inversions(seq, start, middle, end):
"""Merge two adjacent sorted sub-sequences of seq and count
inversions.
Arguments:
seq -- sequence to sort
start -- index of beginning of first sorted subsequence
middle -- end of first, and beginning of second, sorted subsequence
end -- end of second sorted subsequence
Result: number of inversions (cases where i < j, but seq[i] > seq[j]).
Side effect: seq is sorted between start and end.
"""
assert 0 <= start < middle < end <= len(seq)
inversions = 0
temp = []
i = start
j = middle
while i < middle and j < end:
if seq[i] <= seq[j]:
temp.append(seq[i])
i += 1
else:
temp.append(seq[j])
j += 1
inversions += middle - i
if j == end:
# Second subsequence is complete: process remainder of first.
temp.extend(seq[i:middle])
else:
# First subsequence is complete: no need to process
# remaininder of second, since it does not move.
pass
# Insert sorted results into original sequence.
seq[start:start+len(temp)] = temp
return inversions
def sort_and_count_inversions(seq):
"""Sort a sequence and count inversions.
Argument: seq -- a sequence
Result: number of inversions (cases where i < j, but seq[i] > seq[j]).
Side effect: seq is sorted.
"""
def sort_and_count(seq, start, end):
if end - start < 2:
return 0
middle = (start + end) // 2
return (sort_and_count(seq, start, middle)
+ sort_and_count(seq, middle, end)
+ merge_and_count_inversions(seq, start, middle, end))
return sort_and_count(seq, 0, len(seq))
class TestSortAndCount(TestCase):
def check(self, seq):
expected = sum(j < i for i, j in combinations(seq, 2))
found = sort_and_count_inversions(seq)
self.assertEqual(expected, found)
self.assertEqual(seq, sorted(seq))
def runTest(self):
self.check([]) # empty sequence
self.check([1]) # single element
self.check([1, 3, 5, 2, 4, 6]) # even length
self.check([1, 3, 5, 2, 4, 6, 3]) # odd length, duplicate value
for _ in range(100): # random test cases
self.check([randrange(100) for _ in range(randrange(100))])
-
\$\begingroup\$ I riffed over all the notes and will go over them again later - esp. the testing notes. Nice find on the bug +1 just for that. On a quick note, I wanted to use slice replacement, but was worried it takes O(n), which breaks the algorithm. \$\endgroup\$ofer.sheffer– ofer.sheffer2015年10月20日 22:17:49 +00:00Commented Oct 20, 2015 at 22:17
-
\$\begingroup\$ The implementation of slice assignment in CPython checks whether the replacement is the same size as the slice being replaced, and if so it does the efficient thing. See
list_ass_slice
in listobject.c. \$\endgroup\$Gareth Rees– Gareth Rees2015年10月20日 22:25:50 +00:00Commented Oct 20, 2015 at 22:25 -
\$\begingroup\$ 3. the term "sequence" - will try to keep in mind. \$\endgroup\$ofer.sheffer– ofer.sheffer2015年10月25日 15:13:19 +00:00Commented Oct 25, 2015 at 15:13
-
\$\begingroup\$ 8. It would be simpler if Sort_and_Count called len instead. This could easily be done using a helper function; -- very nice workaround. \$\endgroup\$ofer.sheffer– ofer.sheffer2015年10月27日 10:30:45 +00:00Commented Oct 27, 2015 at 10:30
Explore related questions
See similar questions with these tags.