I've been trying to internalize some of the basic sorting algorithms recently, and to do so I've been looking at their wikipedia pages to refresh myself with how they work, and then coding my own "version" of the algorithm from there.
I did this with MergeSort recently, and I got an algorithm that for what I can tell works but I'm not sure it runs in \$O(n \log{n})\$ time like MergeSort does.
def mergeSortAlgorithm(x):
listOfLists = []
for i in range(len(x)):
temp = [x[i]]
listOfLists.append(temp) #I first put every item in the array into
#its own list, within a larger list -> divide step
#so long as there is more than one sorted list, then we need to keep merging
while(len(listOfLists) != 1):
j = 0
#for every pair of lists in the list of lists, we need to merge them
while(j < len(listOfLists)-1):
tempList = merge(listOfLists[j], listOfLists[j+1])
listOfLists[j] = tempList
del listOfLists[j+1] #I shift the new list to index j, and delete the second list from the now-merged pair at index j+1
j = j+1 #increment the value of j, basically a counter
print(listOfLists)
print(listOfLists[0], "is sorted!")
def merge(a, b): #function to merge two lists in order
newList = []
count1, count2 = 0, 0
#basically, walk through each list, adding whichever element is smallest
while((count1 < len(a)) and (count2 < len(b))):
if(a[count1] > b[count2]):
newList.append(b[count2])
count2 = count2 + 1 #update the counter along b
elif(a[count1] < b[count2]):
newList.append(a[count1])
count1 = count1 + 1 #update the counter along a
elif(a[count1] == b[count2]): #if they're equal, add the value from a first
newList.append(a[count1])
newList.append(b[count2])
count1, count2 = count1 + 1, count2 + 1 #update the counter on each
if(count1 < len(a)): #if the while loop exited with left-over values in a, add them
for f in range(count1, len(a)):
newList.append(a[f])
elif(count2 < len(b)): #do the same for b - the values are already in order so you're good
for k in range(count2, len(b)):
newList.append(b[k])
return newList
The algorithm correctly sorts what I give it, but I'm thinking that the runtime is more like \$O(\log{n} * \log{n} * n)\$ because the outer while
loop in mergeSortAlgorithm()
will do \$log(n)\$ passes, as will the inner while
loop, and the merge()
algorithm will take in total \$n\$ operations.
Is that right? If so, in what respect does my algorithm differentiate from MergeSort? (i.e. where am I going wrong?)
2 Answers 2
The approach may lose the stability. If list
a
containsa1, a2
, and listb
containsb1, b2
and all of them are compared equal, the resulting list will have aa1, b1, a2, b2
, whereas the stable result should bea1, a2, b1, b2
. The fix is to not special casea[count1] == b[count2]
:if b[count2] < a[count1]: merge from b else: merge from a
Your big-O worries are unfounded. At the \$k\$'th iteration there remain \$O(\frac{n}{2^k})\$ sublists, but the length of each sublist is \$O(2^k)$\,ドル so the entire body of the inner loop executes in \$O(n)\$ time regardless of the iteration; and as you noticed there are \$O(\log{n})\$ of them.
-
\$\begingroup\$ Are you saying that the body of the inner while loop executes in O(n) time, and that there are O(log n) iterations of the inner loop? Or? Because if that's the case, I'm still confused as to how the runtime is O(n log n) because the outer while loop has to be accounted for. OR - are you saying that the entire body of the outer while loop (including its inner loop) will execute in O(n) time, with O(log n) iterations of the outer while loop, producing a total runtime of O(n log n)? \$\endgroup\$dgamz– dgamz2017年06月21日 20:43:47 +00:00Commented Jun 21, 2017 at 20:43
-
1\$\begingroup\$ @dgamz No. The body of outer loop takes \$O(n)\$ time to execute, and it is executed \$O(\log{n})\$ times. The entire complexity is therefore \$O(n\log{n})\$. \$\endgroup\$vnp– vnp2017年06月21日 20:46:45 +00:00Commented Jun 21, 2017 at 20:46
Advice 1
while(len(listOfLists) != 1):
j = 0
...
should, according to Python conventions, be
while len(listOfLists) != 1:
j = 0
...
Advice 2
print(listOfLists)
Printing to I/O from an algorithm is a no-no. Remove them.
Note 1
According to this Gist your implementation is both slow and incorrect.
Explore related questions
See similar questions with these tags.