I am currently going through and implementing basic algorithms and data structures as I feel I didn't learn enough about this during my Data Structures and Algorithms unit. So far I've done the following algorithms, yet to get to data structures.
Sorting
- Bucket
import random
#toSort = [round(random.uniform(0.10, 0.99), 2) for _ in range(20)]
toSort = [round(random.uniform(0.10, 0.99), 2) for i in range(10, 0, -1)]
print(toSort)
def bucket_sort(array):
buckets = [[] for i in range(len(array))]
sorted_buckets = []
for index in range(len(array)):
bucket_num = len(array) * array[index]
print(bucket_num)
buckets[int(bucket_num)].append(array[index])
for bucket in buckets:
insertion_sort(bucket)
for bucket in buckets:
if len(bucket) == 0:
continue
elif len(bucket) > 1:
for num in bucket:
sorted_buckets.append(num)
else:
sorted_buckets.append(bucket[0])
return sorted_buckets
def insertion_sort(array):
for unsorted_val in range(1, len(array)):
val = array[unsorted_val]
val_index = unsorted_val
while val_index > 0 and array[val_index - 1] > val:
array[val_index] = array[val_index - 1]
val_index -= 1
array[val_index] = val
toSort = bucket_sort(toSort)
- Insertion
- Merge
- Quick
- Selection
Searching
- Binary
- Linear
Merging
- ArrayMerge
I was planning to include the code for the rest of the algorithms I've implemented but I have found code reviews for those already so I do not want to double up on the questions.
However, I did want to ask this on the meta page but I didn't have enough reputation to post. If I were to go through and update my code with the suggestions in the other code reviews, would it be okay for me to post here with my implementation and ask for further review?
1 Answer 1
You have implemented an unusual bucket sort. First, the logic to compute the
bucket number makes assumptions about the values themselves and will fail on
many types of numbers (for example, positive integers). And second, if N
is the size
of the input list, you are creating N
buckets. Typically, bucket sort uses a
number of buckets that is smaller than N
. A common approach is to make an initial pass
over the values to find their min and max. Then each bucket will have a
span of (MAX - MIN) / K
, where K
is the number of buckets
(which might be set either by the caller or by the code based on N
).
For any x
value, I think its bucket index would be
min(K - 1, int((x - MIN) / SPAN))
(you should double check that).
My other comments relate to code readability and simplicity.
Use convenience variables to eliminate repeated calculations, such as
len(array)
. If you need it multiple times, create a variable and lighten the
visual weight of your code.
Organize your code into commented "paragraphs" -- one paragraph per small step in the logic of your algorithm (shown below).
If you need to iterate over values in a collection, do it directly, not
indirectly via indexes. Use for x in xs
not for i in range(len(xs)
. If the
algorithm requires both values an indexes, use enumerate()
. Only iterate
over indexes if you don't actually need the values or if the algorithm's
readability is simpler that way (for example, in your
insertion_sort()
function).
Your code to reassemble the sorted buckets it overly complicated --
specifically, the size of the buckets is not important. The work can be done
either with a list comprehension (as shown) or the equivalent use of 2 for
loops.
Consider using a naming convention that I learned from functional programming:
xs
for a collection of things and x
for one thing.
Its extendable (ys
and y
, zs
and z
, etc) and
it works quite nicely in
generic situations like this where we know nothing about the substantive
meaning of the values. This also lightens up the code weight -- enhancing
readability without any loss of understandability.
The variable naming in insertion_sort()
is backwards. You iterate over the
indexes but call each index an unsorted_val
. If it's an index, just call it
index
or, even better, i
(a convention everyone understands). Then if you
also need the value, get it with xs[i]
. Again, notice how these short
variable naming conventions can often enhance readability -- especially if the
scope is small and well defined.
Finally, it is unusual to modify an index value during an
iteration over indexes, as you do in insertion_sort()
. It forces
your reader to puzzle things over. I've seen more intuitive insertion
sort implementations. For comparison, see the this pseudo-code.
Note how the use of "swap" in that alternative implementation really
helps the reader understand what's going on. Either adjust your
code or add some guidance to your reader.
Here's an edit focused on the readability and simplicity issues only:
def bucket_sort(xs):
# Convenience variables.
N = len(xs)
# Put values into buckets.
buckets = [[] for _ in range(N)]
for x in xs:
i = int(N * x)
buckets[i].append(x)
# Sort each bucket.
#
# To keep hammering the point, `b` is a better variable
# name than `bucket` within this tiny, well-defined context.
for b in buckets:
insertion_sort(b)
# Return the sorted values.
return [
x
for b in buckets
for x in b
]
def insertion_sort(xs):
# Only stylistic edits here.
for i in range(1, len(xs)):
x = xs[i]
while i > 0 and xs[i - 1] > x:
xs[i] = xs[i - 1]
i -= 1
xs[i] = x
-
\$\begingroup\$ Hi thank you for the detailed reply, really appreciate it! I like the idea of using x in xs notation for sets of data. I also like organising the code into paragraphs as it gives me a better understanding of the algorithm itself. Would you be able to elaborate a little on how I can calculate K from N? and would you recommend having K just given to the function instead as it allows more flexibility to the caller? \$\endgroup\$Vehicular IT– Vehicular IT2020年09月10日 06:30:51 +00:00Commented Sep 10, 2020 at 6:30
-
\$\begingroup\$ @VehicularIT If you're trying to making this function truly general purpose and robust, you probably should do some internet searching to see what the computer-science experts advise regarding bucket sizing. However, if you're just wanting a practical approach and you're mostly doing this for basic algorithm study, you could take a simpler approach. For example, make
k
an optional function argument, and then set it with a very basic rule based onN
. Here's one that might not be very good, but it conveys the spirit:k = k or min(N // 3, 10)
. \$\endgroup\$FMc– FMc2020年09月10日 13:52:13 +00:00Commented Sep 10, 2020 at 13:52
Explore related questions
See similar questions with these tags.
rest of [my] question
? I didn't ask anything.) \$\endgroup\$