I'm a new programmer and I'm periodically going into the Intro To Algorithms CLRS textbook and trying to translate pseudocode into Ruby for skill practice. This is my implementation/translation of insertion sort.
I'm looking for feedback on readability as well as efficiency.
Pseudocode is found on page 17 of Intro To Algorithms Second Edition just in case you happen to have it lying around, more interested in the feedback just on the code itself.
class InsertionSort
attr_accessor :arr
def initialize(arr)
@arr = arr
end
def sort
key = 1
to_the_left = key - 1
while key < self.arr.length
if arr[key] < arr[to_the_left]
mover_key = key
left_ele_moving_right = to_the_left
until arr[mover_key] > arr[left_ele_moving_right] || left_ele_moving_right < 0
arr[left_ele_moving_right], arr[mover_key] =
arr[mover_key], arr[left_ele_moving_right]
mover_key -= 1
left_ele_moving_right -= 1
end
end
key += 1
to_the_left += 1
end
self.arr
end
end
```
2 Answers 2
Consistency
Sometimes you are using 4 spaces for indentation, sometimes 2. Sometimes you are using vertical whitespace (an empty line) before a method definition, sometimes you don't. Sometimes you have whitespace at the end of a line, sometimes you don't. In one case, you indent a couple of lines just for no discernible reason.
You should choose one style and stick with it. If you are editing some existing code, you should adapt your style to be the same as the existing code. If you are part of a team, you should adapt your style to match the rest of the team.
Most communities have developed standardized community style guides. In Ruby, there are multiple such style guides. They all agree on the basics (e.g. indentation is 2 spaces), but they might disagree on more specific points (single quotes or double quotes).
In general, if you use two different ways to write the exact same thing, the reader will think that you want to convey a message with that. So, you should only use two different ways of writing the same thing IFF you actually want to convey some extra information.
For example, some people always use parentheses for defining and calling purely functional side-effect free methods, and never use parentheses for defining and calling impure methods. That is a good reason to use two different styles (parentheses and no parentheses) for doing the same thing (defining methods).
Indentation
The community standard for indentation is 2 spaces. For example, your first two lines should look like this:
class InsertionSort
attr_accessor :arr
Instead of this:
class InsertionSort
attr_accessor :arr
Furthermore, you should only indent in places where there is a logically nested structure, e.g. a method definition inside a class definition, the body of the method definition, the body of a block, etc. An exception is a continuation of the previous line, this should also be indented.
Here, you are indenting for no apparent reason:
arr[left_ele_moving_right], arr[mover_key] =
arr[mover_key], arr[left_ele_moving_right]
mover_key -= 1
left_ele_moving_right -= 1
On the other hand, in the same place, you should indent this:
arr[left_ele_moving_right], arr[mover_key] =
arr[mover_key], arr[left_ele_moving_right]
To make it clear that the second line is a continuation of the first.
No whitespace at the end of the line
There should be no whitespace at the end of the line. For example, this line:
if arr[key] < arr[to_the_left]
Has a space at the end. That shouldn't be there. And there are two more lines like this.
Also, there are multiple "empty" lines that actually contain only spaces. Again, those shouldn't be there.
No empty line at the end of the file
Your file should end with a single newline. There should not be an extra empty line at the end of the file.
Vertical whitespace
There should be a blank line after every "logical" break. In particular, there should be a blank line after the attr_accessor
:
attr_accessor :arr
def initialize(arr)
On the other hand, there should be no empty line directly after the if
and the until
.
Otherwise, your use of vertical whitespace looks good! Logical steps in the method are clearly separated by empty lines, the code looks "light", not all bunched together with "room to breathe". I like it, well done!
self
is the implicit receiver
In Ruby, if you don't explicitly specify the receiver of a message send, the receiver is self
. In idiomatic code, you should never explicitly specify the receiver self
, unless it is absolutely necessary.
Right now, I can only think of two cases, where it would be necessary:
- To use an attribute writer method:
self.foo = bar
. - To use an operator:
self + foo
So, this:
self.arr
should just be:
arr
Same here:
while key < self.arr.length
should be
while key < arr.length
Prefer predicate methods over relational operators
You should prefer "speaking" predicate methods such as Numeric#negative?
over relational operators such as < 0
.
So, this:
left_ele_moving_right < 0
should be
left_ele_moving_right.negative?
Code Formatting
If possible, you should set your editor or IDE to automatically format your code when you type, when you paste, and when you save, and set up your version control system to automatically format your commit when you push, as well as set up your CI system to reject code that is not correctly formatted. If not possible, you should seriously consider using a different editor or IDE, version control system, or CI system.
Here's the result of your code, when I simply paste it into my editor, without doing anything else:
class InsertionSort
attr_accessor :arr
def initialize(arr)
@arr = arr
end
def sort
key = 1
to_the_left = key - 1
while key < arr.length
if arr[key] < arr[to_the_left]
mover_key = key
left_ele_moving_right = to_the_left
until arr[mover_key] > arr[left_ele_moving_right] || left_ele_moving_right < 0
arr[left_ele_moving_right], arr[mover_key] =
arr[mover_key], arr[left_ele_moving_right]
mover_key -= 1
left_ele_moving_right -= 1
end
end
key += 1
to_the_left += 1
end
arr
end
end
As you can see, simply copying your code into my editor, the editor corrected every single thing I wrote above except one.
Linting
You should run some sort of linter or static analyzer on your code. Rubocop is a popular one, but there are others.
Rubocop was able to detect all of the style violations I pointed out above (plus some more), and also was able to autocorrect all of the ones I listed.
Let me repeat that: I have just spent two pages pointing out how to correct tons of stuff that you can actually correct within milliseconds at the push of a button. I have set up my editor such that it automatically runs Rubocop with auto-fix as soon as I hit "save".
In particular, running Rubocop on your code, it detects 26 offenses, of which it can automatically correct 23.
Here's what the result of the auto-fix looks like:
class InsertionSort
attr_accessor :arr
def initialize(arr)
@arr = arr
end
def sort
key = 1
to_the_left = key - 1
while key < arr.length
if arr[key] < arr[to_the_left]
mover_key = key
left_ele_moving_right = to_the_left
until arr[mover_key] > arr[left_ele_moving_right] || left_ele_moving_right.negative?
arr[left_ele_moving_right], arr[mover_key] =
arr[mover_key], arr[left_ele_moving_right]
mover_key -= 1
left_ele_moving_right -= 1
end
end
key += 1
to_the_left += 1
end
arr
end
end
And here are the offenses that Rubocop could not automatically correct:
Inspecting 1 file
C
Offenses:
insertion_sort.rb:1:1: C: Style/Documentation: Missing top-level class documentation comment.
class InsertionSort
^^^^^
insertion_sort.rb:8:3: C: Metrics/AbcSize: Assignment Branch Condition size for sort is too high. [<10, 21, 7> 24.29/17]
def sort ...
^^^^^^^^
insertion_sort.rb:8:3: C: Metrics/MethodLength: Method has too many lines. [17/10]
def sort ...
^^^^^^^^
1 file inspected, 3 offenses detected
Similar to Code Formatting, it is a good idea to set up your tools such that the linter is automatically run when you paste code, edit code, save code, commit code, or build your project, and that passing the linter is a criterium for your CI pipeline.
attr_reader
vs. attr_accessor
You are never writing to arr
, so it should probably be a Module#attr_reader
instead of a Module#attr_accessor
:
attr_reader :arr
Access Restrictions
arr
is not intended to be used by other objects. In fact, it shouldn't be used by other objects! It is private internal state of the sorter object. Therefore, it should not be part of the public API, it should be private
:
private attr_reader :arr
Iterators
In Ruby, you almost never use loops. You would normally use at least an low-level iterator such as Kernel#loop
, #each
, or Integer#times
. Really, you want to use higher-level iterators such as Enumerable#map
, Enumerable#select
, Enumerable#group_by
, Enumerable#flat_map
, Enumerable#inject
, etc.
In this case, we are going to use Integer#times
for the outer loop.
def sort
arr.length.times do |key|
to_the_left = key - 1
if arr[key] < arr[to_the_left]
mover_key = key
left_ele_moving_right = to_the_left
until arr[mover_key] > arr[left_ele_moving_right] || left_ele_moving_right.negative?
arr[left_ele_moving_right], arr[mover_key] =
arr[mover_key], arr[left_ele_moving_right]
mover_key -= 1
left_ele_moving_right -= 1
end
end
end
arr
end
Guard clauses
If you have a case where an entire method or block is wrapped in a conditional, you can replace that with a "guard clause" and reduce the level of nesting.
E.g. this:
def something
if foo
bar
baz
quux
else
42
end
end
can become this:
def something
return 42 unless foo
bar
baz
quux
end
After the above change, there is now an opportunity to do that in your code:
def sort
arr.length.times do |key|
to_the_left = key - 1
next unless arr[key] < arr[to_the_left]
mover_key = key
left_ele_moving_right = to_the_left
until arr[mover_key] > arr[left_ele_moving_right] || left_ele_moving_right.negative?
arr[left_ele_moving_right], arr[mover_key] =
arr[mover_key], arr[left_ele_moving_right]
mover_key -= 1
left_ele_moving_right -= 1
end
end
arr
end
Objects
I don't see a particular reason why insertion sort should be an object. It doesn't have any state (having the array to be sorted, as you do, as state is somewhat weird), and it only has a simple single behavior. It would make more sense for a sorting algorithm to be an object if it carries some complex state that persists between different runs of the algorithm.
In a language which supports procedures or functions, this should probably be a function.
In Ruby, I would make it maybe a singleton method of a module, or an instance method of Array
. However, I would not name it sort
because that method already exists. I would also not name it insertion_sort
because that is an implementation detail. I think stable_sort
would be a good name, because the fact that insertion sort is stable (if implemented correctly) is an important distinguishing feature that sets it apart from the default sort
in Ruby, which is not stable.
So, either something like this:
module Sort
module_function def stable_sort(arr)
# ...
end
end
or something like this:
class Array
def stable_sort
length.times do |key|
# ...
end
end
end
Actually, I am not a big fan of monkey-patching core classes. At least put the method in a separate mixin, so that it is easier to trace where the code is coming from:
module StableSortExtension
def stable_sort
length.times do |key|
# ...
end
end
end
class Array
include StableSortExtension
end
But even better would be to make it a Refinement:
module StableSort
module StableSortExtension
def stable_sort
length.times do |key|
to_the_left = key - 1
next unless self[key] < self[to_the_left]
mover_key = key
left_ele_moving_right = to_the_left
while self[mover_key] <= self[left_ele_moving_right] && left_ele_moving_right >= 0
self[left_ele_moving_right], self[mover_key] =
self[mover_key], self[left_ele_moving_right]
mover_key -= 1
left_ele_moving_right -= 1
end
end
end
end
refine Array do
include StableSortExtension
end
end
Now, the Array#stable_sort
method will only exist for code which explicitly activates the Refinement with
using StableSort
Observe:
ary = [5, 4, 6, 3, 7, 2, 8, 1, 9, 0, 10]
ary.stable_sort
# undefined method `stable_sort' for [5, 4, 6, 3, 7, 2, 8, 1, 9, 0, 10]:Array (NoMethodError)
using StableSort
ary.stable_sort
p ary
# [0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10]
However, I would not do it this way. This stable_sort
method mutates its receiver, which is generally not the case for other sort
methods in Ruby, and thus very surprising to people.
So, the method should at least dup
the receiver and perform its mutations on that duplicate, then return it.
Which brings me to my next point:
Mutation
Your method mutates its argument. That is an absolute no-go. You should never-ever break someone else's toy. An argument that is handed to you, is sacred.
You can fix this by simply adding something like arr = arr.dup
at the beginning of the method (or arr = dup
for the case where you want to make it an instance method). But I believe we can improve the code more generally.
More generally, your code looks more like FORTRAN, Pascal, or C, with lots of mutation, looping, manual fiddling with indices.
Insertion sort can actually be expressed very elegantly without mutation and without looping, which will not only simplify the code, but also get rid of the mutation problem.
Something like this:
module Enumerable
protected def ins(element)
return [element] if count.zero?
head = first
return [element] + self if element <= head
[head] + drop(1).ins(element)
end
def stable_sort
return [] if count.zero?
drop(1).stable_sort.ins(first)
end
end
This version does not mutate anything, and it works with all Enumerable
s (including Enumerator
s), not just Array
s. (It does not work with infinite Enumerator
s, though.)
Frozen string literals
Immutable data structures and purely functional code are always preferred, unless mutability and side-effects are required for clarity or performance. In Ruby, strings are always mutable, but there is a magic comment you can add to your files (also available as a command-line option for the Ruby engine), which will automatically make all literal strings immutable:
# frozen_string_literal: true
It is generally preferred to add this comment to all your files.
Note: I put this at the very end, because there actually are no String
literals in your code, so it doesn't matter. Rubocop and some other Linters will complain about this regardless. It's your choice whether you want to get into the habit of always adding it, regardless of whether you use String
literals in the code or not. Personally, I always add it, just in case I add a String
literal later on.
-
\$\begingroup\$ What is the purpose of StableSortExtension? \$\endgroup\$Samuel Samuelson– Samuel Samuelson2021年01月10日 06:12:36 +00:00Commented Jan 10, 2021 at 6:12
-
\$\begingroup\$ As I wrote in my answer: "put the method in a separate mixin, so that it is easier to trace where the code is coming from". Now, if you do something like
Array.ancestors
,StableSortExtension
will show up in the inheritance tree, and if you are wondering where thisstable_sort
method that you never heard about and never noticed before comes from, you can probably guess it came from there. \$\endgroup\$Jörg W Mittag– Jörg W Mittag2021年01月10日 08:09:23 +00:00Commented Jan 10, 2021 at 8:09
Insertion sort is undoubtedly the easiest-to-understand and easiest-to-code sorting method, requiring but one line of Ruby code. It seems overkill in the extreme to define a class with instance variables and multiple instance methods. I have written a single method insertion_sort
.
If desired it could be written as a module method in a module that could be required as needed. Assuming you wished to compare sorting methods (considering that the more efficient built-in methods Array#sort, Enumerable#sort and Enumerable#sort_by would be the methods of choice for general sorting needs) you might create a module such as the following.
module SortingMethods
def self.insertion_sort(arr)
...
end
def self.bubble_sort(arr)
...
end
...
end
The method would then be invoked
sorted = SortingMethods.insertion_sort(arr)
This is analogous to the use of the built-in Math module, which contains module methods only, methods that would serve as functions (which are not invoked on receivers) in other languages.
The method insertion_sort
can be written is as follows.
def insertion_sort(arr)
arr.each_with_object([]) do |o, sorted|
sorted.insert(sorted.bsearch_index { |e| e >= o } || sorted.size, o)
end
end
insertion_sort [8, 29, 10, 20, 26, 5, 20, 2]
#=> [2, 5, 8, 10, 20, 20, 26, 29]
insertion_sort ["dog", "cat", "pig", "owl", "cow", "bat"]
#=> ["bat", "cat", "cow", "dog", "owl", "pig"]
This relies on two methods.
Array#insert
inserts a given object into self
before the element at a given index. If that index equals the size of (the array that is) self
, the element is inserted after the last element of self
.
Array#bsearch_index returns the index of the element of sorted
before which a given object is to be inserted in order to keep sorted
sorted. nil
is returned if the given object is greater than the last element of sorted
(hence the need for || sorted.size
). See also Array#bsearch for details.
Note that the argument is not mutated (modified).
The easiest way to demonstrate the calculations is to run the code after having salted it with puts
statements. I will write the code in a simpler way that effectively performs the same operations.
arr = [8, 29, 10, 20, 26, 5, 20, 2]
sorted = []
arr.each do |o|
puts "\no = #{o}, sorted = #{sorted}"
idx = sorted.bsearch_index { |e| e >= o }
puts "idx from bsearch initially = #{ idx.nil? ? 'nil' : idx }"
idx = idx || sorted.size
puts "idx after 'idx || sorted' = #{idx}"
sorted.insert(idx, o)
puts "sorted after sorted.insert(#{idx}, #{o}) = #{sorted}"
end
o = 8, sorted = []
idx from bsearch initially = nil
idx after 'idx || sorted' = 0
sorted after sorted.insert(0, 8) = [8]
o = 29, sorted = [8]
idx from bsearch initially = nil
idx after 'idx || sorted' = 1
sorted after sorted.insert(1, 29) = [8, 29]
o = 10, sorted = [8, 29]
idx from bsearch initially = 1
idx after 'idx || sorted' = 1
sorted after sorted.insert(1, 10) = [8, 10, 29]
o = 20, sorted = [8, 10, 29]
idx from bsearch initially = 2
idx after 'idx || sorted' = 2
sorted after sorted.insert(2, 20) = [8, 10, 20, 29]
o = 26, sorted = [8, 10, 20, 29]
idx from bsearch initially = 3
idx after 'idx || sorted' = 3
sorted after sorted.insert(3, 26) = [8, 10, 20, 26, 29]
o = 5, sorted = [8, 10, 20, 26, 29]
idx from bsearch initially = 0
idx after 'idx || sorted' = 0
sorted after sorted.insert(0, 5) = [5, 8, 10, 20, 26, 29]
o = 20, sorted = [5, 8, 10, 20, 26, 29]
idx from bsearch initially = 3
idx after 'idx || sorted' = 3
sorted after sorted.insert(3, 20) = [5, 8, 10, 20, 20, 26, 29]
o = 2, sorted = [5, 8, 10, 20, 20, 26, 29]
idx from bsearch initially = 0
idx after 'idx || sorted' = 0
sorted after sorted.insert(0, 2) = [2, 5, 8, 10, 20, 20, 26, 29]
-
\$\begingroup\$ It was in reference to a different answer to this review. I've deleted it to avoid any further confusion. \$\endgroup\$Samuel Samuelson– Samuel Samuelson2021年11月12日 21:16:04 +00:00Commented Nov 12, 2021 at 21:16