Sorting algorithms and command-line interface

Question 1

I have implemented both some sorting algorithms (namely bubble sort and merge sort), as well as a command-line interface, in Clojure for fun and to test my Clojure abilities (I've tampered with Clojure for a while now, but I'm still quite a beginner).

The code looks alright, but I think the styling of some parts of the code could use improvement (mainly the merge sort implementation, which looks awfully verbose). As well as that, I could use some other improvements in general, such as speed and performance, especially with larger lists.

core.clj:

(ns experiments.core
 (:gen-class)
 (:require [experiments.command-line :as cl]))
(defn bubble-sort [array]
 (loop [ind 0
 sort-arr array
 swaps? 0]
 (if (= ind (dec (count array)))
 (if (not= swaps? 0)
 (bubble-sort sort-arr))
 (if (> (nth sort-arr ind) (nth sort-arr (inc ind)))
 (let
 [temp-arr
 (vec
 (concat
 (subvec sort-arr 0 ind)
 [(nth sort-arr (inc ind))]
 [(nth sort-arr ind)]
 (subvec sort-arr (+ ind 2))))]
 (do
 (println temp-arr)
 (recur (inc ind) temp-arr (inc swaps?))))
 (recur (inc ind) sort-arr swaps?)))))
(defn merge-sort [array]
 (defn split-array [arr]
 (loop [orig arr final []]
 (if (< (count orig) 2)
 (if (= (count orig) 1)
 (conj final orig)
 final)
 (recur
 (subvec orig 2)
 (conj final (subvec orig 0 2))))))
 (defn sort-two [[a b]]
 (loop [arr-a a
 arr-b b
 final []]
 (if
 (or
 (empty? arr-a)
 (empty? arr-b))
 (vec
 (concat
 final
 (if (empty? arr-a) arr-b arr-a)))
 (if (> (first arr-a) (first arr-b))
 (recur
 arr-a
 (vec (rest arr-b))
 (conj final (first arr-b)))
 (recur
 (vec (rest arr-a))
 arr-b
 (conj final (first arr-a)))))))
 (loop
 [sort-arr
 (split-array
 (vec
 (for [a (range (count array))] [(nth array a)])))]
 (if (= (count sort-arr) 1)
 (println (sort-two sort-arr))
 (recur
 (split-array
 (loop [ind 0
 temp-arr sort-arr]
 (println temp-arr)
 (if (= ind (count temp-arr)) temp-arr
 (recur
 (inc ind)
 (vec
 (concat
 (subvec temp-arr 0 ind)
 [(if (= (count (nth temp-arr ind)) 1)
 (nth (nth temp-arr ind) 0)
 (sort-two (nth temp-arr ind)))]
 (subvec temp-arr (inc ind))))))))))))
(defn gen-random-array [length]
 (loop [arr []]
 (if (= (count arr) length) arr
 (let [rand-val (rand-int length)]
 (if (some #{rand-val} arr)
 (recur arr)
 (recur (conj arr rand-val)))))))
(defn init-args []
 (cl/add-arg "-b" "bubble"
 ["Print the procedure of bubble-sorting,"
 "given an arbitrary amount of arguments."]
 (fn
 ([]
 (bubble-sort
 (gen-random-array 10)))
 ([args]
 (bubble-sort
 (vec
 (for [a (range (count args))]
 (read-string (nth args a))))))))
 (cl/add-arg "-m" "merge"
 ["Print the procedure of merge-sorting,"
 "given an arbitrary amount of arguments."]
 (fn
 ([]
 (merge-sort
 (gen-random-array 10)))
 ([args]
 (merge-sort
 (vec
 (for [a (range (count args))]
 (read-string (nth args a))))))))
 (cl/add-arg "-r" "random"
 ["Generate a random array."]
 (fn
 ([args]
 (println
 (gen-random-array
 (read-string (nth args 0))))))))
(defn -main [& args]
 (init-args)
 (cl/parse-arg (vec args)))

command-line.clj:

(ns experiments.command-line
 (:gen-class)
 (:require [clojure.string :as string]))
(def names (atom [["-h" "help"]]))
(def docs (atom [[""]]))
(def funcs (atom [(fn [] ())]))
(defn add-arg [s-name l-name arg-doc func]
 (swap! names conj [s-name l-name])
 (swap! docs conj arg-doc)
 (swap! funcs conj func))
(defn disp-help []
 (println "\nCommands: ")
 (loop [a 1]
 (if (= a (count @names))
 (print "")
 (do
 (println " "
 (nth (nth @names a) 0) "\b,"
 (nth (nth @names a) 1) "\b:\n "
 (string/join "\n "
 (nth @docs a)))
 (println "")
 (recur (+ a 1))))))
(defn parse-arg [args]
 (let [main-arg (nth args 0)
 other-args (subvec args 1)]
 (loop [a 0]
 (if (= a (count @names))
 (println "No match!")
 (if
 (or
 (= main-arg (nth (nth @names a) 0))
 (= main-arg (nth (nth @names a) 1)))
 (if (= a 0)
 (disp-help)
 ((nth @funcs a) other-args))
 (recur (+ a 1)))))))

Question 2

I'm impressed with your implementation assuming that it works. I recently attempted a merge sort, and couldn't figure out how to do it without using clojure.walk. Maybe I'll take a closer look at yours. One suggestion I would make is a basic Clojure idiom to not use defn inside of another defn. Instead, use (let [split-array (fn [x] ...)]...) or (letfn [(split-array [x] ...)]...)

Question 3

Let's deal with bubble-sort.

The if form

 (if (not= swaps? 0)
 (bubble-sort sort-arr))

... has no expression for what happens if the condition fails, hence then returns nil. The above needs to be ...

 (if (not= swaps? 0)
 (bubble-sort sort-arr)
 sort-arr)

This works.

Next, every time you do a swap, you reconstruct an entire vector using concat. It is faster and neater to use assoc to exchange the two neighbouring elements:

(defn bubble-sort [array]
 (loop [ind 0
 sort-arr array
 swaps? 0]
 (if (= ind (dec (count array)))
 (if (zero? swaps?)
 array
 (bubble-sort sort-arr))
 (if (> (nth sort-arr ind) (nth sort-arr (inc ind)))
 (let [temp-arr (assoc sort-arr
 ind (sort-arr (inc ind))
 (inc ind) (sort-arr ind))]
 (recur (inc ind) temp-arr (inc swaps?)))
 (recur (inc ind) sort-arr swaps?)))))

I've also used zero? to tidy up the if we looked at before.

Another couple of improvements:

The bubble-sort function performs a recursive call for each pass through the vector. This limits it to sequences about 10K long, depending on how the JVM is configured. Otherwise you get stack overflow.

The solution is to replace the recursive call (bubble-sort sort-arr) with the equivalent (recur 0 sort-arr 0). We also need to replace the reference to array with one to sort-arr, since array is no longer renewed on every pass. So we replace

(if (zero? swaps?)
 array
 (bubble-sort sort-arr))

with ...

(if (zero? swaps?)
 sort-arr
 (recur 0 sort-arr 0))

This works. Try ...

(-> (range 10000 0 -1) vec bubble-sort)

before and after.

A fundamental mismatch is that bubble sort works by update-in-place, whereas clojure's core data structures are persistent - you can't change them. What you get back from assoc and the like is a modified version of the original which shares most of its structure. This carries a substantial cost in performance.

Since you are not using the persistence of the vector, you can use transients. This reduces the cost of updates considerably - YMMV. To do so,

Use transient to create the transient version.
Replace assoc with assoc!
Use persistent! to recover a persistent version.

We get ...

(defn bubble-sort [array]
 (loop [ind 0
 sort-arr (transient array)
 swaps? 0]
 (if (= ind (dec (count array)))
 (if (zero? swaps?)
 (persistent! sort-arr)
 (recur 0 sort-arr 0))
 (if (> (nth sort-arr ind) (nth sort-arr (inc ind)))
 (let [temp-arr (assoc! sort-arr
 ind (sort-arr (inc ind))
 (inc ind) (sort-arr ind))]
 (recur (inc ind) temp-arr (inc swaps?)))
 (recur (inc ind) sort-arr swaps?)))))

This seems to run a bit quicker, though it's still horribly slow compared to the core sort.

Question 4

Thanks! I would love to see more of the review. I know this might sound annoying, but... can I have more stuff please?

Question 5

Thanks for the full bubble-sort review! I thought I couldn't turn the function into a recursive one using recur. I changed swaps? to a boolean as well, so it's easier to follow. I'm not sure if it's faster overall, though.

Question 6

@Qwerp-Derp recur takes the place of a recursive call, but is implemented by a loop - faster and uses no call stack. You can only use it when the returned value is that of the recursive call itself. This is called tail position. Lucklly, that prevails here.

Question 7

I've revisited this review, and I kinda want reviews for the other sorts. If you have time, could you do it?

Thumbnail Thumbnail 1,6349 silver badges16 bronze badges · Answer 1 · 2017-02-16 12:27:20Z

Let's deal with bubble-sort.

The if form

 (if (not= swaps? 0)
 (bubble-sort sort-arr))

... has no expression for what happens if the condition fails, hence then returns nil. The above needs to be ...

 (if (not= swaps? 0)
 (bubble-sort sort-arr)
 sort-arr)

This works.

Next, every time you do a swap, you reconstruct an entire vector using concat. It is faster and neater to use assoc to exchange the two neighbouring elements:

(defn bubble-sort [array]
 (loop [ind 0
 sort-arr array
 swaps? 0]
 (if (= ind (dec (count array)))
 (if (zero? swaps?)
 array
 (bubble-sort sort-arr))
 (if (> (nth sort-arr ind) (nth sort-arr (inc ind)))
 (let [temp-arr (assoc sort-arr
 ind (sort-arr (inc ind))
 (inc ind) (sort-arr ind))]
 (recur (inc ind) temp-arr (inc swaps?)))
 (recur (inc ind) sort-arr swaps?)))))

I've also used zero? to tidy up the if we looked at before.

Another couple of improvements:

The bubble-sort function performs a recursive call for each pass through the vector. This limits it to sequences about 10K long, depending on how the JVM is configured. Otherwise you get stack overflow.

The solution is to replace the recursive call (bubble-sort sort-arr) with the equivalent (recur 0 sort-arr 0). We also need to replace the reference to array with one to sort-arr, since array is no longer renewed on every pass. So we replace

(if (zero? swaps?)
 array
 (bubble-sort sort-arr))

with ...

(if (zero? swaps?)
 sort-arr
 (recur 0 sort-arr 0))

This works. Try ...

(-> (range 10000 0 -1) vec bubble-sort)

before and after.

A fundamental mismatch is that bubble sort works by update-in-place, whereas clojure's core data structures are persistent - you can't change them. What you get back from assoc and the like is a modified version of the original which shares most of its structure. This carries a substantial cost in performance.

Since you are not using the persistence of the vector, you can use transients. This reduces the cost of updates considerably - YMMV. To do so,

Use transient to create the transient version.
Replace assoc with assoc!
Use persistent! to recover a persistent version.

We get ...

(defn bubble-sort [array]
 (loop [ind 0
 sort-arr (transient array)
 swaps? 0]
 (if (= ind (dec (count array)))
 (if (zero? swaps?)
 (persistent! sort-arr)
 (recur 0 sort-arr 0))
 (if (> (nth sort-arr ind) (nth sort-arr (inc ind)))
 (let [temp-arr (assoc! sort-arr
 ind (sort-arr (inc ind))
 (inc ind) (sort-arr ind))]
 (recur (inc ind) temp-arr (inc swaps?)))
 (recur (inc ind) sort-arr swaps?)))))

This seems to run a bit quicker, though it's still horribly slow compared to the core sort.

Thanks! I would love to see more of the review. I know this might sound annoying, but... can I have more stuff please?
Thanks for the full bubble-sort review! I thought I couldn't turn the function into a recursive one using recur. I changed swaps? to a boolean as well, so it's easier to follow. I'm not sure if it's faster overall, though.
@Qwerp-Derp recur takes the place of a recursive call, but is implemented by a loop - faster and uses no call stack. You can only use it when the returned value is that of the recursive call itself. This is called tail position. Lucklly, that prevails here.
I've revisited this review, and I kinda want reviews for the other sorts. If you have time, could you do it?

Stack Exchange Network

Sorting algorithms and command-line interface

core.clj:

command-line.clj:

1 Answer 1

Your Answer

Sign up or log in

Post as a guest

Post as a guest

Hot Network Questions

Sorting algorithms and command-line interface

core.clj:

command-line.clj:

1 Answer 1

Your Answer

Sign up or log in

Post as a guest

Post as a guest

Related

Hot Network Questions