Parallel quicksort algorithm taking way too long

Question 1

Below is a Python implementation of the quicksort algorithm using parallelism. It takes about 1 second per every 10 items in the list, which is hilariously unacceptable. Why is it so slow?

from multiprocessing import *
def quicksort(lyst, connection=None):
 if len(lyst) > 1:
 pivot = lyst.pop(len(lyst)-1)
 wall = 0
 for i in range(len(lyst)):
 if lyst[i] <= pivot:
 lyst[wall], lyst[i] = lyst[i], lyst[wall]
 wall += 1
 receiveLeft, sendLeft = Pipe()
 receiveRight, sendRight = Pipe()
 Process(target=quicksort, args=(lyst[:wall], sendLeft)).start()
 Process(target=quicksort, args=(lyst[wall:], sendRight)).start()
 lyst = receiveLeft.recv() + [pivot] + receiveRight.recv()
 if connection:
 connection.send(lyst)
 connection.close()
 return lyst
if __name__ == '__main__':
 quicksort([8,4,6,9,1,3,10,2,7,5])

EDIT Thanks for the responses. As it turns out, switching to Threads and limiting the number of them I was spawning sped things up. However, my linear version of the algorithm still performed better.

Question 2

Also, please consider adding your imports so your code will be easier to review. Also consider adding example usage, like how the function will be called on example data and it's result.

Question 3

The problem seems to be the fact that you are not dealing with threads (as you should), but rather entire processes (which is more computationally demanding). Refer to this page for details on how to spawn threads.

Also, as a side node, it makes no sense to keep spawning new threads over and over again, since spawning a thread can take easily a couple of milliseconds, which would obliterate all performance gain. Instead, ask the user to pass the maximum amount of threads allowed, and do not spawn any more than that. Furthermore, it would be reasonable to cancel spawning a new thread in case the range to sort it receives is too small.

Hope that helps.

Question 4

Oh really approved answer !!! Dears, There is no real parallel execution with threadsin Python !!!! Keep your process but manage to be reasonnable in the number of them you creat. Alex

Question 5

Spawning new processes is expensive (the cost also varies greatly between types of operating systems). One optimization would be to spawn processes at first and keep them ready to accept tasks. Then you only get the overhead of passing the data and results between them.

A much better alternative is to use thread based parallelism. However, Python has some issues in this regard as explained here: https://softwareengineering.stackexchange.com/questions/186889/why-was-python-written-with-the-gil

coderodde coderodde 31.6k15 gold badges77 silver badges201 bronze badges · Accepted Answer · 2016-09-11 17:27:12Z

The problem seems to be the fact that you are not dealing with threads (as you should), but rather entire processes (which is more computationally demanding). Refer to this page for details on how to spawn threads.

Also, as a side node, it makes no sense to keep spawning new threads over and over again, since spawning a thread can take easily a couple of milliseconds, which would obliterate all performance gain. Instead, ask the user to pass the maximum amount of threads allowed, and do not spawn any more than that. Furthermore, it would be reasonable to cancel spawning a new thread in case the range to sort it receives is too small.

Hope that helps.

Oh really approved answer !!! Dears, There is no real parallel execution with threadsin Python !!!! Keep your process but manage to be reasonnable in the number of them you creat. Alex

Stack Exchange Network

Parallel quicksort algorithm taking way too long

2 Answers 2

Your Answer

Sign up or log in

Post as a guest

Post as a guest

Hot Network Questions

Parallel quicksort algorithm taking way too long

2 Answers 2

Your Answer

Sign up or log in

Post as a guest

Post as a guest

Related

Hot Network Questions