Main Questions:
- Is my runtime analysis correct?
- Is my approach optimal or close to it?
- Is there a better way and/or improvements?
Objective:
- Write an algorithm to assign
n
jobs tok
entities (servers, cpus, whatever) in a round robin fashion. The job that takes longest will be assign first, the shortest last. In the case of a tie, the job with lowest id will be assigned first. - return the job ids assigned to each entity, where an entity can be represented as a list
sample input:
jobs= [40, 80,40, 4, 2]
- entities = 2
- where
jobs[i]
is the length of time the task, andi
is its identifier.
sample output:
- [[1, 2, 4], [0, 3]]
I came up with two algorithms and wanted to see if my runtime analysis is correct, and if there is a better way to approach this
Implementation 1: Linear Search
Pseudocode:
- linear search for the longest job, from left to right, saving its job id (index)
- only update the index if we find a larger job in list
- once we find it assign to another list, which will contain the jobs in order
- iterate over the ordered list, assigning to each server in round robin fashion
Hypothesis for runtime:
- time proportional to
n ^ 2
, wheren
are the number of jobs
Code
def findMax(jobs):
ind_of_max = 0
### iterate over jobs
for k in range(len(jobs)):
# if current job we are processing is greater than the previous lomgest job
if jobs[k] > jobs[ind_of_max]:
ind_of_max = k
# sinonimouns to removing the job from the PQ
jobs[ind_of_max] = -1
return ind_of_max
def determineJobOrder(jobs):
# list of jobIds
order_of_jobs = []
num_of_jobs = len(jobs)
#### TOTAL RUNTIME ANALYSIS:
# - runs in time proportional to N ^ 2
####
# runs in time proportional to N
while num_of_jobs > 0:
# removing jobs from list (aka performing dequeue operations until we run out of jobs)
# runs in time proportional to N
ind_of_max = findMax(jobs)
order_of_jobs.append(ind_of_max)
num_of_jobs -= 1
return order_of_jobs
def assignJobsToServers(servers, jobs):
jobs = determineJobOrder(jobs)
# create a list of lists, where the number of inner lists = number of servers
assignment = [[] for i in range(0, servers)]
num_jobs = len(jobs)
job_ind = 0
server_ind = 0
###
### RUNTIME ANALYSIS:
# - runs in time proportional to n
#while there are jobs left to assign
while num_jobs > 0:
assignment[server_ind % servers].append(jobs[job_ind])
# update server index, job index, and number of jobs left to assign
num_jobs -= 1
job_ind += 1
server_ind += 1
return assignment
Implementation 2: Using Heap
Pseudocode:
- Iterate over all jobs, creating a Job object, Job(job_id, time)
- Since python's
heapq
class is a min heap, I useheapq.nlargest
, passing a key for comparison, the list of Jobs
Hypothesis of runtime:
- time proportional to
n log n
, where n are the number of jobs
Code:
## To enncapsulate the jobs and be able to compare by multiple attributes
class Job:
def __init__(self, job_id, time):
self.job_id = job_id
self.time = time
def __repr__(self):
return 'Job(id: {}, time: {})'.format(self.job_id, self.time)
import heapq
def assignJobsToServers(servers, jobs):
for k, exec_time in enumerate(jobs):
jobs[k] = Job(k, exec_time)
# resource used to figure out how to sort by two attributes, one in natural order one in reverse:
# https://www.techiedelight.com/sort-list-of-objects-by-multiple-attributes-python/
#### Runtime Analysis:
# - I am assuming that what this will do is build the max heap, and then remove the n items to get them in max order.
# - Inserts into heap take log (n).
# - removals take log(n)
# - if we are enqueueing n items, this will take n log n
# - if we are removing n items, this will take n log n for removals
# - hence: runtime should be proportional n log n, ignoring constants
####
# key defines that that the larger the time the larger the, and on a tie, the smallest the job id the largest the item
jobs_ordered = heapq.nlargest(len(jobs), jobs, key = lambda job: (job.time, -job.job_id))
# create a list of lists, where the number of inner lists = number of servers
assignments = [[] for i in range(0, servers)]
num_jobs = len(jobs)
server_ind = 0
job_ind = 0
###### RUNTIME ANALYSIS:
# - at this point we already did all the difficult computation. We now just iterate over all jobs which will be in order and assign
# - this takes time proportional to n, the number of jobs
while num_jobs > 0:
assignments[server_ind % servers].append(jobs_ordered[job_ind].job_id)
# update server index, and number of jobs left to assign
num_jobs -= 1
server_ind += 1
job_ind += 1
return assignments
1 Answer 1
Your runtime analysis seems fine and your heap-based approach is optimal in the
narrow sense that it achieves O(n log n)
performance. However, the software
itself is needlessly complex.
The path to simplicity is found by remembering the most famous O(n log n)
operation: sorting. You have a list of job durations that you
want to sort from largest to smallest, breaking ties via their ID/index. There
are a variety of reasonable ways to achieve that. Here's one illustration:
# Some job durations and a generator to produce (DURATION, ID) pairs.
durations = [40, 80, 40, 4, 2]
pairs = enumerate(durations)
# A function to take a pair and return a modified pair for sorting purposes.
sort_key = lambda pair: (-pair[1], pair[0])
# Sort the pairs and extract job IDs from them.
job_ids = [jid for jid, dur in sorted(pairs, key = sort_key)]
# Allocate the job IDs to N workers/entities in round-robin fashion.
n = 2
allocated_job_ids = [job_ids[i::n] for i in range(n)]
print(allocated_job_ids) # [[1, 2, 4], [0, 3]]
-
\$\begingroup\$ got it. not sure why that did not come to mind but that would definitely be simpler in this case, Thank you. Would a PQ implemented with Heap be preferred in the case our list of jobs is not static , meaning, let's say, two jobs can be added to the initial list, then 1 remove, followed by more additions and removals interleaved? it seems like in that case we couldn't achieve n log n for both removals and additions. What do you think? \$\endgroup\$MPC– MPC2021年11月16日 12:17:02 +00:00Commented Nov 16, 2021 at 12:17
-
1\$\begingroup\$ @MPC Correct. The superpower of a heap is its log-n ability to maintain sorted order in the face of inserts/removals. \$\endgroup\$FMc– FMc2021年11月16日 15:57:04 +00:00Commented Nov 16, 2021 at 15:57
Explore related questions
See similar questions with these tags.