I have a situation where I want to run a complex algorithm consisting of many subtasks in parallel on a number of processors. These subtasks depend on one another, so that this algorithm is characterized by a DAG1 of subtasks. An arrow from A to B in the DAG means that subtask A must be completed before subtask B starts. The scheduler assigns each subtask to a processor. Assuming that we know the runtime of each subtask in advance, we want the total runtime (the "makespan") to be minimized.
It is known (see for example Cormen's Intro to Algorithms) that this problem is NP-hard, and that greedy scheduling is a 2-approximation. But in practical applications, improving the runtime by 10% may be a big deal. My question is:
Are there more sophisticated alternatives to greedy scheduling that take into account the structure of the DAG and/or the runtimes of the subtasks?
Are there known methods of adding dependencies to the DAG in order to improve the runtime of a greedy scheduler?
1: Directed acyclic graph
-
given that you are running the same algorithm multiple times, simply compute the fastest order and save itEwan– Ewan2022年11月10日 22:51:52 +00:00Commented Nov 10, 2022 at 22:51
-
2How many is many? Realistic applications of some NP problems sometimes don't have large instance sizes, and can be brute forced. Larger instance sizes can also be optimized probabilistically, similar to a chess engine that doesn't evaluate all possible actions but samples a representative set. In any case, there's a large body of literature on the optimal job scheduling problem.amon– amon2022年11月11日 01:02:30 +00:00Commented Nov 11, 2022 at 1:02
-
Surely an interesting topic. However, we expect askers here to do some research on their own before they ask and tell what they found. Did you look into Wikipedia? Did you google for "scheduling tasks parallel dag"? Why didn't it help?Doc Brown– Doc Brown2022年11月11日 11:32:59 +00:00Commented Nov 11, 2022 at 11:32