I want to run a complex algorithm consisting of many subtasks in parallel on a number of processors. These subtasks depend on one another, so that this algorithm is characterized by a DAG (Directed Acyclic Graph) of subtasks. An arrow from A to B in the DAG means that subtask A must be completed before subtask B starts.
There are several problems I'd like to solve with regard to this DAG. For example, I'd like to identify bottlenecks in order to prioritize them in the scheduling process, and efficiently find pairs of subtasks that might be involved in race conditions. Some of the problems that I want to solve are NP-hard for general DAGs. But they might be tractable for the specific types of DAGs that one tends to get from algorithm design. So I want to know: What are the typical properties of dependency graphs?
The field of Complex Networks involves the study of graphs and networks arising in the real world. https://en.wikipedia.org/wiki/Complex_network
It has been discovered that many such networks have common properties, such as a heavy tailed degree distribution and strong local dependencies between edges. People have studied computer networks, biological networks, social networks, etc, but I'm not aware of (and haven't been able to find) any research on the properties of large dependency graphs.
I'd like to find out what these properties could be! If anyone can point me in the direction of some relevant research, that would be amazing. If not, where could I get examples of such dependency graphs to analyze myself? Maybe there are online datasets I could use? If any experienced software engineers have any intuitions or guesses as to what to look at here, that would be great too!
-
Can you tell us exactly what you exactly mean by "dependency graphs", and in how you separate them from arbitrary DAGs? Note also that "typical" is a very opinionated term. Currently, from what I read in this question, it could be any kind of DAG, which turns your question into "what are the typical properties of DAGs?", which is definitely too broad to be answerable on this site in a sensible manner.Doc Brown– Doc Brown12/21/2022 18:33:21Commented Dec 21, 2022 at 18:33
-
So when parallelizing an algorithm, we divide it into subtasks. Each subtask must run on a single processor, but the idea is that different subtasks can run in parallel. However, some subtasks depend on others. This can be represented as a DAG. In principle any DAG could be a dependency graph, but in practice I'm guessing that dependency graphs have some typical characteristics. For example, maybe the degree distribution has a heavy tail? I don't really know, that's the question.Zur Luria– Zur Luria12/21/2022 19:19:54Commented Dec 21, 2022 at 19:19
-
Arbitrary algorithms with arbitrary subtasks can form arbitrary DAGs, and I am sure not just in theory, but also in practice.Doc Brown– Doc Brown12/21/2022 20:21:02Commented Dec 21, 2022 at 20:21