4
$\begingroup$

I'm curious if there is existing literature or well known algorithms out there for a problem I am working on.

Given a normalized relational database with valid foreign key constraints and a list of fields across different tables, what is a good or at least consistent algorithm for determining the (inner) joins required to provide that information? Ideally using the minimal number of joins possible.

Given a graph and a set of nodes that I want. How do I find a subgraph which is connected that contains all those nodes.


My initial thoughts:

I see the table as a directed graph with tables as vertices and foreign keys as edges between them. For the purposes of the algorithm treating the graph as undirected seems like it might be useful.

Shortest path between tables seemed obvious but as soon as more than 2 tables are requested then I would be finding the shortest path that visits all nodes (traveling salesman territory ACK!). My second thought is the minimum spanning tree of the subgraph. However, how would I choose the subgraph?


Edit: After thinking about it more I think this could work. Run Kruskal's minimum spanning tree on the graph beginning from a desired node. Assign weights to the edges like so weight=1 if (a,b) is between two desired nodes. Weight = n^2, where n is the number of nodes, if (a,b) is between a desired node and a undesired. Weight = n^4 if between two undesired nodes. The weights are arbitrary. I just thought it would be better to use values that couldn't overlap.

Once all desired nodes are added to the MST, exit Kruskal's. Then from the root (starting node) walk down to each leaf. If a leaf is not desired node, then delete it. Repeat until no nodes are deleted.

Raphael
73.3k31 gold badges183 silver badges403 bronze badges
asked Jun 19, 2015 at 5:23
$\endgroup$

1 Answer 1

1
$\begingroup$

This is the minimum Steiner tree problem for general graphs. It is known to be NP-hard. Here you have the special case where every edge has weight 1, but that special case is still NP-hard.

However, there are approximation algorithms, which you could use to get somewhat close to the optimum. Alternatively, if you want the exact optimal solution and your graph is not too large, you could express your problem as an instance of integer linear programming (ILP) and try using an off-the-shelf ILP solver, though the worst-case running time will still be exponential.

answered Jun 19, 2015 at 17:53
$\endgroup$

Your Answer

Draft saved
Draft discarded

Sign up or log in

Sign up using Google
Sign up using Email and Password

Post as a guest

Required, but never shown

Post as a guest

Required, but never shown

By clicking "Post Your Answer", you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.