Database design: circular reference but dynamically created

Question 1

I am trying to design database for this case:

Assignments have vectors, relation is 1:N
Assignments have submissions, relation is 1:N
Submissions have executions, relation is 1:N
Every execution have one vector.

Database EER diagram

Business logic

teacher creates assignments and defines test vectors
student upload his solution, so record in submissions is created
After successful compilation of submission, submission is executed with defined test vectors. Each execution is one record in executions (One execution per vector).

So circular reference is created after successful execution, but if compilation failed no record is created in executions. Link between vectors and executions is needed for score calculation process, where is reference output from vectors compared to output from executions.

So in my case, circular reference is not persistent, but it depends on runtime, so it is wrong design?

Question 2

The tricky part is that relational databases don't care about the direction of the relationship or circular dependencies. In the object model we do care.

Define your "has a" relationships, and then abstract one of the classes with an interface to break the circular dependency. For example:

Assignment aggregates TestVector (i.e. has a collection of)
Submission has a Assignment
Execution has a Submission
Execution aggregates ExecutionStep (interface)
Vector is a ExecutionStep (interface)

See how Execution refers to an abstract ExecutionStep rather than a concrete Vector? That breaks the circular dependency, because now you can define other things for the execution to run without changing the object model.

(However, looking at it now, you don't need the foreign key from execution to vector, because you can get that with a join of Execution -> Submission -> Assignment -> Vector.)

...

When you write the classes to model these tables, then you have a circular code dependency, which is bad. So you can break the code dependency with an interface, but that's a separate issue from the database design.

In the database design, the reason Execution -> Vector is problematic is that it duplicates Execution -> Submission -> Assignment -> Vector. So you'd have to make sure those stay in sync.

Unless:

you're duplicating the data as a performance optimization & know the risks
you want Execution to reflect the vectors involved when it ran

That is, let's say an Assignment adds a new Vector. Any existing Executions won't see that new vector, because they already point to their list of vectors. But that may be a good thing, because it's saying "When this Execution ran, it used these Vectors, even though an Assignment may have added or removed Vectors since then".

I was thinking these artifacts wouldn't change; that they are all immutable. But that may be a bad assumption. So I could be wrong -- the reference from Execution to Vector may not be redundant. You'd know the answer to that better than me.

Question 3

Thanks for answer. Yes join is possible but it is good practice to join 3 tables? With ExecutionStep you mean table m:n like vector_id, execution_id? thanks

Question 4

execution -> submissions -> assignments are all just index lookups, so those are fast. The only outer join is from assignments -> vectors. In other words, the only join that "blows up" the result set is the last one. The other two are just lookups. In a big production system you might have joins across dozens of tables. But if you need performance, you might not want to have any joins at all. In this case, since the execution is doing offline processing, it looks OK.

Question 5

You could break it up into two if your object-relational mapping made it messy. Go Execution -> Submission -> Assignment to resolve the original assignment, and then Assignment -> Vector to get the list of vectors. But I'd probably just do it in one shot unless it got too messy.

Question 6

Sorry -- I missed the second part of your question. I ammended the answer to pick that up. :)

Question 7

It should work as you have written "...it's saying "When this Execution ran, it used these Vectors, even though an Assignment may have added or removed Vectors since then"." Assumption is that submission was executed, with this set of vectors. When something change in vectors to selected assignment it creates new submission ... but one execution has only one vector.

sea-rob sea-robsea-rob 6,9111 gold badge26 silver badges48 bronze badges · Accepted Answer · 2014-03-22 14:58:03Z

The tricky part is that relational databases don't care about the direction of the relationship or circular dependencies. In the object model we do care.

Define your "has a" relationships, and then abstract one of the classes with an interface to break the circular dependency. For example:

Assignment aggregates TestVector (i.e. has a collection of)
Submission has a Assignment
Execution has a Submission
Execution aggregates ExecutionStep (interface)
Vector is a ExecutionStep (interface)

See how Execution refers to an abstract ExecutionStep rather than a concrete Vector? That breaks the circular dependency, because now you can define other things for the execution to run without changing the object model.

(However, looking at it now, you don't need the foreign key from execution to vector, because you can get that with a join of Execution -> Submission -> Assignment -> Vector.)

...

When you write the classes to model these tables, then you have a circular code dependency, which is bad. So you can break the code dependency with an interface, but that's a separate issue from the database design.

In the database design, the reason Execution -> Vector is problematic is that it duplicates Execution -> Submission -> Assignment -> Vector. So you'd have to make sure those stay in sync.

Unless:

you're duplicating the data as a performance optimization & know the risks
you want Execution to reflect the vectors involved when it ran

That is, let's say an Assignment adds a new Vector. Any existing Executions won't see that new vector, because they already point to their list of vectors. But that may be a good thing, because it's saying "When this Execution ran, it used these Vectors, even though an Assignment may have added or removed Vectors since then".

I was thinking these artifacts wouldn't change; that they are all immutable. But that may be a bad assumption. So I could be wrong -- the reference from Execution to Vector may not be redundant. You'd know the answer to that better than me.

Thanks for answer. Yes join is possible but it is good practice to join 3 tables? With ExecutionStep you mean table m:n like vector_id, execution_id? thanks
execution -> submissions -> assignments are all just index lookups, so those are fast. The only outer join is from assignments -> vectors. In other words, the only join that "blows up" the result set is the last one. The other two are just lookups. In a big production system you might have joins across dozens of tables. But if you need performance, you might not want to have any joins at all. In this case, since the execution is doing offline processing, it looks OK.
You could break it up into two if your object-relational mapping made it messy. Go Execution -> Submission -> Assignment to resolve the original assignment, and then Assignment -> Vector to get the list of vectors. But I'd probably just do it in one shot unless it got too messy.
Sorry -- I missed the second part of your question. I ammended the answer to pick that up. :)
It should work as you have written "...it's saying "When this Execution ran, it used these Vectors, even though an Assignment may have added or removed Vectors since then"." Assumption is that submission was executed, with this set of vectors. When something change in vectors to selected assignment it creates new submission ... but one execution has only one vector.

Stack Exchange Network

Database design: circular reference but dynamically created

1 Answer 1

Your Answer

Sign up or log in

Post as a guest

Post as a guest

Hot Network Questions

Database design: circular reference but dynamically created

1 Answer 1

Your Answer

Sign up or log in

Post as a guest

Post as a guest

Related

Hot Network Questions