How to avoid data corruption with dual parent/child foreign keys

Question 1

Imagine the following:

Persons table: (Id, FirstName, LastName)
PersonEmails table: (Id, PersonId, Address) (to allow a person to have multiple emails)
Contacts table: (Id, PersonId, UnsubscribeAll) (special table, because only some people are in the role which allows them to subscribe)
Subscribers table: (Id, ContactId) (contact-role people subscribed to something)
SubscriberEmails table: (Id, SubscriberId, PersonEmailId) (which email address each person would like to receive the notifications at)

It's trivial to imagine how rows can be inserted into SubscriberEmails where SubscriberId points at Subscribers for one person, but PersonEmailId points at PersonEmails for another person. Is there way to enforce this logical structure to avoid this corruption?

Possible but clunky solutions:

Triggers
Job on a regular schedule to find problems

Question 2

Assuming that an API exists whereby changes to the email address lists are maintained, one could simply provide some validation in the API endpoint to prevent the situation you described from occurring.

Question 3

@RobertHarvey's suggestion sound reasonable. But, I'm actually curious why you need SubscriberEmails at all. Why wouldn't Subscribers be (Id, PersonEmailId)? You can easily get to Person via the PersonEmail they subscribed with. (Or to the associated Contact.)

Question 4

@svidgen I need the Subscribers table because 1) it has other state which is by subscriber, and not by email; 2) I have other entities in my model for SMS subscriptions, which share base classes with the email system, and for SMS, a contact can only have one phone number subscribed (per the business requirements).

Question 5

Gotcha! I'd favor putting the rules in your API then, as Robert suggested. Restrict write access to the database to that API.

Question 6

Foreign key relationships don't have to be single columns, you can have "extra" columns in the relations to enforce your requirement.

Persons table with (Id, ...)
PersonEmails table with (Id, PersonId, ...), relates to Persons by PersonId
Subscribers table with (Id, PersonId, ...), relates to Persons by PersonId
SubscriberEmails table with (Id, PersonId, SubscriberId, PersonEmailId, ...), relates to Persons by PersonsId, to PersonEmails by (PersonId, PersonEmailId) and to Subscribers by (PersonId, SubscriberId)

Question 7

Thank you for the suggestion. This may work for some others who come across this question. Unfortunately, it won't work for me because 1) I actually have another table, Contacts, which Subscriber references, and that Contacts table has PersonId, and Subscribers.PersonId actually references Contacts table; and 2) it necessitates me adding PersonId column to SubscriberEmails, which breaks the entity model and is difficult to accomplish cleanly, because I use an ORM. Still, I up-voted your answer because it can be useful to others.

Question 8

@Mr.TA: The same advice still applies to your situation, it's just another column to track. However, it looks to me like you're creating a complicated data structure instead of a preliminary data validation procedure, and I'm not convinced that you're taking the path of least resistance here.

Question 9

You could add a check constraint that validates the relationship, the only problem with doing so is that it only validates one end of the relationship. You either have to add a another check constrain (or validate via a trigger) on the other end.

See https://stackoverflow.com/questions/3880698/can-a-check-constraint-relate-to-another-table for some guidance on what you want to do.

Question 10

The constraints available to most relational databases do not verify things quite to this level. If you ever notice this kind of inconsistency, application code is likely to blame.

You have identified a few solutions to this problem (using triggers or a scheduled job). A (non-exhaustive) list of other alternatives include:

Stored procedures
Performing checks in the application layer

You need to choose from a number of possible options. There is no single best solution here, but generally these checks are enforced in triggers, stored procedures, or application code that executes before SQL is sent across the network. The one place it won't exist is in the database schema, where I use the term "schema" very loosely to mean "database tables, columns, and constraints."

You will need to make a judgement yourself which layer of the application should handle this. These decisions are subjective by nature, so analyze other code in the application to identify pre-established patterns, or consult your team for further guidance. If you are the one making this judgement call, balance the ease of development, testing, and maintenance with the likelihood that something like this would actually happen.

Question 11

SPs are out for me because I use an ORM, and there already exist checks in the application layer, but those failed in this situation (thankfully, it's not a huge problem). I decided to create triggers for now. I would disagree that this is an application logic problem; under no logical circumstance should the persons be different. It seems to be a schema restriction not being supported by database engines problem to me.

Question 12

@Mr.TA: remember what databases were designed for: storing data. Data consistency is a realm that relational databases can help with, but they don't solve all consistency problems. That's why this is an application logic issue.

Question 13

I believe this is completely wrong. You could just as well ask how the database could ensure that a foreign key points to an existing row in another table. And yet databases do that all the time as it is a data integrity problem and not a business rule problem.

Question 14

@user253751: yes, but relational database systems have easily solved that kind of consistency issue using, just as you said, foreign key constraints. The kind of inconsistency the OP is talking about is not a constraint that relational databases have solved. I never said this wasn't a data integrity problem. I was saying that an RDMS does not have a solution using constraints, and instead you need to choose from a number of other options, some of which reside outside of the database.

Question 15

@GregBurghardt then you're going to have to justify why a "single foreign key" is something databases should solve and a "double foreign key" is something they shouldn't. Keep in mind that you did not answer that no database currently has this feature - you answered that this is a bad feature that no database should ever have.

Caleth Caleth 12.3k2 gold badges29 silver badges44 bronze badges · Answer 1 · 2023-03-30 16:22:18Z

2

Foreign key relationships don't have to be single columns, you can have "extra" columns in the relations to enforce your requirement.

Persons table with (Id, ...)
PersonEmails table with (Id, PersonId, ...), relates to Persons by PersonId
Subscribers table with (Id, PersonId, ...), relates to Persons by PersonId
SubscriberEmails table with (Id, PersonId, SubscriberId, PersonEmailId, ...), relates to Persons by PersonsId, to PersonEmails by (PersonId, PersonEmailId) and to Subscribers by (PersonId, SubscriberId)

Share

Improve this answer

edited Mar 30, 2023 at 22:09

Mr. TA's user avatar

Mr. TA

1755 bronze badges

answered Mar 30, 2023 at 16:22

Caleth's user avatar

Caleth Caleth

12.3k2 gold badges29 silver badges44 bronze badges

2

Thank you for the suggestion. This may work for some others who come across this question. Unfortunately, it won't work for me because 1) I actually have another table, Contacts, which Subscriber references, and that Contacts table has PersonId, and Subscribers.PersonId actually references Contacts table; and 2) it necessitates me adding PersonId column to SubscriberEmails, which breaks the entity model and is difficult to accomplish cleanly, because I use an ORM. Still, I up-voted your answer because it can be useful to others.

Mr. TA
– Mr. TA

03/30/2023 16:47:10
Commented Mar 30, 2023 at 16:47
@Mr.TA: The same advice still applies to your situation, it's just another column to track. However, it looks to me like you're creating a complicated data structure instead of a preliminary data validation procedure, and I'm not convinced that you're taking the path of least resistance here.

Flater
– Flater

03/31/2023 00:45:39
Commented Mar 31, 2023 at 0:45

Add a comment |

jmoreno jmoreno 11.2k1 gold badge33 silver badges50 bronze badges · Answer 2 · 2023-03-30 23:23:39Z

You could add a check constraint that validates the relationship, the only problem with doing so is that it only validates one end of the relationship. You either have to add a another check constrain (or validate via a trigger) on the other end.

See https://stackoverflow.com/questions/3880698/can-a-check-constraint-relate-to-another-table for some guidance on what you want to do.

score -2 · Answer 3 · 2023-03-30 15:29:39Z

-2

The constraints available to most relational databases do not verify things quite to this level. If you ever notice this kind of inconsistency, application code is likely to blame.

You have identified a few solutions to this problem (using triggers or a scheduled job). A (non-exhaustive) list of other alternatives include:

Stored procedures
Performing checks in the application layer

You need to choose from a number of possible options. There is no single best solution here, but generally these checks are enforced in triggers, stored procedures, or application code that executes before SQL is sent across the network. The one place it won't exist is in the database schema, where I use the term "schema" very loosely to mean "database tables, columns, and constraints."

You will need to make a judgement yourself which layer of the application should handle this. These decisions are subjective by nature, so analyze other code in the application to identify pre-established patterns, or consult your team for further guidance. If you are the one making this judgement call, balance the ease of development, testing, and maintenance with the likelihood that something like this would actually happen.

Share

Improve this answer

edited Mar 30, 2023 at 19:31

answered Mar 30, 2023 at 15:29

Greg Burghardt's user avatar

Greg Burghardt Greg Burghardt

45.7k8 gold badges85 silver badges149 bronze badges

10

SPs are out for me because I use an ORM, and there already exist checks in the application layer, but those failed in this situation (thankfully, it's not a huge problem). I decided to create triggers for now. I would disagree that this is an application logic problem; under no logical circumstance should the persons be different. It seems to be a schema restriction not being supported by database engines problem to me.

Mr. TA
– Mr. TA

03/30/2023 15:45:30
Commented Mar 30, 2023 at 15:45
@Mr.TA: remember what databases were designed for: storing data. Data consistency is a realm that relational databases can help with, but they don't solve all consistency problems. That's why this is an application logic issue.

Greg Burghardt
– Greg Burghardt

03/30/2023 15:58:07
Commented Mar 30, 2023 at 15:58
2

I believe this is completely wrong. You could just as well ask how the database could ensure that a foreign key points to an existing row in another table. And yet databases do that all the time as it is a data integrity problem and not a business rule problem.

Stack Exchange Broke The Law
– Stack Exchange Broke The Law

03/30/2023 16:06:17
Commented Mar 30, 2023 at 16:06
@user253751: yes, but relational database systems have easily solved that kind of consistency issue using, just as you said, foreign key constraints. The kind of inconsistency the OP is talking about is not a constraint that relational databases have solved. I never said this wasn't a data integrity problem. I was saying that an RDMS does not have a solution using constraints, and instead you need to choose from a number of other options, some of which reside outside of the database.

Greg Burghardt
– Greg Burghardt

03/30/2023 17:34:24
Commented Mar 30, 2023 at 17:34
@GregBurghardt then you're going to have to justify why a "single foreign key" is something databases should solve and a "double foreign key" is something they shouldn't. Keep in mind that you did not answer that no database currently has this feature - you answered that this is a bad feature that no database should ever have.

Stack Exchange Broke The Law
– Stack Exchange Broke The Law

03/30/2023 17:35:47
Commented Mar 30, 2023 at 17:35

| Show 5 more comments

Stack Exchange Network

How to avoid data corruption with dual parent/child foreign keys

3 Answers 3

Your Answer

Sign up or log in

Post as a guest

Post as a guest

Hot Network Questions

How to avoid data corruption with dual parent/child foreign keys

3 Answers 3

Your Answer

Sign up or log in

Post as a guest

Post as a guest

Related

Hot Network Questions