Purging of unreferenced data via unconditional DELETE, by leaning on foreign key constraints

Question 1

I'm designing a database from which old data has to be purged regularly for legal reasons, and I'm trying to figure out the best way to organise the purging.

Dealing with the main tables is trivial, since they all have a month column and cascading deletes are set up for all owned rows in detail tables. So I can simply iterate over a list of these "master" tables and delete rows older than a given month.

However, there are some tables for which things aren't that simple. They can be referenced from quite a few other tables but their rows have to disappear when they are no longer referenced (because of data protection laws).

I could write some code find out which rows in a certain table aren't referenced anywhere else, based on the foreign key meta data in the database schema.

However, I'd rather lean on the foreign key constraints instead and simply use the moral equivalent of DELETE FROM @TableName. The constraints keep referenced rows from getting deleted and all unreferenced rows disappear as intended. Hence I would make simply make a second list of table names to which the unconditional DELETE should be applied during a purge, and that's it.

That solution would certainly be ideal: you can't make it any simpler, and it is directly based on the declared database schema, i.e. the foreign key constraints.

Would that be considered acceptable practice? Are there any drawbacks*?

*) apart from the fact that the consequences of someone dropping a required foreign key constraint are not only unusually dire but also unusually delayed (until the next purge at the end of the year)

Additional considerations

I found the fly in the ointment: the constraint not only blocks the deletion, it also results in an error state and abortion of the current statement. Hence the scheme doesn't work as is. Back to the drawing board then... Any pointers welcome.

Question 2

Any chance you could put together a Minimal, Complete, and Verifiable Example? This would help wrap some proper context around an equivalent data model to the one you're asking about.

Question 3

@John: good idea, but I won't be able to do it before Saturday since I'm working 15-hour days on the primary project (which is one of the reasons why I am trying to offload as much work onto the DBMS as possible).

Question 4

I don't think anyone on the site cares how quickly you put it together; just don't expect (m)any answers until you can provide more context.

Question 5

One option would be to create views for these "inconvenient" tables that only return rows where all foreign key relationships have been broken. Then you could do unconditional deletes from those views.

You would have the overhead of coding each of these views*, but that's a one-time startup cost. This is essentially the same as what you mentioned already:

I could write some code find out which rows in a certain table aren't referenced anywhere else

But in the end, you will have the "unconditional deletes" in your cleanup job.

Note: the approach below would not scale well on really large tables without indexes, and potentially some form of batching (e.g., putting a TOP 100 and ORDER BY in the view definition

Say you have two "main tables" that reference this "inconvenient table" (the one you need to remove rows from when they are no longer referenced:

CREATE TABLE dbo.InconvenientTable
(
 Id int IDENTITY(1,1) NOT NULL,
 [Description] varchar(10) NOT NULL,
 CONSTRAINT PK_InconvenientTable 
 PRIMARY KEY (Id)
);
GO
CREATE TABLE dbo.MainTable1
(
 Id int IDENTITY(1,1) NOT NULL,
 [Description] varchar(10) NOT NULL,
 InconvenientTableId int NOT NULL,
 CONSTRAINT FK_MainTable1_InconvenientTable 
 FOREIGN KEY (InconvenientTableId)
 REFERENCES dbo.InconvenientTable (Id)
);
CREATE TABLE dbo.MainTable2
(
 Id int IDENTITY(1,1) NOT NULL,
 [Description] varchar(10) NOT NULL,
 InconvenientTableId int NOT NULL,
 CONSTRAINT FK_MainTable2_InconvenientTable 
 FOREIGN KEY (InconvenientTableId)
 REFERENCES dbo.InconvenientTable (Id)
);

The inconvenient table has 3 rows, all 3 of which are referenced in MainTable1, while only 1 is referenced in MainTable2.

INSERT INTO dbo.InconvenientTable
 ([Description])
VALUES
 ('One'),
 ('Two'),
 ('Three');
GO
INSERT INTO dbo.MainTable1
 ([Description], InconvenientTableId)
VALUES
 ('One', 1),
 ('Two', 2),
 ('Three', 3);
INSERT INTO dbo.MainTable2
 ([Description], InconvenientTableId)
VALUES
 ('Two', 2);

Now we need a view that shows all rows in dbo.InconvenientTable that are not referenced by the two main tables:

CREATE VIEW dbo.InconvenientTable_RowsToDelete
AS
SELECT it.*
FROM dbo.InconvenientTable it
WHERE 
 NOT EXISTS (SELECT NULL FROM dbo.MainTable1 mt1 WHERE mt1.InconvenientTableId = it.Id)
 AND NOT EXISTS (SELECT NULL FROM dbo.MainTable2 mt2 WHERE mt2.InconvenientTableId = it.Id);
GO

Currently all rows are referenced, so this query returns 0 rows:

screenshot of SSMS showing no rows returned by the view

Now let's delete the 'Two' row from both main tables:

DELETE dbo.MainTable2 WHERE InconvenientTableId = 2;
DELETE dbo.MainTable1 WHERE InconvenientTableId = 2;

And now the view returns that unreferenced row:

screenshot of SSMS showing one row

Now we can delete everything from the view, which successfully removes the 'Two' row from our inconvenient table for compliance reasons:

screenshot of SSMS showing 1 row being deleted from the view

*You could also attempt to automate the creation of the views with dynamic SQL and metadata queries, but that seems risky

Josh Darnell Josh Darnell 30.2k5 gold badges70 silver badges124 bronze badges · Answer 1 · 2019-06-20 18:07:36Z

One option would be to create views for these "inconvenient" tables that only return rows where all foreign key relationships have been broken. Then you could do unconditional deletes from those views.

You would have the overhead of coding each of these views*, but that's a one-time startup cost. This is essentially the same as what you mentioned already:

I could write some code find out which rows in a certain table aren't referenced anywhere else

But in the end, you will have the "unconditional deletes" in your cleanup job.

Note: the approach below would not scale well on really large tables without indexes, and potentially some form of batching (e.g., putting a TOP 100 and ORDER BY in the view definition

Say you have two "main tables" that reference this "inconvenient table" (the one you need to remove rows from when they are no longer referenced:

CREATE TABLE dbo.InconvenientTable
(
 Id int IDENTITY(1,1) NOT NULL,
 [Description] varchar(10) NOT NULL,
 CONSTRAINT PK_InconvenientTable 
 PRIMARY KEY (Id)
);
GO
CREATE TABLE dbo.MainTable1
(
 Id int IDENTITY(1,1) NOT NULL,
 [Description] varchar(10) NOT NULL,
 InconvenientTableId int NOT NULL,
 CONSTRAINT FK_MainTable1_InconvenientTable 
 FOREIGN KEY (InconvenientTableId)
 REFERENCES dbo.InconvenientTable (Id)
);
CREATE TABLE dbo.MainTable2
(
 Id int IDENTITY(1,1) NOT NULL,
 [Description] varchar(10) NOT NULL,
 InconvenientTableId int NOT NULL,
 CONSTRAINT FK_MainTable2_InconvenientTable 
 FOREIGN KEY (InconvenientTableId)
 REFERENCES dbo.InconvenientTable (Id)
);

The inconvenient table has 3 rows, all 3 of which are referenced in MainTable1, while only 1 is referenced in MainTable2.

INSERT INTO dbo.InconvenientTable
 ([Description])
VALUES
 ('One'),
 ('Two'),
 ('Three');
GO
INSERT INTO dbo.MainTable1
 ([Description], InconvenientTableId)
VALUES
 ('One', 1),
 ('Two', 2),
 ('Three', 3);
INSERT INTO dbo.MainTable2
 ([Description], InconvenientTableId)
VALUES
 ('Two', 2);

Now we need a view that shows all rows in dbo.InconvenientTable that are not referenced by the two main tables:

CREATE VIEW dbo.InconvenientTable_RowsToDelete
AS
SELECT it.*
FROM dbo.InconvenientTable it
WHERE 
 NOT EXISTS (SELECT NULL FROM dbo.MainTable1 mt1 WHERE mt1.InconvenientTableId = it.Id)
 AND NOT EXISTS (SELECT NULL FROM dbo.MainTable2 mt2 WHERE mt2.InconvenientTableId = it.Id);
GO

Currently all rows are referenced, so this query returns 0 rows:

screenshot of SSMS showing no rows returned by the view

Now let's delete the 'Two' row from both main tables:

DELETE dbo.MainTable2 WHERE InconvenientTableId = 2;
DELETE dbo.MainTable1 WHERE InconvenientTableId = 2;

And now the view returns that unreferenced row:

screenshot of SSMS showing one row

Now we can delete everything from the view, which successfully removes the 'Two' row from our inconvenient table for compliance reasons:

screenshot of SSMS showing 1 row being deleted from the view

*You could also attempt to automate the creation of the views with dynamic SQL and metadata queries, but that seems risky

Stack Exchange Network

Purging of unreferenced data via unconditional DELETE, by leaning on foreign key constraints

Additional considerations

1 Answer 1

Your Answer

Sign up or log in

Post as a guest

Post as a guest

Hot Network Questions

Purging of unreferenced data via unconditional DELETE, by leaning on foreign key constraints

Additional considerations

1 Answer 1

Your Answer

Sign up or log in

Post as a guest

Post as a guest

Related

Hot Network Questions