I'm designing a database from which old data has to be purged regularly for legal reasons, and I'm trying to figure out the best way to organise the purging.
Dealing with the main tables is trivial, since they all have a month column and cascading deletes are set up for all owned rows in detail tables. So I can simply iterate over a list of these "master" tables and delete rows older than a given month.
However, there are some tables for which things aren't that simple. They can be referenced from quite a few other tables but their rows have to disappear when they are no longer referenced (because of data protection laws).
I could write some code find out which rows in a certain table aren't referenced anywhere else, based on the foreign key meta data in the database schema.
However, I'd rather lean on the foreign key constraints instead and simply use the moral equivalent of DELETE FROM @TableName
. The constraints keep referenced rows from getting deleted and all unreferenced rows disappear as intended. Hence I would make simply make a second list of table names to which the unconditional DELETE
should be applied during a purge, and that's it.
That solution would certainly be ideal: you can't make it any simpler, and it is directly based on the declared database schema, i.e. the foreign key constraints.
Would that be considered acceptable practice? Are there any drawbacks*?
*) apart from the fact that the consequences of someone dropping a required foreign key constraint are not only unusually dire but also unusually delayed (until the next purge at the end of the year)
Additional considerations
I found the fly in the ointment: the constraint not only blocks the deletion, it also results in an error state and abortion of the current statement. Hence the scheme doesn't work as is. Back to the drawing board then... Any pointers welcome.
-
1Any chance you could put together a Minimal, Complete, and Verifiable Example? This would help wrap some proper context around an equivalent data model to the one you're asking about.John Eisbrener– John Eisbrener2019年06月20日 13:36:08 +00:00Commented Jun 20, 2019 at 13:36
-
@John: good idea, but I won't be able to do it before Saturday since I'm working 15-hour days on the primary project (which is one of the reasons why I am trying to offload as much work onto the DBMS as possible).DarthGizka– DarthGizka2019年06月20日 16:12:17 +00:00Commented Jun 20, 2019 at 16:12
-
2I don't think anyone on the site cares how quickly you put it together; just don't expect (m)any answers until you can provide more context.John Eisbrener– John Eisbrener2019年06月20日 16:39:24 +00:00Commented Jun 20, 2019 at 16:39
1 Answer 1
One option would be to create views for these "inconvenient" tables that only return rows where all foreign key relationships have been broken. Then you could do unconditional deletes from those views.
You would have the overhead of coding each of these views*, but that's a one-time startup cost. This is essentially the same as what you mentioned already:
I could write some code find out which rows in a certain table aren't referenced anywhere else
But in the end, you will have the "unconditional deletes" in your cleanup job.
Note: the approach below would not scale well on really large tables without indexes, and potentially some form of batching (e.g., putting a TOP 100 and ORDER BY in the view definition
Say you have two "main tables" that reference this "inconvenient table" (the one you need to remove rows from when they are no longer referenced:
CREATE TABLE dbo.InconvenientTable
(
Id int IDENTITY(1,1) NOT NULL,
[Description] varchar(10) NOT NULL,
CONSTRAINT PK_InconvenientTable
PRIMARY KEY (Id)
);
GO
CREATE TABLE dbo.MainTable1
(
Id int IDENTITY(1,1) NOT NULL,
[Description] varchar(10) NOT NULL,
InconvenientTableId int NOT NULL,
CONSTRAINT FK_MainTable1_InconvenientTable
FOREIGN KEY (InconvenientTableId)
REFERENCES dbo.InconvenientTable (Id)
);
CREATE TABLE dbo.MainTable2
(
Id int IDENTITY(1,1) NOT NULL,
[Description] varchar(10) NOT NULL,
InconvenientTableId int NOT NULL,
CONSTRAINT FK_MainTable2_InconvenientTable
FOREIGN KEY (InconvenientTableId)
REFERENCES dbo.InconvenientTable (Id)
);
The inconvenient table has 3 rows, all 3 of which are referenced in MainTable1, while only 1 is referenced in MainTable2.
INSERT INTO dbo.InconvenientTable
([Description])
VALUES
('One'),
('Two'),
('Three');
GO
INSERT INTO dbo.MainTable1
([Description], InconvenientTableId)
VALUES
('One', 1),
('Two', 2),
('Three', 3);
INSERT INTO dbo.MainTable2
([Description], InconvenientTableId)
VALUES
('Two', 2);
Now we need a view that shows all rows in dbo.InconvenientTable
that are not referenced by the two main tables:
CREATE VIEW dbo.InconvenientTable_RowsToDelete
AS
SELECT it.*
FROM dbo.InconvenientTable it
WHERE
NOT EXISTS (SELECT NULL FROM dbo.MainTable1 mt1 WHERE mt1.InconvenientTableId = it.Id)
AND NOT EXISTS (SELECT NULL FROM dbo.MainTable2 mt2 WHERE mt2.InconvenientTableId = it.Id);
GO
Currently all rows are referenced, so this query returns 0 rows:
screenshot of SSMS showing no rows returned by the view
Now let's delete the 'Two' row from both main tables:
DELETE dbo.MainTable2 WHERE InconvenientTableId = 2;
DELETE dbo.MainTable1 WHERE InconvenientTableId = 2;
And now the view returns that unreferenced row:
screenshot of SSMS showing one row
Now we can delete everything from the view, which successfully removes the 'Two' row from our inconvenient table for compliance reasons:
screenshot of SSMS showing 1 row being deleted from the view
*You could also attempt to automate the creation of the views with dynamic SQL and metadata queries, but that seems risky
Explore related questions
See similar questions with these tags.