Tables with hierarchy: create a constraint to prevent circularity through foreign keys

Question 1

Suppose we have a table that has a foreign key constraint to itself, like such:

CREATE TABLE Foo 
 (FooId BIGINT PRIMARY KEY,
 ParentFooId BIGINT,
 FOREIGN KEY([ParentFooId]) REFERENCES Foo ([FooId]) )
INSERT INTO Foo (FooId, ParentFooId) 
VALUES (1, NULL), (2, 1), (3, 2)
UPDATE Foo SET ParentFooId = 3 WHERE FooId = 1

This table will have the following records:

FooId ParentFooId
----- -----------
1 3
2 1
3 2

There are cases where this kind of design could make sense (e.g. the typical "employee-and-boss-employee" relationship), and in any case: I'm in a situation where I have this in my schema.

This kind of design unfortunately allows for circularity in data records, as shown in the example above.

My question then is:

Is it possible to write a constraint that checks this? and
Is it feasible to write a constraint that checks this? (if needed only to a certain depth)

For part (2) of this question it may be relevant to mention that I expect only hundreds or perhaps in some cases thousands of records in my table, normally not nested any deeper than about 5 to 10 levels.

Question 2

You are using the Adjacency List model, where it is difficult to enforce such a constraint.

You can examine the Nested Set model, where only true hierarchies can be represented (no circular paths). This has other drawbacks though, like slow Inserts/Updates.

Question 3

I'm accepting this answer, because it was the one that helped me understand the possibility and feasability, i.e. it answered the question for me. However, anyone landing at this question should have a look at @a1ex07's answer for a constraint that works in simple cases, and @JohnGietzen's answer for the great links to HIERARCHYID which seems to be a native MSSQL2008 implementation of the nested set model.

Question 4

I have seen 2 main ways of enforcing this:

1, the OLD way:

CREATE TABLE Foo 
 (FooId BIGINT PRIMARY KEY,
 ParentFooId BIGINT,
 FooHierarchy VARCHAR(256),
 FOREIGN KEY([ParentFooId]) REFERENCES Foo ([FooId]) )

The FooHierarchy column would contain a value like this:

"|1|27|425"

Where the numbers map to the FooId column. You would then enforce that the Hierarchy column ends with "|id" and the rest of the string matches the FooHieratchy of the PARENT.

2, the NEW way:

SQL Server 2008 has a new datatype called the HierarchyID, that does all of this for you.

It operates on the same principal as the OLD way, but it is handled efficiently by SQL Server, and is suitable for use as a REPLACEMENT for your "ParentID" column.

CREATE TABLE Foo 
 (FooId BIGINT PRIMARY KEY,
 FooHierarchy HIERARCHYID )

Question 5

Do you have a source or brief demo demonstrating that HIERARCHYID prevents the creation of hierarchy loops?

Question 6

@NickChammas HierarchyID is materialized path. One cannot materialize a cycle.

Question 7

It is kind of possible: you can invoke a scalar UDF from you CHECK constraint, and it kind of can detect cycles of any length. Unfortunately, this approach is extremely slow and unreliable: you can have false positives and false negatives.

Instead, I would use materialized path.

Another way to avoid cycles is to have a CHECK(ID> ParentID), which is probably not very feasible either.

Yet another way to avoid cycles is to add two more columns, LevelInHierarchy and ParentLevelInHierarchy, have (ParentID, ParentLevelInHierarchy) refer to (ID, LevelInHierarchy), and have a CHECK(LevelInHierarchy> ParentLevelInHierarchy).

Question 8

I believe it's possible :

create function test_foo (@id bigint) returns bit
as
begin
declare @retval bit;
with t1 as (select @id as FooId, 0 as lvl 
union all 
 select f.FooId , t1.lvl+1 from t1 
 inner join Foo f ON (f.ParentFooId = t1.FooId)
 where lvl<11) -- you said that max nested level 10, so if there is any circular 
-- dependency, we don't need to go deeper than 11 levels to detect it
 select @retval =
 CASE(COUNT(*)) 
 WHEN 0 THEN 0 -- for records that don't have children
 WHEN 1 THEN 0 -- if a record has children
 ELSE 1 -- recursion detected
 END
 from t1
 where t1.FooId = @id ;
return @retval; 
end;
GO
alter table Foo add constraint CHK_REC1 CHECK (dbo.test_foo(ParentFooId) = 0)

I might have missed something (sorry, I'm not able to test it throughly), but it seems to work.

Question 9

I agree that "it seems to work", but it may fail for multi-row updates, fail under snapshot isolation, and is very slow.

Question 10

@AlexKuznetsov: I realize that recursive query are relatively slow, and I agree that multi-row updates may be a problem (they can be disabled though).

Question 11

@a1ex07 Thx for this suggestion. I tried it, and in simple cases it seems to work fine indeed. Not sure yet if failure on multi-row updates is a problem (though probably it is). I'm unsure though what you mean by "they can be disabled"?

Question 12

In my understanding, the task implies cursor(or row) based logic. So it makes sense to disable updates that modifies more than 1 row (simple instead of update trigger that raises an error if inserted table has more than 1 row).

Question 13

If you cannot redesign the table, I 'd create a procedure that checks all constraints and adds/updates record. Then I 'll make sure nobody except this sp can insert /update this table.

Question 14

Here is another option: a trigger that allows multi-row updates and enforces no cycles. It works by traversing the ancestor chain until it finds a root element (with parent NULL), thus proving there is no cycle. It is limited to 10 generations since of course a cycle is endless.

It only works with the current set of modified rows, so as long as updates don't touch a huge number of very deep items in the table, performance shouldn't be too bad. It does have to go all the way up the chain for each element, so it will have some performance impact.

A truly "intelligent" trigger would look for cycles directly by checking to see if an item reached itself and then bailing. However, this requires checking state of all previously-found nodes during each loop and thus takes a WHILE loop and more coding than I wanted to do right now. This shouldn't be really any more expensive because the normal operation would be to not have cycles and in this case it will be faster working with only the prior generation rather than all previous nodes during each loop.

I'd love input from @AlexKuznetsov or anyone else on how this would fare in snapshot isolation. I suspect it wouldn't very well, but would like to understand it better.

CREATE TRIGGER TR_Foo_PreventCycles_IU ON Foo FOR INSERT, UPDATE
AS
SET NOCOUNT ON;
SET XACT_ABORT ON;
IF EXISTS (
 SELECT *
 FROM sys.dm_exec_session
 WHERE session_id = @@SPID
 AND transaction_isolation_level = 5
)
BEGIN;
 SET TRANSACTION ISOLATION LEVEL READ COMMITTED;
END;
DECLARE
 @CycledFooId bigint,
 @Message varchar(8000);
WITH Cycles AS (
 SELECT
 FooId SourceFooId,
 ParentFooId AncestorFooId,
 1 Generation
 FROM Inserted
 UNION ALL
 SELECT
 C.SourceFooId,
 F.ParentFooId,
 C.Generation + 1
 FROM
 Cycles C
 INNER JOIN dbo.Foo F
 ON C.AncestorFooId = F.FooId
 WHERE
 C.Generation <= 10
)
SELECT TOP 1 @CycledFooId = SourceFooId
FROM Cycles C
GROUP BY SourceFooId
HAVING Count(*) = Count(AncestorFooId); -- Doesn't have a NULL AncestorFooId in any row
IF @@RowCount > 0 BEGIN
 SET @Message = CASE WHEN EXISTS (SELECT * FROM Deleted) THEN 'UPDATE' ELSE 'INSERT' END + ' statement violated TRIGGER ''TR_Foo_PreventCycles_IU'' on table "dbo.Foo". A Foo cannot be its own ancestor. Example value is FooId ' + QuoteName(@CycledFooId, '"') + ' with ParentFooId ' + Quotename((SELECT ParentFooId FROM Inserted WHERE FooID = @CycledFooId), '"');
 RAISERROR(@Message, 16, 1);
 ROLLBACK TRAN; 
END;

Update

I figured out how to avoid an extra join back to the Inserted table. If anyone sees a better way to do the GROUP BY to detect those that don't contain a NULL please let me know.

I also added a switch to READ COMMITTED if the current session is in SNAPSHOT ISOLATION level. This will prevent inconsistencies, though unfortunately will cause increased blocking. That is kind of unavoidable for the task at hand.

Question 15

You should use WITH (READCOMMITTEDLOCK) hint. Hugo Kornelis wrote an example: web.archive.org/web/20171103062422/http://sqlblog.com/blogs/…

Question 16

Thanks @Alex those articles were dynamite and helped me understand snapshot isolation a lot better. I've added a conditional switch to read uncommitted to my code.

Question 17

Switching to read committed is not sufficient. READCOMMITTEDLOCK hints must be used on every data access because RCSI might be enabled, and that also allows reading versions instead of the most recently-committed data. We must not validate constraints using out-of-date information.

Question 18

If your records are nested more than 1 level, a constraint isn't going to work (I'm assuming by that you mean e.g. record 1 is the parent of record 2, and record 3 is the parent of record 1). The only way to do this would be either in the parent code or with a trigger, but if you're looking at a large table and multiple levels this would be pretty intensive.

ypercubeTM ypercubeTM 99.7k13 gold badges217 silver badges306 bronze badges · Accepted Answer · 2012-03-05 18:02:14Z

6

You are using the Adjacency List model, where it is difficult to enforce such a constraint.

You can examine the Nested Set model, where only true hierarchies can be represented (no circular paths). This has other drawbacks though, like slow Inserts/Updates.

Share

Improve this answer

edited Mar 14, 2012 at 22:38

Nick Chammas's user avatar

Nick Chammas

14.8k17 gold badges77 silver badges124 bronze badges

answered Mar 5, 2012 at 18:02

ypercubeTM's user avatar

ypercubeTM ypercubeTM

99.7k13 gold badges217 silver badges306 bronze badges

1

I'm accepting this answer, because it was the one that helped me understand the possibility and feasability, i.e. it answered the question for me. However, anyone landing at this question should have a look at @a1ex07's answer for a constraint that works in simple cases, and @JohnGietzen's answer for the great links to HIERARCHYID which seems to be a native MSSQL2008 implementation of the nested set model.

Jeroen
– Jeroen

2012年03月14日 22:20:54 +00:00
Commented Mar 14, 2012 at 22:20

Add a comment |

Stack Exchange Network

Tables with hierarchy: create a constraint to prevent circularity through foreign keys

6 Answers 6

1, the OLD way:

2, the NEW way:

Your Answer

Sign up or log in

Post as a guest

Post as a guest

Linked

Hot Network Questions

Tables with hierarchy: create a constraint to prevent circularity through foreign keys

6 Answers 6

1, the OLD way:

2, the NEW way:

Your Answer

Sign up or log in

Post as a guest

Post as a guest

Linked

Related

Hot Network Questions