Suppose we have a table that has a foreign key constraint to itself, like such:
CREATE TABLE Foo
(FooId BIGINT PRIMARY KEY,
ParentFooId BIGINT,
FOREIGN KEY([ParentFooId]) REFERENCES Foo ([FooId]) )
INSERT INTO Foo (FooId, ParentFooId)
VALUES (1, NULL), (2, 1), (3, 2)
UPDATE Foo SET ParentFooId = 3 WHERE FooId = 1
This table will have the following records:
FooId ParentFooId
----- -----------
1 3
2 1
3 2
There are cases where this kind of design could make sense (e.g. the typical "employee-and-boss-employee" relationship), and in any case: I'm in a situation where I have this in my schema.
This kind of design unfortunately allows for circularity in data records, as shown in the example above.
My question then is:
- Is it possible to write a constraint that checks this? and
- Is it feasible to write a constraint that checks this? (if needed only to a certain depth)
For part (2) of this question it may be relevant to mention that I expect only hundreds or perhaps in some cases thousands of records in my table, normally not nested any deeper than about 5 to 10 levels.
6 Answers 6
You are using the Adjacency List model, where it is difficult to enforce such a constraint.
You can examine the Nested Set model, where only true hierarchies can be represented (no circular paths). This has other drawbacks though, like slow Inserts/Updates.
-
I'm accepting this answer, because it was the one that helped me understand the possibility and feasability, i.e. it answered the question for me. However, anyone landing at this question should have a look at @a1ex07's answer for a constraint that works in simple cases, and @JohnGietzen's answer for the great links to
HIERARCHYID
which seems to be a native MSSQL2008 implementation of the nested set model.Jeroen– Jeroen2012年03月14日 22:20:54 +00:00Commented Mar 14, 2012 at 22:20
I have seen 2 main ways of enforcing this:
1, the OLD way:
CREATE TABLE Foo
(FooId BIGINT PRIMARY KEY,
ParentFooId BIGINT,
FooHierarchy VARCHAR(256),
FOREIGN KEY([ParentFooId]) REFERENCES Foo ([FooId]) )
The FooHierarchy column would contain a value like this:
"|1|27|425"
Where the numbers map to the FooId column. You would then enforce that the Hierarchy column ends with "|id" and the rest of the string matches the FooHieratchy of the PARENT.
2, the NEW way:
SQL Server 2008 has a new datatype called the HierarchyID, that does all of this for you.
It operates on the same principal as the OLD way, but it is handled efficiently by SQL Server, and is suitable for use as a REPLACEMENT for your "ParentID" column.
CREATE TABLE Foo
(FooId BIGINT PRIMARY KEY,
FooHierarchy HIERARCHYID )
-
1Do you have a source or brief demo demonstrating that
HIERARCHYID
prevents the creation of hierarchy loops?Nick Chammas– Nick Chammas2012年03月14日 22:58:27 +00:00Commented Mar 14, 2012 at 22:58 -
1@NickChammas HierarchyID is materialized path. One cannot materialize a cycle.2022年02月06日 14:05:42 +00:00Commented Feb 6, 2022 at 14:05
It is kind of possible: you can invoke a scalar UDF from you CHECK constraint, and it kind of can detect cycles of any length. Unfortunately, this approach is extremely slow and unreliable: you can have false positives and false negatives.
Instead, I would use materialized path.
Another way to avoid cycles is to have a CHECK(ID> ParentID), which is probably not very feasible either.
Yet another way to avoid cycles is to add two more columns, LevelInHierarchy and ParentLevelInHierarchy, have (ParentID, ParentLevelInHierarchy) refer to (ID, LevelInHierarchy), and have a CHECK(LevelInHierarchy> ParentLevelInHierarchy).
I believe it's possible :
create function test_foo (@id bigint) returns bit
as
begin
declare @retval bit;
with t1 as (select @id as FooId, 0 as lvl
union all
select f.FooId , t1.lvl+1 from t1
inner join Foo f ON (f.ParentFooId = t1.FooId)
where lvl<11) -- you said that max nested level 10, so if there is any circular
-- dependency, we don't need to go deeper than 11 levels to detect it
select @retval =
CASE(COUNT(*))
WHEN 0 THEN 0 -- for records that don't have children
WHEN 1 THEN 0 -- if a record has children
ELSE 1 -- recursion detected
END
from t1
where t1.FooId = @id ;
return @retval;
end;
GO
alter table Foo add constraint CHK_REC1 CHECK (dbo.test_foo(ParentFooId) = 0)
I might have missed something (sorry, I'm not able to test it throughly), but it seems to work.
-
2I agree that "it seems to work", but it may fail for multi-row updates, fail under snapshot isolation, and is very slow.A-K– A-K2012年03月05日 18:02:02 +00:00Commented Mar 5, 2012 at 18:02
-
@AlexKuznetsov: I realize that recursive query are relatively slow, and I agree that multi-row updates may be a problem (they can be disabled though).a1ex07– a1ex072012年03月05日 18:10:51 +00:00Commented Mar 5, 2012 at 18:10
-
@a1ex07 Thx for this suggestion. I tried it, and in simple cases it seems to work fine indeed. Not sure yet if failure on multi-row updates is a problem (though probably it is). I'm unsure though what you mean by "they can be disabled"?Jeroen– Jeroen2012年03月06日 11:29:06 +00:00Commented Mar 6, 2012 at 11:29
-
In my understanding, the task implies cursor(or row) based logic. So it makes sense to disable updates that modifies more than 1 row (simple instead of update trigger that raises an error if inserted table has more than 1 row).a1ex07– a1ex072012年03月06日 16:10:13 +00:00Commented Mar 6, 2012 at 16:10
-
If you cannot redesign the table, I 'd create a procedure that checks all constraints and adds/updates record. Then I 'll make sure nobody except this sp can insert /update this table.a1ex07– a1ex072012年03月06日 16:16:28 +00:00Commented Mar 6, 2012 at 16:16
Here is another option: a trigger that allows multi-row updates and enforces no cycles. It works by traversing the ancestor chain until it finds a root element (with parent NULL), thus proving there is no cycle. It is limited to 10 generations since of course a cycle is endless.
It only works with the current set of modified rows, so as long as updates don't touch a huge number of very deep items in the table, performance shouldn't be too bad. It does have to go all the way up the chain for each element, so it will have some performance impact.
A truly "intelligent" trigger would look for cycles directly by checking to see if an item reached itself and then bailing. However, this requires checking state of all previously-found nodes during each loop and thus takes a WHILE loop and more coding than I wanted to do right now. This shouldn't be really any more expensive because the normal operation would be to not have cycles and in this case it will be faster working with only the prior generation rather than all previous nodes during each loop.
I'd love input from @AlexKuznetsov or anyone else on how this would fare in snapshot isolation. I suspect it wouldn't very well, but would like to understand it better.
CREATE TRIGGER TR_Foo_PreventCycles_IU ON Foo FOR INSERT, UPDATE
AS
SET NOCOUNT ON;
SET XACT_ABORT ON;
IF EXISTS (
SELECT *
FROM sys.dm_exec_session
WHERE session_id = @@SPID
AND transaction_isolation_level = 5
)
BEGIN;
SET TRANSACTION ISOLATION LEVEL READ COMMITTED;
END;
DECLARE
@CycledFooId bigint,
@Message varchar(8000);
WITH Cycles AS (
SELECT
FooId SourceFooId,
ParentFooId AncestorFooId,
1 Generation
FROM Inserted
UNION ALL
SELECT
C.SourceFooId,
F.ParentFooId,
C.Generation + 1
FROM
Cycles C
INNER JOIN dbo.Foo F
ON C.AncestorFooId = F.FooId
WHERE
C.Generation <= 10
)
SELECT TOP 1 @CycledFooId = SourceFooId
FROM Cycles C
GROUP BY SourceFooId
HAVING Count(*) = Count(AncestorFooId); -- Doesn't have a NULL AncestorFooId in any row
IF @@RowCount > 0 BEGIN
SET @Message = CASE WHEN EXISTS (SELECT * FROM Deleted) THEN 'UPDATE' ELSE 'INSERT' END + ' statement violated TRIGGER ''TR_Foo_PreventCycles_IU'' on table "dbo.Foo". A Foo cannot be its own ancestor. Example value is FooId ' + QuoteName(@CycledFooId, '"') + ' with ParentFooId ' + Quotename((SELECT ParentFooId FROM Inserted WHERE FooID = @CycledFooId), '"');
RAISERROR(@Message, 16, 1);
ROLLBACK TRAN;
END;
Update
I figured out how to avoid an extra join back to the Inserted table. If anyone sees a better way to do the GROUP BY to detect those that don't contain a NULL please let me know.
I also added a switch to READ COMMITTED if the current session is in SNAPSHOT ISOLATION level. This will prevent inconsistencies, though unfortunately will cause increased blocking. That is kind of unavoidable for the task at hand.
-
You should use WITH (READCOMMITTEDLOCK) hint. Hugo Kornelis wrote an example: web.archive.org/web/20171103062422/http://sqlblog.com/blogs/…A-K– A-K2012年03月16日 14:56:52 +00:00Commented Mar 16, 2012 at 14:56
-
Thanks @Alex those articles were dynamite and helped me understand snapshot isolation a lot better. I've added a conditional switch to read uncommitted to my code.ErikE– ErikE2012年03月16日 20:34:37 +00:00Commented Mar 16, 2012 at 20:34
-
Switching to read committed is not sufficient.
READCOMMITTEDLOCK
hints must be used on every data access because RCSI might be enabled, and that also allows reading versions instead of the most recently-committed data. We must not validate constraints using out-of-date information.2022年02月06日 14:04:08 +00:00Commented Feb 6, 2022 at 14:04
If your records are nested more than 1 level, a constraint isn't going to work (I'm assuming by that you mean e.g. record 1 is the parent of record 2, and record 3 is the parent of record 1). The only way to do this would be either in the parent code or with a trigger, but if you're looking at a large table and multiple levels this would be pretty intensive.
Explore related questions
See similar questions with these tags.