When implementing soft delete on a table that can be searched by other columns, which is the correct way to index it?
Let's say the table has an id
field, and a couple of text
fields, and finally a isDeleted
Boolean
field.
All queries will include WHERE isDeleted=FALSE AND ...
Should I add one index for each (id, each text columns, and finally one for isDeleted)? Or Composite indexes that include isDeleted (e.g. INDEX x ON "Table"("id","isDeleted")
, etc)? Or something else?
I'm tempted to leave the isDeleted
alone since it will only be scanned on content i might return based on other indexes, and I don't expect to have most of the data deleted.
3 Answers 3
If the index is solely for the delete purpose you may create a filtered
or partial index
CREATE INDEX "IX_NAME" on "table" (id,text1,..) where isDeleted =FALSE;
This way the index will be compact and efficient. You may choose to create a compound index if multiple columns are referenced in the same query. If the "ID" alone provides high selectivity you could also remove other fields from the index.
Considering the comment from Laurenz, while both "IS" and "=" operator in query bring same result, their semantics are different. When they encounter NULL
value, "IS" evaluate it to FALSE
and "=" evaluate it to NULL
. Even though rows evaluated to both NULL
and FALSE
are excluded from the result the query and index definition should be a literal match
. Due to this an index with "=" do not support a query with "IS" and vice versa. An index that could support both of them is,
CREATE INDEX "IX_NAME" on "table" (id,text1,..) where (isDeleted =FALSE or isDeleted is FALSE);
-
1@LaurenzAlbe that was really an interesting piece of information. At first I wrote it with "=", but then I saw an example in site with "IS" hence I changed. I am curios should the application use "IS" or "=" for booleans ? "IS" for nullable columns?goodfella– goodfella2025年08月28日 07:31:55 +00:00Commented Aug 28 at 7:31
I think that sometimes, single field indexes are wasted.
Assuming you want to filter out isDeleted ones, I would create indexes
- Id, isDeleted
- Text1, isDeleted
- Text2, isDeleted
I'd like to "move" archived (aka soft deleted) records to a different table partition:
CREATE TABLE t1 (id int GENERATED ALWAYS AS IDENTITY , f1 text, f2 text, del_stamp timestamptz) PARTITION BY LIST ( (del_stamp IS NULL ));
CREATE TABLE t1_active PARTITION OF t1 FOR VALUES IN (TRUE);
CREATE TABLE t1_archive PARTITION OF t1 FOR VALUES IN (FALSE);
ALTER TABLE t1_active -- pkey for foreign key usage
ADD CONSTRAINT pkey_t1_active PRIMARY KEY (id);
ALTER TABLE t1_archive -- pkey for foreign key usage
ADD CONSTRAINT pkey_t1_archive PRIMARY KEY (id);
INSERT INTO t1(f1, f2)
VALUES ('foo', 'bar')
,('another foo', 'some more bar');
ANALYSE t1;
EXPLAIN(ANALYSE , VERBOSE , BUFFERS )
SELECT *
FROM t1
WHERE del_stamp IS NULL;
EXPLAIN(ANALYSE , VERBOSE , BUFFERS )
SELECT *
FROM t1
WHERE del_stamp IS NOT NULL; -- deleted records
UPDATE t1
SET del_stamp = now() -- soft delete
WHERE id = 1
AND del_stamp IS NULL; -- not "deleted" yet
EXPLAIN(ANALYSE , VERBOSE , BUFFERS )
SELECT *
FROM t1
WHERE del_stamp IS NULL;
EXPLAIN(ANALYSE , VERBOSE , BUFFERS )
SELECT *
FROM t1
WHERE del_stamp IS NOT NULL; -- deleted records
And one of the query plans, only reading from public.t1_active:
Seq Scan on public.t1_active t1 (cost=0.00..1.02 rows=2 width=29)
(actual time=0.043..0.044 rows=1 loops=1) Output: t1.id, t1.f1,
t1.f2, t1.del_stamp Filter: (t1.del_stamp IS NULL) Buffers: shared
hit=1 Query Identifier: -8475205591691465029 Planning Time: 0.131 ms
Execution Time: 0.066 ms
As you can see in the query plan, the planner has already selected the partition you need. You don't need to worry about an index on the soft-delete column. Another benefit is that deleted records do not pollute your active partition. And when you implement the soft-delete feature with a timestamp, data retention will be much easier to handle as well. You can even create sub-partitions per month and discard entire partitions after a specified number of months.