0

When implementing soft delete on a table that can be searched by other columns, which is the correct way to index it?

Let's say the table has an id field, and a couple of text fields, and finally a isDeleted Boolean field.

All queries will include WHERE isDeleted=FALSE AND ...

Should I add one index for each (id, each text columns, and finally one for isDeleted)? Or Composite indexes that include isDeleted (e.g. INDEX x ON "Table"("id","isDeleted"), etc)? Or something else?

I'm tempted to leave the isDeleted alone since it will only be scanned on content i might return based on other indexes, and I don't expect to have most of the data deleted.

mustaccio
28.7k24 gold badges60 silver badges77 bronze badges
asked Aug 28 at 1:24

3 Answers 3

4

If the index is solely for the delete purpose you may create a filtered or partial index

CREATE INDEX "IX_NAME" on "table" (id,text1,..) where isDeleted =FALSE;

This way the index will be compact and efficient. You may choose to create a compound index if multiple columns are referenced in the same query. If the "ID" alone provides high selectivity you could also remove other fields from the index.

Considering the comment from Laurenz, while both "IS" and "=" operator in query bring same result, their semantics are different. When they encounter NULL value, "IS" evaluate it to FALSE and "=" evaluate it to NULL. Even though rows evaluated to both NULL and FALSE are excluded from the result the query and index definition should be a literal match. Due to this an index with "=" do not support a query with "IS" and vice versa. An index that could support both of them is,

CREATE INDEX "IX_NAME" on "table" (id,text1,..) where (isDeleted =FALSE or isDeleted is FALSE);
answered Aug 28 at 4:16
1
  • 1
    @LaurenzAlbe that was really an interesting piece of information. At first I wrote it with "=", but then I saw an example in site with "IS" hence I changed. I am curios should the application use "IS" or "=" for booleans ? "IS" for nullable columns? Commented Aug 28 at 7:31
1

I think that sometimes, single field indexes are wasted.

Assuming you want to filter out isDeleted ones, I would create indexes

  • Id, isDeleted
  • Text1, isDeleted
  • Text2, isDeleted
answered Aug 28 at 3:20
1

I'd like to "move" archived (aka soft deleted) records to a different table partition:

CREATE TABLE t1 (id int GENERATED ALWAYS AS IDENTITY , f1 text, f2 text, del_stamp timestamptz) PARTITION BY LIST ( (del_stamp IS NULL ));
CREATE TABLE t1_active PARTITION OF t1 FOR VALUES IN (TRUE);
CREATE TABLE t1_archive PARTITION OF t1 FOR VALUES IN (FALSE);
ALTER TABLE t1_active -- pkey for foreign key usage
 ADD CONSTRAINT pkey_t1_active PRIMARY KEY (id);
ALTER TABLE t1_archive -- pkey for foreign key usage
 ADD CONSTRAINT pkey_t1_archive PRIMARY KEY (id);
INSERT INTO t1(f1, f2)
VALUES ('foo', 'bar')
 ,('another foo', 'some more bar');
ANALYSE t1;
EXPLAIN(ANALYSE , VERBOSE , BUFFERS )
SELECT *
FROM t1
WHERE del_stamp IS NULL;
EXPLAIN(ANALYSE , VERBOSE , BUFFERS )
SELECT *
FROM t1
WHERE del_stamp IS NOT NULL; -- deleted records
UPDATE t1
SET del_stamp = now() -- soft delete
WHERE id = 1
AND del_stamp IS NULL; -- not "deleted" yet
EXPLAIN(ANALYSE , VERBOSE , BUFFERS )
SELECT *
FROM t1
WHERE del_stamp IS NULL;
EXPLAIN(ANALYSE , VERBOSE , BUFFERS )
SELECT *
FROM t1
WHERE del_stamp IS NOT NULL; -- deleted records

And one of the query plans, only reading from public.t1_active:

Seq Scan on public.t1_active t1 (cost=0.00..1.02 rows=2 width=29)
(actual time=0.043..0.044 rows=1 loops=1) Output: t1.id, t1.f1,
t1.f2, t1.del_stamp Filter: (t1.del_stamp IS NULL) Buffers: shared
hit=1 Query Identifier: -8475205591691465029 Planning Time: 0.131 ms
Execution Time: 0.066 ms

As you can see in the query plan, the planner has already selected the partition you need. You don't need to worry about an index on the soft-delete column. Another benefit is that deleted records do not pollute your active partition. And when you implement the soft-delete feature with a timestamp, data retention will be much easier to handle as well. You can even create sub-partitions per month and discard entire partitions after a specified number of months.

answered Aug 28 at 16:37

Your Answer

Draft saved
Draft discarded

Sign up or log in

Sign up using Google
Sign up using Email and Password

Post as a guest

Required, but never shown

Post as a guest

Required, but never shown

By clicking "Post Your Answer", you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.