1

I have a table train_statuses. With the schema as below:

CREATE TABLE public.train_statuses (
 id uuid NOT NULL,
 status text NOT NULL,
 updated_at timestamp without time zone NOT NULL
);
Indexes:
 "train_statuses_id_updated_at_key" UNIQUE CONSTRAINT, btree (id, updated_at)
 "idx_train_statuses_id_updated_at" btree (id, updated_at)

This table can have multiple entries for the same id. The status can be one among these values : 'cancelled', 'queued','executed', 'failed', 'succeeded'

Now I need to write a query that inserts entries into train_statuses for all ids that has their status only as either queued or executed with status as 'cancelled'

I came up with this query

WITH ts AS (
 SELECT 
 DISTINCT ON (ts.id) ts.id, ts.status
 FROM train_statuses ts
 INNER JOIN train_statuses t ON ts.id = t.id
 WHERE 
 ts.status IN ('queued','executed') 
 AND ts.status NOT IN ('failed', 'succeeded','cancelled')
 )
 INSERT INTO train_statuses (id, updated_at, status) SELECT id, now(), 'cancelled' FROM ts

This technically works. But I feel there would be a more optimal way to get this done.

Joining train_statuses again with itself doesn't seem that great since the table size is around 150GB and we don't have index on status.

Please go through and let me know if there is a better way to get the desired result.

Thanks

Note: If there is any better title for this question please add an edit suggestion.

asked Dec 28, 2020 at 9:50
8
  • 1
    (1) ts.status IN ('queued','executed') is excess - status is defined as NOT NULL. (2) NOT IN is slow - use NOT EXISTS. (3) ts.status selected in CTE is not used - remove it. Commented Dec 28, 2020 at 13:38
  • So, train_statuses can have multiple records for one id ? Commented Dec 31, 2020 at 17:06
  • Yes it can have multiple records @GerardH.Pille Commented Jan 3, 2021 at 13:32
  • Can the same train be cancelled or queued, or whatever, multiple times too? Commented Jan 3, 2021 at 14:49
  • 1
    Therefore – and please correct me if I'm wrong – this is not about trains having their status as either queued or executed but rather about those having their latest status as either queued or executed, correct? Commented Jan 3, 2021 at 16:13

1 Answer 1

1

Removed superfluous joins and criteria

WITH ts AS (
 SELECT ts.id
 FROM train_statuses ts
 WHERE ts.status IN ('queued','executed')
 EXCEPT
 SELECT ts.id
 FROM train_statuses ts
 WHERE ts.status Not IN ('queued','executed')
)
 INSERT INTO train_statuses (id, updated_at, status) SELECT id, now(), 'cancelled' FROM ts
answered Jan 3, 2021 at 14:17
3
  • 1
    In PostgreSQL, it's EXCEPT, not MINUS. Other than that, an important detail has just come up in the question's comment section that's likely going to invalidate your otherwise neat solution. Commented Jan 3, 2021 at 15:54
  • @AndriyM you're awful but I like you. Commented Jan 3, 2021 at 15:56
  • I aim to please, even though I'm awful at that Commented Jan 3, 2021 at 16:15

Your Answer

Draft saved
Draft discarded

Sign up or log in

Sign up using Google
Sign up using Email and Password

Post as a guest

Required, but never shown

Post as a guest

Required, but never shown

By clicking "Post Your Answer", you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.