5
\$\begingroup\$

I have the following table in a PostgreSQL DB (simplified for the question), where I want to select only the rows marked with *:

uuid | status | info1 | info2
-------------------------------
1234 | CONF | - | 10
1234 | START | abc | 10
1234 | FIN | abc | 10 *
-------------------------------
7890 | CONF | - | 20 *
-------------------------------
1111 | CONF | - | 30
1111 | START | zzz | 30
1111 | FIN | zzz | 30
1111 | REM | zzz | 30
-------------------------------
2222 | CONF | - | 100
2222 | START | ijk | 100 *

The separating lines between different values of uuid are only for clarity, as I will explain now.

The logic for selection is the following:

  • Partition the table by the column uuid in order to create groups of all the different values.
  • Discard all the groups where there is at least one row with status REM.
  • For all the remaining groups, select only the one with the last status. The values of this column define a custom order like this: CONF < START < FIN. The value REM is only used to remove this group of the selection.

I wrote this query that does the job, but I think there might be a better solution, and I have the sensation of abusing the subtable structures:

SELECT *
FROM (
 SELECT MAX(numericphase.phase) OVER (PARTITION BY numericphase.uuid) AS s
 , *
 FROM(
 SELECT
 CASE
 WHEN status = 'FIN' THEN 3
 WHEN status = 'START' THEN 2
 WHEN status = 'CONF' THEN 1
 ELSE -1
 END AS phase
 , *
 FROM (
 SELECT *
 , BOOL_OR(status = 'REM')
 OVER (PARTITION BY uuid) AS removed
 FROM mytable
 ) nonremoved
 WHERE NOT removed
 ) numericphase
) lastphase
WHERE s = phase
AND s > 0

Here I'm doing the following operations (from inner to outer):

  • First, select all rows where status is not REM (using BOOL_OR).
  • Then, map each status value to a numeric one so they can be sorted.
  • Compute the maximum value of each uuid group (corresponding to the last status following the custom sorting).
  • Finally, select only rows where status = max(status). This way, only the rows corresponding to the maximum status for each group are selected.

Any possible status that does not correspond to any value in this question is not suitable, so they are mapped to -1 and discarded.

I think is easier to understand the problem by looking at the table than trying to redact it, so feel free to correct me if something is not clear.

asked Feb 17, 2020 at 14:06
\$\endgroup\$
0

1 Answer 1

1
\$\begingroup\$

As I wrote in my edit of the question, I came to the following solution, which I think could be better optimized, but does the job:

SELECT *
FROM (
 SELECT *
 , MAX(numericphase.phase) OVER (PARTITION BY numericphase.uuid) AS s
 FROM (
 SELECT *
 , CASE
 WHEN status = 'REM' THEN 4
 WHEN status = 'FIN' THEN 3
 WHEN status = 'START' THEN 2
 WHEN status = 'CONF' THEN 1
 ELSE -1
 END AS phase
 FROM mytable
 ) numericphase
) lastphase
WHERE phase != 4
AND s = phase
AND s > 0
answered Feb 21, 2020 at 8:39
\$\endgroup\$
1
  • 1
    \$\begingroup\$ Thanks for posting this (and it can now be removed from the question, where it doesn't really fit). You could improve the value of this answer by actually reviewing the code and showing how this replacement improves on it. We value self-answers; it does help if they reach the same standards of explanation as other answers. \$\endgroup\$ Commented Feb 21, 2020 at 9:58

Your Answer

Draft saved
Draft discarded

Sign up or log in

Sign up using Google
Sign up using Email and Password

Post as a guest

Required, but never shown

Post as a guest

Required, but never shown

By clicking "Post Your Answer", you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.