capturing counts of open items between dates

Question 1

DB version is: Oracle 12c. Restricted to the use of inline PL/SQL.

The end goal here is to get a count of open items by day, inclusive of created date, exclusive of the end date.

The data structure is approximately:

| id | open_date | close_date |
|----|------------|------------|
| a | 01/01/2020 | 01/04/2020 |
| b | 01/02/2020 | 01/05/2020 |

The end result would be something like:

| date | open_item |
|------------|-----------|
| 01/01/2020 | 1 |
| 01/02/2020 | 2 |
| 01/03/2020 | 2 |
| 01/04/2020 | 1 |
| 01/05/2020 | 0 |

the start date in all this is user input, but that's not terribly relevant.

What I had done is generate a table of dates using that user input date in a cte like so:

with caledar_dates as (
select user_input_date + rownum-1 dates
from dual connect by rownum < sysdate-user_input_date+1
)

Then joined the items table to the calendar_dates table in another CTE like so:

item_list as (
 select
 a.dates,
 b.id
 from calendar_dates a
 left join ticket_table b on a.dates>=b.created_date-1 and a.dates<b.closed_date
)

Finally, I then simply

select count(b.id), dates from item_list group by dates

There may be a few syntax errors above since I am truncating the actual code. This query works; however, it seems like an inefficient, wordy, ugly approach to the problem and if the user inputs a date a few years back, the query takes a few minutes to run.

Looking for an alternative and hopefully more efficient approach.

Not sure if this went on SO or here so happy to post over there if that seems the better venue.

Mahalo's in advance.

Question 2

Welcome to Code Review! "I am truncating the actual code" is usually not something that is liked very much here on this site. Unfortunately my knowledge on your specific topic is not deep enough to be able to judge if crucial information is missing because of the abbreviated code.

Question 3

Thanks AlexV. The code that is removed is just to remove information specific to the database I am working, ie other columns I am pulling + a few column renames. The time difference between running the truncated code vs. the actual code is not measurable on my side.

Question 4

One thing to note, I would recommend using TRUNC(b.created_date) instead of b.created_date-1. I know it seems trivial, but I ran into an issue with this a few years back where something happened exactly at midnight (DATE datatype so down to the second) and it was a huge pain to debug what was going wrong when I pulled too much info back.

Question 5

In looking at your query, I think there may be a few things you can do to improve your performance. The first is that you can aggregate your ticket_table ahead of time based on the request interval. Second, unless it is needed for the logic that you redacted, I would avoid using the CTE once you start doing the real logic. You may be depriving the optomizer of information it could use. You might try something like the following query:

WITH query_interval AS
(
 SELECT TO_DATE('09/01/2012', 'MM/DD/YYYY') AS START_DATE, TRUNC(SYSDATE) AS END_DATE
 FROM DUAL
),
interval_dates AS
(
 SELECT qi.start_date + ROWNUM - 1 AS I_DATE
 FROM query_interval qi
 CONNECT BY ROWNUM < qi.end_date - qi.start_date
)
SELECT i.i_date, SUM(sub.cycle_count)
FROM interval_dates i
LEFT JOIN (SELECT GREATEST(TRUNC(tt.start_date), qi.start_date) AS OPEN_DATE,
 LEAST(TRUNC(tt.end_date)-1, qi.end_date) AS CLOSE_DATE,
 COUNT(*) AS CYCLE_COUNT
 FROM ticket_table tt
 CROSS JOIN query_interval qi
 WHERE GREATEST(TRUNC(tt.start_date), qi.start_date) <= LEAST(TRUNC(tt.end_date)-1, qi.end_date)
 GROUP BY GREATEST(TRUNC(tt.start_date), qi.start_date), LEAST(TRUNC(tt.end_date)-1, qi.end_date)) sub ON I_DATE BETWEEN sub.open_date AND sub.close_date
GROUP BY i.i_date
ORDER BY i.i_date;

Most of this is similar to what you already had, but there are some differences. First you notice that I added a CTE with just the interval. This allows me to perform the aggregation that I mentioned earlier. The aggregation works like this:

First, we limit our search to tickets which were open at some point during our query interval. That I'm using the GREATEST <= LEAST to determine an overlap between the ticket and the query interval.
We clip tickets that hang over the ends of our search interval. You really don't care that they are open before your search, just that they are open during your search.
We aggregate all cycles which now have the same start and end dates.

After that, the logic is pretty similar to what you already had. A left join between the full list of dates and our modified table. Then instead of using COUNT, we use SUM.

Del Del 1384 bronze badges · Accepted Answer · 2020-04-02 16:15:23Z

In looking at your query, I think there may be a few things you can do to improve your performance. The first is that you can aggregate your ticket_table ahead of time based on the request interval. Second, unless it is needed for the logic that you redacted, I would avoid using the CTE once you start doing the real logic. You may be depriving the optomizer of information it could use. You might try something like the following query:

WITH query_interval AS
(
 SELECT TO_DATE('09/01/2012', 'MM/DD/YYYY') AS START_DATE, TRUNC(SYSDATE) AS END_DATE
 FROM DUAL
),
interval_dates AS
(
 SELECT qi.start_date + ROWNUM - 1 AS I_DATE
 FROM query_interval qi
 CONNECT BY ROWNUM < qi.end_date - qi.start_date
)
SELECT i.i_date, SUM(sub.cycle_count)
FROM interval_dates i
LEFT JOIN (SELECT GREATEST(TRUNC(tt.start_date), qi.start_date) AS OPEN_DATE,
 LEAST(TRUNC(tt.end_date)-1, qi.end_date) AS CLOSE_DATE,
 COUNT(*) AS CYCLE_COUNT
 FROM ticket_table tt
 CROSS JOIN query_interval qi
 WHERE GREATEST(TRUNC(tt.start_date), qi.start_date) <= LEAST(TRUNC(tt.end_date)-1, qi.end_date)
 GROUP BY GREATEST(TRUNC(tt.start_date), qi.start_date), LEAST(TRUNC(tt.end_date)-1, qi.end_date)) sub ON I_DATE BETWEEN sub.open_date AND sub.close_date
GROUP BY i.i_date
ORDER BY i.i_date;

Most of this is similar to what you already had, but there are some differences. First you notice that I added a CTE with just the interval. This allows me to perform the aggregation that I mentioned earlier. The aggregation works like this:

First, we limit our search to tickets which were open at some point during our query interval. That I'm using the GREATEST <= LEAST to determine an overlap between the ticket and the query interval.
We clip tickets that hang over the ends of our search interval. You really don't care that they are open before your search, just that they are open during your search.
We aggregate all cycles which now have the same start and end dates.

After that, the logic is pretty similar to what you already had. A left join between the full list of dates and our modified table. Then instead of using COUNT, we use SUM.

Stack Exchange Network

capturing counts of open items between dates

1 Answer 1

Your Answer

Sign up or log in

Post as a guest

Post as a guest

Hot Network Questions

capturing counts of open items between dates

1 Answer 1

Your Answer

Sign up or log in

Post as a guest

Post as a guest

Related

Hot Network Questions