DB version is: Oracle 12c. Restricted to the use of inline PL/SQL.
The end goal here is to get a count of open items by day, inclusive of created date, exclusive of the end date.
The data structure is approximately:
| id | open_date | close_date |
|----|------------|------------|
| a | 01/01/2020 | 01/04/2020 |
| b | 01/02/2020 | 01/05/2020 |
The end result would be something like:
| date | open_item |
|------------|-----------|
| 01/01/2020 | 1 |
| 01/02/2020 | 2 |
| 01/03/2020 | 2 |
| 01/04/2020 | 1 |
| 01/05/2020 | 0 |
the start date in all this is user input, but that's not terribly relevant.
What I had done is generate a table of dates using that user input date in a cte like so:
with caledar_dates as (
select user_input_date + rownum-1 dates
from dual connect by rownum < sysdate-user_input_date+1
)
Then joined the items table to the calendar_dates table in another CTE like so:
item_list as (
select
a.dates,
b.id
from calendar_dates a
left join ticket_table b on a.dates>=b.created_date-1 and a.dates<b.closed_date
)
Finally, I then simply
select count(b.id), dates from item_list group by dates
There may be a few syntax errors above since I am truncating the actual code. This query works; however, it seems like an inefficient, wordy, ugly approach to the problem and if the user inputs a date a few years back, the query takes a few minutes to run.
Looking for an alternative and hopefully more efficient approach.
Not sure if this went on SO or here so happy to post over there if that seems the better venue.
Mahalo's in advance.
1 Answer 1
In looking at your query, I think there may be a few things you can do to improve your performance. The first is that you can aggregate your ticket_table ahead of time based on the request interval. Second, unless it is needed for the logic that you redacted, I would avoid using the CTE once you start doing the real logic. You may be depriving the optomizer of information it could use. You might try something like the following query:
WITH query_interval AS
(
SELECT TO_DATE('09/01/2012', 'MM/DD/YYYY') AS START_DATE, TRUNC(SYSDATE) AS END_DATE
FROM DUAL
),
interval_dates AS
(
SELECT qi.start_date + ROWNUM - 1 AS I_DATE
FROM query_interval qi
CONNECT BY ROWNUM < qi.end_date - qi.start_date
)
SELECT i.i_date, SUM(sub.cycle_count)
FROM interval_dates i
LEFT JOIN (SELECT GREATEST(TRUNC(tt.start_date), qi.start_date) AS OPEN_DATE,
LEAST(TRUNC(tt.end_date)-1, qi.end_date) AS CLOSE_DATE,
COUNT(*) AS CYCLE_COUNT
FROM ticket_table tt
CROSS JOIN query_interval qi
WHERE GREATEST(TRUNC(tt.start_date), qi.start_date) <= LEAST(TRUNC(tt.end_date)-1, qi.end_date)
GROUP BY GREATEST(TRUNC(tt.start_date), qi.start_date), LEAST(TRUNC(tt.end_date)-1, qi.end_date)) sub ON I_DATE BETWEEN sub.open_date AND sub.close_date
GROUP BY i.i_date
ORDER BY i.i_date;
Most of this is similar to what you already had, but there are some differences. First you notice that I added a CTE with just the interval. This allows me to perform the aggregation that I mentioned earlier. The aggregation works like this:
- First, we limit our search to tickets which were open at some point during our query interval. That I'm using the GREATEST <= LEAST to determine an overlap between the ticket and the query interval.
- We clip tickets that hang over the ends of our search interval. You really don't care that they are open before your search, just that they are open during your search.
- We aggregate all cycles which now have the same start and end dates.
After that, the logic is pretty similar to what you already had. A left join between the full list of dates and our modified table. Then instead of using COUNT, we use SUM.
TRUNC(b.created_date)
instead ofb.created_date-1
. I know it seems trivial, but I ran into an issue with this a few years back where something happened exactly at midnight (DATE datatype so down to the second) and it was a huge pain to debug what was going wrong when I pulled too much info back. \$\endgroup\$