Algorithm for finding overlapping times

Question 1

Part of our ERP system is a sub-system for running background jobs. We track a variety of meta-data about our jobs in a table including timestamps for submitted, started, and end times.

I'm creating a report showing the performance of our job system, detailed by day. One KPI is maximum # of jobs running at once. The algorithm I'm currently using is:

dim cnt as integer 'number of overlapping jobs
dim max as integer 'maximum number of jobs running at one time
for each Job where 
 Job.SubmitTs = today
 'bJob is a 2nd instance of Job
 for each bJob where
 bJob.SubmitTs = today and
 bJob.StartTs <= Job.EndTs and
 bJob.EndTs >= Job.EndTs
 cnt = cCnt + 1
 next
 if cnt > max then max = cnt
next

The problem with this algorithm is that it is very slow due to all the looping. I was wondering if there is a faster way to implement this?

Edit I cannot use SQL queries to access the data.

Question 2

If you post a schema of that table, we might be able to suggest a query that can get what you need.

Question 3

I can't use a SQL query to get the data as the language doesn't support it.

Question 4

This was asked for and solved multiple times in SQL and outside of it. stackoverflow.com/questions/117962/… stackoverflow.com/questions/2037618/… stackoverflow.com/questions/1025688/… stackoverflow.com/questions/325933/…

Question 5

@Job 3 of your links point to SQL solutions which I cannot use. The 4th is dealing with comparing 2 date ranges. I am looking for a more efficient solution for comparing 1.2 million ranges

Question 6

@maple_shaft For codereview.SE there should be real code involved; this is just pseudocode as far as I get it, wrong at that (although it resembles VB). To the OP: if you can provide at least a language name, I can give you a better code snippet...

Question 7

Include all the start and end points (in time) of the Jobs in an array (this creates 2*N elements (1 for start 1 for end))
sort the array ordering by the timestamp of the event,

then iterate over the (2*N) elements as follows:

for each element X 
do 
 if(X.type == start)
 counter++
 else
 counter--
 ans=max(ans, counter);
end

Complexity: O(n.log(n)) for initial build of the sorted structure + O(n) for the iteration through its elements.

Edit: It has been suggested by Giorgio that an array is used, which is a better option. (The suggestion originally was to use a red/black tree, but no task removing capability seems to be needed, so maintaining order is trivial).

Question 8

+1: Cool idea! Wouldn't be possible to just sort all the start and end points in an array (this also takes O(n log(n))) and then scan the array as you have described?

Question 9

@Giorgio Actually yes, the OP never says tasks are to be removed, so actually a dynamic array is a better fit. Seems I unconsciously introduced further complexity :)

Question 10

Brilliant! Collapsing it into a single array made it super-fast.

Question 11

having the array contents already sorted seems like a prerequisite for the algorithm to work at all

Question 12

SQL knows how to sort stuff, and I'd bet your engine uses the most efficient algorithm it can (i.e. O(n log(n))), while still working if the result set doesn't fit in RAM (in contrast with K.Steff's answer which will fail if the array doesn't fit in RAM). In python :

import sqlite3
connection = sqlite3.connect("database.db")
cursor = connection.cursor()
cursor.execute("SELECT isStart from (" +
 " SELECT startTime AS time, 1 AS isStart FROM myTable " +
 " UNION ALL " +
 " SELECT endTime as time, -1 AS isStart FROM myTable " +
 " ORDER BY time ASC, isStart ASC" +
 ")")
maxOverlap = 0
currentOverlap = 0
for (isStart,) in cursor:
 currentOverlap += isStart
 maxOverlap = max(maxOverlap, currentOverlap)
print maxOverlap

Note the isStart ASC order is necessary so the intervals [10,15] and [15,23] are not considered overlapping.

For further improvement, it is probably the case that the columns startTime and endTime have an index, so the SQL engine should efficiently merge the two already sorted tables, resulting in an O(n) operation (the O(log(n)) factor then happens at insertion time for each row, when it is added to the index, but you already have it anyway). If your SQL engine isn't smart enough to do an efficient merge, do it yourself on the fly (still in python) :

import sqlite3
connection = sqlite3.connect("database.db")
cursorStart = connection.cursor()
cursorStart.execute("SELECT startTime FROM myTable ORDER BY startTime ASC")
cursorEnd = connection.cursor()
cursorEnd.execute("SELECT endTime FROM myTable ORDER BY endTime ASC")
maxOverlap = 0
currentOverlap = 0
currentStart = cursorStart.fetchone()
currentEnd = cursorEnd.fetchone()
while currentStart is not None and currentEnd is not None:
 if currentStart < currentEnd:
 currentOverlap += 1
 currentStart = cursorStart.fetchone()
 else:
 currentOverlap -= 1
 currentEnd = cursorEnd.fetchone()
 maxOverlap = max(maxOverlap, currentOverlap)
print maxOverlap

K.Steff K.Steff 4,5352 gold badges33 silver badges28 bronze badges · Accepted Answer · 2012-05-18 16:18:27Z

Include all the start and end points (in time) of the Jobs in an array (this creates 2*N elements (1 for start 1 for end))
sort the array ordering by the timestamp of the event,

then iterate over the (2*N) elements as follows:

for each element X 
do 
 if(X.type == start)
 counter++
 else
 counter--
 ans=max(ans, counter);
end

Complexity: O(n.log(n)) for initial build of the sorted structure + O(n) for the iteration through its elements.

Edit: It has been suggested by Giorgio that an array is used, which is a better option. (The suggestion originally was to use a red/black tree, but no task removing capability seems to be needed, so maintaining order is trivial).

+1: Cool idea! Wouldn't be possible to just sort all the start and end points in an array (this also takes O(n log(n))) and then scan the array as you have described?
@Giorgio Actually yes, the OP never says tasks are to be removed, so actually a dynamic array is a better fit. Seems I unconsciously introduced further complexity :)
Brilliant! Collapsing it into a single array made it super-fast.
having the array contents already sorted seems like a prerequisite for the algorithm to work at all

Stack Exchange Network

Algorithm for finding overlapping times

2 Answers 2

Your Answer

Sign up or log in

Post as a guest

Post as a guest

Hot Network Questions

Algorithm for finding overlapping times

2 Answers 2

Your Answer

Sign up or log in

Post as a guest

Post as a guest

Related

Hot Network Questions