I would like to find out how much time lays in between 22.00 and 6.00 o’clock for a given event per day.
So if the event start at 21.00 and ends at 23.59 the result would be 1.59. For a start at 22.00 and end at 7.00 it would be 2.00 + 6.00 = 8.00 hours. So basically I want to calculate the night shift part per day.
I came up with the following implementation. It works but I wonder if there is an easier way that uses less overhead/ lines of code.
from datetime import datetime, timedelta, time
from typing import Dict, Generator
def daterange(start_date: datetime, end_date: datetime) -> Generator[datetime, None, None]:
for n in range(int((end_date - start_date).days)):
yield start_date + timedelta(n)
def get_overlap(dt1: datetime, dt2: datetime) -> timedelta:
# Check that the two datetimes are on the same day
if dt1.date() != dt2.date():
raise ValueError('Datetimes must be on the same day')
# Calculate the start and end times of the two periods of interest
early_shift_start = datetime.combine(dt1.date(), time(hour=0))
early_shift_end = datetime.combine(dt1.date(), time(hour=6))
late_shift_start = datetime.combine(dt1.date(), time(hour=22))
late_shift_end = datetime.combine(dt1.date(), time(hour=23, minute=59, second=59))
# Calculate the amount of overlap between the two periods
overlap = max(min(dt2, early_shift_end) - max(dt1, early_shift_start), timedelta())
overlap += max(min(dt2, late_shift_end) - max(dt1, late_shift_start), timedelta())
return overlap
def split_on_midnight() -> Dict[str, float]:
start_date = datetime(2023, 3, 1, 12, 30, 0) # example start datetime
end_date = datetime(2023, 3, 3, 16, 45, 0) # example end datetime
total_duration = timedelta()
for date in daterange(start_date, end_date + timedelta(days=1)):
midnight = datetime.combine(date, time.min)
chunk_start = max(start_date, midnight)
chunk_end = min(end_date, midnight + timedelta(days=1) - timedelta(seconds=1))
if chunk_start < chunk_end:
duration = get_overlap(chunk_start, chunk_end)
total_duration += duration
print(f"Chunk from {chunk_start} to {chunk_end}: {duration}")
return {"total_duration": total_duration.total_seconds() / 3600}
print(split_on_midnight())
1 Answer 1
Set up your code for ease of testing. This is useful both for your own
debugging and when asking others for help or code review. Instead of writing an
English paragraph giving us example inputs and expected outputs (or hardcoding
a single example in your program), organize the entire program for ease
of testing and experimentation, from the ground up. In any project of
significance, I would do that with a testing tool (I like pytest, but there are
other good options as well). In a simple situation like an online code review,
including a main()
method with a simple testing apparatus works fine, as
shown below.
Speaking of testing, the example in your code has a bug. Given
datetime(2023, 3, 1, 12, 30, 0)
and datetime(2023, 3, 3, 16, 45, 0)
, your
code returned 57598 seconds. But the answer should be 57600 seconds: 28800 +
28800 (two 8-hour night shifts, one for 3/1 to 3/2 and the other for 3/2 to
3/3). I did not try to track down the source of the bug, because my overall
reaction to your code was the following: it's looks like the code has been put
together in a thoughtful way, but there is a lot of complexity in it, and my
gut tells me there is a simpler way to address the problem (your question text
makes me think that your gut was telling you something similar).
If something is a datetime, don't call it a date. You have a few variables
that are misnamed in that fashion: start_date
and end_date
are the two most
prominent examples. This may seem pedantic (indeed it is), but conceptual
clarity is rewarded in programming in the form of fewer bugs, more readable
code, and more efficient communication with others working on a project (this
becomes even more true as projects grow in size and complexity). If it's a
datetime, don't call it a date; if it's a dict, don't call it JSON (the later
is text); and vice versa. Often, one can achieve greater clarity simply with a
different naming convention: for example, start
and end
are shorter names,
completely clear in context, and don't imply anything incorrect about their
underlying nature.
An alternative approach. My idea for this problem was to maintain a tiny list of datetimes. The list would initially contain the input start and end times, plus the upcoming start or end point for the next night shift. Sort the list in reverse order. Pop off the most recent datetime. If the current start and the popped item cover the night shift, add their duration to the total. As needed, add another night shift endpoint to the list and sort again. Stop when the tiny list no longer contains the input end time.
I started with some constants:
from datetime import datetime, timedelta, time
NIGHT_START_TIME = time(22, 0)
NIGHT_END_TIME = time(6, 0)
Then a function to take the input start and yield appropriate night shift endpoints in the future:
def nightshift_endpoints_gen(start):
start_date = start.date()
combine = datetime.combine
n = 0
while True:
dt = combine(start_date + timedelta(days = n), NIGHT_START_TIME)
if dt > start:
yield dt
n += 1
yield combine(start_date + timedelta(days = n), NIGHT_END_TIME)
With those building blocks, computing the overall duration is not too bad:
def nightshift_duration(start, end):
dts = [end, start]
e = start
gen = nightshift_endpoints_gen(start)
tot = 0
while end in dts:
dts.append(next(gen))
dts.sort(reverse = True)
s, e = (e, dts.pop())
start_time = s.time()
if start_time >= NIGHT_START_TIME or start_time < NIGHT_END_TIME:
tot += (e - s).total_seconds()
return int(tot)
And a testing apparatus:
def main():
TESTS = (
# Example from your code.
(
datetime(2023, 3, 1, 12, 30, 0),
datetime(2023, 3, 3, 16, 45, 0),
16 * 3600,
),
# Examples from your comments.
(
datetime(2020, 2, 28, 21, 0, 0),
datetime(2020, 2, 28, 23, 59, 0),
119 * 60,
),
(
datetime(2020, 2, 2, 22, 0, 0),
datetime(2020, 2, 3, 7, 0, 0),
8 * 3600,
),
# A time span entirely within the night shift.
(
datetime(2020, 2, 3, 3, 0, 0),
datetime(2020, 2, 3, 5, 5, 0),
125 * 60,
),
)
for start, end, exp in TESTS:
got = nightshift_duration(start, end)
if got == exp:
print(got, 'ok')
else:
print(got, exp)
if __name__ == '__main__':
main()
An alternative approach: solve directly. A different idea is to solve
directly rather than iterating over all days from start
to end
. The idea
here is to decompose the total duration into three chunks: (1) the partial
night shift that start
might be part of, (2) the partial night shift that
end
might be part of, and (3) the full night shift(s) that might sit in
between all of that stuff.
# Five night shifts (equal signs), plus start and end (carets).
======== ======== ======== ======== ========
^ ^
# The chunks:
Chunk 1: start to the first night-shift-end.
Chunk 2: the last night-shift-start to end.
Chunk 3: three full night shifts.
To solve the problem along those lines, we could use a function that can take a datetime and return its closest "neighbor" to the right (future) or left (past), where a neighbor is a night shift startpoint or endpoint. We can use that function to compute the partial chunks (1 and 2) and to find references points to compute the number of full night shifts in the middle chunk (we will always use night shift starts for that purpose). To handle all of that we can break down the situation into three possibilities:
# Notation
dt : datetime
NS : night shift startpoint
NE : night shift endpoint
|| : midnight
# Three possibilities:
NE dt NS #1: dt is not in a night shift
NS dt || NE #2: dt in night shift, before midnight
NS || dt NE #3: dt in night shift, after midnight
Start with some constants:
from datetime import datetime, timedelta, time
NS = NIGHT_START_TIME = time(22, 0)
NE = NIGHT_END_TIME = time(6, 0)
NON_NIGHTSHIFT = 0
BEFORE_MIDNIGHT = 1
AFTER_MIDNIGHT = 2
LEFT, RIGHT = (0, 1)
SAME_DAY = timedelta(0)
NEXT_DAY = timedelta(1)
PREV_DAY = timedelta(-1)
NEIGHBOR_PARAMS = {
# in_nightshift() : [LEFT neighbor, RIGHT neighbor]
NON_NIGHTSHIFT : [(SAME_DAY, NE), (SAME_DAY, NS)],
BEFORE_MIDNIGHT : [(SAME_DAY, NS), (NEXT_DAY, NE)],
AFTER_MIDNIGHT : [(PREV_DAY, NS), (SAME_DAY, NE)],
}
SECONDS_PER_NIGHTSHIFT = 8 * 3600
Then two utility functions, one to determine a datetime's night shift status and another to return the "neighbor" to the right or left:
def in_nightshift(dt):
t = dt.time()
return (
BEFORE_MIDNIGHT if t >= NIGHT_START_TIME else
AFTER_MIDNIGHT if t < NIGHT_END_TIME else
NON_NIGHTSHIFT
)
def neighbor(dt, direction):
params = NEIGHBOR_PARAMS[in_nightshift(dt)]
td, time = params[direction]
return datetime.combine(dt.date() + td, time)
And finally, our primary function and its immediate helper to find the relevant neighbor and associated chunk #1 or #3.
def nightshift_duration(start, end):
ns1, chunk1 = get_nightshift_startpoint(start, True)
ns2, chunk2 = get_nightshift_startpoint(end, False)
chunk3 = (ns2 - ns1).days * SECONDS_PER_NIGHTSHIFT
return int(chunk1 + chunk2 + chunk3)
def get_nightshift_startpoint(dt, is_start):
if in_nightshift(dt):
if is_start:
dt2 = neighbor(dt, RIGHT)
ns = neighbor(dt2, RIGHT)
else:
dt2 = ns = neighbor(dt, LEFT)
chunk = abs((dt - dt2).total_seconds())
return (ns, chunk)
else:
ns = neighbor(dt, RIGHT)
return (ns, 0)
I suppose this approach is "better" in the sense that it can immediately
compute any duration, even if start
and end
are millions of days apart
[after typing this I looked up the max year supported by datetime, and it is
shockingly low, so I guess Python has low expectations for humanity]. But it
was definitely harder to think through all of the details of this approach.
Whether the resulting code is easier or harder to understand than my
first approach is a close call. In its first draft, the new approach was definitely
harder to understand, but it got better as I added more
constants, so I guess that's my final piece of advice: use
constants and data structures to simplify coding logic and
enhance readability.
Explore related questions
See similar questions with these tags.