I'm trying to come up with a solution to this interview question:
You have a stream of RPC requests coming into a server which is being logged. Each log entry is of the form [id, timestamp, type ('Start' or 'End')]. Given a sequence of log entries and a timeout value, you need to figure out at the earliest possible time if an RPC call has timed out (e.g. print a message as soon as you detect such situation).
Example: Timeout = 3 id - time - type 0 - 0 - Start 1 - 1 - Start 0 - 2 - End 2 - 6 - Start # figured out id 1 had timed out at time 6 1 - 7 - End
I believe that my solution is O(NlogN)
, is there any way to improve it?
from sortedcontainers import SortedList
def process_log(log: list[list], timeout=3):
rpc_ids = {}
rpcs = SortedList()
for rpc_id, timestamp, action in log:
if action == 'Start':
rpcs.add([timestamp, rpc_id])
rpc_ids[rpc_id] = timestamp
else:
if rpc_id in rpc_ids:
entry = [rpc_ids[rpc_id], rpc_id]
if timestamp - rpc_ids[rpc_id] > timeout:
report_rpc(rpc_id, rpc_ids[rpc_id], timestamp)
del rpc_ids[rpc_id]
rpcs.remove(entry)
idx = rpcs.bisect_left([timestamp-timeout, float('inf')])
if idx > 0:
for i in range(idx):
start_time, rpc_id = rpcs[i]
report_rpc(rpc_id, start_time, timestamp)
del rpc_ids[rpc_id]
del rpcs[:idx]
def report_rpc(rpc_id, start_time, timestamp):
print(f'RPC #{rpc_id} started at {start_time} has timed out (@{timestamp})')
process_log([ # RPC #1 times out at timestamp 6
[0, 0, 'Start'],
[1, 1, 'Start'],
[0, 2, 'End'],
[2, 6, 'Start'],
[1, 7, 'End'],
])
1 Answer 1
Here's a O(n)
solution using just a dictionary:
from dataclasses import dataclass
@dataclass(frozen=True, slots=True)
class LogEntry:
rpc_id: int
timestamp: int
action: str
class LogProcessor:
def __init__(self, timeout: int):
self.entries: dict[int, LogEntry] = {}
self.timeout = timeout
def process_log_entry(self, entry: LogEntry) -> list[LogEntry]:
if entry.action == 'Start':
self.entries[entry.rpc_id] = entry
elif entry.rpc_id in self.entries and entry.timestamp - self.entries[entry.rpc_id].timestamp <= self.timeout:
del self.entries[entry.rpc_id]
timed_out = []
for e in self.entries.values():
if entry.timestamp - e.timestamp <= self.timeout:
break
timed_out.append(e)
for e in timed_out:
del self.entries[e.rpc_id]
return timed_out
log_processor = LogProcessor(timeout=3)
assert log_processor.process_log_entry(LogEntry(1, 0, 'Start')) == []
assert log_processor.process_log_entry(LogEntry(2, 1, 'Start')) == []
assert log_processor.process_log_entry(LogEntry(1, 2, 'End')) == []
assert log_processor.process_log_entry(LogEntry(3, 6, 'Start')) == [LogEntry(2, 1, 'Start')]
As Python dictionaries are insertion ordered, we can check for timed out entries on each call to process_log_entry()
by iterating once from the beginning of the dictionary. Since we remove timed out entries immediately after the iteration, the total time complexity will still be O(n)
.
P.S. The code has also been refactored to use a dataclass for log entry and a class for log processor, which should be more readable/maintainable.
Explore related questions
See similar questions with these tags.