2
\$\begingroup\$

I'm trying to come up with a solution to this interview question:

You have a stream of RPC requests coming into a server which is being logged. Each log entry is of the form [id, timestamp, type ('Start' or 'End')]. Given a sequence of log entries and a timeout value, you need to figure out at the earliest possible time if an RPC call has timed out (e.g. print a message as soon as you detect such situation).

Example:
Timeout = 3
id - time - type
0 - 0 - Start
1 - 1 - Start
0 - 2 - End
2 - 6 - Start # figured out id 1 had timed out at time 6
1 - 7 - End

I believe that my solution is O(NlogN), is there any way to improve it?

from sortedcontainers import SortedList
def process_log(log: list[list], timeout=3):
 rpc_ids = {}
 rpcs = SortedList()
 for rpc_id, timestamp, action in log:
 if action == 'Start':
 rpcs.add([timestamp, rpc_id])
 rpc_ids[rpc_id] = timestamp
 else:
 if rpc_id in rpc_ids:
 entry = [rpc_ids[rpc_id], rpc_id]
 if timestamp - rpc_ids[rpc_id] > timeout:
 report_rpc(rpc_id, rpc_ids[rpc_id], timestamp)
 del rpc_ids[rpc_id]
 rpcs.remove(entry)
 idx = rpcs.bisect_left([timestamp-timeout, float('inf')])
 if idx > 0:
 for i in range(idx):
 start_time, rpc_id = rpcs[i]
 report_rpc(rpc_id, start_time, timestamp)
 del rpc_ids[rpc_id]
 del rpcs[:idx]
def report_rpc(rpc_id, start_time, timestamp):
 print(f'RPC #{rpc_id} started at {start_time} has timed out (@{timestamp})')
process_log([ # RPC #1 times out at timestamp 6
 [0, 0, 'Start'],
 [1, 1, 'Start'],
 [0, 2, 'End'],
 [2, 6, 'Start'],
 [1, 7, 'End'],
])
asked Feb 13, 2023 at 21:07
\$\endgroup\$
0

1 Answer 1

1
\$\begingroup\$

Here's a O(n) solution using just a dictionary:

from dataclasses import dataclass
@dataclass(frozen=True, slots=True)
class LogEntry:
 rpc_id: int
 timestamp: int
 action: str
class LogProcessor:
 def __init__(self, timeout: int):
 self.entries: dict[int, LogEntry] = {}
 self.timeout = timeout
 def process_log_entry(self, entry: LogEntry) -> list[LogEntry]:
 if entry.action == 'Start':
 self.entries[entry.rpc_id] = entry
 elif entry.rpc_id in self.entries and entry.timestamp - self.entries[entry.rpc_id].timestamp <= self.timeout:
 del self.entries[entry.rpc_id]
 timed_out = []
 for e in self.entries.values():
 if entry.timestamp - e.timestamp <= self.timeout:
 break
 timed_out.append(e)
 for e in timed_out:
 del self.entries[e.rpc_id]
 return timed_out
log_processor = LogProcessor(timeout=3)
assert log_processor.process_log_entry(LogEntry(1, 0, 'Start')) == []
assert log_processor.process_log_entry(LogEntry(2, 1, 'Start')) == []
assert log_processor.process_log_entry(LogEntry(1, 2, 'End')) == []
assert log_processor.process_log_entry(LogEntry(3, 6, 'Start')) == [LogEntry(2, 1, 'Start')]

As Python dictionaries are insertion ordered, we can check for timed out entries on each call to process_log_entry() by iterating once from the beginning of the dictionary. Since we remove timed out entries immediately after the iteration, the total time complexity will still be O(n).

P.S. The code has also been refactored to use a dataclass for log entry and a class for log processor, which should be more readable/maintainable.

answered Mar 10, 2023 at 15:28
\$\endgroup\$

Your Answer

Draft saved
Draft discarded

Sign up or log in

Sign up using Google
Sign up using Email and Password

Post as a guest

Required, but never shown

Post as a guest

Required, but never shown

By clicking "Post Your Answer", you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.