I have created a script where I add a timestamp to each value that has been found with a code that I have written:
import random
from datetime import datetime, timedelta
from typing import Dict, List
import time
class RequestFilter:
"""Tracks requests and filters them to prevent hammering."""
def __init__(self, cooldown: timedelta):
self._cooldown = cooldown
self._requests: Dict[str, datetime] = {}
def filter(self, requests: List[str], time: datetime) -> List[str]:
"""Filter requests to only those that haven't been made
previously within our defined cooldown period."""
# Get filtered set of requests.
filtered = [
r for r in list(set(requests))
if (
r not in self._requests or time - self._requests[r] >= self._cooldown
)
]
# Refresh timestamps for requests we're actually making.
for r in filtered:
self._requests[r] = time
print(self._requests)
return filtered
if __name__ == '__main__':
from time import sleep
request_filter = RequestFilter(timedelta(minutes=5))
firstReq = []
for _ in range(random.randint(1,5)):
firstReq.append(f"US {random.randint(1, 10)}")
for _ in range(100):
newReq = []
for _ in range(random.randint(2, 8)):
newReq.append(f"US {random.randint(1, 10)}")
if len(newReq) > len(firstReq):
print(request_filter.filter(newReq, datetime.now()), datetime.now())
sleep(1)
firstReq = newReq
else:
print("Length is not bigger, testing again in 3 sec...")
time.sleep(3)
firstReq = newReq
As you can see at the very bottom i'm checking if the list size from previous request is less than newest request (at this momentit just random function but it would be reading from a HTML later on) and if it is, that means that something has been added to a webpage and we want to see what value has been added. If the value has already a timestamp then we check "filter" and see if it has been over 5 minutes difference in the timestamp and if its true then we should say "New value has been found!"
However my improvements in general here is that I am not quite happy with the way that I check for length of firstReq vs newReq. and reason if that could be etc if I request a page and it has US 3 and then the newReq has value US 6 but it still will have 1 > 1 which is false but still different value which will not print due to the 1 > 1. My question is, how can I improve the code that I could skip the < function and check maybe the sizes directly?
1 Answer 1
Global code
Move the code after your __main__
check into a function, since it's still in global scope.
PEP8
firstReq
should be first_req
, and the same for new_req
.
Dict comprehension
I would rearrange your filtered
comprehension to be only a dict, and to use dict.update
. In other words,
filtered = {r: self._requests.get(r) for r in requests}
self._requests.update({
r: time
for r, orig_time in filtered.items()
if orig_time is None or time - orig_time >= self._cooldown
})
List comprehension
firstReq = []
for _ in range(random.randint(1, 5)):
firstReq.append(f"US {random.randint(1, 10)}")
can be
first_req = [
f"US {random.randint(1, 10)}"
for _ in range(random.randint(1, 5))
]
Diagnostic printing
print(self._requests)
is out-of-place. Remove it, and if you still want this to happen, issue an equivalent statement from main()
, perhaps printing the entire RequestFilter
and overriding __str__
to return pprint.pformat(self._requests)
.
Change detection
Apparently what you actually want is to return a boolean that is true if self._requests
has changed. This can be done like:
filtered = {r: self._requests.get(r) for r in requests}
old_requests = dict(self._requests)
self._requests.update({
r: time
for r, orig_time in filtered.items()
if orig_time is None or time - orig_time >= self._cooldown
})
return old_requests == self._requests
-
\$\begingroup\$ Hello! I appreciate the advice and I will be right on it! About the pep8 I do agree. I was used to use the camelCase but will be changed to underscore instead :) About the
Dict comprehension
I tried to run the code you provided and it does give me an errorself._requests.update({ TypeError: cannot unpack non-iterable NoneType object
Im not sure what could be a cause of it at this moment. But also what should we return in that case if we in our case remove the filtered? \$\endgroup\$PythonNewbie– PythonNewbie2020年10月29日 15:58:25 +00:00Commented Oct 29, 2020 at 15:58 -
\$\begingroup\$ The
List comprehension
would it be really needed to have a list of different values if we check timestamp for each value that has been found anyways? In that case I wouldn't needif len(newReq) > len(firstReq):
at all and incase I could just do the timestamp comprehension ? :D \$\endgroup\$PythonNewbie– PythonNewbie2020年10月29日 15:58:27 +00:00Commented Oct 29, 2020 at 15:58 -
1\$\begingroup\$ Re. the
TypeError
- my mistake; I needed to use.items()
. Please try again. \$\endgroup\$Reinderien– Reinderien2020年10月29日 16:00:11 +00:00Commented Oct 29, 2020 at 16:00 -
\$\begingroup\$ Im having abit trouble to understand your
Dict comprehension
compare to what I have done. as I understood, im checking the currently request and compare it to theself._requests
and if there is new value added then it will be added to the filtered list - which we can then return and assume its true (Meaning that there has been something being added from the filter function) else if there is no value added then it will return empty filtered list which will be False. Im not sure if you do the same? \$\endgroup\$PythonNewbie– PythonNewbie2020年10月29日 16:17:09 +00:00Commented Oct 29, 2020 at 16:17 -
\$\begingroup\$ If you're looking to return an indicator that
self._requests
has changed, there's a better way to do it. I'll update. \$\endgroup\$Reinderien– Reinderien2020年10月29日 16:51:07 +00:00Commented Oct 29, 2020 at 16:51