-
Notifications
You must be signed in to change notification settings - Fork 3
Comments
Conversation
4cb6c02 to
93fc4d4
Compare
This commit deals with two types of error. First it adds a sentry log when there is a file not found on s3. Second it catches any exceptions raised when calling the code which gets the boundary review response. It then logs this to sentry, but still returns a response. This seemed better than raising a 5xx in this situation, where a client might want the rest of the information in the response, even if boundary change failed.
93fc4d4 to
aa12f29
Compare
3d4a92d to
7904efb
Compare
7904efb to
05a8eb4
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This now happens inside every request/response cycle that looks for static data.
I don't think it will add much, but it feels like the wrong place for it.
Would it be better to try and add some checks to the statemachine that run at the end of each run and do data quality assurance.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think it is fine to run consistency checks on every request. That is what we are doing on WDIV. I definitely think we should also try to prevent errors at write time. However, as long as we're using a format that doesn't allow us to enforce constraints I think we also need to be defensive when we consume the data. This is one of the reasons I think there is mileage in looking at something like SQLite/DuckDB as the static data format. It would allow us to enforce some constraints and "trust" the data a bit more in the client code.
Anyway, yes. Lets check the data here before we consume it.
I think the other failure modes we should care about here are:
- The Postcode we are fetching does exist (we got a response from WDIV) but it is not in the parquet file.
- The UPRN we are fetching does exist (we got a response from WDIV) but it is not in the parquet file.
I think I would also want a notification in Sentry if either of those things happen because it means our data is out of sync (although we can serve the "there are no applicable boundary reviews to this query" response to the user). Are those two cases captured anywhere?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This isn't something we can do without changing the lambda that writes outcode parquet files in data baker. Specifically these lines:
if has_any_non_null_filter_column: print( f"At least one UPRN in {outcode} has data in {filter_column}, writing a file with data" ) outcode_df.sort(by=["postcode", "uprn"]) outcode_df.write_parquet(outcode_path) else: print( f"No {filter_column} for any address in {outcode}, writing an empty file" ) polars.DataFrame().write_parquet(outcode_path)
It will mean writing loads of files with empty columns, but can go with that if you think it's worth it.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think one of us isn't understanding the other on this.
Lets have a look at this one on a call together.
d2c7a10 to
f5256bb
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think this is why this was failing when you deployed it to dev.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't think this is quite what is needed. I had to add f5256bb to get it working on dev. Having just the folder name means I can run LOCAL_STATIC_DATA_PATH=~/cloud/aws/democracy-club/pollingstations.private.data python run_local_api.py --function voting_information --port 8000 to have it work locally.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hmm. so I am running this locally with
S3_CLIENT_ENABLED=1 AWS_PROFILE=wdiv-prod WDIV_API_KEY=[redacted] python run_local_api.py --function voting_information
to query the real s3 bucket from my local copy.
For me to get this to work, I have to make the BOUNDARY_REVIEWS_DATA_KEY_PREFIX equal to addressbase/production/current_boundary_reviews_parquet/
With the default, every request throws FileNotFoundError
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I've not tested this, but does passing a context= kwarg to capture_exception() here work?
I think you might need to set context like this now?
Can we double-check this on dev.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I went off the docs here. That says you can either set scope or scope_kwargs (as described in Scope.update_from_kwargs). But will deploy to dev and check.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
..or you can set up your local copy with a sentry DSN. That will make it easier to deliberately trigger exceptions.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hmm. My gut instinct reading this is that if I write test code that is trying to fetch a fixture that doesn't exist, that seems like it should raise an exception rather than silently returning an empty DataFrame.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I did this, because it mirrors what load_fixture does. Essentially this helper is a wrapper load_fixture to return a DataFrame rather than some json. This line isn't really doing anything, so can be deleted, but I thought it made the behaviour more obvious. If I delete it then if the fixture doesn't exists load_fixture will return [] (and pl.DataFrame().equals( pl.DataFrame([])) is True).
I could change load_fixture:
@@ -12,7 +12,7 @@ def load_fixture(testname, fixture, api_version="v1"): dirname / api_version / "test_data" / testname / f"{fixture}.json" ) if not file_path.exists(): - return [] + raise FileNotFoundError(f"Could not find fixture:{fixture} at {file_path}") with file_path.open("r") as f: return json.loads(f.read())
but that breaks some other tests.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
OK. I feel like that underlying behaviour in load_fixture() is probably wrong/unhelpful, and if there are specific requests we want to mock as returning [] we should explicitly write files to disk containing [] but I think pulling that thread right this second is a distraction from the core thing we're trying to accomplish in this PR.
Lets leave this for now, but I would like to revisit what load_fixture() is doing here
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think it is fine to run consistency checks on every request. That is what we are doing on WDIV. I definitely think we should also try to prevent errors at write time. However, as long as we're using a format that doesn't allow us to enforce constraints I think we also need to be defensive when we consume the data. This is one of the reasons I think there is mileage in looking at something like SQLite/DuckDB as the static data format. It would allow us to enforce some constraints and "trust" the data a bit more in the client code.
Anyway, yes. Lets check the data here before we consume it.
I think the other failure modes we should care about here are:
- The Postcode we are fetching does exist (we got a response from WDIV) but it is not in the parquet file.
- The UPRN we are fetching does exist (we got a response from WDIV) but it is not in the parquet file.
I think I would also want a notification in Sentry if either of those things happen because it means our data is out of sync (although we can serve the "there are no applicable boundary reviews to this query" response to the user). Are those two cases captured anywhere?
f6cfc71 to
451388a
Compare
451388a to
55b9daa
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Including the postcode/UPRN in the exception message means Sentry probably won't group all DuplicateUPRNErrors together
i.e: if we throw this 500 times for different UPRNs then Sentry will probably consider that 500 completely different issues instead of 1 issue with 500 events
This will be very annoying.
There are multiple ways to skin this cat. One of them is to set a fingerprint based on exception class. So you can do something like this:
https://docs.sentry.io/platforms/python/usage/sdk-fingerprinting/#group-errors-more-aggressively
..or in WDIV, I set the fingerprints at log time e.g:
Another way to do it is something like
class DuplicateUPRNError(ValueError): def __init__(self, postcode, uprns): self.postcode = postcode self.uprns = sorted(uprns) # static — Sentry groups on this super().__init__("Duplicate UPRNs found") def __str__(self): # human-readable for local dev return f"Duplicate UPRNs found for postcode {self.postcode}: {self.uprns}" # ..and when we raise it: with sentry_sdk.push_scope() as scope: # attach extras for sentry scope.set_extra("postcode", self.postcode.with_space) scope.set_extra("uprns", duplicate_uprns) raise DuplicateUPRNError(postcode=self.postcode.with_space, uprns=duplicate_uprns)
I've not tested that code, but it should be.. roughly right. Can you try setting up a sentry DSN locally and have a go with one or other of these approaches.
No description provided.