Comments

Boundary changes endpoint.#629

Open

GeoWill wants to merge 18 commits intomaster from

boundary_changes

Open

Boundary changes endpoint. #629
GeoWill wants to merge 18 commits intomaster from
boundary_changes

Conversation

@GeoWill

Copy link

Contributor

@GeoWill GeoWill commented Jan 13, 2026

No description provided.

@GeoWill GeoWill force-pushed the boundary_changes branch from 4cb6c02 to 93fc4d4 Compare

January 20, 2026 17:31

GeoWill added 5 commits

February 3, 2026 12:13

@GeoWill


 Models for boundary changes endpoint.

091f4f8

@GeoWill


 Dry static data helper

58ca32b

@GeoWill


 boundary changes client

@GeoWill


 boundary changes in voting info endpoint

22be494

@GeoWill


 Boundary review error handling

aecb7bd

This commit deals with two types of error.
First it adds a sentry log when there is a file not found on s3.
Second it catches any exceptions raised when calling the code which gets
the boundary review response. It then logs this to sentry, but still
returns a response. This seemed better than raising a 5xx in this
situation, where a client might want the rest of the information in the
response, even if boundary change failed.

@GeoWill GeoWill force-pushed the boundary_changes branch from 93fc4d4 to aa12f29 Compare

February 3, 2026 12:13

@GeoWill GeoWill marked this pull request as ready for review

February 3, 2026 13:14

@GeoWill GeoWill force-pushed the boundary_changes branch from 3d4a92d to 7904efb Compare

February 3, 2026 14:50

GeoWill added 3 commits

February 3, 2026 15:02

@GeoWill


 Integration test for boundary_changes

c787687

@GeoWill


 Unit tests for boundary review client

067f218

@GeoWill


 Add data quality checks to StaticDataHelper

05a8eb4

@GeoWill GeoWill force-pushed the boundary_changes branch from 7904efb to 05a8eb4 Compare

February 3, 2026 15:02

@GeoWill GeoWill changed the title ~~(削除) WiP models for boundary changes endpoint. (削除ここまで)~~ (追記) Boundary changes endpoint. (追記ここまで)

Feb 3, 2026

GeoWill

GeoWill commented

Feb 3, 2026

View reviewed changes

api/endpoints/v1/voting_information/static_data_helper.py

data = self.get_data_for_uprn()

return self.query_to_dict(data)

def data_quality_check(self, postcode_df):

Copy link

Contributor Author

@GeoWill GeoWill Feb 3, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This now happens inside every request/response cycle that looks for static data.
I don't think it will add much, but it feels like the wrong place for it.

Would it be better to try and add some checks to the statemachine that run at the end of each run and do data quality assurance.

Copy link

Member

@chris48s chris48s Feb 10, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think it is fine to run consistency checks on every request. That is what we are doing on WDIV. I definitely think we should also try to prevent errors at write time. However, as long as we're using a format that doesn't allow us to enforce constraints I think we also need to be defensive when we consume the data. This is one of the reasons I think there is mileage in looking at something like SQLite/DuckDB as the static data format. It would allow us to enforce some constraints and "trust" the data a bit more in the client code.

Anyway, yes. Lets check the data here before we consume it.

I think the other failure modes we should care about here are:

The Postcode we are fetching does exist (we got a response from WDIV) but it is not in the parquet file.
The UPRN we are fetching does exist (we got a response from WDIV) but it is not in the parquet file.

I think I would also want a notification in Sentry if either of those things happen because it means our data is out of sync (although we can serve the "there are no applicable boundary reviews to this query" response to the user). Are those two cases captured anywhere?

Copy link

Contributor Author

@GeoWill GeoWill Feb 19, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This isn't something we can do without changing the lambda that writes outcode parquet files in data baker. Specifically these lines:

 if has_any_non_null_filter_column:
 print(
 f"At least one UPRN in {outcode} has data in {filter_column}, writing a file with data"
 )
 outcode_df.sort(by=["postcode", "uprn"])
 outcode_df.write_parquet(outcode_path)
 else:
 print(
 f"No {filter_column} for any address in {outcode}, writing an empty file"
 )
 polars.DataFrame().write_parquet(outcode_path)

It will mean writing loads of files with empty columns, but can go with that if you think it's worth it.

Copy link

Member

@chris48s chris48s Feb 19, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think one of us isn't understanding the other on this.
Lets have a look at this one on a call together.

GeoWill

GeoWill commented

Feb 4, 2026

View reviewed changes

api/endpoints/v1/voting_information/stitcher.py Outdated Show resolved Hide resolved

GeoWill

GeoWill commented

Feb 4, 2026

View reviewed changes

api/endpoints/v1/voting_information/boundary_changes/client.py Outdated Show resolved Hide resolved

@GeoWill


 Add BOUNDARY_REVIEWS_DATA_KEY_PREFIX to deploy

f5256bb

@GeoWill GeoWill force-pushed the boundary_changes branch from d2c7a10 to f5256bb Compare

February 5, 2026 19:57

chris48s

chris48s reviewed

Feb 10, 2026

View reviewed changes

common/common/conf.py

self.BOUNDARY_REVIEWS_ENABLED = True

self.BOUNDARY_REVIEWS_DATA_KEY_PREFIX = os.environ.get(

"BOUNDARY_REVIEWS_DATA_KEY_PREFIX",

"current_boundary_reviews_parquet/",

Copy link

Member

@chris48s chris48s Feb 10, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change

"current_boundary_reviews_parquet/",

"addressbase/production/current_boundary_reviews_parquet/",

I think this is why this was failing when you deployed it to dev.

Copy link

Contributor Author

@GeoWill GeoWill Feb 19, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think this is quite what is needed. I had to add f5256bb to get it working on dev. Having just the folder name means I can run LOCAL_STATIC_DATA_PATH=~/cloud/aws/democracy-club/pollingstations.private.data python run_local_api.py --function voting_information --port 8000 to have it work locally.

Copy link

Member

@chris48s chris48s Feb 19, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hmm. so I am running this locally with

S3_CLIENT_ENABLED=1 AWS_PROFILE=wdiv-prod WDIV_API_KEY=[redacted] python run_local_api.py --function voting_information

to query the real s3 bucket from my local copy.

For me to get this to work, I have to make the BOUNDARY_REVIEWS_DATA_KEY_PREFIX equal to addressbase/production/current_boundary_reviews_parquet/

With the default, every request throws FileNotFoundError

api/endpoints/v1/sandbox/sandbox-responses/AD11DD.json Show resolved Hide resolved

api/endpoints/v1/sandbox/sandbox-responses/DD11DD.json Outdated Show resolved Hide resolved

api/endpoints/v1/voting_information/config.py Outdated Show resolved Hide resolved

api/endpoints/v1/voting_information/stitcher.py Outdated Show resolved Hide resolved

api/endpoints/v1/voting_information/static_data_helper.py

Comment on lines +135 to +137

sentry_sdk.capture_exception(

ex, context={"s3_key": key, "bucket": bucket}

)

Copy link

Member

@chris48s chris48s Feb 10, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've not tested this, but does passing a context= kwarg to capture_exception() here work?
I think you might need to set context like this now?
Can we double-check this on dev.

https://github.com/DemocracyClub/UK-Polling-Stations/blob/76fd6a1947c1f6df6a140386f5677eb5e769cb87/polling_stations/apps/data_finder/helpers/baked_data_helper.py#L75-L91

Copy link

Contributor Author

@GeoWill GeoWill Feb 17, 2026 •

edited

Loading

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I went off the docs here. That says you can either set scope or scope_kwargs (as described in Scope.update_from_kwargs). But will deploy to dev and check.

Copy link

Member

@chris48s chris48s Feb 19, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

..or you can set up your local copy with a sentry DSN. That will make it easier to deliberately trigger exceptions.

api/endpoints/v1/voting_information/boundary_changes/client.py Outdated Show resolved Hide resolved

tests/helpers.py

Comment on lines +78 to +79

if not fixture_data:

return pl.DataFrame()

Copy link

Member

@chris48s chris48s Feb 10, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hmm. My gut instinct reading this is that if I write test code that is trying to fetch a fixture that doesn't exist, that seems like it should raise an exception rather than silently returning an empty DataFrame.

Copy link

Contributor Author

@GeoWill GeoWill Feb 19, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I did this, because it mirrors what load_fixture does. Essentially this helper is a wrapper load_fixture to return a DataFrame rather than some json. This line isn't really doing anything, so can be deleted, but I thought it made the behaviour more obvious. If I delete it then if the fixture doesn't exists load_fixture will return [] (and pl.DataFrame().equals( pl.DataFrame([])) is True).

I could change load_fixture:

@@ -12,7 +12,7 @@ def load_fixture(testname, fixture, api_version="v1"):
 dirname / api_version / "test_data" / testname / f"{fixture}.json"
 )
 if not file_path.exists():
- return []
+ raise FileNotFoundError(f"Could not find fixture:{fixture} at {file_path}")
 with file_path.open("r") as f:
 return json.loads(f.read())

but that breaks some other tests.

Copy link

Member

@chris48s chris48s Feb 19, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

OK. I feel like that underlying behaviour in load_fixture() is probably wrong/unhelpful, and if there are specific requests we want to mock as returning [] we should explicitly write files to disk containing [] but I think pulling that thread right this second is a distraction from the core thing we're trying to accomplish in this PR.

Lets leave this for now, but I would like to revisit what load_fixture() is doing here

api/endpoints/v1/voting_information/boundary_changes/client.py Show resolved Hide resolved

api/endpoints/v1/voting_information/static_data_helper.py

data = self.get_data_for_uprn()

return self.query_to_dict(data)

def data_quality_check(self, postcode_df):

Copy link

Member

@chris48s chris48s Feb 10, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Anyway, yes. Lets check the data here before we consume it.

I think the other failure modes we should care about here are:

The Postcode we are fetching does exist (we got a response from WDIV) but it is not in the parquet file.
The UPRN we are fetching does exist (we got a response from WDIV) but it is not in the parquet file.

chris48s

chris48s reviewed

Feb 10, 2026

View reviewed changes

api/endpoints/v1/voting_information/static_data_helper.py Show resolved Hide resolved

GeoWill added 6 commits

February 19, 2026 09:08

@GeoWill


 fixup! boundary changes client

ade8916

@GeoWill


 fixup! Boundary review error handling

334b636

@GeoWill


 fixup! boundary changes in voting info endpoint

cdd6970

@GeoWill


 fixup! Integration test for boundary_changes

eec462b

@GeoWill


 fixup! Add data quality checks to StaticDataHelper

2c8b1ff

@GeoWill


 fixup! Models for boundary changes endpoint.

ae2ee92

@GeoWill GeoWill force-pushed the boundary_changes branch from f6cfc71 to 451388a Compare

February 19, 2026 11:44

GeoWill added 2 commits

February 19, 2026 11:49

@GeoWill


 fixup! Integration test for boundary_changes

6289ccd

@GeoWill


 fixup! Unit tests for boundary review client

55b9daa

@GeoWill GeoWill force-pushed the boundary_changes branch from 451388a to 55b9daa Compare

February 19, 2026 11:49

@GeoWill


 fixup! Models for boundary changes endpoint.

1ed24d6

chris48s

chris48s reviewed

Feb 19, 2026

View reviewed changes

api/endpoints/v1/voting_information/stitcher.py

and not resp["address_picker"]

try:

resp["boundary_reviews"] = None

Copy link

Member

@chris48s chris48s Feb 19, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change

resp["boundary_reviews"] = None

resp["boundary_reviews"] = []

chris48s

chris48s reviewed

Feb 19, 2026

View reviewed changes

api/endpoints/v1/voting_information/static_data_helper.py

Comment on lines +205 to +218

class DuplicateUPRNError(ValueError):

def __init__(self, postcode, uprns):

message = (

f"Duplicate UPRNs found for postcode {postcode}: {sorted(uprns)}"

)

super().__init__(message)

class MultipleAddressbaseSourceError(ValueError):

def __init__(self, postcode, sources):

message = f"Multiple addressbase sources found for postcode {postcode}: {sources}"

super().__init__(message)

Copy link

Member

@chris48s chris48s Feb 19, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Including the postcode/UPRN in the exception message means Sentry probably won't group all DuplicateUPRNErrors together
i.e: if we throw this 500 times for different UPRNs then Sentry will probably consider that 500 completely different issues instead of 1 issue with 500 events
This will be very annoying.

There are multiple ways to skin this cat. One of them is to set a fingerprint based on exception class. So you can do something like this:

https://docs.sentry.io/platforms/python/usage/sdk-fingerprinting/#group-errors-more-aggressively

..or in WDIV, I set the fingerprints at log time e.g:

https://github.com/DemocracyClub/UK-Polling-Stations/blob/b0a46a1df2591d9280b7b1d2f93e3ed32223b4c0/polling_stations/apps/data_finder/helpers/baked_data_helper.py#L160-L190

Another way to do it is something like

class DuplicateUPRNError(ValueError):
 def __init__(self, postcode, uprns):
 self.postcode = postcode
 self.uprns = sorted(uprns)
 # static — Sentry groups on this
 super().__init__("Duplicate UPRNs found")
 def __str__(self):
 # human-readable for local dev
 return f"Duplicate UPRNs found for postcode {self.postcode}: {self.uprns}"
# ..and when we raise it:
with sentry_sdk.push_scope() as scope:
 # attach extras for sentry
 scope.set_extra("postcode", self.postcode.with_space)
 scope.set_extra("uprns", duplicate_uprns)
 raise DuplicateUPRNError(postcode=self.postcode.with_space, uprns=duplicate_uprns)

I've not tested that code, but it should be.. roughly right. Can you try setting up a sentry DSN locally and have a go with one or other of these approaches.

Labels

None yet

3 participants

@GeoWill @symroe @chris48s

Comments

Conversation

@GeoWill GeoWill commented Jan 13, 2026

Uh oh!

@GeoWill GeoWill Feb 3, 2026

Choose a reason for hiding this comment

Uh oh!

@chris48s chris48s Feb 10, 2026

Choose a reason for hiding this comment

Uh oh!

@GeoWill GeoWill Feb 19, 2026

Choose a reason for hiding this comment

Uh oh!

@chris48s chris48s Feb 19, 2026

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

@chris48s chris48s Feb 10, 2026

Choose a reason for hiding this comment

Uh oh!

@GeoWill GeoWill Feb 19, 2026

Choose a reason for hiding this comment

Uh oh!

@chris48s chris48s Feb 19, 2026

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

@chris48s chris48s Feb 10, 2026

Choose a reason for hiding this comment

Uh oh!

@GeoWill GeoWill Feb 17, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

@chris48s chris48s Feb 19, 2026

Choose a reason for hiding this comment

Uh oh!

Uh oh!

@chris48s chris48s Feb 10, 2026

Choose a reason for hiding this comment

Uh oh!

@GeoWill GeoWill Feb 19, 2026

Choose a reason for hiding this comment

Uh oh!

@chris48s chris48s Feb 19, 2026

Choose a reason for hiding this comment

Uh oh!

Uh oh!

@chris48s chris48s Feb 10, 2026

Choose a reason for hiding this comment

Uh oh!

Uh oh!

@chris48s chris48s Feb 19, 2026

Choose a reason for hiding this comment

Uh oh!

@chris48s chris48s Feb 19, 2026

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

@GeoWill GeoWill Feb 17, 2026 •

edited

Loading