-
Notifications
You must be signed in to change notification settings - Fork 43
Implement lazy loading to defer metadata RPCs until data access time ...#253
Implement lazy loading to defer metadata RPCs until data access time ... #253pvrraju wants to merge 2 commits intogoogle:main from
Conversation
xee/ext.py
Outdated
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Instead of this eager getattr, I think we should just pass lazy_load as an argument to this function.
xee/ext.py
Outdated
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't like this magic number. Is there a more principled way to check if the cache needs updating?
xee/ext.py
Outdated
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Idea: maybe we can break this out into a function and call it here and in get_info as needed.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks, I'm really happy to see this test.
xee/ext_integration_test.py
Outdated
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Let's use time.perf_counter() to capture time instead (and at the places below).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks this is good to have in the test.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm concerned that we have two levels of caching: this private dictionary and the functools.cached_property level. Is there any way we can use one system instead of both? I anticipate cache invalidation problems down the line.
One way we could clean this system up is by breaking out what info we get into multiple functions (that are also cached): e.g. a helper fn gets the first line essential stuff below. Since it's cached, we'd need less if...else logic to manage state; that would be happen in Python's memoizer decorator -- we'd make use of the cache just by calling the helper functions.
WDYT about this approach?
@alxmrs
alxmrs
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Provided specific and high-level feedback.
In addition, I have a general warning: I don't have much control over whether PRs like this will actually get merged into Xee; patches like this require secondary review to ensure that the code is compatible with Google's internal use of Xee, and I'm not sure if there is capacity on the relevant teams to do these types of reviews right now.
pvrraju
commented
Sep 12, 2025
Hi @alxmrs I've addressed all the feedback from the review:
- Pass
lazy_loadas an argument toget_info()instead of usinggetattr - Replaced the magic number with proper key sets to check cache status
- Extracted property fetching into a separate helper method
_fetch_collection_properties() - Used
time.perf_counter()instead oftime.time()in tests for more accurate timing - Refactored the caching system to use a more consistent approach
Let me know if there's anything else I should adjust!
Hi @alxmrs,
Added lazy loading (issue #44):
lazy_load=True option
Defers metadata RPCs → faster large dataset opens
Backward compatible + tested
can you review my PR?