Skip to content

Navigation Menu

Sign in
Appearance settings

Search code, repositories, users, issues, pull requests...

Provide feedback

We read every piece of feedback, and take your input very seriously.

Saved searches

Use saved searches to filter your results more quickly

Sign up
Appearance settings

potential performance improvement for GSPath globbing capabilities #513

Open
Labels
@fafnirZ

Description

Hey, I've been using the GSPath globbing capabilities to glob over a fairly large GCS bucket (couple of gbs) and have been noticing that it takes a lot longer to process compared to a google-cloud-storage implementation.

list_blobs(match_glob="**/version_1/**")

Furthermore, when having task manager open when performing a glob on the bucket I observe significantly higher network footprint (when using GSPath) in comparison to the list_blobs implementation.

My guess is that cloudpathlib may potentially be sending more network request than necessary (correct me if I'm wrong)

Any reasons why we don't just leverage the match_glob arg for GSPath's glob capabilities?

GCloud SDK list_blobs(match_glob="") reference below:
https://github.com/googleapis/python-storage/blob/main/google/cloud/storage/bucket.py#L1407

a GSPath("/path/to/folder/").glob("**/version_1/**) to my belief can be translated to list_blobs by doing the following:
list_blobs(prefix="/path/to/folder", match_glob="**/version_1/**")

happy to submit something if you would like this change incorporated :)

Metadata

Metadata

Assignees

No one assigned

    Labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

      Relationships

      None yet

      Development

      No branches or pull requests

      Issue actions

        AltStyle によって変換されたページ (->オリジナル) /