-
Notifications
You must be signed in to change notification settings - Fork 266
PHPLIB-1323 Implement unlink for GridFS stream wrapper
#1206
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
0a08f8e to
7cbf64e
Compare
src/GridFS/WritableStream.php
Outdated
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm wondering if we can improve on this by collecting all IDs, then deleting all files with the given filename, followed by removing all chunks for the IDs we previously found.
While there's always a chance to end up in an undesirable state, e.g. by having older revisions of files remaining, the suggestion above would reduce this by instructing the server to delete all file documents with a given name, ensuring that future requests for such files would not produce any result. Thus, if there was an error while deleting any chunks, we wouldn't end with older revisions left over.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I will implement a solution consistent with rename #1207 (comment)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This would definitely benefit from some bulk strategy to remove the fs.files documents first (likely using $in on multiple _id values), and then cleaning up the orphan chunks. I think it'd also benefit from being addressed in the GridFS spec, since we'd probably want to have a consistent way to address this in any driver that wishes to do so. And codifying that in a spec would ensure we're not overlooking some edge case.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I implemented the bulk delete
- get all file ids by filename
- delete all files by id
- delete all chunks by file ids
The limitation is if there is more revisions that supported by a single command body.
cae12ee to
03f4ea2
Compare
03f4ea2 to
6da793c
Compare
src/GridFS/CollectionWrapper.php
Outdated
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is zero in case of WriteConcern=0 or not file to update.
src/GridFS/WritableStream.php
Outdated
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I implemented the bulk delete
- get all file ids by filename
- delete all files by id
- delete all chunks by file ids
The limitation is if there is more revisions that supported by a single command body.
docs/reference/method/MongoDBGridFSBucket-registerGlobalStreamWrapperAlias.txt
Outdated
Show resolved
Hide resolved
src/GridFS/WritableStream.php
Outdated
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I realize this is an internal class, so we're not actually introducing a public API here, but a delete() method that affects multiple revisions seems odd, which is otherwise an abstraction for writing a single revision.
In the PR description, you said:
The context resolver mechanism introduced in #1138 doesn't give direct access to the CollectionWrapper which is required for this feature.
If we look at stream_open, it accesses a CollectionWrapper after resolving the path to an array of context options. Couldn't the same approach be used here to access a CollectionWrapper directly?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I felt deep down that using ReadableStream and WritableStream wasn't correct. I refactored by adding a private getContext method to use CollectionWrapper directly for rename and unlink.
2d9c795 to
1d8e3c4
Compare
1d8e3c4 to
f8231a7
Compare
src/GridFS/StreamWrapper.php
Outdated
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Bucket::resolveStreamContext() executes findFileByFilenameAndRevision() for read modes in order to fetch the file document. Would it make sense to use 'w' here and avoid the unnecessary query?
I understand rename() needs to use 'r' since it does read the filename from the file document; however, there may be a refactoring opportunity there if you'd just like to trust the $fromPath string. I suppose it's a matter of intentionally NOP-ing on a FileNotFoundException or executing an update query that may not match anything -- so I'd be find leaving rename() as-is.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The idea of using r mode was to check that the file exists before doing operations, to throw a FileNotFoundException. This could be optimized by checking the number of updated documents after the update, but that's not feasible if Write Concern = 0 (edge case).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I reverted to throw a FileNotFoundException in rename and unlink, instead of silently returning false. This behavior is consistent with S3 stream wrapper.
9f582b7 to
70c71e8
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@jmikola removing or renaming a single revision is not supported. Even if the option gridfs[revision] is set.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Edge case: if $fromPath and $toPath are identical, the file exists but you'll throw FileNotFoundException. That may warrant reverting to an "r" mode when selecting the context to throw eagerly.
Alternatively, you could have updateFilenameForFilename() return the matched count and assume that the update was successful or a NOP. The only normal scenario where the matched and modified counts wouldn't be equals is if the paths were the same, in which case you could probably NOP earlier and avoid calling updateFilenameForFilename() altogether (FWIW, PHP's rename() returns true in this case, but I dunno if there's some internal NOP behavior).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good catch. I updated the code to return the matched count, and added a test for renaming with the same filename (existing and not existing).
src/GridFS/StreamWrapper.php
Outdated
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Not sure if this is relevant, but the ternary here only checks for "r" and stream_open() may call this with "rb". Can you invoke functions here, or are you restricted on syntax here?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The logical or works. I also added the @return annotation so PHPStorm understands it.
01663eb to
1fa8976
Compare
* master: PHPLIB-1323 Implement `unlink` for GridFS stream wrapper (mongodb#1206) PHPLIB-1330: Sync tests for failCommand errorLabels reqs (mongodb#1214) PHPLIB-1246: Test PHP 8.3 on Evergreen (mongodb#1213) PHPLIB-1324 Implement `rename` for GridFS stream wrapper (mongodb#1207) PHPLIB-1248 Add examples on GridFS (mongodb#1196) Deprecate setting GridFS disableMD5 to false explicitly (mongodb#1205) PHPLIB-1326: Use more permissive top-level runOnRequirements (mongodb#1210) PHPLIB-1206 Add bucket alises for context resolver using GridFS StreamWrapper (mongodb#1138) Bump actions/upload-artifact from 3 to 4 (mongodb#1208) PHPLIB-1275: Replace apiargs usage in docs with extracts (mongodb#1203) Fix title formatting in Client::removeSubscriber() docs (mongodb#1204) PHPLIB-1304: Pull mongohouse image from ECR repo (mongodb#1202) Fix evergreen failures (mongodb#1200) Enable workflows to run for GitHub Merge Queue (mongodb#1199) PHPLIB-1313 Ensure the GridFS stream is saved when the script ends (mongodb#1197) PHPLIB-1309 Add addSubscriber/removeSubscriber to Client class to ease configuration (mongodb#1195) Master is now 1.18-dev
Fix PHPLIB-1323
This is not optimized as it creates a
WritableStreaminstance that is not really used.The context resolver mechanism introduces in #1138 doesn't give direct access to the
CollectionWrapperwhich is required for this feature.