-
Notifications
You must be signed in to change notification settings - Fork 267
ENH: Accept pathlib.Path objects where filenames are accepted #610
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
For python >= 3.6, objects representing paths now have the option to have a dunder `__fspath__` (See PEP 519 for more info) to return the string (or bytes) representation of the path. Pathlib on 3.6 includes this method so we should check for it first. Then, for pathlib objects from the py 3.4-3.5x, `__fspath__` is not available, so we dip into the py3k module to get the current python version's (2 or 3) string method (`unicode` in this case). So overall we're allowing filename to be a path object, but ensuring that it will be converted to the string/bytes representation of the path before passing it on to `image_klass.from_filename` My only reservations is that `nib.load` lowest entry point for this type of handling, because if some one really wants to call `from_filename` from an image_klass directly, they won't have this pathlib compatibility- however I think that `nib.load` is a very common entry point, which justifies the placement.
☔ The latest upstream changes (presumably #611) made this pull request unmergeable. Please resolve the merge conflicts.
Why is the PR unmergeable?
It cannot be merged because there are conflicts with the master branch - see the web interface for this PR for links.
@CRiddler, do you think you can fix the conflicts?
Codecov Report
@@ Coverage Diff @@ ## master #610 +/- ## ========================================= + Coverage 90.1% 90.31% +0.2% ========================================= Files 96 96 Lines 11914 12171 +257 Branches 2125 2127 +2 ========================================= + Hits 10735 10992 +257 Misses 834 834 Partials 345 345
Continue to review full report at Codecov.
|
The conflicts have been resolved, however as stated earlier- this approach to resolving pathlib objects is not final nor foolproof. Essentially, we are checking if the filename
is a Pathlike
object which is only available in python 3.6+ and coercing that properly. If its not a Pathlike
then we're coercing the object to six.unicode type. What happens when the user passes in the filename as None
? Should we be coercing None to a string/unicode type? While the example is a corner case, it is still technically a case of an improperly handled input which can be fixed by a more in-depth filename handling procedure as discussed in the earlier comments.
I'm setting this milestone to 3.0. Once we're only dealing with Python 3.5+, that reduces the range of behaviors we need to accommodate. If you want to try for 2.5 and support Python 2's pathlib
, we can do that.
Hi @CRiddler, @fepegar. By pushing off to Python3-only, I think we can be a lot dumber about this. I would say the logic can simply be:
filename = pathlib.Path(filename)
If we place this at the head of load
, save
, to_filename
and from_filename
, then we can just work with Path
s internally, converting to str
only when we hit functions that don't support them.
Sound like a reasonable approach? We would also want to add some tests where we pass in valid and invalid strings, paths, path-like objects, and non-path-likes.
Sounds reasonable to me.
Hello @CRiddler, Thank you for updating!
Cheers! There are no style issues detected in this Pull Request. 🍻 To test for issues locally, pip install flake8
and then run flake8 nibabel
.
Comment last updated at 2019年10月23日 15:31:04 UTC
Hi @CRiddler, we're now requiring Python 3.5.1+, so we can resume work on this with simpler assumptions about the sort of objects we're going to be passed. I merged in master to resolve conflicts and get you going, but let me know how you want to proceed here.
Hi @effigies, if we do want to convert everything into pathlib.Path
objects and deal with those internally then a good amount of filename_parser.py
will have to be rewritten. Do we still want to try to move forward with pathlib internally? Or should we add in the method that pandas
uses to coerce pathlike objects to python3 string types?
Hmm. I haven't really scoped out where all we will need to make modifications to use Path
s internally. I figured we'd find the entry-points, and follow the logic to see where things break.
But if we want to take the Pandas approach, the entry-points are still the first step, so I don't see these as competing approaches. If anything the Pandas-like one would be a good first pass, and then we could evaluate if we want to take the plunge with internal pathlib.Path
s. There are a number of places where we take either filenames or open filehandles, so this could end up getting messy, and might not be worth it.
As a note, it looks like Pandas has simplified their check: https://github.com/pandas-dev/pandas/blob/325dd686de1589c17731cf93b649ed5ccb5a99b4/pandas/io/common.py#L131-L160
Their minimum Python is now 3.5.3, so it seems pretty safe to follow suit. I don't think we need to copy the expand_user
bit, though.
I agree in that coercing everything to strings at the entry points will be the easiest way to go. The updated pandas method looks like it'll work perfectly for us since __fspath__
is supported in 3.6+ and coercing pathlib.Path
types to str
will support any 3.5 <= 3.6 versions.
I will update the branch with a function similar to pandas
approach (not sure how to give them credit for that though) and then add that function in the entry points you listed. We can see where to go from there? I can probably get around to this by the end of the week.
Cool. We have external licenses in COPYING, IIRC, so that and a link to the original code (with specific revision) should suffice. Our licenses are compatible.
@CRiddler Just a bump on this one. Let me know if there's anything I can do to help.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is looking good. Some suggestions on paths to guard.
And we should also verify that we can pass pathlib.Path
s to each method that's guarded. One good place would be here:
nibabel/nibabel/tests/test_image_api.py
Lines 146 to 159 in 1d91b72
I would just make it loop, e.g.:
fname = 'an_image' + self.standard_extension for path in (fname, Path(fname)): ...
And here, as well:
nibabel/nibabel/tests/test_image_load_save.py
Lines 256 to 257 in 1d91b72
And also in https://github.com/nipy/nibabel/blob/master/nibabel/tests/test_loadsave.py.
Finally, all functions/methods that accept os.PathLike
objects should have their docstrings updated.
@CRiddler Just a bump on this. Let me know if there's anything I can help with. I'm going to aim to feature freeze 3.0 on October 28 (three weeks from Monday) so we can have a decent RC period.
Yep! This is still on my radar, classes started for me this past week so I've been a little busy. This is the first on my to-do list once I have a bit of free time, so I should have it done over the weekend.
I mean... also enjoy your weekend.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looking good. Some minor fixes and cleanups.
Also from one of the tests:
======================================================================
ERROR: autogenerated test from validate_filenames
----------------------------------------------------------------------
Traceback (most recent call last):
File "c:\hostedtoolcache\windows\python3円.5.4\x64\lib\site-packages\nose\case.py", line 198, in runTest
self.test(*self.arg)
File "c:\hostedtoolcache\windows\python3円.5.4\x64\lib\site-packages\nibabel\tests\test_api_validators.py", line 20, in meth
validator(self, imaker, params)
File "c:\hostedtoolcache\windows\python3円.5.4\x64\lib\site-packages\nibabel\tests\test_image_api.py", line 146, in validate_filenames
img.set_filename(path)
File "c:\hostedtoolcache\windows\python3円.5.4\x64\lib\site-packages\nibabel\filebasedimages.py", line 255, in set_filename
self.file_map = self.__class__.filespec_to_file_map(filename)
File "c:\hostedtoolcache\windows\python3円.5.4\x64\lib\site-packages\nibabel\freesurfer\mghformat.py", line 533, in filespec_to_file_map
if splitext(filespec)[1].lower() == '.mgz':
File "c:\hostedtoolcache\windows\python3円.5.4\x64\lib\ntpath.py", line 224, in splitext
return genericpath._splitext(p, '\\', '/', '.')
File "c:\hostedtoolcache\windows\python3円.5.4\x64\lib\genericpath.py", line 118, in _splitext
sepIndex = p.rfind(sep)
AttributeError: 'WindowsPath' object has no attribute 'rfind'
It looks like MGHImage
has its own implementation of filespec_to_file_map
that needs to be updated.
Hmm. Looks like your history got a bit messed up there.
Screen Shot 2019年10月16日 at 16 51 58
I'd try rebasing onto fbaeacc. Should be pretty clean, as it's the merge commit that seems to be causing damage.
Hopefully that fixed, I always struggle when it comes to wrestling with git history. If it needs some more reworking, I may need some hand-holding to resolve the history issue.
Nope.
Screen Shot 2019年10月16日 at 17 37 34
So the easy way to do this is just:
git rebase -i fbaeacc
That will open an editor, probably vim
or nano
:
pick 6056014b update _stringify_path doc
pick b54926e3 update docstrings to accept str or os.PathLike
pick 8a34a7a0 mghformat accept pathlib for filespec_to_file_map
pick d6c96c02 tests pathlib compatible
pick cf631067 fix _stringify_path doc
That should be fine, so save and exit. Now you'll see:
Auto-merging nibabel/loadsave.py
Auto-merging nibabel/filename_parser.py
CONFLICT (content): Merge conflict in nibabel/filename_parser.py
Auto-merging nibabel/filebasedimages.py
error: could not apply b54926e3... update docstrings to accept str or os.PathLike
Resolve all conflicts manually, mark them as resolved with
"git add/rm <conflicted_files>", then run "git rebase --continue".
You can instead skip this commit: run "git rebase --skip".
To abort and get back to the state before "git rebase", run "git rebase --abort".
Could not apply b54926e3... update docstrings to accept str or os.PathLike
If you run git status
, you should see:
interactive rebase in progress; onto fbaeacc9
Last commands done (2 commands done):
pick 6056014b update _stringify_path doc
pick b54926e3 update docstrings to accept str or os.PathLike
Next commands to do (3 remaining commands):
pick 8a34a7a0 mghformat accept pathlib for filespec_to_file_map
pick d6c96c02 tests pathlib compatible
(use "git rebase --edit-todo" to view and edit)
You are currently rebasing branch 'loadsave-pathlib_compat' on 'fbaeacc9'.
(fix conflicts and then run "git rebase --continue")
(use "git rebase --skip" to skip this patch)
(use "git rebase --abort" to check out the original branch)
Changes to be committed:
(use "git reset HEAD <file>..." to unstage)
modified: nibabel/loadsave.py
Unmerged paths:
(use "git reset HEAD <file>..." to unstage)
(use "git add <file>..." to mark resolution)
both modified: nibabel/filename_parser.py
So edit nibabel/filename_parser.py
as you would during a merge conflict (search for <<<<
and resolve), then:
git add nibabel/filename_parser.py
git rebase --continue
You may be dropped into an editor again, giving you a chance to modify the commit message. You can just save and exit, to leave it unchanged. Then you might see:
[detached HEAD 2d37fb71] update docstrings to accept str or os.PathLike
Author: Cameron Riddell <cameronriddell1@gmail.com>
1 file changed, 2 insertions(+), 2 deletions(-)
The previous cherry-pick is now empty, possibly due to conflict resolution.
If you wish to commit it anyway, use:
git commit --allow-empty
Otherwise, please use 'git reset'
interactive rebase in progress; onto fbaeacc9
Last commands done (5 commands done):
pick d6c96c02 tests pathlib compatible
pick cf631067 fix _stringify_path doc
No commands remaining.
You are currently rebasing branch 'loadsave-pathlib_compat' on 'fbaeacc9'.
nothing to commit, working tree clean
Could not apply cf631067... fix _stringify_path doc
In this case, there's a commit that had no effect. So you can just
git rebase --continue
When you've iterated through all of the necessary changes:
> git status
On branch loadsave-pathlib_compat
Your branch and 'origin/loadsave-pathlib_compat' have diverged,
and have 3 and 6 different commits each, respectively.
(use "git pull" to merge the remote branch into yours)
nothing to commit, working tree clean
Since your GitHub branch has a lot of old stuff in it, instead of:
git diff origin/loadsave-pathlib_compat
I would suggest just comparing to the commit you rebased against:
git diff fbaeacc9
If that looks reasonable, go ahead and force-push:
git push -f
6374ede
to
e98dbfc
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM. Two remaining comments that may have gotten lost in the mix.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There are a couple style issues (see #610 (comment)). And one small fix:
Great! Thanks for this.
Yay! Thank you guys for having worked on this.
For python >= 3.6, objects representing paths now have the option to have a dunder
__fspath__
(See PEP 519 for more info) to return the string (or bytes) representation of the path. Pathlib on 3.6 includes this method so we should check for it first. Then, for pathlib objects from the py 3.4-3.5x,__fspath__
is not available, so we dip into the py3k module to get the current python version's (2 or 3) string method (unicode
in this case). So overall we're allowing filename to be a path object, but ensuring that it will be converted to the string/bytes representation of the path before passing it on toimage_klass.from_filename
My only reservations is that
nib.load
lowest entry point for this type of handling, because if some one really wants to callfrom_filename
from an image_klass directly, they won't have this pathlib compatibility- however I think thatnib.load
is a very common entry point, which justifies the placement.