-
Notifications
You must be signed in to change notification settings - Fork 536
ENH: Disable symlinks on CIFS filesystems #1941
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
Codecov Report
@@ Coverage Diff @@ ## master #1941 +/- ## ========================================= + Coverage 72.49% 72.5% +0.01% ========================================= Files 1063 1063 Lines 54159 54189 +30 Branches 7811 7820 +9 ========================================= + Hits 39260 39288 +28 - Misses 13680 13682 +2 Partials 1219 1219
Continue to review full report at Codecov.
|
Determining the filesystem might be easier than we thought http://stackoverflow.com/a/25287521/616300
That requires psutil, but if it seems worth it to add the dependency, we can check for CIFS. Could you install psutil on your docker container and get a working is_cifs(path) function?
9840795 to
cbf2661
Compare
Does not require psutil.
def is_cifs(path):
exit_code, mount_str = subprocess.getstatusoutput("mount")
if not exit_code:
mount_points = [(line.split(" ")[2], line.split(" ")[4]) for line in mount_str.split("\n")]
sorted_mount_points = sorted(mount_points, key=lambda x: len(x[0]), reverse=True)
for k, v in sorted_mount_points:
print(k,v)
if path.startswith(k):
if v == "cifs":
return True
else:
return False
else:
return False
I also run the workflow with working directory inside the container (not mounted) and I did not get an error.
This means we should avoid symlinks only if the destination is on cifs.
9d176f2 to
0d5359d
Compare
@chrisfilo I refactored your test a little. Want to grab 0d5359d from me and verify that this works?
from nipype.utils.filemanip import _on_cifs assert (_on_cifs('/'), _on_cifs('/scratch')) == (False, True)
I think creating a CIFS mount in the docker image for testing might be excessive.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Works as advertised! Thanks for cleaning up my napkin quality code!
nipype/utils/filemanip.py
Outdated
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I would make it public - could be useful in other places
nipype/utils/filemanip.py
Outdated
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
For performance reasons, I would only run this check if we are about to make a symlink.
The performance issue is a good point. If this function's going to get called a lot, it might make sense to do something a little more persistent. We could do something like:
- At program start (or first call), call
mount, and get(mountpoint, fstype)tuples. - Prune all but superstrings of CIFS mounts
- Check target files against pruned list
d0f7f25 to
59c6ceb
Compare
Only check if symlinks might be tried
59c6ceb to
dafb5ee
Compare
Latest commit may be breaking some style guidelines. Let me know if I should reorganize.
@chrisfilo could you verify this still works? Updated test:
from nipype.utils.filemanip import on_cifs assert (on_cifs('/'), on_cifs('/scratch')) == (False, True)
Remove unnecessary optimization
8de8b79 to
a9cb28c
Compare
I'm seeing huge performance hit when using MultiProc. It seems that mount is called each time a Node is being run (or checked in it needs to be rerun) - even if no files are being copied.
On the most recent commit? The latest changes should make the mount call occur only once per run. Unless I misunderstand how things are being run...
I'm doing more debugging. It seems that rechecking EPIToT1Transform (nothing gets rewritten) takes longer than usual (it's really slow) however the mri_expand is happening in the background so I don't know if the total running time will differ.
But you are right - the table is generated only once. I'll do more debugging.
Re: EPIToT1, have you recently switched to content hashing, or using an EPI with more volumes? Because I've found that's always been slow when I used content hashing...
Unable to replicate the slowdown on mine.
I think the slowdown is unrelated to the change in this PR. Everything looks good to me, but since this is a change at the core of nipype it would be nice to have @satra hava a look at it as well.
Do you have remove_unnecessary_outputs disabled? If not, that might be your slowdown.
John Rogers-Lee and I spent some time looking at the cifs parsing because it ended up blocking the installation on my local Mac. I'll create a pull request with my suggested changes in filemanip.py.
Uh oh!
There was an error while loading. Please reload this page.
There appear to be cases where hacks to make symlinks work on filesystems that don't directly support them can cause
FileNotFoundErrors. In particular, the CIFS network filesystem driver for Linux supports themfsymlinksoption, which enables symlinks at the VFS level, having the effect that symlinking can be subject to network timeouts or temporary inconsistencies.CIFS+
mfsymlinksis the Docker-supported way to mount filesystems into a container from a Windows host, which means this may be increasingly a problem for Nipype users, as containers are becoming a more popular environment for running workflows. An error reported on nipreps/fmriprep#398 ran into this issue when running an interface that requires 4 symlinks, which split into twomapflowdirectories (per dataset being analyzed), using theMultiProcplugin.This PR permits users to disable symlinking in the config file with the
disable_symbolic_linksoption.Closes #1938.