-
-
Notifications
You must be signed in to change notification settings - Fork 8k
Fix macosx segfault #17084
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Fix macosx segfault #17084
Conversation
If you delete the figure, and run a gc, does it also crash? As you noted, the code in FigureManager_dealloc
is the same and I would expect it to be called either on exit, or on a garbage collection run.
@QuLogic I'm not sure how to do this. I tried with the following code (which works on master
):
import matplotlib.pyplot as plt fig, ax = plt.subplots() plt.show(block=False) plt.close(fig) del fig
However, del fig
only deletes the name and not the object, after which GC may kick in and eventually remove the fig
object (I also tried appending some code to the end to avoid it being the last line). I don't think that this triggers any of the two functions.
My thought is either add gc.collect()
after deleting it, or never close or delete the figure (then it'd get collected on exit?)
No, this doesn't trigger the segfault.
I think plt.close(fig)
followed by del fig
and gc.collect()
is not necessarily enough to get it to go away (at least in Linux testing). Adding this to the Figure
class in matplotlib/figure.py
:
def __del__(self):
print('deleting %s' % (self,))
And running:
import gc
import matplotlib.pyplot as plt
fig, ax = plt.subplots()
plt.show(block=False)
plt.close(fig)
del fig
gc.collect()
Nothing prints until exit()
is actually called (and then it prints the "deleting" line). Starting to look at why, I tried:
import gc
import matplotlib.pyplot as plt
fig, ax = plt.subplots()
plt.show(block=False)
plt.close(fig)
print(gc.collect())
ref = [x for x in gc.get_referrers(fig) if not isinstance(x, dict)]
print(ref)
del fig
print(gc.collect())
We see:
0
[<bound method Figure.delaxes of <Figure size 640x480 with 1 Axes>>]
0
Commenting this line:
ax._remove_method = self.delaxes
Makes the list empty, but it's still not GC'ed. Not sure why that would be, maybe someone wants to look deeper. But at least this suggests to me that the del
and gc.collect()
are not enough to trigger the segfault possibly because the figure is not actually getting GC'ed.
Also perhaps when exiting (which does indeed cause it to get GC'ed and print) the order of cleanup is different in a way that does not cause a segfault -- you could try printing messages for __del__
methods, and/or adding print messages to close statements to figure this out. It might be informative to see if the order matters.
Figures should be GC'able once closed and unreferenced, but I guess that's a separate issue from this one. Maybe adding a lambda
around delaxes
like @anntzer often does will be enough to break the ref.
I don't quite understand why there are so many NSAutoreleasePool
objects in the code. I've never worked with Cocoa, but the documentation clearly states that this is typically not necessary: https://developer.apple.com/library/archive/documentation/Cocoa/Conceptual/MemoryMgmt/Articles/mmAutoreleasePools.html#//apple_ref/doc/uid/20000047
There are three situations where this might be necessary, but none of them apply to matplotlib AFAICT.
Actually, I removed all NSAutoreleasePool
objects in the code, and everything I tried still works (and the segfault is also resolved).
Reading those docs you linked, I'm inclined to agree with your conclusions. I'd feel more confident if we could do a stress test of the memory usage, though.
Ah, there's one small bug in that test; you need to delete ax
too, since it holds a ref to its figure:
import matplotlib.pyplot as plt
fig, ax = plt.subplots()
del ax
plt.show(block=False)
plt.close(fig)
del fig
gc.collect()
or never create it:
import matplotlib.pyplot as plt
fig = plt.figure()
plt.show(block=False)
plt.close(fig)
del fig
gc.collect()
I'm not sure how to properly stress test this, but I've compared a memory profile using the following code snippet for both this branch and master:
import tracemalloc from functools import partial import matplotlib.pyplot as plt import matplotlib matplotlib.use("macosx") print(matplotlib.__version__) def close_event(event, fig): plt.close(fig) snapshot1 = tracemalloc.take_snapshot() for n in range(1000): fig = plt.figure() close_callback = partial(close_event, fig=fig) fig.canvas.mpl_connect("close_event", close_callback) plt.show(block=False) # block=True works plt.close(fig) snapshot2 = tracemalloc.take_snapshot() diff = snapshot2.compare_to(snapshot1, 'lineno') for stat in diff[:10]: print(stat)
Both branches show the same difference (+2156 KiB) when comparing before and after creating and closing 1000 figures. Here's the output for this branch:
3.2.1.post1906+gb96da85cf
/Users/clemens/Repositories/matplotlib/lib/matplotlib/figure.py:474: size=2156 KiB (+2156 KiB), count=1000 (+1000), average=2208 B
/usr/local/Cellar/python@3.8/3.8.2/Frameworks/Python.framework/Versions/3.8/lib/python3.8/weakref.py:51: size=938 KiB (+938 KiB), count=10000 (+10000), average=96 B
/Users/clemens/Repositories/matplotlib/lib/matplotlib/transforms.py:115: size=937 KiB (+937 KiB), count=15000 (+14997), average=64 B
/Users/clemens/Repositories/matplotlib/lib/matplotlib/transforms.py:199: size=867 KiB (+867 KiB), count=12000 (+12000), average=74 B
/Users/clemens/Repositories/matplotlib/lib/matplotlib/cbook/__init__.py:181: size=820 KiB (+820 KiB), count=5000 (+5000), average=168 B
/Users/clemens/Repositories/matplotlib/lib/matplotlib/artist.py:74: size=788 KiB (+788 KiB), count=2999 (+2999), average=269 B
/Users/clemens/Repositories/matplotlib/lib/matplotlib/cbook/__init__.py:172: size=704 KiB (+704 KiB), count=15001 (+15001), average=48 B
/Users/clemens/Repositories/matplotlib/lib/matplotlib/cbook/__init__.py:179: size=656 KiB (+656 KiB), count=3998 (+3998), average=168 B
/Users/clemens/Repositories/matplotlib/lib/matplotlib/cbook/__init__.py:180: size=539 KiB (+539 KiB), count=7000 (+7000), average=79 B
/Users/clemens/Repositories/matplotlib/lib/matplotlib/cbook/__init__.py:170: size=539 KiB (+539 KiB), count=6999 (+6999), average=79 B
And this is master:
3.2.1.post1906+gb96da85cf
/Users/clemens/Repositories/matplotlib/lib/matplotlib/figure.py:474: size=2156 KiB (+2156 KiB), count=1000 (+1000), average=2208 B
/usr/local/Cellar/python@3.8/3.8.2/Frameworks/Python.framework/Versions/3.8/lib/python3.8/weakref.py:51: size=938 KiB (+938 KiB), count=10000 (+10000), average=96 B
/Users/clemens/Repositories/matplotlib/lib/matplotlib/transforms.py:115: size=937 KiB (+937 KiB), count=15000 (+14997), average=64 B
/Users/clemens/Repositories/matplotlib/lib/matplotlib/transforms.py:199: size=867 KiB (+867 KiB), count=12000 (+12000), average=74 B
/Users/clemens/Repositories/matplotlib/lib/matplotlib/cbook/__init__.py:181: size=820 KiB (+820 KiB), count=5000 (+5000), average=168 B
/Users/clemens/Repositories/matplotlib/lib/matplotlib/artist.py:74: size=788 KiB (+788 KiB), count=2999 (+2999), average=269 B
/Users/clemens/Repositories/matplotlib/lib/matplotlib/cbook/__init__.py:172: size=704 KiB (+704 KiB), count=15001 (+15001), average=48 B
/Users/clemens/Repositories/matplotlib/lib/matplotlib/cbook/__init__.py:179: size=656 KiB (+656 KiB), count=3998 (+3998), average=168 B
/Users/clemens/Repositories/matplotlib/lib/matplotlib/cbook/__init__.py:180: size=539 KiB (+539 KiB), count=7000 (+7000), average=79 B
/Users/clemens/Repositories/matplotlib/lib/matplotlib/cbook/__init__.py:170: size=539 KiB (+539 KiB), count=6999 (+6999), average=79 B
However, I'm not sure if (1) I'm comparing memory usage correctly, and (2) I'm correctly creating and closing figures in order to test what I want to test.
b96da85
to
af58544
Compare
I am inclined to merge this conditional on one of our OSX based developers testing it locally
The autorelease pools may be left over from porting the previous mac backend (which had it's own renderer) to one the current version that uses Agg to render the plot.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Conditional on a committer with OSX verifying it works locally.
I think that's a good strategy. Even if it turns out that these changes do not work as expected in all scenarios, this can be quickly reverted, and users can always switch to the Qt5 backend. MNE devs recommend the PyQt5 backend even on macOS anyway, so in the long run it might be another option to change the default backend to PyQt5/PySide2 (QtPy) as is already the case on Linux and Windows.
This seems to have broken the save dialog. The error below results from clicking on the save-file icon in the toolbar after making a trivial plot.
In [5]: 2020年04月29日 09:14:25.097 python[551:23058739] +[NSXPCSharedListener endpointForReply:withListenerName:]: an error occurred while attempting to obtain endpoint for listener 'com.apple.view-bridge': Connection interrupted
I can't reproduce the original crash, so I don't understand why this fixes it. But looking over the code it does appear to me that @tacaswell is right -- the autorelease pools were leftover from a previous implementation and are no longer serving a purpose.
Also, by the way, the best way to create an autorelease pool is @autoreleasepool
rather than by allocating it directly. See: https://developer.apple.com/library/archive/documentation/Cocoa/Conceptual/MemoryMgmt/Articles/mmAutoreleasePools.html
https://bugs.openjdk.java.net/browse/JDK-8227836
It seems an identical error arises in a completely different context (java), but again connected with trying to bring up a file dialog.
@lawrence-danna-apple after checking out cbrnr:fix-maxosx-segfault
, did you compile the Objective-C file (i.e. did you pip install -ve .
)? I just tried again, and the segfault still occurs with master. For reference, here's the example script which you need to run in script mode (the issue isn't raised when run in interactive mode), i.e. run python example.py
:
from functools import partial import matplotlib.pyplot as plt import matplotlib matplotlib.use("macosx") def close_event(event, fig): plt.close(fig) fig, ax = plt.subplots() close_callback = partial(close_event, fig=fig) fig.canvas.mpl_connect("close_event", close_callback) plt.show(block=False)
@efiring I cannot reproduce the error you encounter with the save dialog. I tried plotting a figure, clicking on save, and saving the figure, and everything worked (on master as well as on this branch). I guess this is a different issue not related to these changes here.
I couldn't reproduce the original crash, either. Maybe there is some subtle timing involved, such that different hardware and/or OS versions behave differently. Two tests, A and B: for you, A fails and B passes; for me, A passes and B fails. Very strange. I'm on a Macbook pro 16", 6-core I7 at 2.6 GHz, Catalina 10.15.3.
@efiring that's really strange, but with macOS you never know what they changed. I'm on 10.15.4 on an older MacBook Pro 15" mid-2014. You did try the example in script mode (#17084 (comment)), right?
Yes, that's the one. But now I'm baffled. Repeating the tests, I still can't reproduce the original failure on master...but neither can I reproduce the save-dialog failure on your branch. I know it failed for me twice before. I'm stumped.
Oh wow, did I fix this by accident too 😛? On a serious note, does #17263 address the dialog issue?
This keeps happening to me, but most of the times I am on the wrong branch, forget to compile with pip install -ve .
or use the wrong matplotlib package (not the dev version). I find that it helps to completely start from scratch in these situations. But given that some people can reproduce the issue whereas others cannot, there seems to be something else (like different hardware, OS versions, and what not) involved. As long as after the fix this example works for everyone, it is not that important that everyone can reproduce the issue in the first place (but at least some people can confirm).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I still can't reproduce my earlier problem with this, so let's assume it wasn't caused by this PR.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I was able to reproduce the crash from #17061, and this PR fixes the crash for me. I also tested with a variety of static, animation, and event handling examples; everything seems to be working fine.
Owee, I'm MrMeeseeks, Look at me.
There seem to be a conflict, please backport manually. Here are approximate instructions:
- Checkout backport branch and update it.
$ git checkout v3.2.x
$ git pull
- Cherry pick the first parent branch of the this PR on top of the older branch:
$ git cherry-pick -m1 cb34934c6f9688571508151068c72e535e40162a
- You will likely have some merge/cherry-pick conflict here, fix them and commit:
$ git commit -am 'Backport PR #17084: Fix macosx segfault'
- Push to a named branch :
git push YOURFORK v3.2.x:auto-backport-of-pr-17084-on-v3.2.x
- Create a PR against branch v3.2.x, I would have named this PR:
"Backport PR #17084 on branch v3.2.x"
And apply the correct labels and milestones.
Congratulation you did some good work ! Hopefully your backport PR will be tested by the continuous integration and merged soon!
If these instruction are inaccurate, feel free to suggest an improvement.
Fix macosx segfault
Fixes #17061.
Please note that I don't know if these changes have adverse effects; they fix the segfault described in #17061 though. It would be great if someone familiar with the
macosx
backend source took a quick look.