GNOME Bugzilla – Bug 701571
1ad25dfb causes NSApp to not notice NSWindow destruction
Last modified: 2018-04-15 00:27:59 UTC
This change, a fix for bug 647108, moved the per-main-loop-iteration autoreleasepool release from gtk_event_check() to gtk_event_dispatch(). This has the (rather odd) side effect of causing the NSApplication singleton to fail to notice that a top level window has been destroyed, which leaves it in the Window menu. Of course, selecting a destroyed window from the Window menu crashes.
I think you meant bug 674108 instead? Strange side effect, we should be figuring out what is causing this. I have stumbled upon some other run loop weirdness too in the past few months.
Yes, 674108. Twisted fingers, sorry. Weirdness indeed. Might it be better to refresh the base autoreleasePool *after* dispatch() so that allocations from polling remain available throughout the iteration?
I can confirm that John is right regarding cause. Compiling bluefish on Mac OSX 10.8.2, using stable moduleset (gtk+ 3.6.4, gtk-mac-integration 2.0.1). When I replace original gdkeventloop-quartz.c file with one from gtk+ 3.5.16 tarball (this one does not have patch for 674108 applied), then Window menu issues disappear. Moreover, reverting solves also problem with memory leak (initially I thought that this is bluefish issue, but it seems it is not). It seems that top level window is not only not removed from Window menu list, but also from the memory. This causes memory usage to grow continuously with each window opened/closed. When using bluefish, the memory usage quickly grows from ~57 MB to hundreds of MB, and closing any of the opened windows/projects/documents does not reduces the memory in use. Compiling with 3.5.16 version gdkeventloop-quartz.c file solves this issue. I think we have to put more effort in fixing this, since all gtk+ build starting from 2.24.13 or 3.5.18 are affected by this.
Yes, I've confirmed the same by moving the autoreleasepool block back into gdk_event_check(). What I didn't notice (because I didn't look) is the leak. Do you see the leak if you disable gtkosx_application?
It was pain to revert back when everything started to work... Well, without gtkosx the memory leak is still present. It is not completely consistent, for example if I just open and close some windows there is no leak. If I open and do something, then close it, then there is leak. It seems related to losing focus. Transient toplevels always produce leak.
Interesting. I found some leaks in gtkosxapplication's menu code, but when I disabled it I found no leaks. If you could run with MallocStackLogging=1 and then (in a separate shell) run leaks(1), that will tell us (and you) what exactly is leaking.
Created attachment 246668 [details] leaks output log
Probably I will need some more instructions regarding leak test. Below is what I did, what I got see in attached file. $ env MallocStackLogging=1 sh /Applications/Bluefish.app/Contents/MacOS/bluefish Then from new terminal $ leaks bluefish-bin The leaks was executed memory usage was ~60 to 70 MB higher than it should be.
What's libxpc do? Seems at first blush to be the source of a bunch of your leaked objects. If you're not a bluefish developer, I'd say it's time to get one involved.
I have no idea what libxpc do. Googling also did not provided much clues. Looking at the log file I might guess that it is involved with dealing with file system. Anyway, this is Apple library, and when looking to Activity Monitor I do not see it beeing opened by bluefish. I did some extra comparison, hopefully this will provide some clues. Please find attached two leaks dumps. Both are with gtkosx disabled. The first one is compiled with 3.6.4 version gdkeventloop-quartz.c file, the second one - with 3.5.16. leaks is run after approximately the same number of actions with bluefish (open window, type something, close it and so on). Memory usage (by Activity Monitor) after actions: 3.6.4 version = 166 MB 3.5.16 version = 113 MB I also checked 3.5.16 version with gtkosx enabled. It has huge number of leaks (~3000, log file size 5MB). Most of them related to gtkosx integration. I think a lot of them are bluefish bugs (there are g_warnings produced), and we will try to fix them. From the other side, memory usage is just 115 MB (just 2 MB larger in comparison to version without gtkosx) and Window menu works correctly.
Created attachment 246681 [details] leaks output log no gtk-osx integration, 3.5.16 version
Created attachment 246682 [details] leaks output log no gtk-osx integration, 3.6.4 version
Hmm, the output from leaks claims to only see nodes for a total of ~15MB. So then a large part of the leaks would be missing? Or the "hundreds of MB" of extra memory usage is still referenced somehow? Most of the leaks seem to be in the internal implementation of the getfsent() system call, which uses libxpc (an interprocess communication library as far as I know) internally. libxpc is objective-c code, so it could be affected by the autorelease pools, but on the other hand it is "hidden" behind a generic C system call. I would need to look into that in more detail. Also, we need to figure out where the "hundreds of MB" of extra memory usage is coming from. Is this due to gtk-osx integration or something else?
> Also, we need to figure out where the "hundreds of MB" of extra memory usage is > coming from. Is this due to gtk-osx integration or something else? No, he's got gtk-osx disabled for these tests. When I set a breakpoint on gdk_window_impl_quartz_finalize I see that impl->toplevel (its NSWindow*) still has a retainCount of 4 or more. Since leaks doesn't show any leaked NSWindows, *something* is holding references to them in a way that's affected by where in the mainloop the pool is refreshed.
Kristian, about missing MB: Large part of MBs comes from MallocStackLogging environment. If I do not use it, then the memory usage after the same sequence of actions is 66MB instead of 113 MB (for 3.5.16 version). There is also large number of files opened by bluefish together with bluefish-bin executable (.so, .dylib and etc.). Maybe these also consumes some memory? If there is any way to check something more I will be glad to help.
Thanks to https://developer.apple.com/library/mac/#technotes/tn2124/_index.html I found call (void)_CFAutoreleasePoolPrintPools(), which when run from my breakpoint in gdk_window_impl_quartz_finalize() starts with objc[38238]: ############## objc[38238]: AUTORELEASE POOLS for thread 0xac2a7a28 objc[38238]: 3708 releases pending. objc[38238]: [0x117e000] ................ PAGE (full) (cold) By inspection most of the lines that follow look like objc[38238]: [0x117e614] 0x1a58e20 NSEvent There are 3 pools. With the autoreleasepool reset moved back to gdk_event_check(), the pools report looks like: objc[21576]: ############## objc[21576]: AUTORELEASE POOLS for thread 0xac2a7a28 objc[21576]: 100 releases pending. objc[21576]: [0x13d3000] ................ PAGE (hot) (cold) with only 2 NSEvents, and only one pool. If I split the creation and draining so that the pool is created in gdk_event_prepare and drained at the *end* of gdk_event_dispatch (which is what I intuitively think is correct, because then one pool is in effect for a full iteration), it's back to 1802 releases, but with over 120 undrained pools, most of which have 2 NSEvents, but a few of which are empty. Since I protected both the allocation and drain with if (current_loop_level == 0 && g_main_depth() == 0) ISTM either gdk_event_dispatch() isn't always getting called or that condition isn't always true even at the base level. Returning to a single block at the beginning of gdk_even_prepare produces 5 releases with 1 pool: (gdb) call (void)_CFAutoreleasePoolPrintPools() objc[55734]: ############## objc[55734]: AUTORELEASE POOLS for thread 0xac2a7a28 objc[55734]: 5 releases pending. objc[55734]: [0x139c000] ................ PAGE (hot) (cold) objc[55734]: [0x139c028] 0xf450e0 NSUserDefaults objc[55734]: [0x139c02c] 0xf450e0 NSUserDefaults objc[55734]: [0x139c030] 0xf450e0 NSUserDefaults objc[55734]: [0x139c034] 0xf450e0 NSUserDefaults objc[55734]: [0x139c038] ################ POOL 0x139c038 objc[55734]: ############## This is clearly the result that we want, as long as it doesn't cause trouble with dnd. Initial testing with testdnd seems OK. To make testing easy for everyone, I'll attach a patch.
Created attachment 246896 [details] [review] Move autoreleasepool reset to gdk_event_prepare()
Do we really want autorelease_pool initialized the first time outside of the event loop? That's going to pick up anything added to a pool between the application calling gtk_init() and gtk_main() and trash it at the beginning of the first iteration. Maybe we should have a second NSAutoreleasePool for that, though without an event loop finalize function I'm not sure where we'd drain it. Indeed, when I add the second pool I find that it has a little over 300 objects in it at the point that the mainloop one is drained in gdk_event_prepare.
I have tested John's patch with the same bf source code, and it produces approximately the same result as "3.5.16" version file. Just to remind, no gtk-osx integration. The memory usage and number of leaks approximately the same as for "3.5.16". Since bf does not use dnd, no chance to test it. When I put back gtk-osx, Window menu works correctly.
Okay, I can reproduce this now. The broken window menu is actually not a strange side effect, as windows are removed from the menu with the following backtrace:
+ Trace 232131
#9: widget unrealize/destroy As John has already found, the NSWindows are simply never released due to the retain count larger than 1. And because of this they are never removed from the windows menu. It turns out that with the current pool refresh in the dispatch function, the drain method is simply *never* called. I think we need to fully rethink how we handle the autorelease polls. Moving to prepare seems viable, but I really want to understand the full problem now to get it right :) I will give it some thought.
(In reply to comment #16) > If I split the creation and draining so that the pool is created in > gdk_event_prepare and drained at the *end* of gdk_event_dispatch (which is what > I intuitively think is correct, because then one pool is in effect for a full > iteration), it's back to 1802 releases, but with over 120 undrained pools, most > of which have 2 NSEvents, but a few of which are empty. Since I protected both > the allocation and drain with > if (current_loop_level == 0 && g_main_depth() == 0) > ISTM either gdk_event_dispatch() isn't always getting called or that condition > isn't always true even at the base level. From what I saw, this condition never holds in the dispatch function, which likely explains the problem... I need to re-read the main loop code in order to get this right.
It is clear to me now why the patch in 674108 fixed the autorelease problems, it simply stopped draining the pool. dispatch callbacks are always called with a g_main_depth() of at least 1 from what I can see, so the if condition was never true. My current problem that is holding me from further progressing on this is that _CFAutoreleasePoolPrintPools() is never outputting anything for me on my 10.7 machine. I tried enabling the usual environment variables (NSDebugEnabled et al), but that didn't help. I hope to get that function going soon, because it seems a very useful tool for getting this right.
Are you trying to actually print with it, or to use it in the debugger like the TN2124 example?
Calling the function using the "call" command in the debugger, like in the example. I just tried on my Snow Leopard machine, and on there it just works. So something must be broken on my Lion machine.
Strange. It "just works" for me in 10.8, too.
Created attachment 247787 [details] [review] Patch for Gtk2: Moves autoreleasepool reset, adds outer pool Here's a patch for gtk2. I've included the outer autoreleasepool in this. It doesn't get released when the display is taken down, but that's at program termination so it probably doesn't do too much harm.
Apparently the print pools does not work on Mitch' 10.7 machine either. So I will hash out some things on 10.6 then. John: I will play with your patch from comment 26 as well.
Cross posting from bug 674108: The patch does release all objects properly contrary to the current code in master, but it does not solve the main problem we were seeing in gimp (this is only visible when reverting gimp commit 668891e7455ff9107901b004e7ce9d422041ef71). The patch we committed did fix this problem, but that happened because the body of the if-statement was never executed -- the autorelease pool was never drained. This can be easily fixed by comparing g_main_depth() to 1 instead of 0, so that the pool is drained correctly. However, also with this change, it does no longer solve the problem we saw in gimp. It turns out that the real problem is that the pool is currently drained when poll_func is active (and with NSApplication nextEventMatchingMask in the current stack trace). This indicates that something must be wrong with how recursive main loops are handled currently. I am thinking about a fix for both of these issues. Hopefully these problems are then gone forever.
The following trace is from GIMP with the mentioned commit reverted. The trace has been edited for legibility:
+ Trace 232187
Note the two instances of g_main_context_iterate() in the stack trace. At the bottom of the stack, poll_func() is polling for a Cocoa event. While this poll_func() is "active", the main loop is run again. Neither the current_loop_level or the g_main_depth() capture this. So all subroutines of the main loop are executed (including prepare) under the assumption the main loop is at depth 0, that is, no other main loop invocation is ongoing. As a result, the autorelease pool may be drained, potentially releasing data that is in use at the bottom of the stack. One way to remedy this is to also check for getting_events == 0 before draining the pool. This was also suggested by Daniel Sabo in bug 674108. Also with this in place, there are still deadlocks in GIMP. I suspect this is due to the "last_ufds" handling in poll_func() not being fully reentrant yet, even though it was a step in the right direction. I will play some more with poll_func() to make it fully reentrant. From that should follow how we should correct the autorelease pool handling.
Interesting. Poll_func() shouldn't really be making anything happen, but [NSApplication nextEventMatchingMask:untilDate:inMode:dequeue:] appears to be dispatching events. That's the bug: g_main_depth counts invocations of g_main_dispatch, and dispatching events when not in g_main_dispatch prevents it being incremented. So unless we can find a way to make poll_func() work correctly, we need to go with the stack of pools suggestion, pushing one at the start of gdk_event_prepare and popping it and draining it unconditionally at the end of gdk_event_dispatch; that would at least ensure that each g_main_iteration messes only with its own pool.
Solving the pools is not the problem anymore, I am pretty confident I know how we should do it. The problems now is to get polling to work correctly on re-entrant invocations of poll_func(). I wrote down some ideas that I will be exploring.
Well, let's get the pools fixed so we stop leaking NSEvents. How do you want to do it?
I think I have solved the problems. I will attach a glib and GTK+2 patch that correct both the autorelease pool handling and pool function. For the autorelease pool handling, I agree moving this to prepare is the right thing to do. The pool is only refreshed if all three possible recursion levels are at the base level. (See the comments in the patch for more details). I have decided against the outer autorelease pool approach, because I do not see the use of keeping this data around for the lifetime of the process. There is no main loop recursion involved between the event loop initialization and start of the event loop. So the dangers we are addressing within the event loop, should not occur during the initialization phase. All memory that was autoreleased in this phase can be safely released at the first event loop iteration.
Created attachment 248598 [details] [review] glib patch to make things work with recursion in poll Attaching here for a first round of testing. If everything seems proper I will submit this through glib bugzilla for review.
Created attachment 248599 [details] [review] GTK+ 2 patch to fix issues in the event loop handling Also see the extensive comments within the patch. This patch has been tested with GIMP 2.9 with commit 668891e7455ff9107901b004e7ce9d422041ef71 reverted. Everything seems to work properly (open recent & rounded rectangle selection script-fu). Also verified that the gtk-mac-integration test case works properly.
Is there any way to test the patches with stable moduleset for gtk+3 (which currently uses glib 2.34.2 and gtk 3.6.4)? Probably, this is question for John...
The glib patch should apply directly and the gtk patch will probably need some minor edits. This is all pretty stable code, and we've kept the two mainlines (2.24 and master) mostly in sync.
FYI: Mitch has tested the patches with GIMP. It seems to work well for him, except for a small issue with the Open Recent menu. I could not reproduce that issue yet, but I have a stab at that later this week.
Created attachment 248973 [details] [review] gdkeventloop-quartz.c patch for gtk+3.6.4 Did some testing for gtk+3 stable moduleset (3.6.4). glib file indeed can be applied without any editing (just ofest is ~160 lines). patch for gtk+-3.6.4 is attached. Everything works normally, as it should, at least I did not detected any issues yet (when using bluefish, obviously).
(In reply to comment #38) > FYI: Mitch has tested the patches with GIMP. It seems to work well for him, > except for a small issue with the Open Recent menu. I could not reproduce that > issue yet, but I have a stab at that later this week. Also on my Lion machine I have not been able to reproduce this problem. GIMP seems to behave well with the new patches applied.
OK, sounds good. How do you want to proceed with the glib patch?
Created attachment 249234 [details] [review] GdkEventLoop patch for Gtk3 master.
(In reply to comment #41) > OK, sounds good. How do you want to proceed with the glib patch? I just filed https://bugzilla.gnome.org/show_bug.cgi?id=704374 for this. Adding dependency.
(In reply to comment #43) > (In reply to comment #41) > > OK, sounds good. How do you want to proceed with the glib patch? > > I just filed https://bugzilla.gnome.org/show_bug.cgi?id=704374 for this. Adding > dependency. Quite a bit of push-back on that. ;-) I'd like to move the autorelease pool reset to gdk_event_prepare now so that we don't ship 3.10.0 with the NSEvent leaks. Any objections?
Yea, not as smooth as I had hoped ;) Should be fine to commit the move of the autorelease pool release to gdk_event_prepare() without the poll_func() modifications.
Comment on attachment 246896 [details] [review] Move autoreleasepool reset to gdk_event_prepare() (In reply to comment #45) > Yea, not as smooth as I had hoped ;) Should be fine to commit the move of the > autorelease pool release to gdk_event_prepare() without the poll_func() > modifications. Committed Kristian's version with longer comment and getting_events condition to gtk-2-24, gtk-3-8, and master.
Hello, It looks like the problem was fixed for windows that have been destroyed but it persists when the window is merely hidden. In this case, activating a previously hidden window from the list will put it on top but it will be non-responsive. An example of this behavior can be reproduced with gtk_show_about_dialog. Tom
Indeed, this behavior can be seen also with another widgets, like fontchooser, for example. The reason of this behavior is not related to bug discussed in this thread, though. There is practice of executing gtk_widget_hide() when pushing "Close" button instead of destroying it. Then, since widget is still present, it is shown in Window menu of gtk-osx. The workaround would be for gtk-osx to check whether the widget is hidden with gtk_widget_get_visible() and remove from Window list. But I am not sure if this possible. Maybe John can comment on this?
It's doable, but not the way you suggest. Quartz controls the contents of the windows menu and NSWindow offers setExcludedFromWindowsMenu: to control that. One approach might be to hook the map and unmap signals from gtkwidget, check if it's a toplevel and has an NSWindow, and call setExcludedFromWindowsMenu with the appropriate argument. Anyone see a better way to do it?
We're moving to gitlab! As part of this move, we are moving bugs to NEEDINFO if they haven't seen activity in more than a year. If this issue is still important to you and still relevant with GTK+ 3.22 or master, please reopen it and we will migrate it to gitlab.
As announced a while ago, we are migrating to gitlab, and bugs that haven't seen activity in the last year or so will be not be migrated, but closed out in bugzilla. If this bug is still relevant to you, you can open a new issue describing the symptoms and how to reproduce it with gtk 3.22.x or master in gitlab: https://gitlab.gnome.org/GNOME/gtk/issues/new