GNOME Bugzilla – Bug 683473
Wait for box saving completion before exiting
Last modified: 2016-03-31 14:03:08 UTC
The way we save running boxes when exiting Boxes is racy as we start an async save of the VMs and then we complete the exit process without making sure the various saves have completed. It has happened a few times to be that after exiting Boxes with running VMs, some of them are still up instead of being saved after I restart Boxes.
Created attachment 223617 [details] [review] Wait for box saving completion before exiting
Review of attachment 223617 [details] [review]: what happens when you start gnome-boxes again while it is saving VMs? does it reopen the window/application?
Review of attachment 223617 [details] [review]: Hmm, segfault, I guess that's not the answer you want ;)
Maybe we should let the GApplication go and hold on to the process some other way, rather than deal with the complexities of reviving the GApp.
Created attachment 223769 [details] [review] Wait until all machines are suspended We need to wait until all suspend() calls are finished to be sure that the call went through. We shouldn't hold the app though, so it needs to be done in a seperate loop when the program exits.
Review of attachment 223769 [details] [review]: Ah great, thanks for taking over this patch, looks good! ::: src/app.vala @@ +566,3 @@ + public async void suspend_machines () { + var waiting_counter = 0; Can you add a debug ("suspending all machines"); message? @@ +578,3 @@ + while (waiting_counter > 0) { + yield; Are we 100% sure yield; will only wait for the async methods started by this method, ie the .suspend.begin () calls? ::: src/main.vala @@ +138,3 @@ + return false; + }); + loop.run (); maybe it would be slightly better to hook that clean-up on the Application::shutdown signal which seems to be meant for that purpose?
Review of attachment 223769 [details] [review]: ::: src/app.vala @@ +566,3 @@ + public async void suspend_machines () { + var waiting_counter = 0; sure @@ +578,3 @@ + while (waiting_counter > 0) { + yield; there is only one place calling suspend_machines.callback, so yes. ::: src/main.vala @@ +138,3 @@ + return false; + }); + loop.run (); after app.run () returns, the application is already ::shutdown, and the patch doesn't attempt to recycle it, since that would lead to the problem we had with the first approach.
Review of attachment 223769 [details] [review]: ok to commit with the debug() ?
Review of attachment 223769 [details] [review]: ::: src/app.vala @@ +578,3 @@ + while (waiting_counter > 0) { + yield; But there's nothing linking this yield; call with the machine.suspend.begin(); call, is there? This is probably just me being confused by vala ;) ::: src/main.vala @@ +138,3 @@ + return false; + }); + loop.run (); What I mean is to have app.shutdown.connect (() => { // your code doing the suspend }); return app.run (); The way it is done means we'll create a mainloop and run suspend_machines in secondary instances I think, which is useless, and could even cause bugs when we do more cleanup at shutdown time.
Review of attachment 223769 [details] [review]: ::: src/app.vala @@ +578,3 @@ + while (waiting_counter > 0) { + yield; the suspend_machines.callback () is the link, resuming where the function suspend_machines left with the yield; ::: src/main.vala @@ +138,3 @@ + return false; + }); + loop.run (); The ::shutdown signal is emitted only on the registered primary instance immediately after the main loop terminates. We need the loop to keep running, but the application to not hold. The simplest way is to create our own loop for the final cleanup. I don't see why it would cause bugs, even when we do more cleanup, either at App.run() time or after.
Review of attachment 223769 [details] [review]: ::: src/app.vala @@ +578,3 @@ + while (waiting_counter > 0) { + yield; Ah right, thanks, now I'm understanding your first comment, thanks for explaining again :) ::: src/main.vala @@ +138,3 @@ + return false; + }); + loop.run (); Oh my suggestion is not about trying to reuse the App mainloop or anything like that, I'm fine with creating our own mainloop. It just feels slightly cleaner to do cleanup that is specific to the primary instance only in the primary instance, ie from 'shutdown'. What could cause bugs is secondary instances trying to access members from App that have not been initialized. But if you don't think that is worth it, it's not really important.
Created attachment 224033 [details] [review] proposed changes They go on top of the existing patch, and implement my suggestions. Feel free to squash in, or to drop ;)
Review of attachment 224033 [details] [review]: ::: src/app.vala @@ +162,3 @@ + }); + loop.run (); + }); I don't think this is right. Shutdown is run immediately after the primary mainloop in GApplication is exited, while dbus deregistration happens in GApplication finalizer. So, this mainloop will serve the main context and thus the GApplication, meaning that late arriving dbus requests still get handled. In fact, this means that the original patch is not quite right either. Can we finalize the app before suspending the machines, or does the suspend code reference App.app? Maybe we can use a different main context in the final loop in order to avoid serving the dbus calls for the GApplication? Does that work with e.g. the async libvirt calls during shutdown?
Attachment 223769 [details] pushed as 524265e - Wait until all machines are suspended
Arg, missed this review and pushed the patch already
I've reverted the patch for now.
I've tried to fix this patch, but I haven't had much success so far. (In reply to comment #13) > Can we finalize the app before suspending the machines, or does the suspend > code reference App.app? There is no need for anything from App.app except for GenericArray<Boxes.CollectionItem> machines = app.collection.items; which we can get a reference on before finalizing the app. However, I didn't manage to finalize the app cleanly as various piece of code seems to still have a ref on it even after it shutdown (even after cleaning the App.app ref, and the one held by main()). After recreating the mainloop, unwanted callbacks keep running... > Maybe we can use a different main context in the final loop in order to avoid > serving the dbus calls for the GApplication? Does that work with e.g. the async > libvirt calls during shutdown? The async libvirt calls are using a regular g_simple_async_result_run_in_thread ( http://libvirt.org/git/?p=libvirt-glib.git;a=blob;f=libvirt-gobject/libvirt-gobject-domain.c;h=bcfad2a62e856a1bf5374824461522f0ef59e77f;hb=HEAD#l1183 ). I don't know if the libvirt/glib mainloop integration code is used there, hopefully not.. ( http://libvirt.org/git/?p=libvirt-glib.git;a=blob;f=libvirt-glib/libvirt-glib-event.c;h=2a9ee23521dcd3f9c04b70926f42306e8e5bdce5;hb=HEAD )
Created attachment 224315 [details] [review] Add Machine::suspend_at_exit This indicates whether we must try to suspend the box when exiting Boxes or not. For now this is only set for libvirt machines using the default box connection.
Created attachment 224316 [details] [review] Wait until all machines are suspended We need to wait until all suspend() calls are finished to be sure that the call went through. We shouldn't hold the app though, so it needs to be done in a seperate loop when the program exits. Based on a patch from Marc-André Lureau <marcandre.lureau@gmail.com>
Review of attachment 224316 [details] [review]: ::: src/app.vala @@ +580,3 @@ + + var idle = new IdleSource (); + idle.set_callback (() => { I don't understand why this has to be in an idle? It should be able to be called normally, as long as its is inside the push_thread_default(), as then the async suspend calls will pick up this maincontext.
Review of attachment 224316 [details] [review]: ::: src/app.vala @@ +580,3 @@ + + var idle = new IdleSource (); + idle.set_callback (() => { No idea either, this was initially this way so I kept it, but it's probably unneeded now, I'll try without.
Created attachment 224318 [details] [review] Wait until all machines are suspended We need to wait until all suspend() calls are finished to be sure that the call went through. We shouldn't hold the app though, so it needs to be done in a seperate loop when the program exits. Based on a patch from Marc-André Lureau <marcandre.lureau@gmail.com>
Looks good to me.
Created attachment 224332 [details] [review] Wait until all machines are suspended We need to wait until all suspend() calls are finished to be sure that the call went through. We shouldn't hold the app though, so it needs to be done in a seperate loop when the program exits. Based on a patch from Marc-André Lureau <marcandre.lureau@gmail.com>
Attachment 224315 [details] pushed as 55c03cb - Add Machine::suspend_at_exit Attachment 224332 [details] pushed as 0c46399 - Wait until all machines are suspended
*** Bug 674826 has been marked as a duplicate of this bug. ***