After an evaluation, GNOME has moved from Bugzilla to GitLab. Learn more about GitLab.
No new issues can be reported in GNOME Bugzilla anymore.
To report an issue in a GNOME project, go to GNOME GitLab.
Do not go to GNOME Gitlab for: Bluefish, Doxygen, GnuCash, GStreamer, java-gnome, LDTP, NetworkManager, Tomboy.
Bug 674352 - Getting statistics slow down VMs a lot
Getting statistics slow down VMs a lot
Status: RESOLVED FIXED
Product: gnome-boxes
Classification: Applications
Component: general
3.4.x (unsupported)
Other Linux
: Normal normal
: --
Assigned To: GNOME Boxes maintainer(s)
GNOME Boxes maintainer(s)
Depends on:
Blocks:
 
 
Reported: 2012-04-18 20:49 UTC by Christophe Fergeau
Modified: 2016-03-31 13:54 UTC
See Also:
GNOME target: ---
GNOME version: ---


Attachments
Asynchronously get information on libvirt domain (3.03 KB, patch)
2012-04-19 01:01 UTC, Zeeshan Ali
reviewed Details | Review
Asynchronously get information on libvirt domain (2.99 KB, patch)
2012-04-19 16:40 UTC, Zeeshan Ali
needs-work Details | Review
Asynchronously get information on libvirt domain (2.67 KB, patch)
2012-04-19 17:39 UTC, Zeeshan Ali
reviewed Details | Review
Asynchronously get information on libvirt domain (3.50 KB, patch)
2012-04-19 19:17 UTC, Zeeshan Ali
reviewed Details | Review
Asynchronously get information on libvirt domain (3.33 KB, patch)
2012-04-19 19:59 UTC, Zeeshan Ali
none Details | Review
Asynchronously get information on libvirt domain (3.87 KB, patch)
2012-04-19 23:44 UTC, Zeeshan Ali
committed Details | Review
Remove some more sync calls to GVir.Domain.get_info() (2.56 KB, patch)
2012-05-07 22:23 UTC, Zeeshan Ali
committed Details | Review

Description Christophe Fergeau 2012-04-18 20:49:32 UTC
When using a f17 livecd on a f17 system, interactions with the VM are not really nice: scrolling down a webpage is choppy, keys sometimes get stuck while typing, ... Behaviour is much nicer when connecting to exactly the same VM with virt-viewer (NB: to reproduce this, it's probably better to have a few VMs listed in boxes).
For testing, I added a return right a the beginning of LibvirtMachine::set_stats_enable to make sure the timeout callback gathering VM stats is never called, and boxes started to behave like virt-viewer (smoother animations, no keyboard keys getting stuck, ...).

Looking some more into it, the timeout added by set_stats_enable calls LibvirtMachine::update_stats which does var info = domain.get_info (); which results in a call to virDomainGetInfo. This function currently has some performance issues:
http://www.redhat.com/archives/libvir-list/2011-December/msg00321.html
(unsolved as of today I think given that the last mail in the thread is http://www.redhat.com/archives/libvir-list/2012-April/msg00306.html ) 

One way to avoid this issue would be to move the stat gathering in its own thread, and propagate the stat info through an idle. Given that getting stat is costly, it's probably better to do it less often, say every 5 seconds or so
Comment 1 Zeeshan Ali 2012-04-18 21:04:47 UTC
(In reply to comment #0)
> One way to avoid this issue would be to move the stat gathering in its own
> thread, and propagate the stat info through an idle. Given that getting stat is
> costly, it's probably better to do it less often, say every 5 seconds or so

Or add an async variant of GVirDomain.get_info() that makes use of g_simple_async_result_run_in_thread() like the other async methods in that class. Actually I think we should someday sit down and ensure we have an async wrapper for every libvirt call that could block or take a very long time.
Comment 2 Marc-Andre Lureau 2012-04-18 21:49:53 UTC
I agree with Zeeshan, but is libvirt multithread-aware? Can it have multiple outstanding calls from various threads? If not, all the operations will have to be queued. It's a shame libvirt is not async, it makes me drop tears so often.
Comment 3 Marc-Andre Lureau 2012-04-18 21:53:27 UTC
Christophe, the message you point to is perhaps not related. It would be worth to try to revert/remove the "save to XML" and verify if it changes anything.

I might never have noticed this issue because I use an SSD.. Even if it that, I am still surprise that it slows down Boxes, libvirt could learn to also save/sync in background and not delay its reply...
Comment 4 Zeeshan Ali 2012-04-18 23:37:23 UTC
(In reply to comment #2)
> I agree with Zeeshan, but is libvirt multithread-aware? Can it have multiple
> outstanding calls from various threads?

Good point! We should make sure not to just queue-up async calls to virDomainGetInfo() but wait (again asynchronously) for previous call to return first.

One issue that might be caused by this change is boxes getting more of these timeouts that I've seen when qemu is very busy. When that happens, connection to libvirt is lost and boxes need to be restarted.

> If not, all the operations will have to
> be queued. It's a shame libvirt is not async, it makes me drop tears so often.

That is true. Any API that does blocking I/O and is not async is fundamentally flawed IMO (this is why i sometimes dream of writing my own libvirt that sits on top of KVM and is completely async but then i hope libvirt can be changed..any ways back to planet earth).
Comment 5 Marc-Andre Lureau 2012-04-18 23:55:29 UTC
(In reply to comment #4)
> (In reply to comment #2)
> > I agree with Zeeshan, but is libvirt multithread-aware? Can it have multiple
> > outstanding calls from various threads?
> 
> Good point! We should make sure not to just queue-up async calls to
> virDomainGetInfo() but wait (again asynchronously) for previous call to return
> first.

http://wiki.libvirt.org/page/FAQ#Is_libvirt_thread_safe.3F

So that should be good enough.

> That is true. Any API that does blocking I/O and is not async is fundamentally
> flawed IMO (this is why i sometimes dream of writing my own libvirt that sits
> on top of KVM and is completely async but then i hope libvirt can be
> changed..any ways back to planet earth).

It might not be necessary then. Properly wrapped in libvirt-glib, it should be good. (even if libvirt is queuing it's own request internally, which I guess it does for some function, since qemu monitor operations are synchronous, afaik)
Comment 6 Zeeshan Ali 2012-04-19 00:14:40 UTC
Just FWY, A patch adding async variant of gvir_domain_get_info() sent to libvirt list..
Comment 7 Zeeshan Ali 2012-04-19 01:01:03 UTC
Created attachment 212327 [details] [review]
Asynchronously get information on libvirt domain

When a libvirt machine is selected, information is fetched for it every
second and the call to fetch the information can take from about 2.5s to
2 *minutes* 45s (currently thats the case due to a regression in libvirt).
Comment 8 Zeeshan Ali 2012-04-19 01:04:31 UTC
(In reply to comment #7)
> Created an attachment (id=212327) [details] [review]
> Asynchronously get information on libvirt domain

Here is a Boxes patch based on the libvirt-glib patch I mentioned in comment#6. It didn't break anything in my preliminary tests but I'm sure there is issues in it that hopefully Marc-Andre can spot instantly. :) There is something funny going on with the I/O stats but I see the same without this patch.
Comment 9 Marc-Andre Lureau 2012-04-19 02:32:51 UTC
Review of attachment 212327 [details] [review]:

::: src/libvirt-machine.vala
@@ +239,3 @@
+                update_stats.begin ();
+
+                return false;

I don't think we should have 2 timers. Throw away the first call here without timeout, and piggy back with a timer. the function itself shouldn't know it might be wrapped in a timer.
Comment 10 Christophe Fergeau 2012-04-19 10:17:31 UTC
As a datapoint (ie this is not a review nor an ACK), I tested these patches and they help
Comment 11 Daniel P. Berrange 2012-04-19 13:13:46 UTC
> I agree with Zeeshan, but is libvirt multithread-aware?

libvirt is fully multi-threaded - you can have multiple APIs calls in progress at once.


> When a libvirt machine is selected, information is fetched for it every
> second and the call to fetch the information can take from about 2.5s to
> 2 *minutes* 45s (currently thats the case due to a regression in libvirt)

Both of those times are outrageously slow. If operating correctly, we expect virDomainGetInfo() call to take 100 ms or less - it is intended to be very low overhead
Comment 12 Christophe Fergeau 2012-04-19 15:30:25 UTC
Dunno if other places using get_info should be audited too. LibvirtMachine.state is using it, and it's in turn used by is_running, take_screenshot, ... No idea whether it's worth looking at these too.
Comment 13 Zeeshan Ali 2012-04-19 16:40:19 UTC
Created attachment 212371 [details] [review]
Asynchronously get information on libvirt domain

OK, this one has only one timeout call. You'll notice that I don't remove the timeout but rather not do anything in the timeout callback once stats are disabled. This will mean an extra wake-up at the end but IMHO its worth it to keep the code simple.
Comment 14 Marc-Andre Lureau 2012-04-19 16:57:56 UTC
Review of attachment 212371 [details] [review]:

::: src/libvirt-machine.vala
@@ +224,3 @@
+        Timeout.add_seconds (1, () => {
+            if (stats_enabled)
+                update_stats.begin ();

I think it's cleaner if update_stats() doesn't embed the timer. It should be external, and enabled/disable should remove it, all from set_stats_enable()
Comment 15 Zeeshan Ali 2012-04-19 17:05:49 UTC
(In reply to comment #14)
> Review of attachment 212371 [details] [review]:
> 
> ::: src/libvirt-machine.vala
> @@ +224,3 @@
> +        Timeout.add_seconds (1, () => {
> +            if (stats_enabled)
> +                update_stats.begin ();
> 
> I think it's cleaner if update_stats() doesn't embed the timer. It should be
> external, and enabled/disable should remove it, all from set_stats_enable()

Now I'm really confused, I thought you asked me to remove the Timeout.add() call from set_stats_enable().
Comment 16 Marc-Andre Lureau 2012-04-19 17:23:40 UTC
(In reply to comment #15)

No, "the function itself shouldn't know it might be wrapped in a timer."
Comment 17 Zeeshan Ali 2012-04-19 17:39:26 UTC
Created attachment 212372 [details] [review]
Asynchronously get information on libvirt domain

How about this version?
Comment 18 Marc-Andre Lureau 2012-04-19 18:35:03 UTC
Review of attachment 212372 [details] [review]:

ack
Comment 19 Marc-Andre Lureau 2012-04-19 18:36:22 UTC
Review of attachment 212372 [details] [review]:

::: src/libvirt-machine.vala
@@ +231,2 @@
             stats_id = Timeout.add_seconds (1, () => {
+                update_stats.begin ();

although we should somehow prevent multiple outstading call. Just track this call with a local variable to see if it finished?
Comment 20 Zeeshan Ali 2012-04-19 19:17:37 UTC
Created attachment 212379 [details] [review]
Asynchronously get information on libvirt domain

Something like this?
Comment 21 Marc-Andre Lureau 2012-04-19 19:41:33 UTC
Review of attachment 212379 [details] [review]:

::: src/libvirt-machine.vala
@@ +236,3 @@
             stats_id = Timeout.add_seconds (1, () => {
+                if (stats_updating)
+                    return true; // Lets wait for previous async call to finish first

rather something local like:

var stats_updating = true;
update_stats.begin(() => {
  stats_updating = false;
})
Comment 22 Zeeshan Ali 2012-04-19 19:59:25 UTC
Created attachment 212384 [details] [review]
Asynchronously get information on libvirt domain

This uses local variable.
Comment 23 Marc-Andre Lureau 2012-04-19 20:08:48 UTC
Review of attachment 212384 [details] [review]:

::: src/libvirt-machine.vala
@@ +91,3 @@
     construct {
         stats = new MachineStat[STATS_SIZE];
+        stats_cancellable = new Cancellable ();

Hmm, I didn't notice it first, but I think for the same reason, it would be preferable that the cancellable would be contained in set_stats_enable(). This avoid having lot of property members.
Comment 24 Zeeshan Ali 2012-04-19 20:18:23 UTC
(In reply to comment #23)
> Review of attachment 212384 [details] [review]:
> 
> ::: src/libvirt-machine.vala
> @@ +91,3 @@
>      construct {
>          stats = new MachineStat[STATS_SIZE];
> +        stats_cancellable = new Cancellable ();
> 
> Hmm, I didn't notice it first, but I think for the same reason, it would be
> preferable that the cancellable would be contained in set_stats_enable(). This
> avoid having lot of property members.

The object is the outer context of set_stats_enable() and we need to keep the same cancellable across multiple calls to this method.
Comment 25 Marc-Andre Lureau 2012-04-19 22:12:51 UTC
(In reply to comment #24)

> The object is the outer context of set_stats_enable() and we need to keep the
> same cancellable across multiple calls to this method.

Correct. Can you group stats_id with the rest of the members?

Can you change the comment you left to warning, for the case where update_stats() is too slow, like:

if (stats_updating) {
  warning ("update_stats() is too slow, waiting for last call to finish...");
  return true;
}

With that 2 changes, ack.
Comment 26 Marc-Andre Lureau 2012-04-19 22:14:54 UTC
also stats_id would probably be better named stats_update_timeout, feel free to rename it.
Comment 27 Zeeshan Ali 2012-04-19 23:03:11 UTC
(In reply to comment #25)
> (In reply to comment #24)
> 
> > The object is the outer context of set_stats_enable() and we need to keep the
> > same cancellable across multiple calls to this method.
> 
> Correct. Can you group stats_id with the rest of the members?

Sure!

> Can you change the comment you left to warning, for the case where
> update_stats() is too slow, like:
> 
> if (stats_updating) {
>   warning ("update_stats() is too slow, waiting for last call to finish...");

Are you sure? that'll spam the console a lot I'm affraid and doesn't tell anything to users other than devs. How about a debug?
Comment 28 Zeeshan Ali 2012-04-19 23:44:57 UTC
Created attachment 212397 [details] [review]
Asynchronously get information on libvirt domain
Comment 29 Marc-Andre Lureau 2012-04-20 00:11:43 UTC
Review of attachment 212397 [details] [review]:

::: src/libvirt-machine.vala
@@ +232,3 @@
+            stats_update_timeout = Timeout.add_seconds (1, () => {
+                if (stats_updating) {
+                    warning ("Fetching of stats for '%s' is taking too long. Probably a libvirt bug." + name);

/+ name/, name
Comment 30 Zeeshan Ali 2012-04-20 00:23:19 UTC
Attachment 212397 [details] pushed as f68fe17 - Asynchronously get information on libvirt domain
Comment 31 Zeeshan Ali 2012-05-07 22:23:45 UTC
Created attachment 213625 [details] [review]
Remove some more sync calls to GVir.Domain.get_info()

This is a blocking call so remove calls where redundant or make it
async in hot (enough) paths.
Comment 32 Zeeshan Ali 2012-05-07 22:28:31 UTC
(In reply to comment #31)
> Created an attachment (id=213625) [details] [review]
> Remove some more sync calls to GVir.Domain.get_info()
> 
> This is a blocking call so remove calls where redundant or make it
> async in hot (enough) paths.

Not very sure about this since I'd rather like to listen to libvirt's state change events and keep the last known state in a prop/field but I wouldn't recommend doing that until this bug is fixed: 

https://bugzilla.redhat.com/show_bug.cgi?id=819617

I was actually cooking a patch that uses events but because of this issue, I totally failed to make it work. :(
Comment 33 Marc-Andre Lureau 2012-05-07 22:34:41 UTC
(In reply to comment #32)
> https://bugzilla.redhat.com/show_bug.cgi?id=819617
> 
> I was actually cooking a patch that uses events but because of this issue, I
> totally failed to make it work. :(

 ack
Comment 34 Marc-Andre Lureau 2012-05-07 22:35:30 UTC
Review of attachment 213625 [details] [review]:

ack
Comment 35 Zeeshan Ali 2012-05-08 03:04:29 UTC
Attachment 213625 [details] pushed as 358c2e9 - Remove some more sync calls to GVir.Domain.get_info()