Bug 701145 – Serious performance degradation

After an evaluation, GNOME has moved from Bugzilla to GitLab. Learn more about GitLab.
No new issues can be reported in GNOME Bugzilla anymore.
To report an issue in a GNOME project, go to GNOME GitLab.
Do not go to GNOME Gitlab for: Bluefish, Doxygen, GnuCash, GStreamer, java-gnome, LDTP, NetworkManager, Tomboy.

Bug 701145 - Serious performance degradation


Summary:	Serious performance degradation


Status:	RESOLVED FIXED

Product:	at-spi
Classification:	Platform
Component:	at-spi2-core
Version:	unspecified
Hardware:	Other Linux

Importance:	Normal major
Target Milestone:	---
Assigned To:	At-spi maintainer(s)
QA Contact:	At-spi maintainer(s)

URL:
Whiteboard:

Depends on:
Blocks:

Reported:	2013-05-28 16:48 UTC by MDE
Modified:	2013-09-09 08:09 UTC

See Also:
GNOME target:	---
GNOME version:	---

Attachments
Plot of increasing time (18.45 KB, image/png) 2013-05-28 16:50 UTC, MDE	Details
Testcase for reproducing the bug (1.44 KB, text/x-python) 2013-05-28 16:50 UTC, MDE	Details
More complex testcase (1.34 KB, text/x-python) 2013-05-28 16:51 UTC, MDE	Details
Plot of more complex case (17.32 KB, image/png) 2013-05-28 16:52 UTC, MDE	Details
Gnuplot script (87 bytes, text/plain) 2013-05-28 16:52 UTC, MDE	Details
Plot of constant time after fix (24.22 KB, image/png) 2013-08-23 10:02 UTC, MDE	Details
Plot of constant time after fix (complex case) (30.08 KB, image/png) 2013-08-23 10:03 UTC, MDE	Details

Description MDE 2013-05-28 16:48:51 UTC

I've started reusing a semi-abandoned test system based on dogtail, but ran into problems. I could reproduce the problem with a minimal testcase, which is attached to this bugreport. I also attached a graph showing the time required for each widget lookup during the test. The testcase and the graph have a second version, which is basically the same, but with a bit more complex gtk-demo app.

The performance degradation is also a major problem in even more complex windows, but notice the testcases aborting after 2585 (or 467) iterations (dogtail doesn't find the widgets and bails out).

The versions and the graphs are from an Ubuntu Raring system, default install inside a VM (upgraded from Precise, all releases having the same behavior). Dogtail is newest from https://fedorahosted.org/dogtail/

(commit b78e4af091f14a710b596553237afa6df9cec530)

While running the testcase, the python interpreter will slowly start to eat the CPU, mostly in libatspi and libdbus calls.

To plot the result of the tests for yourself, run:
./testcase.py > out
gnuplot plot


Versions:
ii  at-spi2-core                             2.8.0-1                   amd64                     Assistive Technology Service Provider Interface (dbus core)
ii  gir1.2-atspi-2.0                         2.8.0-1                   amd64                     Assistive Technology Service Provider (GObject introspection)
ii  libatspi2.0-0:amd64                      2.8.0-1                   amd64                     Assistive Technology Service Provider Interface - shared library
ii  python-pyatspi                           2.7.91+dfsg-0ubuntu1      all                       Assistive Technology Service Provider Interface - Python bindings

ii  libdbus-1-3:amd64                         1.6.8-1ubuntu6                         amd64        simple interprocess messaging system (library)
ii  libdbus-glib-1-2:amd64                    0.100.2-1                              amd64        simple interprocess messaging system (GLib-based shared library)

ii  python2.7                                2.7.3-0ubuntu3.1                         Interactive high-level object-oriented language (version 2.7)
ii  gtk2.0-examples                          2.24.17-0ubuntu2          amd64                     example files for GTK+ 2.0

Comment 1 MDE 2013-05-28 16:50:03 UTC

Created attachment 245462 [details]
Plot of increasing time

Comment 2 MDE 2013-05-28 16:50:48 UTC

Created attachment 245463 [details]
Testcase for reproducing the bug

Comment 3 MDE 2013-05-28 16:51:34 UTC

Created attachment 245464 [details]
More complex testcase

Comment 4 MDE 2013-05-28 16:52:11 UTC

Created attachment 245465 [details]
Plot of more complex case

Comment 5 MDE 2013-05-28 16:52:50 UTC

Created attachment 245466 [details]
Gnuplot script

Comment 6 Alejandro Piñeiro Iglesias (IRC: infapi00) 2013-05-28 19:02:00 UTC

Just a question: this is something new? Started to happen, lets say, with 2.8.0, or probably was happening before but nobody realized that?

Just wondering if it makes sense to try to search when this started to happen, of if that would be useless as was happening since a long time.

Comment 7 MDE 2013-05-29 08:04:08 UTC

(In reply to comment #6)
> Just a question: this is something new? Started to happen, lets say, with
> 2.8.0, or probably was happening before but nobody realized that?

It's been the the same from Ubuntu Precise (python-pyatspi2: 2.4.0+dfsg-0ubuntu3, at-spi2-core: 2.4.2-0ubuntu0.1)

The last known working system was on Lucid (python-pyatspi: 1.30.1-0ubuntu1, at-spi: 1.30.1-0ubuntu1), but I don't have access to that now.

> Just wondering if it makes sense to try to search when this started to happen,
> of if that would be useless as was happening since a long time.

I'd say that's a long time, with too big changes between Lucid and Precise.

Comment 8 André Klapper 2013-08-14 10:05:30 UTC

[Mass-resetting default assignee, see bug 705890. Please reclaim this bug report by setting the assignee to yourself if you still plan to work on this. Thanks!]

Comment 9 Mike Gorse 2013-08-21 00:53:52 UTC

Thanks for the report.

Yeah, AT-SPI 1 was CORBA-based and thus needed to be largely reworked because it depended on ORBit and Bonobo which were both deprecated.

Your test case starts to spend a lot of time in check_for_reply_unlocked() in libdbus. I think that the problem has to do with AT-SPI calling dbus_connection_send_with_reply_and_block and the connection's messages never being dispatched. When an app with a main loop (Orca, for instance) runs, D-Bus messages are dispatched in the main loop, but dogtail, for instance, is not always running a main loop. dbus_connection_send_with_reply_and_block searches the connection's message queue, starting at the beginning, to try to find a message with a reply serial that matches the serial of the DBusPendingCall, to determine whether there is a reply to the method call on which it is blocking. Since the queue is never flushed, it grows longer and longer, so searching it takes more and more time.

When AT-SPI is running inside of a main loop, such as in Orca, I don't think that we want D-Bus messages to be delivered during method calls--we probably don't, say, want Orca's handler to get called for a keyboard event while it is responding to something unrelated. So I think that the best thing to do would be to behave differently when a main loop is not running--perhaps start one, or adding a call to dbus_connection_flush() would probably be sufficient, although this should not happen if we are being called from within a main loop.

Comment 10 Mike Gorse 2013-08-21 19:38:22 UTC

Fix pushed to at-spi2-core as commit ec424c.

Comment 11 MDE 2013-08-23 10:02:44 UTC

Created attachment 252826 [details]
Plot of constant time after fix

Thanks for the quick fix!

I've applied your patch to the ubuntu source package of at-spi2-core-2.8.0 and re-run the tests. The fix clearly works, as the time required for finding the widgets seems almost constant with minor variations caused only by other parts of the system.

Comment 12 MDE 2013-08-23 10:03:36 UTC

Created attachment 252827 [details]
Plot of constant time after fix (complex case)

Comment 13 Magdalen Berns (irc magpie) 2013-09-01 10:52:37 UTC

It would be useful to see the data that produced this graph and learn more about your method. Warning: I am going to sound pedantic but it is important. I still have to apologise in advance... But:

Time is not constant in this graph (or in the physical world) it is increasing at a constant rate and because of this, time needs to be plotted onto the x axis. The iterations are what you are measuring, therefore they should be plotted on the y axis.

Comment 14 MDE 2013-09-09 08:09:27 UTC

(In reply to comment #13)
> It would be useful to see the data that produced this graph and learn more
> about your method. Warning: I am going to sound pedantic but it is important. I

Every script is attached to the bug, and using them is documented in comment 0.
The only difference since then with the fix applied is the need to manually abort the testcase, as the final timeout would not be reached in a useful time period. This is, however, not even needed, as plotting can be done parallel, even "realtime" when python stdout buffering is turned off.

> still have to apologise in advance... But:
> 
> Time is not constant in this graph (or in the physical world) it is increasing
> at a constant rate and because of this, time needs to be plotted onto the x
> axis. The iterations are what you are measuring, therefore they should be
> plotted on the y axis.

I'm sorry to use misleading labels for the axes (I'm not a native English speaker). The x axis shows how many complete tests were run (nth iteration), and the y is actually the duration of each. It's true, that in this case the duration was related to the number of iterations needed to perform the test function inside the library, maybe this was confusing.