GNOME Bugzilla – Bug 701145
Serious performance degradation
Last modified: 2013-09-09 08:09:27 UTC
I've started reusing a semi-abandoned test system based on dogtail, but ran into problems. I could reproduce the problem with a minimal testcase, which is attached to this bugreport. I also attached a graph showing the time required for each widget lookup during the test. The testcase and the graph have a second version, which is basically the same, but with a bit more complex gtk-demo app. The performance degradation is also a major problem in even more complex windows, but notice the testcases aborting after 2585 (or 467) iterations (dogtail doesn't find the widgets and bails out). The versions and the graphs are from an Ubuntu Raring system, default install inside a VM (upgraded from Precise, all releases having the same behavior). Dogtail is newest from https://fedorahosted.org/dogtail/ (commit b78e4af091f14a710b596553237afa6df9cec530) While running the testcase, the python interpreter will slowly start to eat the CPU, mostly in libatspi and libdbus calls. To plot the result of the tests for yourself, run: ./testcase.py > out gnuplot plot Versions: ii at-spi2-core 2.8.0-1 amd64 Assistive Technology Service Provider Interface (dbus core) ii gir1.2-atspi-2.0 2.8.0-1 amd64 Assistive Technology Service Provider (GObject introspection) ii libatspi2.0-0:amd64 2.8.0-1 amd64 Assistive Technology Service Provider Interface - shared library ii python-pyatspi 2.7.91+dfsg-0ubuntu1 all Assistive Technology Service Provider Interface - Python bindings ii libdbus-1-3:amd64 1.6.8-1ubuntu6 amd64 simple interprocess messaging system (library) ii libdbus-glib-1-2:amd64 0.100.2-1 amd64 simple interprocess messaging system (GLib-based shared library) ii python2.7 2.7.3-0ubuntu3.1 Interactive high-level object-oriented language (version 2.7) ii gtk2.0-examples 2.24.17-0ubuntu2 amd64 example files for GTK+ 2.0
Created attachment 245462 [details] Plot of increasing time
Created attachment 245463 [details] Testcase for reproducing the bug
Created attachment 245464 [details] More complex testcase
Created attachment 245465 [details] Plot of more complex case
Created attachment 245466 [details] Gnuplot script
Just a question: this is something new? Started to happen, lets say, with 2.8.0, or probably was happening before but nobody realized that? Just wondering if it makes sense to try to search when this started to happen, of if that would be useless as was happening since a long time.
(In reply to comment #6) > Just a question: this is something new? Started to happen, lets say, with > 2.8.0, or probably was happening before but nobody realized that? It's been the the same from Ubuntu Precise (python-pyatspi2: 2.4.0+dfsg-0ubuntu3, at-spi2-core: 2.4.2-0ubuntu0.1) The last known working system was on Lucid (python-pyatspi: 1.30.1-0ubuntu1, at-spi: 1.30.1-0ubuntu1), but I don't have access to that now. > Just wondering if it makes sense to try to search when this started to happen, > of if that would be useless as was happening since a long time. I'd say that's a long time, with too big changes between Lucid and Precise.
[Mass-resetting default assignee, see bug 705890. Please reclaim this bug report by setting the assignee to yourself if you still plan to work on this. Thanks!]
Thanks for the report. Yeah, AT-SPI 1 was CORBA-based and thus needed to be largely reworked because it depended on ORBit and Bonobo which were both deprecated. Your test case starts to spend a lot of time in check_for_reply_unlocked() in libdbus. I think that the problem has to do with AT-SPI calling dbus_connection_send_with_reply_and_block and the connection's messages never being dispatched. When an app with a main loop (Orca, for instance) runs, D-Bus messages are dispatched in the main loop, but dogtail, for instance, is not always running a main loop. dbus_connection_send_with_reply_and_block searches the connection's message queue, starting at the beginning, to try to find a message with a reply serial that matches the serial of the DBusPendingCall, to determine whether there is a reply to the method call on which it is blocking. Since the queue is never flushed, it grows longer and longer, so searching it takes more and more time. When AT-SPI is running inside of a main loop, such as in Orca, I don't think that we want D-Bus messages to be delivered during method calls--we probably don't, say, want Orca's handler to get called for a keyboard event while it is responding to something unrelated. So I think that the best thing to do would be to behave differently when a main loop is not running--perhaps start one, or adding a call to dbus_connection_flush() would probably be sufficient, although this should not happen if we are being called from within a main loop.
Fix pushed to at-spi2-core as commit ec424c.
Created attachment 252826 [details] Plot of constant time after fix Thanks for the quick fix! I've applied your patch to the ubuntu source package of at-spi2-core-2.8.0 and re-run the tests. The fix clearly works, as the time required for finding the widgets seems almost constant with minor variations caused only by other parts of the system.
Created attachment 252827 [details] Plot of constant time after fix (complex case)
It would be useful to see the data that produced this graph and learn more about your method. Warning: I am going to sound pedantic but it is important. I still have to apologise in advance... But: Time is not constant in this graph (or in the physical world) it is increasing at a constant rate and because of this, time needs to be plotted onto the x axis. The iterations are what you are measuring, therefore they should be plotted on the y axis.
(In reply to comment #13) > It would be useful to see the data that produced this graph and learn more > about your method. Warning: I am going to sound pedantic but it is important. I Every script is attached to the bug, and using them is documented in comment 0. The only difference since then with the fix applied is the need to manually abort the testcase, as the final timeout would not be reached in a useful time period. This is, however, not even needed, as plotting can be done parallel, even "realtime" when python stdout buffering is turned off. > still have to apologise in advance... But: > > Time is not constant in this graph (or in the physical world) it is increasing > at a constant rate and because of this, time needs to be plotted onto the x > axis. The iterations are what you are measuring, therefore they should be > plotted on the y axis. I'm sorry to use misleading labels for the axes (I'm not a native English speaker). The x axis shows how many complete tests were run (nth iteration), and the y is actually the duration of each. It's true, that in this case the duration was related to the number of iterations needed to perform the test function inside the library, maybe this was confusing.