GNOME Bugzilla – Bug 585626
Setting widget tooltip hammers X11 server on any TCP/IP X11 connection
Last modified: 2009-06-22 14:12:02 UTC
Please describe the problem: The gtkhtml editor control takes a very long time (at least 16 seconds) to appear when it's being displayed over any TCP/IP X11 connection. This happens even on a loopback interface connection to the local X server, but does not happen on a UNIX socket. I found this bug while tracking down an issue with Evolution's compose/reply window taking a long time to appear on my LTSP thin clients (see bug 585624, and referenced Launchpad bug). I've just found that it occurs with gtkhtml-editor-test from current gtkhtml svn (r9202, 2009-04-10 18:17:33 +0800). Steps to reproduce: 1. Enable TCP/IP on your X server by editing gdm.conf and setting DisableTCP=false, then restarting X 2. run "DISPLAY=:0.0 /opt/gnome2/bin/gtkhtml-editor-test" . Note that the window appears promptly. 3. run "DISPLAY=127.0.0.1:0.0 /opt/gnome2/bin/gtkhtml-editor-test". Note that the window takes a LONG time to appear, but then behaves normally. Actual results: The first time, on a UNIX domain socket, the window appears promptly. On the second it takes a long time to appear. Expected results: The window should appear very promptly both times. Does this happen every time? Yes, on several different hosts and with at least two versions of gtkhtml (svn trunk head, and 2.11.1-2ubuntu1). Other information: Makes Evolution unusable on remote X11 connections such as when used on thin clients.
Stack trace attached. I launched a Xephyr session: /usr/bin/Xephyr :11 Then I started gtkhtml-editor-test from a terminal in my main X server as: DISPLAY=127.0.0.1:11 gdb --args /opt/gnome2/bin/gtkhtml-editor-test I set up logging and "start"ed the app. Every few seconds I interrupted execution with control-C, ran "thread apply all bt" to get stack traces of all threads, and "cont"inued execution. The window appeared shortly after the final backtrace was taken.
Created attachment 136486 [details] Backtraces taken during long startup pause on tcp/ip connection
Created attachment 136487 [details] Connection and startup of gtkhtml as captured by tcpdump on loopback iface Note that tcp checksums are incorrect in this trace because the kernel implements TSO on the loopback interface, as shown by: sudo ethtool --show-offload lo | grep 'tcp segmentation offload' When opening the packet trace in Wireshark, ignore the errors. Alternately, you can disable the option "Check the validity of the TCP checksum when possible" in the TCP dissector preferences if desired, so Wireshark won't check them anymore. Once you've turned off checksum validation, tell Wireshark to analyze the traffic as X11 by right-clicking on the first packet, selecting "Decode as...", and in the dialog that appears selecting "X11" from the list. Select the second packet and do the same thing to see the replies decoded the same way. You can now see what gtkhtml is chatting to the server about. If you scroll to any random place in the trace you'll now be able to see the same endlessly repeating sequence of X11 operations: Client -> Server: Request: GrabServer, QueryPointer Server -> Client: Reply: QueryPointer Client -> Server: Request: UngrabServer
One direct culprit is the GtkHTML color combo widget. It looks like color_combo_new_swatch_button(...) calls gtk_widget_set_tooltip_text(...) which for some bizarre reason grabs the server. If I comment out the call to gtk_widget_set_tooltip_text in components/editor/gtkhtml-color-combo.c, startup falls to about 5 seconds - we save *TEN* *SECONDS* off the widget startup time.
Created attachment 136488 [details] [review] This patch dramatically reduces startup time
Note that the attached patch is NOT the right fix. It's a test, and confirms where the problem is.
The remaining calls to gtk_widget_set_tooltip_text(...) seem to be causing the rest of the delay. When all the direct calls are commented out it's still a couple of full seconds slower over TCP than over a UNIX socket, but a couple of seconds is a lot better than 15. Breaking into startup at random points shows that the remaining delay is almost entirely other calls to gtk_widget_set_tooltip_text(...) within glade, via GObject properties, or the like.
Great analysis! I'm going to move this over to GTK+ to see if they have any ideas on this. Also adjusting summary.
There's a couple of avenues for improvement here: - We should probably not call gtk_widget_trigger_tooltip_query when setting the tooltip on a widget that is not visible - It is worth investigating if we can do better than just calling gtk_tooltip_trigger_tooltip_query. We are throwing away some information here, since we have a concrete widget (and thus window) where we want to trigger tooltip changes, and gtk_tooltip_trigger_tooltip query then goes and recomputes a window from the pointer position... - Finally, the X implementation of _gdk_windowing_window_at_pointer looks like it could really do with an XFixes request to make it less expensive.
Some comments from Owen: - Can also move the trigger tooltip query to an idle - Client-side windows may allow a 0-cost implementation of _gdk_windowing_window_at_pointer since it already needs to keep track of pointer windows, presumably.
I've committed some of these ideas now. Let me know if it doesn't help. commit 0f00d3fdb084eac236072361df19e030d390ea9b Author: Matthias Clasen <mclasen@redhat.com> Date: Sat Jun 20 13:54:33 2009 -0400 Reduce roundtrips Setting a tooltip on a widget unfortunately triggers several roundtrips to the X server. We reduce this overhead by only doing it if the widget is visible, and by deferring to an idle. See bug 585626.
I've reverted to stock libgtkhtml, built gtk master/HEAD, and tested with that. There's a huge improvement, to the point where I'm not sure opening the compose window is even any slower over TCP/IP than over a UNIX socket now. I'll grab that diff and see if I can adapt it to 2.16.1 and rebuild the ubuntu gtk package with it so the users of the thin client systems I run can benefit. I really appreciate your taking a look at this. It looks like a huge performance win, and maybe not just in the particular area I first noticed it in.
Created attachment 137150 [details] [review] gtk+ git patch 0f00d3fdb084eac236072361df19e030d390ea9b, see comments For Ubuntu/Debian users who may run into this and who might want to fix it with minimal impact to their systems, here's how to rebuild your gtk+ package to include the patch: mkdir $HOME/gtk-build cd $HOME/gtk-build apt-get source libgtk2.0-0 sudo apt-get build-dep libgtk2.0-0 sudo apt-get install fakeroot cd gtk+2.0-* patch -p1 < /path/to/mclasen-0f00d3fdb084eac236072361df19e030d390ea9b.diff fakeroot debian/rules binary Now install the gtk+ debs created in $HOME/gtk-build .
BTW, does it still seem like it's worth investigating an XFixes addition for finding the window at the pointer?
Better rebuild instructions: mkdir $HOME/gtk-build cd $HOME/gtk-build apt-get source libgtk2.0-0 sudo apt-get build-dep libgtk2.0-0 sudo apt-get install fakeroot devscripts cd gtk+2.0-* wget -O - http://bugzilla.gnome.org/attachment.cgi?id=137150 | patch -p1 debuild -i -j2 -tc Once the packages have built and you've' installed them, you might want to pin them in place in /etc/apt/preferences. I've written a little script to generate pinning entries. Run this in the directory the .deb files were generated to pin them all to your custom versions. for f in *.deb; do dpkg-deb -e $f sed -e '/^Package: / p' \ -e '/^Version: / s/Version: \(.*\)$/Pin: version \1 origin=""/ p' \ -e '/^Pin: / aPin-Priority: 1001' \ -e 'D' \ DEBIAN/control echo done | sudo tee -a /etc/apt/preferences Just remember to delete the pinning entries when the upstream packages are updated or when you want to upgrade! You can use "apt-cache policy packagename" to find out why it's held back.