GNOME Bugzilla – Bug 143668
performance problems with GtkUIManager
Last modified: 2011-02-04 16:17:18 UTC
Gimp 2.1.0 takes considerably longer to create new images than Gimp 2.0, regardless of the method: New, Open, or Duplicate. The delay is unpleasant even on a pretty fast machine. Investigation shows that the slowness comes mainly from two places. One, strangely enough, is the call to gtk_ui_manager_get_widget() in the code for gimp_ui_manager_ui_get() in gimpuimanager.c. The other, which is worse, is from plug_in_menus_setup(), called via entry->setup_func from gimp_ui_manager_ui_get().
I asked Mitch about this, and apparrently the biggest slowdown is in the parsing of the image menu XML file. This same file is parsed for every display, and there is no way to do this differently with the current design. All of these functions delve into that file, I think. I can confirm that on my machine here (a 350 MHz K6-2) the delay is very large, in the region of 9 seconds between the plug-in handing back control and the display coming up. Cheers, Dave.
Strange, the places you mention seem to indicate that it's *not* XML parsing which is slow (which was my first assumption). Instead, it looks like the actual creating of the GUI (by GtkUIManager) is *really* slow. How did you measure the slowness?
I "measured" by inserting printf's into the code and watching how long it took to get from one to another. It's crude, but the results are pretty obvious and consistent. A little more detail: the longest delay is in executing this code from plug_in_menus_setup: g_tree_foreach (menu_entries, (GTraverseFunc) plug_in_menus_tree_traverse, manager);
Sorry, but why has this report Priority set to "High" and Severity "major"? Only crashes are major bugs and the is certainly not a high priority for optimisations since we are in the middle of a development cycle.
I've done a trivial optimization but it doesn't address the major bottleneck in this code so the speedup is hardly noticeable. 2004-06-04 Sven Neumann <sven@gimp.org> * app/menus/plug-in-menus.c (plug_in_menus_setup): populate the tree with collation keys and use strcmp() instead of g_utf8_collate() as the tree's sort function.
Re comment #4: I did not know that only crashes are "major". I set the priority to "normal" when I filed it; presumably Dave raised it to "high". IMHO this is a rather serious issue, in the sense that if 2.2 is released without improving this, many users will feel that Gimp has gotten worse rather than better regardless of what other improvements are made.
Luckily thanks to SuperBugzilla and its ability to say who did what when we click on "View bug activity", we can confirm that it was mitch who raised the priority when he confirmed the bug. Just to get me off the hook :) Cheers, Dave.
Since we will most probably have to reassign this report to GTK+, it's up to the GTK+ developers to decide about the severity and priority of this issue. But first of all we need some serious profiling data.
Okay, I'm rebuilding 2.1.0 with --enable-profile, and assuming everything works I'll go ahead and generate some data.
I was unable to get useful information from gprof, possibly because gtk+ is not compiled for profiling. However I inserted a number of lines of the form fprintf (stderr, "Enter XXX: %lg\n", clock () / (gdouble)CLOCKS_PER_SEC); and was able to get CPU times that way. Here is what happens when I duplicate a 256x256 white image: Enter image_duplicate_cmd_callback: 3.66 Enter gimp_image_duplicate: 3.66 Leave gimp_image_duplicate: 3.67 Enter gimp_create_display: 3.67 Enter gui_create_display: 3.67 Enter gimp_display_new: 3.67 loading menu: /usr/local/gimp-2.1/share/gimp/2.1/menus/image-menu.xml for /image-menubar Enter gtk_ui_manager_get_widget: 3.72 Leave gtk_ui_manager_get_widget: 4.07 Enter image_menu_setup: 4.07 Enter g_tree_foreach: 4.09 Leave g_tree_foreach: 4.68 Leave image_menu_setup: 4.68 Leave gimp_display_new: 4.71 Leave gui_create_display: 4.73 Leave gimp_create_display: 4.73 Leave image_duplicate_cmd_callback: 4.73 If I comment out the call to gtk_ui_manager_add_ui in plug_in_menus_add_proc, then the result instead looks like: Enter image_duplicate_cmd_callback: 2.42 Enter gimp_image_duplicate: 2.42 Leave gimp_image_duplicate: 2.43 Enter gimp_create_display: 2.43 Enter gui_create_display: 2.43 Enter gimp_display_new: 2.43 loading menu: /usr/local/gimp-2.1/share/gimp/2.1/menus/image-menu.xml for /image-menubar Enter gtk_ui_manager_get_widget: 2.48 Leave gtk_ui_manager_get_widget: 2.79 Enter image_menu_setup: 2.79 Enter g_tree_foreach: 2.8 Leave g_tree_foreach: 2.85 Leave image_menu_setup: 2.85 Leave gimp_display_new: 2.87 Leave gui_create_display: 2.89 Leave gimp_create_display: 2.89 Leave image_duplicate_cmd_callback: 2.89 The conclusion is that the bulk of the delay is attributable to two functions, gtk_ui_manager_get_widget and gtk_ui_manager_add_ui. Note: all this testing has been done in vanilla Fedora Core 2.
These numbers seem to give enough evidence that the performance problems are in GtkUIManager so I will reassign the bug report accordingly. Of course profiling data for GTK+ would be very helpful and we should assist the GTK+ developers with this.
*** Bug 143841 has been marked as a duplicate of this bug. ***
Created attachment 29051 [details] profiling log (brief) after recompiling gtk+ with CFLAGS=-pg, and recompiling GIMP with --enable-profile, I opened Gimp, created a new image with default 256x256 size, and quitted. i ran gprof with the -p -q options, and cut off the info for functions with < 5% time. it looks like a lot of time is spent in gtk_hwrap_box_size_request (this accounts for all the get_child_requisition and get_layout_size calls.) i'm not sure how to interpret the results beyond that.
These results don't seem to be usable at all. You will have to create a number of image window in order to get reasonable profiling data.
Created attachment 29069 [details] updated profiling log this is even more strange, it makes very little sense to me.
oops.. those results are from creating 10 256x256 images. no swapping occurred (this machine has 576mb of memory).
This doesn't look like you linked in gtk+ and glib statically. If you want to get the profiling data we need here you will have compile glib and gtk+ with -pg and link them statically into the gimp-2.1 executable.
I don't know how. I've rebuilt and reinstalled glib and gtk+ with --enable-static, and rebuilt and reinstalled gimp with --enable-static, but it is obviously still using dynamic linking, and i don't know how to tell it otherwise.
If you specify --enable-static, all you ask for is to have .a files built that you can use for static linking. You will have to link gimp manually. Basically just a matter of using the generated linker command, changing-lglib to <path>/libglib.a and the like.
*** Bug 153071 has been marked as a duplicate of this bug. ***
I found the following remark from Soeren Sandmann on this bug. I hope he doesn't mind that I add it here since I think it might be useful for people trying to figure out what exactly is causing the bad performance: A lot of time is spend in handler_list_insert(), which is linear in the number of handlers. What's going on here is that a particular object, a GtkSettings object, is getting 'popular' in that it has a lot of signal handlers connected to it.
Eeek! But at least it's trivial to make it faster by using a better data structure.
Nathan, would you mind to explain which data structure you think needs to be improved and why this is a trivial problem? Actually I was under the impression that it would be that simple before I found Soeren's remark. His remark makes me think that the problem is not as trivial to fix as I hoped it would be.
Well, the problem is in handler_insert() where a new connection is inserted into a list sorted by when the handlers should run (ie. first "!after" handlers, then "after" handlers). What happens in this case is that a lot of !after handlers are inserted, and searching for the end of the !after handlers is what is taking the time. Just caching the point where we go from !after to after handlers should be enough to fix this. Not totally trivial to do, but also not rocket science. Note that this would most likely not be a complete fix. handler_insert() only accounts for 11% of the time. It's worth doing, but likely more fixes are needed.
Sounds to me like what you need is two lists, one for !after and one for after handlers. You would also store a pointer to the end of the lists. That way you have O(1) inserts and still have O(n) traversal.
Created attachment 31918 [details] [review] patch to optimize g_signal_connect() This patch introduces a pointer to the last !after signal in the list of signal handlers. It makes g_signal_connect() for !after signals O(1) and also speeds up connecting after signals because the list traversal can start after the !after signals. The patch does however introduce some overhead to handler_unref_R() so it should probably not be applied until this part has been improved.
Søren, since it seems you've done some profiling. Could you tell us what other places would be worth to look at for possible optimizations? Thanks.
The time is spent inside update_node() in gtkuimanager.c. It looks like each call to gtk_ui_manager_get_widget() through gtk_ui_manager_ensure(), do_updates() and update_node() traverses the entire UI tree searching for dirty nodes. One way to fix this might be to add an extra guint has_dirty_descendant : 1; to the Node structure, and then when a node is marked dirty, also set has_dirty_descendant on all its ancestors. That way update_node() can be fast in the common case where no nodes are dirty. Disclaimer: I am shooting from the hip; the above could be completely wrong.
Actually it looks like something is making some of the nodes dirty, because half the time inside update_node() is spent creating and inserting menu items into menus. That is on a test where I created several new images in gimp 2.1. So that's another thing worth figuring out: what causes the UI to become dirty between two New Image?
There is a UI manager created for each image's menubar.
After speaking to Tim Janik on #gimp, I have opened a glib bug-report for the performance problems in g_signal_connect_data(). Further discussion on this particular detail should be happening in bug #153727. Let's focus on the improvements to the GTK+ code (and perhaps also the GIMP code) here.
Comment on attachment 31918 [details] [review] patch to optimize g_signal_connect() patch is obsolete, a newer version is attached to bug #153727
Created attachment 32652 [details] GIMP profiling data with GTK+ linked statically (part 1) This is a profiling log which results from starting the GIMP, opening 12 images, and closing the GIMP. As Soeren says, most fo the time is spent in update_node. The key passage of code is in update_smart_separators, which traverses the tree after each insertion , making the creation of a menu with N entries O(N^2) I think. Would it be possible to call this function once, after all menu items have been inserted? That is, in some way have a way to indicate that we are finished, and update smart separators afterwards? The profiling file in its entirity is 3.3M, so I'm attaching it in 500K chunks.
Created attachment 32654 [details] GIMP profiling data with GTK+ linked statically (part 2)
Created attachment 32655 [details] GIMP profiling data with GTK+ linked statically (part c)
Created attachment 32656 [details] GIMP profiling data with GTK+ linked statically (part 4)
Created attachment 32657 [details] GIMP profiling data with GTK+ linked statically (part V)
Created attachment 32658 [details] GIMP profiling data with GTK+ linked statically (part 6)
Created attachment 32660 [details] GIMP profiling data with GTK+ linked statically (part g) That's it, folks - these were generated with split, so cat should be enough to get them all back to the original. Save the chunks with a common prefix, followed by [a-g] or [1-7] and cat prefix* > gimp_prof.txt Hope this is useful.
A few things stand out: 25.34 14.27 14.27 243697 0.06 0.09 update_smart_separators (A quarter of the time!) 0.99 32.38 0.56 10386890 0.00 0.00 gtk_separator_menu_item_get_type 0.61 39.31 0.35 10386440 0.00 0.00 gtk_container_children_callback 0.73 37.43 0.41 8463330 0.00 0.00 gtk_separator_tool_item_get_type 0.28 44.54 0.16 6316823 0.00 0.00 gtk_tearoff_menu_item_get_type (Too many calls, i.e., a sign of work that probably need not be done) Note, that gtk_container_children_callback involves memory allocation so this could be a real killer.
This stack trace from Dave Neary shows the problem nicely:
+ Trace 51161
update_smart_separators() calls _gtk_action_sync_menu_visible() on a submenu's attach widget, which is the menu item on the parent menu. This subsequently sets the visibility of the menu item to which the submenu is attached, which causes update_smart_separators() to be called on the parent menu again.
Stack frames #6 through #8 do not look believable. Could you add debug information for glib, please? _gtk_action_sync_menu_visible always calls g_object_set. Does it always need to? (I.e., is there significant fraction of those calls that do nothing because "visible" isn't changing?)
I have installed glib2.0-dbg, so there should be debugging symbols in libglib... but you're right, from the looks of things they're not there. I will try to find time to compile a glib from source this week, but I can't promise anything.
Mitch patched _gtk_action_sync_menu_visible() not to call g_object_set() if the item is already shown/hidden, and he said there was not much difference. But indeed, not calling g_object_set() if the widget is already in the desired visibility state would be a small win.
Can we commit that change to _gtk_action_sync_menu_visible() then? Any change that improves performance and doesn't add complexity to the code should IMO be applied.
I have committed them now.
Created attachment 33655 [details] [review] mark subtrees dirty Here is a patch which does what federico proposed. The dirty flag is always propagated up the tree, and update_node () doesn't descent into clean subtrees. Please test. I didn't feel a huge difference on my machine, but the patch considerably reduced the number of calls to update_nodes and update_smart_separators: update_node calls (calls on dirty nodes) unpatched patched until Gimp main window 200646 (1123) 9890 (2603) until first image window 357544 (2057) 17467 (4376) update_smart_separators calls unpatched patched until Gimp main window 23907 994 until first image window 42979 1565
Here are some numbers from a hacked GimpDisplayShell that creates 20 UI managers instead of one: 3 runs without the patch applied: gimp_display_shell_new: 20 ui managers took 11.116945 seconds gimp_display_shell_new: 20 ui managers took 11.087124 seconds gimp_display_shell_new: 20 ui managers took 11.040364 seconds and with the patch applied: gimp_display_shell_new: 20 ui managers took 5.270503 seconds gimp_display_shell_new: 20 ui managers took 5.268469 seconds gimp_display_shell_new: 20 ui managers took 5.286084 seconds I'd say that's a *major* speedup :) Great hack Matthias! (note that I also had the gsignal.c patch applied in both runs so the relative speedup of this one is more visible).
Tests without timings here, and the result is indeed impressive. Windows are opening slower than in 2.0, but within 2-3 seconds of the plug-in finishing, instead of the 10-15 seconds previously. Plus, performance is not degrading the more images I open any more - the 10th seemed to open in roughly the same time as the 1st. Perhaps it's not perfect yet, but this is certainly worth committing IMHO. Remember to decllare variables at the start of a block though ;)
I took another look at the problem with the huge number of GtkSettings connections, and found that GtkImageMenuItem and GtkButton connect one signal handler per instance to GtkSettings::notify. Here is a patch which takes a different approach by keeping all instances in a hashtable, and only connect one handler which loops over the instances. This should not only reduce the connect/disconnect overhead, it should also save space, since we allocate a Handler struct (3 longs + 3 pointers) and a Closure (1 long + 4 pointer) per signal connection, while a GHashNode is only 3 pointers.
Created attachment 33753 [details] [review] the patch
Can you check if this gives you an additional speedup ?
A slight one. Here's a run of the GIMP (which depends on my reactions too, unless the idle loop registers as 0 time - and these are completely useless figures if you're looking for the time on this machine that it takes to construct the menu): Unpatched: dave@bolsh:~/test$ time /usr/local/gimp-dev/bin/gimp-2.2 *.jpg real 5m12.801s user 4m26.510s sys 0m3.420s Patched: dave@bolsh:~/test$ time /usr/local/gimp-dev/bin/gimp-2.2 *.jpg real 4m51.217s user 3m32.070s sys 0m4.050s That's a saving here of a second per image (there were 20 in the sample). All I did was start the gimp, wait for the last image to open its window, then close the main toolbox and wait for everything to clean itself up. Another run gave these results: dave@bolsh:~/test$ time /usr/local/gimp-dev/bin/gimp-2.2 *.jpg real 4m32.395s user 3m18.730s sys 0m4.070s I don't really know what the difference might be, but that showed another speedup, showing just how unreliable these timings are ;) Dave.
2004-11-15 Matthias Clasen <mclasen@redhat.com> * gtk/gtkbutton.c: * gtk/gtkimagemenuitem.c: Don't use one signal handler per instance to track changes to the gtk-menu-images and gtk-button-images settings. Instead use a single handler which iterates over the widget hierarchy. (#143668)
*** Bug 159154 has been marked as a duplicate of this bug. ***
Performance seems to be acceptable now.
*** Bug 160566 has been marked as a duplicate of this bug. ***