GNOME Bugzilla – Bug 397439
Performance enhancement patch series
Last modified: 2007-01-17 18:50:34 UTC
Attached are a series of patches aimed at fine tuning the implementation. No major surgery, just the result of lots of profiling and some janitorial work. time hexdump 16M.wvm Before: local 0m27.228s remote 3m25.101s After: local 0m23.257s remote 2m54.964s The really odd thing is that very rarely the hexdump runs at ~4x the speed on the slow system eg remote 0m56.524s. Of course when that happens I'm neither tracing nor profiling, so I have no idea what's going on.
Created attachment 80441 [details] [review] replace fprintf(stderr) with g_printerr
Created attachment 80442 [details] [review] Avoid the run-time checked type casts when possible.
Created attachment 80443 [details] [review] Switch to G_DEFINE_TYPE for the simple cases.
Created attachment 80444 [details] [review] glib zeroes memory on free if G_DEBUG=gc-friendly, so we dont have to.
Created attachment 80445 [details] [review] s/g_array_new(TRUE, TRUE/g_array_new(FALSE, FALSE/ and verify the transformation is safe.
Created attachment 80446 [details] [review] We are rate-limited by the display update timeout, so the input processing timer is redundant
Created attachment 80447 [details] [review] Modify the cell locally before pushing it back to memory.
Created attachment 80448 [details] [review] Introduce a common find_row_data to simplify some code.
Created attachment 80449 [details] [review] s/(VteConv)-1/VTE_INVALID_CONV/
Created attachment 80450 [details] [review] Avoid the double glyph lookup
Created attachment 80451 [details] [review] Micro-optimise the loops inside vteconv
Created attachment 80452 [details] [review] FT2: reuse the transformation of glyph->rgb inside a run
Created attachment 80453 [details] [review] Micro-optimise the loops inside vteseq
Created attachment 80455 [details] [review] Don't load fonts whilst unrealized.
Created attachment 80456 [details] [review] Avoid the switch on matcher type by using a function table (and cache the match_func)
Created attachment 80457 [details] [review] Make the debugging checks inside the ring macros conditional on VTE_DEBUG
Created attachment 80458 [details] [review] Make the debugging checks inside the ring macros conditional on VTE_DEBUG
Created attachment 80459 [details] [review] Some more janitorial work and minor tweaks.
Created attachment 80461 [details] [review] Janitorial work and minor tweaks (refreshed).
Doh! That's a heck of a lot of large patches! Ok, just one thing I spotted. This doesn't look trivially right to me: - vte_reaper_child_watch_destroyed); + (GDestroyNotify)g_object_unref); but it probably does to you. Since I would not have time to go over these bit by bit in this life time, I suggest you commit them all as soon as you feel comfortable doing that, and (after committing them other optimization of yours) get a release out soon and wait for bugs to come in! I also like to see the VTE_BACKEND patch land. Good job!
Darn, if you want to put in a good word with accounts@gnome.org... [2 days and waiting] s/vte_reaper_child_watch_destroyed/g_object_unref/: in switching over to the exemplar GObject singleton code we acquire a new reference for every vte_reaper_get() [which is better for language bindings and using g_object_new(VTE_REAPER_TYPE) directly] hence the new requirement to unref. Which of course points out the missed unref in vte.c
Created attachment 80465 [details] [review] Switch to G_DEFINE_TYPE for the simple cases (updated for missing unref on VteReaper)
Commited: r1429-45.
(In reply to comment #15) > Created an attachment (id=80456) [edit] > Avoid the switch on matcher type by using a function table (and cache the > match_func) Note that this can be a little tricky. A couple years ago I profile pango and saw a switch showing in the profiles and replaced it with a function table, but later I found that the reason the switch was showing is because the functions it calls were being inlined, while now it's impossible to avoid the function call overhead, plus function tables introduce more relocations and .data storage.
Hmm, tricky indeed - many different tradeoffs. In the tests I ran I am reasonably confident (having to judge from time&sysprof&callgrind relative timings) that this represented a performance improvement, and not just from moving inlined code. I'd say a nice thing to have would be something akin to cairo-perf...