GNOME Bugzilla – Bug 159001
half-finished gmarkup optimizations/cleanups
Last modified: 2011-02-18 16:09:33 UTC
The attached patches are cumulative, each one contains the changes of the previous. The first one has mostly small microoptimizations of gmarkup.c, that should be self-explanatory. Overall, these changes make a big difference in callgrind but really are lost in the noise if you profile with wall clock time. There's one small possible bugfix: - if (*p == '\n') + if (p != text_end && *p == '\n') The change to remove char counting from advance_char() breaks errors messages, so probably shouldn't go in. advance_char() is called over a million times in "gconftool-2 -R /" which adds up according to callgrind, so I was trying to make it cheaper somehow. The utf8_next_char() and return_if_fail() are really the expensive parts though per callgrind. Removing the return_if_fail() I think is appropriate. I'm not sure my find_current_text_end() change is correct, but it looks plausible to me and saves a scan over the entire text to be parsed. The second patch includes this first set of changes, and also refactors unescape_text() into multiple functions. I did this to simplify profiling, but then the compiler re-inlined everything, so the patch has __attribute__((noinline)) for now, would have to come out. I think this refactor makes the code nicer anyhow. I think the last patch just adds a minor rearrangement of unescape_text_state_inside_text() The bottlenecks in the callgrind profile at this point are - 60% g_markup_parse_context_parse() breaking down into: - 30% (of total gconftool run) g_utf8_validate() - 15% "" unescape_text() For gconf the only obvious hack to dramatically cheapen g_utf8_validate (special case ASCII) doesn't work, since most of the data is non-ASCII The core problem with unescape_text() seems to be building the GString, so again pretty unavoidable So anyhow, for gconf this is all more or less a waste of time, we need to just not parse all these xml files. But maybe some of this is worthwhile to keep around.
Created attachment 34004 [details] [review] microoptimization patch
Created attachment 34005 [details] [review] adds rearranging unescape_text
Created attachment 34006 [details] [review] adds some other minor change or two
Oops, the profile data there I was looking at the wrong thing. The optimized version brings the total "Instruction Fetch" count down from 304 million to 212 million for the whole program. The big drops in instructions are from: - cutting down advance_char() - avoiding unichar_isalnum() - avoiding the find_next_char() scan - the sum of the other small changes But again, there's no real wall clock time difference.
Of the 212 million, 58 million are UTF-8 validation and 30 million are unescape_text(), 184 million are some part of markup_parse_context(). Memory allocation seems to be the large part of parse_context().
As in any performance patch it would be useful to have a benchmark and simple input. 30% of time in g_utf8_validate() certainly makes me wonder if you are validating the same text many times, since g_utf8_validate() is reasonably speedy last time I checked ... should be faster than actually parsing XML. But I guess $ du /etc/gconf/schemas/ 15216 /etc/gconf/schemas/ is a lot of text to validate. (And a lot of IO). If we are actually reading that much. g_utf8_validate() probably could be optimized quite a bit by noting that many of the checks can be dropped for length == 1 or length == 2. UNICODE_VALID() is always true when len == 1 or len == 2. For length == 1, the overlong check never triggers. For length == 2, the overlong check ends up being (Off the top of my head, might need checking) so you could add a if (!(len == 1 || (len == 2 && (*p & 0x1c) != 0))) { } around the bulk of the inner loop. Would slow down CJK a bit, of course, but even CJK text has some ASCII in it.
Note that it isn't 30% of *time* in utf8_validate, it's 58 of 212 million instruction fetches... there are just over a million calls to advance_char(), so I'm guessing about a million chars, and having average 58 instructions in utf8_validate sounds plausible to me. I bet it's well under 30% of time since utf8_validate is a relatively focused loop rather than code sprawl but who knows. I experimented with special-casing ASCII first in dbus and then for this gconf thing. In _dbus_string_validate_utf8() (a cut and paste of g_utf8_validate) I just put "if (c < 128) { ++p; continue; }" at the top of the loop, as it looked to me like all the other checks were irrelevant to ascii. This helped quite a bit for dbus where most of the utf8 is just method names. I also tweaked the UTF8_ macros slightly in dbus but that could be trickier in glib plus it probably doesn't help. Special casing ASCII doesn't do anything useful for gconf though because most of the data isn't ASCII. I also tried one slightly different approach for the ASCII special case, which was to add a tight ASCII-only loop in front of the current loop, leaving the current loop unmodified. Something like: while (*p < 128 && *p) ++p; (don't remember the details) When that ends it just drops to the current loop. So the idea is a 100% ASCII file is superfast, but as soon as you see a non-ASCII char it bails out. This could be slightly faster than adding the ASCII check *inside* the current loop for both 100% ASCII files and 100% non-ASCII files, but slower for mixed files. Note that I'm not really lobbying for any of these changes, I just spent some time on it so I thought I'd put the info in bugzilla. For the most part the conclusion is "no big wins available here," though I think some of the gmarkup changes are nice cleanups.
Created attachment 34036 [details] quick benchmark hack Here is a quick benchmark, created by doing violence to markup-test.c Now for some good test data... I'll look into creating sets of files with the following features being present/absent: - non-ascii text - attributes - entity references - deep nesting
Created attachment 34040 [details] a new version Here is a new version of the benchmark. It also times g_utf8_validate, memcpy and strlen
Created attachment 34042 [details] some input files Here is a number of inputs bench1-*: simple ascii text, no attributes, no deep hierarchy bench2-*: simple ascii text, attributes, no deep hierarchy bench3-*: simple ascii text, no attributes, deep hierarchy bench4-*: non-ascii text, no attributes, no deep hierarchy *-1.xml: 100 elements containing text *-2.xml: 1000 elements containing text *-3.xml: 10000 elements containing text *-4.xml: 100000 elements containing text
To figure out whether g_utf8_validate() is the right target here at all, I commented out the g_utf8_validate() call and compared the timings. The result is that g_utf8_validate() take up between 3 and 21 percent of the parsing time, depending on the data being parsed (the 21% were with bench4-4.xml from above). So it doesn't look like optimizing g_utf8_validate() into a mere nothing will give us spectacular saving for g_markup...
2004-11-28 Matthias Clasen <mclasen@redhat.com> * glib/gmarkup.c: Optimizations; don't scan the entire text in find_current_text_end(), split unescape_text() into multiple functions. (#159001, Havoc Pennington)
/me does not like. We have... #define g_ascii_isalnum(c) \ ((g_ascii_table[(guchar) (c)] & G_ASCII_ALNUM) != 0) ...so that should probably not be called for non-ascii (unless you happen not to depend on the result in that case). The real[tm] way to do this is something like... static gboolean is_name_start_char (gunichar c) { return (c <= 0x7f) ? g_ascii_isalpha (c) || == '_' || c == ':' : g_unichar_isalpha (c); } ...or with a special-cased version of g_ascii_isalpha. While we're poking there, make char_str take an 8-byt buffer (which is faster to clear). Wanna bet that __attribute__ ((noinline)) going to cause problems for non-gcc compilers?
Regarding "p = g_utf8_next_char (p);": if you know that *p is ascii, then p++ is a bit faster. Thrice in unescape_text_state_inside_text. Once in unescape_text_state_after_ampersand. Once in unescape_text_state_inside_entity_name. Twice in unescape_text_state_after_charref_hash.
Urg, I removed one instance of __attribute__((noinline)) and overlooked the remaining ones...
2004-11-29 Matthias Clasen <mclasen@redhat.com> * glib/gmarkup.c: Remove leftover noinline attributes. (is_name_start_char, is_name_char): Avoid possible reads beyond the end of g_ascii_table. Not worth optimizing char_str, which is only used in error paths.