Bug 159001 – half-finished gmarkup optimizations/cleanups

After an evaluation, GNOME has moved from Bugzilla to GitLab. Learn more about GitLab.
No new issues can be reported in GNOME Bugzilla anymore.
To report an issue in a GNOME project, go to GNOME GitLab.
Do not go to GNOME Gitlab for: Bluefish, Doxygen, GnuCash, GStreamer, java-gnome, LDTP, NetworkManager, Tomboy.

Bug 159001 - half-finished gmarkup optimizations/cleanups


Summary:	half-finished gmarkup optimizations/cleanups


Status:	RESOLVED FIXED

Product:	glib
Classification:	Platform
Component:	general
Version:	unspecified
Hardware:	Other All

Importance:	Normal normal
Target Milestone:	---
Assigned To:	gtkdev
QA Contact:	gtkdev

URL:
Whiteboard:

Depends on:
Blocks:

Reported:	2004-11-21 23:39 UTC by Havoc Pennington
Modified:	2011-02-18 16:09 UTC

See Also:
GNOME target:	---
GNOME version:	---

Attachments
microoptimization patch (7.01 KB, patch) 2004-11-21 23:40 UTC, Havoc Pennington	none	Details \| Review
adds rearranging unescape_text (27.71 KB, patch) 2004-11-21 23:40 UTC, Havoc Pennington	none	Details \| Review
adds some other minor change or two (27.76 KB, patch) 2004-11-21 23:41 UTC, Havoc Pennington	none	Details \| Review
quick benchmark hack (6.89 KB, text/plain) 2004-11-22 18:42 UTC, Matthias Clasen		Details
a new version (5.79 KB, text/plain) 2004-11-22 21:42 UTC, Matthias Clasen		Details
some input files (353.79 KB, application/x-compressed-tar) 2004-11-22 21:46 UTC, Matthias Clasen		Details

Description Havoc Pennington 2004-11-21 23:39:48 UTC

The attached patches are cumulative, each one contains the changes of the previous.

The first one has mostly small microoptimizations of gmarkup.c, that should be
self-explanatory. Overall, these changes make a big difference in callgrind but
really are lost in the noise if you profile with wall clock time.

There's one small possible bugfix:
-                   if (*p == '\n')
+                   if (p != text_end && *p == '\n')

The change to remove char counting from advance_char() breaks errors messages,
so probably shouldn't go in. advance_char() is called over a million times in 
"gconftool-2 -R /" which adds up according to callgrind, so I was trying to make
it cheaper somehow. The utf8_next_char() and return_if_fail() are really the
expensive parts though per callgrind. Removing the return_if_fail() I think is
appropriate.

I'm not sure my find_current_text_end() change is correct, but it looks
plausible to me and saves a scan over the entire text to be parsed.

The second patch includes this first set of changes, and also refactors
unescape_text() into multiple functions. I did this to simplify profiling, but
then the compiler re-inlined everything, so the patch has
__attribute__((noinline)) for now, would have to come out. I think this refactor
makes the code nicer anyhow.

I think the last patch just adds a minor rearrangement of
unescape_text_state_inside_text()

The bottlenecks in the callgrind profile at this point are 
 - 60% g_markup_parse_context_parse() 
   breaking down into:
   - 30% (of total gconftool run) g_utf8_validate()
   - 15% ""                       unescape_text()

For gconf the only obvious hack to dramatically cheapen g_utf8_validate (special
case ASCII) doesn't work, since most of the data is non-ASCII

The core problem with unescape_text() seems to be building the GString, so again
pretty unavoidable

So anyhow, for gconf this is all more or less a waste of time, we need to just
not parse all these xml files. But maybe some of this is worthwhile to keep around.

Comment 1 Havoc Pennington 2004-11-21 23:40:26 UTC

Created attachment 34004 [details] [review]
microoptimization patch

Comment 2 Havoc Pennington 2004-11-21 23:40:55 UTC

Created attachment 34005 [details] [review]
adds rearranging unescape_text

Comment 3 Havoc Pennington 2004-11-21 23:41:31 UTC

Created attachment 34006 [details] [review]
adds some other minor change or two

Comment 4 Havoc Pennington 2004-11-22 00:37:31 UTC

Oops, the profile data there I was looking at the wrong thing.

The optimized version brings the total "Instruction Fetch" count down from 304
million to 212 million for the whole program. The big drops in instructions are
from:
 - cutting down advance_char()
 - avoiding unichar_isalnum()
 - avoiding the find_next_char() scan
 - the sum of the other small changes

But again, there's no real wall clock time difference.

Comment 5 Havoc Pennington 2004-11-22 00:41:01 UTC

Of the 212 million, 58 million are UTF-8 validation and 30 million are
unescape_text(), 184 million are some part of markup_parse_context().
Memory allocation seems to be the large part of parse_context().

Comment 6 Owen Taylor 2004-11-22 17:14:47 UTC

As in any performance patch it would be useful to have a benchmark and
simple input. 

30% of time in g_utf8_validate() certainly makes me wonder if you are 
validating the same text many times, since g_utf8_validate() is reasonably
speedy last time I checked ... should be faster than actually parsing
XML. But I guess 

$ du /etc/gconf/schemas/
15216   /etc/gconf/schemas/

is a lot of text to validate. (And a lot of IO). If we are actually
reading that much. g_utf8_validate() probably could be optimized 
quite a bit by noting that many of the checks can be dropped for
length == 1 or length == 2. UNICODE_VALID() is always true when
len == 1 or len == 2. For length == 1, the overlong check never
triggers. For length == 2, the overlong check ends up being
(Off the top of my head, might need checking)

so you could add a

 if (!(len == 1 ||
       (len == 2 && (*p & 0x1c) != 0)))
   {
   }

around the bulk of the inner loop. Would slow down CJK a bit,
of course, but even CJK text has some ASCII in it.

Comment 7 Havoc Pennington 2004-11-22 18:28:43 UTC

Note that it isn't 30% of *time* in utf8_validate, it's 58 of 212 million
instruction fetches... there are just over a million calls to advance_char(), so
I'm guessing about a million chars, and having average 58 instructions in
utf8_validate sounds plausible to me. I bet it's well under 30% of time since
utf8_validate is a relatively focused loop rather than code sprawl but who knows.

I experimented with special-casing ASCII first in dbus and then for this gconf
thing. In _dbus_string_validate_utf8() (a cut and paste of g_utf8_validate) 
I just put "if (c < 128) { ++p; continue; }" at the top of the loop, as it
looked to me like all the other checks were irrelevant to ascii.
This helped quite a bit for dbus where most of the utf8 is just method names.
I also tweaked the UTF8_ macros slightly in dbus but that could be trickier in
glib plus it probably doesn't help.

Special casing ASCII doesn't do anything useful for gconf though because most of
the data isn't ASCII.

I also tried one slightly different approach for the ASCII special case, which
was to add a tight ASCII-only loop in front of the current loop, leaving the
current loop unmodified. Something like:
  while (*p < 128 && *p)
    ++p;
(don't remember the details)

When that ends it just drops to the current loop. So the idea is a 100% ASCII
file is superfast, but as soon as you see a non-ASCII char it bails out. This
could be slightly faster than adding the ASCII check *inside* the current loop 
for both 100% ASCII files and 100% non-ASCII files, but slower for mixed files.

Note that I'm not really lobbying for any of these changes, I just spent some
time on it so I thought I'd put the info in bugzilla. For the most part the
conclusion is "no big wins available here," though I think some of the gmarkup
changes are nice cleanups.

Comment 8 Matthias Clasen 2004-11-22 18:42:50 UTC

Created attachment 34036 [details]
quick benchmark hack

Here is a quick benchmark, created by doing violence to markup-test.c
Now for some good test data...
I'll look into creating sets of files with the following features being
present/absent:
- non-ascii text
- attributes
- entity references
- deep nesting

Comment 9 Matthias Clasen 2004-11-22 21:42:37 UTC

Created attachment 34040 [details]
a new version

Here is a new version of the benchmark. It also times
g_utf8_validate, memcpy and strlen

Comment 10 Matthias Clasen 2004-11-22 21:46:19 UTC

Created attachment 34042 [details]
some input files

Here is a number of inputs
bench1-*: simple ascii text, no attributes, no deep hierarchy
bench2-*: simple ascii text, attributes, no deep hierarchy
bench3-*: simple ascii text, no attributes, deep hierarchy
bench4-*: non-ascii text, no attributes, no deep hierarchy

*-1.xml: 100 elements containing text
*-2.xml: 1000 elements containing text
*-3.xml: 10000 elements containing text
*-4.xml: 100000 elements containing text

Comment 11 Matthias Clasen 2004-11-23 02:03:09 UTC

To figure out whether g_utf8_validate() is the right target here at all,
I commented out the g_utf8_validate() call and compared the timings. The result 
is that g_utf8_validate() take up between 3 and 21 percent of the parsing time,
depending on the data being parsed (the 21% were with bench4-4.xml from above).
So it doesn't look like optimizing g_utf8_validate() into a mere nothing will
give us spectacular saving for g_markup...

Comment 12 Matthias Clasen 2004-11-28 05:40:34 UTC

2004-11-28  Matthias Clasen  <mclasen@redhat.com>

	* glib/gmarkup.c: Optimizations; don't scan the entire text
	in find_current_text_end(), split unescape_text() into multiple
	functions.  (#159001, Havoc Pennington)

Comment 13 Morten Welinder 2004-11-28 22:16:35 UTC

/me does not like.

We have...

#define g_ascii_isalnum(c) \
  ((g_ascii_table[(guchar) (c)] & G_ASCII_ALNUM) != 0)

...so that should probably not be called for non-ascii (unless you happen
not to depend on the result in that case).

The real[tm] way to do this is something like...

static gboolean
is_name_start_char (gunichar c)
{
  return (c <= 0x7f)
    ? g_ascii_isalpha (c) ||  == '_' || c == ':'
    : g_unichar_isalpha (c);
}

...or with a special-cased version of g_ascii_isalpha.


While we're poking there, make char_str take an 8-byt buffer (which is faster
to clear).


Wanna bet that __attribute__ ((noinline)) going to cause problems for non-gcc
compilers?

Comment 14 Morten Welinder 2004-11-28 22:52:22 UTC

Regarding "p = g_utf8_next_char (p);": if you know that *p is ascii, then
p++ is a bit faster.

Thrice in unescape_text_state_inside_text.
Once in unescape_text_state_after_ampersand.
Once in unescape_text_state_inside_entity_name.
Twice in unescape_text_state_after_charref_hash.

Comment 15 Matthias Clasen 2004-11-29 06:33:30 UTC

Urg, I removed one instance of __attribute__((noinline)) and overlooked the
remaining ones...

Comment 16 Matthias Clasen 2004-11-29 06:52:21 UTC

2004-11-29  Matthias Clasen  <mclasen@redhat.com>

	* glib/gmarkup.c: Remove leftover noinline attributes.
	(is_name_start_char, is_name_char): Avoid possible reads
	beyond the end of g_ascii_table.

Not worth optimizing char_str, which is only used in error paths.