GNOME Bugzilla – Bug 650211
[PATCH] Optimization in key file parsing
Last modified: 2011-05-17 03:54:15 UTC
Created attachment 187832 [details] [review] Patch Current key file parsing code makes a copy of each key and value string and then frees the originals. Attached patch uses the original strings instead, which shows a 13% reduction in CPU cycles in my testing.
Created attachment 187833 [details] [review] Patch #2 Here is an additional patch that replaces a character-by-character search in g_key_file_parse_data() with a memchr()-based search. This saves an additional 5% of CPU time.
Can you explain a bit about how you are doing your profiling, what your test case is etc?
Sure, the test case is GTK's icon theme initialization, which loads /usr/share/icons/gnome/index.theme. Procedure is: valgrind --tool=callgrind --collect-atstart=no --toggle-collect=g_key_file_load_from_file audacious Audacious doesn't call g_key_file_* functions itself, so this is GTK's own usage of the code.
I think it's fairly clear from the code why the optimizations help, though: 1. Dynamic memory allocation (i.e. malloc() and free()) is relatively slow if you're doing it very often, e.g. several times on each line of a key file, and should be avoided where possible. 2. Byte-by-byte processing should by done using C library functions (e.g. strlen(), memchr(), etc.) where possible, since these functions are generally optimized in assembler for particular machines.
Review of attachment 187832 [details] [review]: Patch is straight-forward, does a reasonable optimization and also does a small code clean up. Looks great, thanks. In the future git format-patch output would be preferred so the commit message and authorship information is taken care of automatically. A tool that makes it easy to post patches straight from git to bugzilla in the ideal format is git-bz. I went with this commit message: keyfile: avoid needless allocations on file load When loading a key file, the keys and values of individual lines are allocated once when copied and trimmed from the parse buffer and allocated/copied again when added to the lookup map. This commit avoids the second pair of allocations by introducing a new function g_key_file_add_key_value_pair that gives the lookup map direct ownership of the key and value copied from the parse buffer. https://bugzilla.gnome.org/show_bug.cgi?id=650211
Review of attachment 187833 [details] [review]: The idea makes a lot of sense, especially in the face of glibc's optimized memchr and memcpy routines. There are a few things I changed for clarity which I will discuss below. This is the commit message I went with: http://git.gnome.org/browse/glib/commit/?id=aeac5de2f8eeaadcdc7e021694d92470834fece8 keyfile: fill parse buffer in line sized chunks When loading a keyfile the incoming bytes are fed to a line buffer to get parsed each time a new line is encountered. The code that fills the line buffer does it inefficiently, one byte at a time. This commit changes that code to look ahead at the incoming bytes for the next '\n' character and then fill the line buffer all at once. https://bugzilla.gnome.org/show_bug.cgi?id=650211 Thanks for the patch! ::: gkeyfile.c.1 @@ +992,3 @@ else + { + /* g_string_append_c (key_file->parse_buffer, data[i]); */ This line should just get erased instead of commented out @@ +994,3 @@ + /* g_string_append_c (key_file->parse_buffer, data[i]); */ + const gchar *c = memchr (data + i, '\n', length - i); + gsize n = (c != NULL) ? c - (data + i) : length - i; This is all a little too scrunched up. For readability, it's probably better to split things up into multiple lines and use more descriptive variable names. @@ +996,3 @@ + gsize n = (c != NULL) ? c - (data + i) : length - i; + g_string_append_len (key_file->parse_buffer, data + i, n); + i += n - 1; Adding n - 1 doesn't really make logical sense. We processed n bytes not n - 1. The - 1 is just compensating for the fact that the loop increments i at the end of each iteration. It made sense to increment i at the end of each iteration of the loop when both sides of the loop body only processed one byte at a time. Now that one side of the loop body processes more than one byte at a time, I think it's probably better to change the loop to not increment and instead add the actual number of bytes processed to i all in one go.