GNOME Bugzilla – Bug 312581
Optimise string manipulation in vcard parsing
Last modified: 2018-12-11 12:52:31 UTC
It turns out that vCard parsing is quite a bottleneck in the addressbook. Attaching a patch to: * Reduce the number of g_string_append_unichar() calls (which turn into a memcpy) * Replace _append_ with _insert_ (-1) to save the overhead of another function call and type check * Stop using the return value of g_string_* (as its only returned for convenience)
Created attachment 50230 [details] [review] Optimise string handling
marking new, and setting priority to high because of the patch.
Created attachment 50568 [details] [review] Improved patch This patch rewrites the vCard parser, and doesn't duplicate the string several times before parsing. It needs testing and many eyes. Shame there isn't a decent unit test suite for vCard parsing.
Forgot to attach the benchmark. Parsing a vCard 100000 times with the old code takes 8.4 seconds, and 4.4s with this patch.
I wonder, given that there is no vcard regression test right now, if one can't be created? It would be a good opportunity I think. I would certainly feel more comfortable with a parser change if there was a good verification suite for it.
adding perf keyword
Devashish/Sushma : ?
harish: please don't use NEEDINFO state if input from developers is needed. please use it only for reporters. use ASSIGNED state instead. thanks in advance. :-) devashish/sushma: ping
ross: is that patch already in use at openedhand?
Yes. A derivitive of this patch is on the Nokia 770.
ross, i just asked because you wrote "it needs testing and many eyes". thanks. :-) devashish/sushma: ping! please review the patch...
definitely a 1.8 or 1.9 target.
Can we please get this reviewed/commited?
Latest svn, once #433782 lands, contains a partial fix for this which is very nice. I need to re-sync and refactor this patch.
Any updates? Is the issue resolved? TIA.
Lowering severity since this has been partially fixed already. Are there still relevant bits in this patch?
I tried to do similar changes as Ross in his latest patch (which has at least one bug, a call of g_string_assign (str, ""); after freeing it with g_string_free()), though it didn't do as significant improvement as the Ross' patch in the past. It shrunk parsing of a 2.6KB vCard 100K times from 13.7s to 9.8s (28%), which is not as that much as Ross' attempt (47.6%). Maybe it's because I used a different vCard data, I do not know. The code also changed meanwhile, especially that about unfolding, thus that could have its impact as well. It's possible there's still some room for improvements, though I do not see it at the moment. Created commit f2b976db9 in eds master (3.31.4+)