GNOME Bugzilla – Bug 702545
tags: xmp: adding tags is very slow
Last modified: 2013-07-05 18:53:48 UTC
We have came across MP4 files exported from Adobe authoring tools which contain a lot of XMP tags (~50k to be more precise). Tested version of GStreamer was very slow to process them taking minutes on a modern i5 CPU. It looks time is spent in the loop calling read_one_tag from gst_tag_list_from_xmp_buffer (gstxmptags.c). That in turn calls gst_tag_list_add where and below I think the time is being spent. Might be gst_value_list_merge from core in fact but it is just a guess. I think this will be testable with the test_xmp_parsing unit test (tests/check/libs/tag.c) by pasting the attached example XML in there. (But I have not tried whether it reproduces.)
Created attachment 247128 [details] XMP data extracted from the problem file
There are two culprits: 1) gst_tag_list_add does copies of everything it receives, GValue initialization/copy/destroy, .... 2) initial expensive usage of g_slist_append (trivial patch to follow) Regarding 1) it would be great if we had a method to "take" a provided GValue instead of copying it. In the same way gst_structure_take_value and gst_value_{array|list}_take_value work.
My bad, it's actually gst_value_list_merge which is the most expensive part (called by gst_value_list_add(TAG_MERGE_MORE_APPEND|PREPEND) :( It actually creates ... a new valuelist from the existing one and the new one ... which becomes exponentially more expensive as you add items.
Created attachment 247232 [details] [review] taglist: Avoid combinatorial explosion when merging tags When appending/prepending tags, avoid re-creating (and copying) lists if we already have one and instead just append/prepend the GValue to the list.
Created attachment 247233 [details] [review] xmptag: More efficient GSList usage Instead of constantly appending (which gets more and more expensive), just prepend to the list (O(1)) and reverse the list before usage.
Patches are of course for git master. Rebasing against 0.10 is left as an exercise to the reader :) Those two patches make xmp reading almost instantaneous here.
Current usage of gst_tag_list_from_xmp_buffer is now: 26% gst_tag_list_add 48% _gst_xmp_tag_get_mapping_reverse
That mapping function looks *really* suboptimal. 60% of it is ... calling g_hash_table_iter_next ... If it's just going to be used for iteration, it might be better to just use a GList :D
Might make sense to add a reverse hash table instead g_hash_table_iter_next() should be similar to iterating over a list in complexity
(In reply to comment #9) > Might make sense to add a reverse hash table instead > > g_hash_table_iter_next() should be similar to iterating over a list in > complexity .... except not :) tmp = tmp->next <== 1 op g_hash_table_iter_next() <== >50 ops
Still O(1) ;) I think just using a reverse hash table there instead of iterating will make things much faster
Comment on attachment 247233 [details] [review] xmptag: More efficient GSList usage How about using a GQueue on the stack instead? Which is basically a list that keeps a pointer to the last member for efficient appending. Then you don't have to reverse later.
For what it is worth keep in mind that the first patch makes gst_tag_list_from_xmp_buffer go from five minutes to five seconds elapsed time and the second one to under half a second, possibly much less. I did not bother measuring accurately. But in the context of further optimisations, even getting rid of that 48% in _gst_xmp_tag_get_mapping_reverse completely might only win further 50ms. And this is with the pathological case of 50k tags. It probably would matter much more on slow CPUs though so personally I do like the idea of getting rid of poor algorithms.
Both patches commited to master. Closing bug (since adding tags is no longer very slow).
Comment on attachment 247232 [details] [review] taglist: Avoid combinatorial explosion when merging tags This patch was pushed: commit 0d57bd3198225e6ab713ba1e4e24c45b7bd8c502 Author: Edward Hervey <edward@collabora.com> Date: Wed Jun 19 09:19:53 2013 +0200 taglist: Avoid combinatorial explosion when merging tags When appending/prepending tags, avoid re-creating (and copying) lists if we already have one and instead just append/prepend the GValue to the list. https://bugzilla.gnome.org/show_bug.cgi?id=702545