After an evaluation, GNOME has moved from Bugzilla to GitLab. Learn more about GitLab.
No new issues can be reported in GNOME Bugzilla anymore.
To report an issue in a GNOME project, go to GNOME GitLab.
Do not go to GNOME Gitlab for: Bluefish, Doxygen, GnuCash, GStreamer, java-gnome, LDTP, NetworkManager, Tomboy.
Bug 702545 - tags: xmp: adding tags is very slow
tags: xmp: adding tags is very slow
Status: RESOLVED FIXED
Product: GStreamer
Classification: Platform
Component: gst-plugins-base
git master
Other Linux
: Normal normal
: 1.1.2
Assigned To: GStreamer Maintainers
GStreamer Maintainers
Depends on:
Blocks:
 
 
Reported: 2013-06-18 11:34 UTC by Tvrtko Ursulin
Modified: 2013-07-05 18:53 UTC
See Also:
GNOME target: ---
GNOME version: ---


Attachments
XMP data extracted from the problem file (430.86 KB, application/x-gzip)
2013-06-18 11:35 UTC, Tvrtko Ursulin
  Details
taglist: Avoid combinatorial explosion when merging tags (1.78 KB, patch)
2013-06-19 07:24 UTC, Edward Hervey
committed Details | Review
xmptag: More efficient GSList usage (1.65 KB, patch)
2013-06-19 07:26 UTC, Edward Hervey
reviewed Details | Review

Description Tvrtko Ursulin 2013-06-18 11:34:29 UTC
We have came across MP4 files exported from Adobe authoring tools which contain a lot of XMP tags (~50k to be more precise). Tested version of GStreamer was very slow to process them taking minutes on a modern i5 CPU.

It looks time is spent in the loop calling read_one_tag from gst_tag_list_from_xmp_buffer (gstxmptags.c). That in turn calls gst_tag_list_add where and below I think the time is being spent. Might be gst_value_list_merge from core in fact but it is just a guess.

I think this will be testable with the test_xmp_parsing unit test (tests/check/libs/tag.c) by pasting the attached example XML in there. (But I have not tried whether it reproduces.)
Comment 1 Tvrtko Ursulin 2013-06-18 11:35:26 UTC
Created attachment 247128 [details]
XMP data extracted from the problem file
Comment 2 Edward Hervey 2013-06-19 06:49:12 UTC
There are two culprits:
1) gst_tag_list_add does copies of everything it receives, GValue initialization/copy/destroy, ....
2) initial expensive usage of g_slist_append (trivial patch to follow)

Regarding 1) it would be great if we had a method to "take" a provided GValue instead of copying it.

In the same way gst_structure_take_value and gst_value_{array|list}_take_value work.
Comment 3 Edward Hervey 2013-06-19 07:00:57 UTC
My bad, it's actually gst_value_list_merge which is the most expensive part (called by gst_value_list_add(TAG_MERGE_MORE_APPEND|PREPEND) :(

It actually creates ... a new valuelist from the existing one and the new one ... which becomes exponentially more expensive as you add items.
Comment 4 Edward Hervey 2013-06-19 07:24:02 UTC
Created attachment 247232 [details] [review]
taglist: Avoid combinatorial explosion when merging tags

When appending/prepending tags, avoid re-creating (and copying) lists if we already
have one and instead just append/prepend the GValue to the list.
Comment 5 Edward Hervey 2013-06-19 07:26:38 UTC
Created attachment 247233 [details] [review]
xmptag: More efficient GSList usage

Instead of constantly appending (which gets more and more expensive), just
prepend to the list (O(1)) and reverse the list before usage.
Comment 6 Edward Hervey 2013-06-19 07:27:32 UTC
Patches are of course for git master. Rebasing against 0.10 is left as an exercise to the reader :)

Those two patches make xmp reading almost instantaneous here.
Comment 7 Edward Hervey 2013-06-19 07:29:04 UTC
Current usage of gst_tag_list_from_xmp_buffer is now:
26% gst_tag_list_add
48% _gst_xmp_tag_get_mapping_reverse
Comment 8 Edward Hervey 2013-06-19 07:42:25 UTC
That mapping function looks *really* suboptimal. 60% of it is ... calling g_hash_table_iter_next ... If it's just going to be used for iteration, it might be better to just use a GList :D
Comment 9 Sebastian Dröge (slomo) 2013-06-19 09:00:23 UTC
Might make sense to add a reverse hash table instead

g_hash_table_iter_next() should be similar to iterating over a list in complexity
Comment 10 Edward Hervey 2013-06-19 09:20:11 UTC
(In reply to comment #9)
> Might make sense to add a reverse hash table instead
> 
> g_hash_table_iter_next() should be similar to iterating over a list in
> complexity

.... except not :)

tmp = tmp->next  <== 1 op

g_hash_table_iter_next() <== >50 ops
Comment 11 Sebastian Dröge (slomo) 2013-06-19 09:24:06 UTC
Still O(1) ;) I think just using a reverse hash table there instead of iterating will make things much faster
Comment 12 Tim-Philipp Müller 2013-06-19 09:31:45 UTC
Comment on attachment 247233 [details] [review]
xmptag: More efficient GSList usage

How about using a GQueue on the stack instead? Which is basically a list that keeps a pointer to the last member for efficient appending. Then you don't have to reverse later.
Comment 13 Tvrtko Ursulin 2013-06-19 09:35:02 UTC
For what it is worth keep in mind that the first patch makes gst_tag_list_from_xmp_buffer go from five minutes to five seconds elapsed time and the second one to under half a second, possibly much less. I did not bother measuring accurately.

But in the context of further optimisations, even getting rid of that 48% in _gst_xmp_tag_get_mapping_reverse completely might only win further 50ms. And this is with the pathological case of 50k tags.

It probably would matter much more on slow CPUs though so personally I do like the idea of getting rid of poor algorithms.
Comment 14 Edward Hervey 2013-06-19 10:32:03 UTC
Both patches commited to master.

Closing bug (since adding tags is no longer very slow).
Comment 15 Tim-Philipp Müller 2013-07-05 18:53:48 UTC
Comment on attachment 247232 [details] [review]
taglist: Avoid combinatorial explosion when merging tags

This patch was pushed:


commit 0d57bd3198225e6ab713ba1e4e24c45b7bd8c502
Author: Edward Hervey <edward@collabora.com>
Date:   Wed Jun 19 09:19:53 2013 +0200

    taglist: Avoid combinatorial explosion when merging tags
    
    When appending/prepending tags, avoid re-creating (and copying) lists if we already
    have one and instead just append/prepend the GValue to the list.
    
    https://bugzilla.gnome.org/show_bug.cgi?id=702545