GNOME Bugzilla – Bug 459862
typefinding coud be more efficient
Last modified: 2008-05-16 21:09:30 UTC
typefinding is not really efficient: * the buffer list is not sorted and always searched completely * keeping tiny buffers iwth 1 byte overlap in there is inefficient
Created attachment 92265 [details] [review] cleanup and performance improvment or typefinding At first the patch unifies the code of buffer and stream typefinding. Then it tries to handle common scenarios a bit more intelligently. I left some defugging code commented out in helper_find_peek() - will remove for final commit. cd ~/projects/gstreamer.portfwd/gst-media-test/nokia/bugdata time gst-typefind >/dev/null 2>&1 * */* 1773 files = before = real 0m4.529s user 0m3.940s sys 0m0.528s = after = real 0m3.495s user 0m2.884s sys 0m0.168s = using file = time file >/dev/null 2>&1 * */* real 0m12.793s user 0m0.544s sys 0m0.276s
Cool stuff, but it would be really great if you could split out the 'cleanups' and commit them and then make a new patch with the functional changes (since this is code that is run by almost every playback pipeline, I think it's important to keep the functional change in cvs history separately, and it also makes it more likely to get reviewed by more people).
Created attachment 92274 [details] [review] performance improvment for typefinding The cleanup part has been committed.
If leaking the typefinder list is acceptable, we could also add: static GList *type_list = NULL; static GList * type_find_factory_get_list (void) { if (G_UNLIKELY(!type_list)) { GstRegistry *registry = gst_registry_get_default (); type_list = gst_registry_get_feature_list (registry, GST_TYPE_TYPE_FIND_FACTORY); type_list = g_list_sort (type_list, type_find_factory_rank_cmp); } return type_list; } which brings the time down to: real 0m2.958s user 0m2.680s sys 0m0.168s
Tim, any change you could review the patch once again?
Created attachment 96164 [details] [review] same as above, but a bit more terse and with more comments (some temporary) I actually did look at this again a while ago, but then couldn't comment because bugzilla was down. Attached is what I came up with. It's basically the same as your patch, just a bit more terse and with a few extra comments here and there. My main question is basically: wouldn't it be better/more efficient to sort the list by offset+size rather than just offset? Otherwise we might stop looking for cached content in the list too early, no? As for caching the list of typefinders: it would probably be best to file a separate enhancement bug for this. It looks like an okay thing to do in general, but it probably needs some more work to take into account registry rescans (and locking ?) and freeing resources on gst_deinit() IMO.
Yes sorting by offset+size makes sense. Can we somehow incrementaly commit this? I don't care based on which patch we continue.
Hmm, instead of sorting by offset and then size, shouldn't we just keep the largest and kick out smaller ones?
> Hmm, instead of sorting by offset and then size, shouldn't we just keep the > largest and kick out smaller ones? I meant sorting by endoffset=(offset+size).
Any news on this? :)
Created attachment 110929 [details] [review] typefind.diff What about this patch? It sorts the list by offset+size and uses 4k as minimal buffer size that is peeked. We use 4k at other places for caching like this too so it should be fine ;) Also it's the page size on x86.
Created attachment 110930 [details] [review] typefind.diff erm... more like this patch.
2008-05-16 Sebastian Dröge <slomo@circular-chaos.org> * libs/gst/base/gsttypefindhelper.c: (helper_find_peek): Sort buffer cache list by end offsets. This makes sure that we don't stop to search for a cached buffer that contains the requested data too early. Also read a minimum of 4k bytes instead of 512 bytes as this is a bit more efficient. Fixes bug #459862.