After an evaluation, GNOME has moved from Bugzilla to GitLab. Learn more about GitLab.
No new issues can be reported in GNOME Bugzilla anymore.
To report an issue in a GNOME project, go to GNOME GitLab.
Do not go to GNOME Gitlab for: Bluefish, Doxygen, GnuCash, GStreamer, java-gnome, LDTP, NetworkManager, Tomboy.
Bug 459862 - typefinding coud be more efficient
typefinding coud be more efficient
Status: RESOLVED FIXED
Product: GStreamer
Classification: Platform
Component: gstreamer (core)
git master
Other Linux
: Normal normal
: 0.10.20
Assigned To: GStreamer Maintainers
GStreamer Maintainers
Depends on:
Blocks:
 
 
Reported: 2007-07-24 11:08 UTC by Stefan Sauer (gstreamer, gtkdoc dev)
Modified: 2008-05-16 21:09 UTC
See Also:
GNOME target: ---
GNOME version: ---


Attachments
cleanup and performance improvment or typefinding (11.74 KB, patch)
2007-07-24 11:12 UTC, Stefan Sauer (gstreamer, gtkdoc dev)
none Details | Review
performance improvment for typefinding (3.33 KB, patch)
2007-07-24 13:50 UTC, Stefan Sauer (gstreamer, gtkdoc dev)
none Details | Review
same as above, but a bit more terse and with more comments (some temporary) (3.98 KB, patch)
2007-09-25 12:26 UTC, Tim-Philipp Müller
committed Details | Review
typefind.diff (2.19 KB, patch)
2008-05-14 20:20 UTC, Sebastian Dröge (slomo)
none Details | Review
typefind.diff (2.19 KB, patch)
2008-05-14 20:27 UTC, Sebastian Dröge (slomo)
committed Details | Review

Description Stefan Sauer (gstreamer, gtkdoc dev) 2007-07-24 11:08:37 UTC
typefinding is not really efficient:
* the buffer list is not sorted and always searched completely
* keeping tiny buffers iwth 1 byte overlap in there is inefficient
Comment 1 Stefan Sauer (gstreamer, gtkdoc dev) 2007-07-24 11:12:41 UTC
Created attachment 92265 [details] [review]
cleanup and performance improvment or typefinding

At first the patch unifies the code of buffer and stream typefinding. Then it tries to handle common scenarios a bit more intelligently. I left some defugging code commented out in helper_find_peek() - will remove for final commit.

cd ~/projects/gstreamer.portfwd/gst-media-test/nokia/bugdata
time gst-typefind >/dev/null 2>&1 * */*
1773 files

= before =

real    0m4.529s
user    0m3.940s
sys     0m0.528s

= after =

real    0m3.495s
user    0m2.884s
sys     0m0.168s

= using file =
time file >/dev/null 2>&1 * */*

real    0m12.793s
user    0m0.544s
sys     0m0.276s
Comment 2 Tim-Philipp Müller 2007-07-24 11:53:28 UTC
Cool stuff, but it would be really great if you could split out the 'cleanups' and commit them and then make a new patch with the functional changes (since this is code that is run by almost every playback pipeline, I think it's important to keep the functional change in cvs history separately, and it also makes it more likely to get reviewed by more people).
Comment 3 Stefan Sauer (gstreamer, gtkdoc dev) 2007-07-24 13:50:34 UTC
Created attachment 92274 [details] [review]
performance improvment for typefinding

The cleanup part has been committed.
Comment 4 Stefan Sauer (gstreamer, gtkdoc dev) 2007-07-24 14:28:09 UTC
If leaking the typefinder list is acceptable, we could also add:
static GList *type_list = NULL;

static GList *
type_find_factory_get_list (void)
{
  if (G_UNLIKELY(!type_list)) {
    GstRegistry *registry = gst_registry_get_default ();
  
    type_list = gst_registry_get_feature_list (registry,
      GST_TYPE_TYPE_FIND_FACTORY);
    type_list = g_list_sort (type_list, type_find_factory_rank_cmp);
  }  
  return type_list;
}

which brings the time down to:

real    0m2.958s
user    0m2.680s
sys     0m0.168s
Comment 5 Stefan Sauer (gstreamer, gtkdoc dev) 2007-09-25 07:36:04 UTC
Tim, any change you could review the patch once again?
Comment 6 Tim-Philipp Müller 2007-09-25 12:26:26 UTC
Created attachment 96164 [details] [review]
same as above, but a bit more terse and with more comments (some temporary)

I actually did look at this again a while ago, but then couldn't comment because bugzilla was down.  Attached is what I came up with.  It's basically the same as your patch, just a bit more terse and with a few extra comments here and there.

My main question is basically: wouldn't it be better/more efficient to sort the list by offset+size rather than just offset?  Otherwise we might stop looking for cached content in the list too early, no?

As for caching the list of typefinders: it would probably be best to file a separate enhancement bug for this.  It looks like an okay thing to do in general, but it probably needs some more work to take into account registry rescans (and locking ?) and freeing resources on gst_deinit() IMO.
Comment 7 Stefan Sauer (gstreamer, gtkdoc dev) 2007-10-02 11:17:23 UTC
Yes sorting by offset+size makes sense. Can we somehow incrementaly commit this? I don't care based on which patch we continue.
Comment 8 Stefan Sauer (gstreamer, gtkdoc dev) 2007-10-03 12:26:57 UTC
Hmm, instead of sorting by offset and then size, shouldn't we just keep the largest and kick out smaller ones?
Comment 9 Tim-Philipp Müller 2007-10-03 12:37:36 UTC
> Hmm, instead of sorting by offset and then size, shouldn't we just keep the
> largest and kick out smaller ones?

I meant sorting by endoffset=(offset+size).


Comment 10 Sebastian Dröge (slomo) 2008-03-22 10:29:13 UTC
Any news on this? :)
Comment 11 Sebastian Dröge (slomo) 2008-05-14 20:20:50 UTC
Created attachment 110929 [details] [review]
typefind.diff

What about this patch? It sorts the list by offset+size and uses 4k as minimal buffer size that is peeked. We use 4k at other places for caching like this too so it should be fine ;) Also it's the page size on x86.
Comment 12 Sebastian Dröge (slomo) 2008-05-14 20:27:07 UTC
Created attachment 110930 [details] [review]
typefind.diff

erm... more like this patch.
Comment 13 Sebastian Dröge (slomo) 2008-05-16 21:09:30 UTC
2008-05-16  Sebastian Dröge  <slomo@circular-chaos.org>

        * libs/gst/base/gsttypefindhelper.c: (helper_find_peek):
        Sort buffer cache list by end offsets. This makes sure that we don't
        stop to search for a cached buffer that contains the requested data
        too early.
        Also read a minimum of 4k bytes instead of 512 bytes as this is a bit
        more efficient. Fixes bug #459862.