GNOME Bugzilla – Bug 300932
[PATCH] add a GST_ELEMENT_WORK_IN_PLACE flag to gstelement
Last modified: 2005-04-22 15:19:06 UTC
ok, let me explain: My goal is to make a pipeline like .. ! decodebin ! identity ! some_passthrough_element ! xvimagesink as fast as .. ! decodebin ! xvimagesink. The problem is this: decodebin ask for a buffer with _pad_alloc_buffer on identity sink pad. Identity doesn't have a _pad_alloc_buffer function so buffers are allocated with gst_buffer_new_and_alloc. Instead in the decodebin ! xvimagesink pipeline buffers are allocated (a lot faster) by xvimagesink. What this patch does: 1) adds a GST_ELEMENT_WORK_IN_PLACE flag to gstelement.h 2) Elements that are acting as passthrough (like identity, a not active videoscale, a not active ffmpegcolorspace, image filters that writes directly on the buffer and so on) should set this flag (on this patch i have only changed identity) 3) gst_pad_alloc_buffer (in gstpad.c) now works like this: a) Try to see if the peer pad exists and support bufferallocfunc. If yes then use it. b) If the element has GST_ELEMENT_WORK_IN_PLACE setted try to proxy the request on the peer element. Status of the patch: It's just a proof of concept right now.. I want only to know if this is a good solution and which are main problems of this approach (i'm just trying to learn gst core). I'll work better on this if this solution will be accepted. Please tell me your ideas regarding this patch :)
Created attachment 45351 [details] [review] first patch (just a proof of concept, contains debugs g_print, can destroy your pc, don't apply!) To test this patch: launch filesrc ! decodebin ! xvimagesink see cpu usage launch filesrc ! decodebin ! identity ! xvimagesink see cpu usage Apply the patch and repeat :)
oh right, you have to apply patch at #300923 to see an actual improvement!
I think right now, we want elements to forward the pad_alloc_buffer internally. This patch saves some code, but doesn't gain anything otherwise... It may be easier to just continue requiring manual forwarding (I think videoscale already does that).
iirc videoscale doesn't do it.. It has a passthrough mode that just forwards the buffer but this isn't what this patch does. Manual forwarding should work fine (will try this evening) but it has to be done in a lot of plugins... to have some effects in playbin you'll need to change(iirc): - identity - stream selector - queue - videoscale - ffmpegcolorspace I'll be happy to do such changes in these plugins but it's some duplicated code.. (/me hates duplicated code) Maybe we can think about it for 0.9?
Point taken. I hate duplicated code, too... I'm OK with this, as long as it's well-implemented. Nag: please don't change GST_ELEMENT_FLAG_LAST, that's part of our API and is intentionally +16 (it's supposed to be the first integer representation of the next derived class).
Created attachment 45419 [details] [review] a better patch (includes change to queue and identity) This patch includes changes to core, queue and identity. I have done different patch for gst-plugins elements, you can find them here: http://skaboy.no-ip.org/~luogni/patch/ I'll put them on bugzilla when this bug will be closed. I think that these are all the patch needed to make playbin faster (it's a bit late here so i can be wrong..). other thing: i've only tested it with raw playbin, some pipelines and aldegonde.. should i install totem? :) here aldegonde is as fast as playbin, is totem much different than aldegonde?
I don't like very much the line: + GstPad *src = gst_element_get_pad (parent, "src"); isn't there a better way to get an element's src pad? (will try to fix this tonigh)
GList *pads = gst_element_get_pad_list (element); GstPad *pad = NULL; for (pads = gst_element_get_pad_list (element); pads != NULL; pads = pads->next) { GstPad *t = GST_PAD (pads->data); if (GST_PAD_DIRECTION (t) == GST_PAD_SRC) { pad = t; break; } }
Created attachment 45454 [details] [review] other patch, includes fix for finding the source pad
Some profiling, i have done them using oprofile while playing a xvid/mp3/avi file with totem. With patch applied (cpu usage 38%): TIMER:0| samples| %| ------------------ 12220 53.8302 processor 4740 20.8801 libgstffmpeg.so 1662 7.3213 libc-2.3.2.so 1275 5.6165 libmad.so.0.2.1 654 2.8809 vmlinux-2.6.10 407 1.7929 libgstreamer-0.8.so.1.4.0 330 1.4537 libgobject-2.0.so.0.600.3 313 1.3788 libgstaudioconvert.so Without patch applied (cpu usage 52%): TIMER:0| samples| %| ------------------ 10512 43.6219 processor 5091 19.4424 libgstffmpeg.so 3291 12.5683 libc-2.3.2.so 2183 8.3368 vmlinux-2.6.10 1738 6.6374 libmad.so.0.2.1 461 1.7605 libgstreamer-0.8.so.1.4.0 457 1.7453 libgobject-2.0.so.0.600.3 385 1.4703 libgstaudioconvert.so 286 1.0922 libglib-2.0.so.0.600.3 The symbol that is using all cpu time in libc-2.3.2.so is "_wordcopy_bwd_aligned". Another thing to improve performance would be to support _get_buffers and _release_buffers functions in ffdec_* elements but this is a different bug..
As for get/release-buffer: as long as you explicitely but out some formats (e.g. SVQ1, see libavcodec/utils.c avcodec_default_get_buffer() and avcodec_align_dimensions()), it should be as easy as the use-pad-allocation patch. this patch looks good, apart from some minor style-issues (comments, indenting, and such), but I'll take care of those myself.
Last patch applied; please attach the gst-plugins patch too when you have time.
Created attachment 45544 [details] [review] add the flag to ffmpegcolorspace
Created attachment 45545 [details] [review] add the flag to playbin
Created attachment 45546 [details] [review] add the flag to videoscale These are the patch against elements used in playbin but this flag should be used also in other elements like gdkpixbufscale, overlay elements (?), videofilter elements (?), tee(?)
We'll see about others later. All applied, playbin is a hell-of-a-lot faster now, thanks.