After an evaluation, GNOME has moved from Bugzilla to GitLab. Learn more about GitLab.
No new issues can be reported in GNOME Bugzilla anymore.
To report an issue in a GNOME project, go to GNOME GitLab.
Do not go to GNOME Gitlab for: Bluefish, Doxygen, GnuCash, GStreamer, java-gnome, LDTP, NetworkManager, Tomboy.
Bug 646566 - Protect against Pad-Parent disappearing
Protect against Pad-Parent disappearing
Status: RESOLVED FIXED
Product: GStreamer
Classification: Platform
Component: gstreamer (core)
git master
Other All
: Normal blocker
: 0.10.33
Assigned To: GStreamer Maintainers
GStreamer Maintainers
Depends on:
Blocks:
 
 
Reported: 2011-04-02 21:05 UTC by Håvard Graff (hgr)
Modified: 2011-05-09 09:57 UTC
See Also:
GNOME target: ---
GNOME version: ---


Attachments
patch (4.06 KB, patch)
2011-04-02 21:05 UTC, Håvard Graff (hgr)
none Details | Review
queue and queue2 (1.77 KB, patch)
2011-04-02 21:06 UTC, Håvard Graff (hgr)
none Details | Review
upstream events (2.82 KB, patch)
2011-04-02 21:07 UTC, Håvard Graff (hgr)
none Details | Review
querys (2.71 KB, patch)
2011-04-02 21:07 UTC, Håvard Graff (hgr)
none Details | Review
vorbisdec (1.02 KB, patch)
2011-04-02 21:08 UTC, Håvard Graff (hgr)
none Details | Review
more query (1.04 KB, patch)
2011-04-02 21:09 UTC, Håvard Graff (hgr)
none Details | Review
rtp-callbacks (2.17 KB, patch)
2011-04-02 21:09 UTC, Håvard Graff (hgr)
none Details | Review
more rtp pad-callbacks (5.77 KB, patch)
2011-04-02 21:11 UTC, Håvard Graff (hgr)
none Details | Review
more... (901 bytes, patch)
2011-04-02 21:11 UTC, Håvard Graff (hgr)
none Details | Review
more upstream events (1.33 KB, patch)
2011-04-02 21:12 UTC, Håvard Graff (hgr)
none Details | Review

Description Håvard Graff (hgr) 2011-04-02 21:05:21 UTC
We have done a lot of work on stress-testing element for different scenarios where the pad-parent might disappear. Upstream Events / Querys and pad-allocs arriving while the element goes down.

Instead of making lots of bugs for this, I thought I could post all the patches here instead. Pretty straight-forward self-explanatory stuff.
Comment 1 Håvard Graff (hgr) 2011-04-02 21:05:54 UTC
Created attachment 184974 [details] [review]
patch
Comment 2 Håvard Graff (hgr) 2011-04-02 21:06:32 UTC
Created attachment 184975 [details] [review]
queue and queue2
Comment 3 Håvard Graff (hgr) 2011-04-02 21:07:14 UTC
Created attachment 184976 [details] [review]
upstream events
Comment 4 Håvard Graff (hgr) 2011-04-02 21:07:56 UTC
Created attachment 184977 [details] [review]
querys
Comment 5 Håvard Graff (hgr) 2011-04-02 21:08:41 UTC
Created attachment 184978 [details] [review]
vorbisdec
Comment 6 Håvard Graff (hgr) 2011-04-02 21:09:05 UTC
Created attachment 184979 [details] [review]
more query
Comment 7 Håvard Graff (hgr) 2011-04-02 21:09:31 UTC
Created attachment 184980 [details] [review]
rtp-callbacks
Comment 8 Håvard Graff (hgr) 2011-04-02 21:11:06 UTC
Created attachment 184981 [details] [review]
more rtp pad-callbacks
Comment 9 Håvard Graff (hgr) 2011-04-02 21:11:39 UTC
Created attachment 184982 [details] [review]
more...
Comment 10 Håvard Graff (hgr) 2011-04-02 21:12:12 UTC
Created attachment 184983 [details] [review]
more upstream events
Comment 11 Olivier Crête 2011-04-03 02:50:35 UTC
Do we really want to go this way ? I still think GstPad should hold a ref on it's parent any time it calls one of those functions.
Comment 12 Sebastian Dröge (slomo) 2011-04-04 07:47:28 UTC
Isn't there another bug about this already, with patches by you Olivier?
Comment 13 Olivier Crête 2011-04-04 14:11:12 UTC
There is bug #631580, it has a test case, but no patches. I could never make a patch that would work in all case.
I think one solution for 0.11 is to have the pad functions ref the parent and pass it to the callback (so getcaps() for example could be blah_getcaps(Pad *pad, GstObject *parent);
Also, we currently don't require pads to have a parent, it may be much easier if we did.
Comment 14 Håvard Graff (hgr) 2011-04-04 14:18:00 UTC
so how about committing these patches now, since they all help make GStreamer more stable, and then revisit the whole thing for .11?
Comment 15 Wim Taymans 2011-04-04 14:21:08 UTC
I don't particularly think the checking is weird, just something you need to be aware of and do.

Another option is to have a new pad flag (PAD_REF_PARENT) and when this is set, the core automatically keeps the pad alive while doing a callback. The flag could automatically be set when you do gst_element_add_pad().

If you want to do something for 0.11 (pass the pad as an argument) you still need to be able to know when the parent is really gone (after a dispose) or when the pad simply has no parent. I would also like to avoid adding the requirement that pads always need a parent to do dataprocessing.
Comment 16 Olivier Crête 2011-04-04 16:20:17 UTC
I guess we could have pads that require a parent and pads that dont... with a flag or something. The current system has clearly failed (almost none of the elements in the tree do it correctly). That said, I'm not sure there is anywhere outside of the unit tests where there are pads that really have no parent.
Comment 17 Håvard Graff (hgr) 2011-04-04 16:26:27 UTC
(In reply to comment #16)
> That said, I'm not sure there is anywhere outside of the unit tests where there > are pads that really have no parent.

Au contraire! This has been a common cause of crashes for us. We take out an element, and a pad-alloc, query or event arrives just at that time. You can never completely protect yourself against this unless you take down the whole pipeline at the same time. Crash-dumps from customers often have one variant of this, with the error-msg G_IS_OBJECT () as the most common one.
Comment 18 Olivier Crête 2011-04-04 17:41:19 UTC
(In reply to comment #17)

I meant, I don't know any case outside of the unit tests where the callback from the pads should be called without holding the parent (or without having a parent set). My proposal was to never call the functions from the pads unless the parent is set and "hold" the parent until the callback returns.

I'm also getting convinced that we should have something at the GstObject level, so we can "lock" the gstobject parent as returned by gst_object_get_parent(), so that it always succeeds during the callbacks (so we don't have to do those checks). Or maybe just not allow changing the parent, only setting it once and then only allow to unset it once?
Comment 19 Håvard Graff (hgr) 2011-04-04 21:35:51 UTC
(In reply to comment #12)
> Isn't there another bug about this already, with patches by you Olivier?

Bug 638844, that Olivier marked as a duplicate of his, has a few patches with the same theme, yes.
Comment 20 Tim-Philipp Müller 2011-04-08 13:08:38 UTC
Not really in favour of this approach at all, there must be a better way to do this that doesn't involve adding such checks to every single element.

Also, they're not really correct either, e.g.

  queue = GST_QUEUE (gst_pad_get_parent (pad));

might still throw a warning and possibly abort if get_parent() returns NULL, which is arguably better than crashing, but still broken.
Comment 21 Sebastian Dröge (slomo) 2011-04-08 13:27:56 UTC
I've committed all these patches now until we find a better solution. This way the races are at least fixed in the next release and we can find a perfect solution without any pressure and possibly using a suboptimal solution.

IMHO the best solution for this would be to
a) add a GstObject *parent parameter to all pad callbacks and
b) add a flag to GstPad that you don't want to have your callback called if there's no parent (and by default that flag is disabled)

For this we have to wait until 0.11 though because it's an API change.

b) might be possible to do in 0.10 by taking a reference of the parent before calling the callbacks
Comment 22 Olivier Crête 2011-04-08 14:29:14 UTC
Since all these patches are committed, lets follow this at bug #631580
Comment 23 Håvard Graff (hgr) 2011-04-08 19:57:06 UTC
(In reply to comment #20)
> Not really in favour of this approach at all, there must be a better way to do
> this that doesn't involve adding such checks to every single element.
> 
> Also, they're not really correct either, e.g.
> 
>   queue = GST_QUEUE (gst_pad_get_parent (pad));
> 
> might still throw a warning and possibly abort if get_parent() returns NULL,
> which is arguably better than crashing, but still broken.

I thought queue = GST_QUEUE (NULL); was legal?

Anyway, I made these stress-tests you might want? Posting them inline here, since this bug is now closed... :)


static gboolean keep_going;

static gpointer
query_func (gpointer data)
{
  GstElement * element = GST_ELEMENT (data);
  GstPad * src_pad = gst_element_get_static_pad (element, "src");
  GstQuery * query = gst_query_new_latency ();
  while (keep_going)
    gst_pad_query (src_pad, query);

  gst_query_unref (query);
  gst_object_unref (src_pad);
  return NULL;
}

static gpointer
event_func (gpointer data)
{
  GstElement * element = GST_ELEMENT (data);
  GstPad * src_pad = gst_element_get_static_pad (element, "src");
  GstStructure * event_structure = gst_structure_new ("test-event", NULL);
  GstEvent * event = gst_event_new_custom (GST_EVENT_CUSTOM_UPSTREAM, event_structure);
  GstPadEventFunction eventfunc = GST_PAD_EVENTFUNC (src_pad);
  g_assert (eventfunc);

  while (keep_going)
    eventfunc (src_pad, gst_event_ref (event));

  gst_event_unref (event);
  gst_object_unref (src_pad);
  return NULL;
}

static gpointer
state_change_and_unparent_func (gpointer data)
{
  GstElement * element = GST_ELEMENT (data);
  GstPad * src_pad = gst_element_get_static_pad (element, "src");
  while (keep_going)
  {
    gst_element_set_state (element, GST_STATE_PLAYING);
    g_usleep (G_USEC_PER_SEC / 10000);
    gst_element_set_state (element, GST_STATE_NULL);
    g_usleep (G_USEC_PER_SEC / 10000);

    gst_object_unparent (GST_OBJECT (src_pad));
    g_usleep (G_USEC_PER_SEC / 10000);
    gst_object_set_parent (GST_OBJECT (src_pad), GST_OBJECT (element));
    g_usleep (G_USEC_PER_SEC / 10000);
  }

  gst_object_unref (src_pad);

  return NULL;
}

static void
test_element_against_racy_upstream_querys (GstElement * element)
{
  GThread * query_thread;
  GThread * state_change_and_unparent_thread;

  keep_going = TRUE;

  query_thread =
    g_thread_create_full (query_func, element, 0, TRUE, TRUE,
      G_THREAD_PRIORITY_NORMAL, NULL);

  state_change_and_unparent_thread =
    g_thread_create_full (state_change_and_unparent_func, element, 0, TRUE, TRUE,
      G_THREAD_PRIORITY_NORMAL, NULL);

  g_usleep (G_USEC_PER_SEC / 4);

  keep_going = FALSE;
  g_thread_join (query_thread);
  g_thread_join (state_change_and_unparent_thread);
}

static void
test_element_against_racy_upstream_events (GstElement * element)
{
  GThread * event_thread;
  GThread * state_change_and_unparent_thread;

  keep_going = TRUE;

  event_thread =
    g_thread_create_full (event_func, element, 0, TRUE, TRUE,
      G_THREAD_PRIORITY_NORMAL, NULL);

  state_change_and_unparent_thread =
    g_thread_create_full (state_change_and_unparent_func, element, 0, TRUE, TRUE,
      G_THREAD_PRIORITY_NORMAL, NULL);

  g_usleep (G_USEC_PER_SEC / 4);

  keep_going = FALSE;
  g_thread_join (event_thread);
  g_thread_join (state_change_and_unparent_thread);
}
Comment 24 Tim-Philipp Müller 2011-04-09 17:42:22 UTC
> I thought queue = GST_QUEUE (NULL); was legal?

You are right, my apologies. I could've sworn this was not allowed, because I remember warnings about invalid casts of pointer 'NULL" to type xyz. But according to the code in gtype.[ch] this is just fine. Wonder if it changed at some point. In any case, sorry for the noise.
Comment 25 Stefan Sauer (gstreamer, gtkdoc dev) 2011-05-02 13:43:45 UTC
(In reply to comment #21)
> I've committed all these patches now until we find a better solution. This way
> the races are at least fixed in the next release and we can find a perfect
> solution without any pressure and possibly using a suboptimal solution.
> 
> IMHO the best solution for this would be to
> a) add a GstObject *parent parameter to all pad callbacks and
> b) add a flag to GstPad that you don't want to have your callback called if
> there's no parent (and by default that flag is disabled)
> 
> For this we have to wait until 0.11 though because it's an API change.
> 
> b) might be possible to do in 0.10 by taking a reference of the parent before
> calling the callbacks

Wanna file a 0.11 bug so that we remember?
Comment 26 Sebastian Dröge (slomo) 2011-05-02 13:58:27 UTC
There's bug #631580 already