After an evaluation, GNOME has moved from Bugzilla to GitLab. Learn more about GitLab.
No new issues can be reported in GNOME Bugzilla anymore.
To report an issue in a GNOME project, go to GNOME GitLab.
Do not go to GNOME Gitlab for: Bluefish, Doxygen, GnuCash, GStreamer, java-gnome, LDTP, NetworkManager, Tomboy.
Bug 794290 - rtponviftimestamp: race condition in buffer handling leading to memory corruption
rtponviftimestamp: race condition in buffer handling leading to memory corrup...
Status: RESOLVED DUPLICATE of bug 794353
Product: GStreamer
Classification: Platform
Component: gst-plugins-bad
git master
Other Linux
: Normal normal
: git master
Assigned To: GStreamer Maintainers
GStreamer Maintainers
Depends on:
Blocks:
 
 
Reported: 2018-03-13 13:40 UTC by Pavel Modilaynen
Modified: 2018-04-06 12:49 UTC
See Also:
GNOME target: ---
GNOME version: ---


Attachments
test patch (1.91 KB, patch)
2018-03-13 13:43 UTC, Pavel Modilaynen
none Details | Review
gdb backtrace (10.23 KB, text/plain)
2018-03-13 13:44 UTC, Pavel Modilaynen
  Details
proposed fix (3.12 KB, patch)
2018-03-13 13:45 UTC, Pavel Modilaynen
none Details | Review
proposed fix (updated) (3.72 KB, patch)
2018-03-15 10:11 UTC, Pavel Modilaynen
none Details | Review
proposed fix (updated 2) (4.85 KB, patch)
2018-03-16 08:32 UTC, Pavel Modilaynen
none Details | Review

Description Pavel Modilaynen 2018-03-13 13:40:19 UTC
There is a race condition in handling buffer and buffer_list in rtponviftimestamp.

According to code it is possible that one of threads executes the following chain:
send_cached_buffer_and_events() -> handle_and_push_buffer () -> gst_pad_push ()

and before self->buffer or self->list will be nullified in 
the same buffer or buffer_list can be attempted to destroy by another thread in 
purge_cached_buffer_and_events () -> gst_buffer_list_unref ()

This is race condition that leads to memory corruption and random crashes.
Comment 1 Pavel Modilaynen 2018-03-13 13:43:28 UTC
Created attachment 369617 [details] [review]
test patch
Comment 2 Pavel Modilaynen 2018-03-13 13:44:02 UTC
Created attachment 369618 [details]
gdb backtrace
Comment 3 Pavel Modilaynen 2018-03-13 13:45:38 UTC
Created attachment 369619 [details] [review]
proposed fix

Proposed patch to fix the issue
Comment 4 Sebastian Dröge (slomo) 2018-03-13 14:01:26 UTC
Comment on attachment 369619 [details] [review]
proposed fix

While pushing a buffer or buffer list (or event or anything really) downstream/upstream, no local mutex should be locked. This can easily lead to deadlocks.
Comment 5 Pavel Modilaynen 2018-03-13 14:06:29 UTC
@slomo: so, what should we do to prevent usage of buffer/buffer_list while it is/they are  being consumed by gst_pad_push ?
Comment 6 Sebastian Dröge (slomo) 2018-03-13 14:57:24 UTC
Take another reference, unlock, push and unref
Comment 7 Pavel Modilaynen 2018-03-15 10:11:13 UTC
Created attachment 369720 [details] [review]
proposed fix (updated)

I tried to take another ref but it lead to another problems with writability  and mapping of buffers. So, I just simply copy ref under lock and then proceed with pushing without locks. Please check updated patch.
Comment 8 Linus Svensson 2018-03-15 10:53:48 UTC
How about adding:
  g_mutex_lock (&lock);
  list = self->list;
  buffer = self->buffer;
  event_queue = self->event_queue;
  self->list = NULL;
  self->buffer = NULL;
  self->event_queue = NULL;
  g_mutex_unlock (&lock);

to both send_cached_buffers_and_events() and purge_cached_buffers_and_events() and use the local variables in the rest of those functions. The cached items are expected to be removed once any of those functions has been called.

The patch should also solve:
* purge all events in event_queue in the case pad_push fails in send_cached_buffers_and_events()
* Perhaps move purge_cached_buffers_and_events() to be called after change_state has chained up to the parent (Otherwise a new buffer or list might be cached and never free'd).
Comment 9 Pavel Modilaynen 2018-03-16 08:32:01 UTC
Created attachment 369758 [details] [review]
proposed fix (updated 2)

I have updated my patch adding protection for event queue as well.

> How about adding:
> ... to both send_cached_buffers_and_events() and 
> purge_cached_buffers_and_events() and use the local variables in the rest of 
> those functions. The cached items are expected to be removed once any of 
> those functions has been called.

I am not sure what would we gain from it? Purge now is fully under lock, is it some performance penalty that you are concerned of - then I don't think we'd win much here - purge is called only on events.

>  event_queue = self->event_queue;
>  self->event_queue = NULL;

I think we cannot just NULL-ify, since it is used without NULL checks in other parts of code. What I did is adding temporary queue, pushing events to that queue and clearing event_queue so that it becomes empty. All above under lock. I hope this is appropriate. 

> The patch should also solve:
> * purge all events in event_queue in the case pad_push fails in
> send_cached_buffers_and_events()

Do you mean when handle_and_push_buffer/handle_and_push_buffer_list fails? If yes, I addressed this situation in this patch, good point! 

> * Perhaps move purge_cached_buffers_and_events() to be called after
> change_state has chained up to the parent (Otherwise a new buffer or list
> might be cached and never free'd).

Yes, I noticed we have huge leaks here. I will address that in my new patch for https://bugzilla.gnome.org/show_bug.cgi?id=794353
Comment 11 Pavel Modilaynen 2018-04-06 12:38:23 UTC
@Tim-Philipp I couldn't reproduce corruptions with your patch.

*** This bug has been marked as a duplicate of bug 794353 ***
Comment 12 Tim-Philipp Müller 2018-04-06 12:49:17 UTC
Thanks for testing and confirming it's been fixed!