Bug 794290 – rtponviftimestamp: race condition in buffer handling leading to memory corruption

After an evaluation, GNOME has moved from Bugzilla to GitLab. Learn more about GitLab.
No new issues can be reported in GNOME Bugzilla anymore.
To report an issue in a GNOME project, go to GNOME GitLab.
Do not go to GNOME Gitlab for: Bluefish, Doxygen, GnuCash, GStreamer, java-gnome, LDTP, NetworkManager, Tomboy.

Bug 794290 - rtponviftimestamp: race condition in buffer handling leading to memory corruption


Summary:	rtponviftimestamp: race condition in buffer handling leading to memory corrup...


Status:	RESOLVED DUPLICATE of bug 794353

Product:	GStreamer
Classification:	Platform
Component:	gst-plugins-bad
Version:	git master
Hardware:	Other Linux

Importance:	Normal normal
Target Milestone:	git master
Assigned To:	GStreamer Maintainers
QA Contact:	GStreamer Maintainers

URL:
Whiteboard:

Depends on:
Blocks:

Reported:	2018-03-13 13:40 UTC by Pavel Modilaynen
Modified:	2018-04-06 12:49 UTC

See Also:
GNOME target:	---
GNOME version:	---

Attachments
test patch (1.91 KB, patch) 2018-03-13 13:43 UTC, Pavel Modilaynen	none	Details \| Review
gdb backtrace (10.23 KB, text/plain) 2018-03-13 13:44 UTC, Pavel Modilaynen		Details
proposed fix (3.12 KB, patch) 2018-03-13 13:45 UTC, Pavel Modilaynen	none	Details \| Review
proposed fix (updated) (3.72 KB, patch) 2018-03-15 10:11 UTC, Pavel Modilaynen	none	Details \| Review
proposed fix (updated 2) (4.85 KB, patch) 2018-03-16 08:32 UTC, Pavel Modilaynen	none	Details \| Review

Description Pavel Modilaynen 2018-03-13 13:40:19 UTC

There is a race condition in handling buffer and buffer_list in rtponviftimestamp.

According to code it is possible that one of threads executes the following chain:
send_cached_buffer_and_events() -> handle_and_push_buffer () -> gst_pad_push ()

and before self->buffer or self->list will be nullified in 
the same buffer or buffer_list can be attempted to destroy by another thread in 
purge_cached_buffer_and_events () -> gst_buffer_list_unref ()

This is race condition that leads to memory corruption and random crashes.

Comment 1 Pavel Modilaynen 2018-03-13 13:43:28 UTC

Created attachment 369617 [details] [review]
test patch

Comment 2 Pavel Modilaynen 2018-03-13 13:44:02 UTC

Created attachment 369618 [details]
gdb backtrace

Comment 3 Pavel Modilaynen 2018-03-13 13:45:38 UTC

Created attachment 369619 [details] [review]
proposed fix

Proposed patch to fix the issue

Comment 4 Sebastian Dröge (slomo) 2018-03-13 14:01:26 UTC

Comment on attachment 369619 [details] [review]
proposed fix

While pushing a buffer or buffer list (or event or anything really) downstream/upstream, no local mutex should be locked. This can easily lead to deadlocks.

Comment 5 Pavel Modilaynen 2018-03-13 14:06:29 UTC

@slomo: so, what should we do to prevent usage of buffer/buffer_list while it is/they are  being consumed by gst_pad_push ?

Comment 6 Sebastian Dröge (slomo) 2018-03-13 14:57:24 UTC

Take another reference, unlock, push and unref

Comment 7 Pavel Modilaynen 2018-03-15 10:11:13 UTC

Created attachment 369720 [details] [review]
proposed fix (updated)

I tried to take another ref but it lead to another problems with writability  and mapping of buffers. So, I just simply copy ref under lock and then proceed with pushing without locks. Please check updated patch.

Comment 8 Linus Svensson 2018-03-15 10:53:48 UTC

How about adding:
  g_mutex_lock (&lock);
  list = self->list;
  buffer = self->buffer;
  event_queue = self->event_queue;
  self->list = NULL;
  self->buffer = NULL;
  self->event_queue = NULL;
  g_mutex_unlock (&lock);

to both send_cached_buffers_and_events() and purge_cached_buffers_and_events() and use the local variables in the rest of those functions. The cached items are expected to be removed once any of those functions has been called.

The patch should also solve:
* purge all events in event_queue in the case pad_push fails in send_cached_buffers_and_events()
* Perhaps move purge_cached_buffers_and_events() to be called after change_state has chained up to the parent (Otherwise a new buffer or list might be cached and never free'd).

Comment 9 Pavel Modilaynen 2018-03-16 08:32:01 UTC

Created attachment 369758 [details] [review]
proposed fix (updated 2)

I have updated my patch adding protection for event queue as well.

> How about adding:
> ... to both send_cached_buffers_and_events() and 
> purge_cached_buffers_and_events() and use the local variables in the rest of 
> those functions. The cached items are expected to be removed once any of 
> those functions has been called.

I am not sure what would we gain from it? Purge now is fully under lock, is it some performance penalty that you are concerned of - then I don't think we'd win much here - purge is called only on events.

>  event_queue = self->event_queue;
>  self->event_queue = NULL;

I think we cannot just NULL-ify, since it is used without NULL checks in other parts of code. What I did is adding temporary queue, pushing events to that queue and clearing event_queue so that it becomes empty. All above under lock. I hope this is appropriate. 

> The patch should also solve:
> * purge all events in event_queue in the case pad_push fails in
> send_cached_buffers_and_events()

Do you mean when handle_and_push_buffer/handle_and_push_buffer_list fails? If yes, I addressed this situation in this patch, good point! 

> * Perhaps move purge_cached_buffers_and_events() to be called after
> change_state has chained up to the parent (Otherwise a new buffer or list
> might be cached and never free'd).

Yes, I noticed we have huge leaks here. I will address that in my new patch for https://bugzilla.gnome.org/show_bug.cgi?id=794353

Comment 10 Tim-Philipp Müller 2018-03-27 16:02:02 UTC

Could you check if this fixes it?

https://cgit.freedesktop.org/gstreamer/gst-plugins-bad/commit/gst/onvif?id=4d76070f4ec97c6aecbc55ec27997c523aa2b922

Comment 11 Pavel Modilaynen 2018-04-06 12:38:23 UTC

@Tim-Philipp I couldn't reproduce corruptions with your patch.

*** This bug has been marked as a duplicate of bug 794353 ***

Comment 12 Tim-Philipp Müller 2018-04-06 12:49:17 UTC

Thanks for testing and confirming it's been fixed!