Bug 780442 – vaapi: wayland: multiple sinks blocks the pipeline

After an evaluation, GNOME has moved from Bugzilla to GitLab. Learn more about GitLab.
No new issues can be reported in GNOME Bugzilla anymore.
To report an issue in a GNOME project, go to GNOME GitLab.
Do not go to GNOME Gitlab for: Bluefish, Doxygen, GnuCash, GStreamer, java-gnome, LDTP, NetworkManager, Tomboy.

Bug 780442 - vaapi: wayland: multiple sinks blocks the pipeline


Summary:	vaapi: wayland: multiple sinks blocks the pipeline


Status:	RESOLVED FIXED

Product:	GStreamer
Classification:	Platform
Component:	gstreamer-vaapi
Version:	1.8.3
Hardware:	Other Linux

Importance:	Normal normal
Target Milestone:	1.11.91
Assigned To:	GStreamer Maintainers
QA Contact:	GStreamer Maintainers

URL:
Whiteboard:

Depends on:	773689 774029
Blocks:	748634

Reported:	2017-03-23 11:49 UTC by Lim Siew Hoon
Modified:	2017-04-24 11:35 UTC

See Also:
GNOME target:	---
GNOME version:	---

Attachments
*output_log with GST_DEBUG=vaapi:5,GST_REFCOUNTING:7** (1.98 MB, application/zip) 2017-03-23 11:51 UTC, Lim Siew Hoon		Details
libs: window: wayland: refactor event-handling (9.64 KB, patch) 2017-04-06 08:15 UTC, Hyunjun Ko	none	Details \| Review
libs: window: wayland: refactor event-handling (10.75 KB, patch) 2017-04-17 08:02 UTC, Hyunjun Ko	none	Details \| Review
libs: window: wayland: refactor event-handling (10.84 KB, patch) 2017-04-18 03:00 UTC, Hyunjun Ko	none	Details \| Review
libs: window: wayland: refactor event-handling (9.59 KB, patch) 2017-04-19 02:14 UTC, Hyunjun Ko	rejected	Details \| Review
libs: window: wayland: sending null buffer when destruction (2.38 KB, patch) 2017-04-19 02:14 UTC, Hyunjun Ko	rejected	Details \| Review

Description Lim Siew Hoon 2017-03-23 11:49:27 UTC

Gstreamer framework: 1.8.3
Gstreamer vaapi: 1.8 branch
OS: Yocto Linux with wayland weston compositor 
Wayland and weston version: 1.10.0

Able to reproduce the issue with vaapisink (1 video didn't get killed): 
gst-launch-1.0 filesrc location=/Videos/test.mp4 ! qtdemux ! vaapidecode ! vaapisink filesrc location=/Videos/test.mp4 ! qtdemux ! vaapisink

Able to reproduce the issue with vaapisink (2 video didn't get killed): 
gst-launch-1.0 filesrc location=/Videos/test.mp4 ! qtdemux ! vaapidecode ! vaapisink filesrc location=/Videos/test.mp4 ! qtdemux ! vaapisink filesrc location=/Videos/test.mp4 ! qtdemux ! vaapisink

Not able to reproduce issue if vaapisink replace to waylandsink. 
This issue does not happen in X11 if using vaapisink.

Attached the output_log.txt running with command:
GST_DEBUG=vaapi*:5,GST_REFCOUNTING:7 gst-launch-1.0 -v filesrc location=/home/root/HoneyBee.mp4 ! qtdemux ! vaapidecode ! vaapisink filesrc location=/home/root/HoneyBee.mp4 ! qtdemux ! vaapidecode ! vaapisink &> output_log.txt

Comment 1 Lim Siew Hoon 2017-03-23 11:51:40 UTC

Created attachment 348564 [details]
output_log with GST_DEBUG=vaapi*:5,GST_REFCOUNTING:7

Comment 2 Víctor Manuel Jáquez Leal 2017-03-23 17:21:03 UTC

Hi Lim,

Thanks for filling this bug.

What do you mean with "get killed"?

I tried this pipeline in weston:

gst-launch-1.0 uridecodebin uri= file:///home/vjaquez/patterns/300\ -\ Rise\ of\ an\ Empire\ -\ Trailer\ 2.mp4 ! vaapisink uridecodebin uri= file:///home/vjaquez/patterns/big_buck_bunny_1080p_h264.mov ! vaapisink

And got a crash. Is that what you mean?

If I run it under gdb, no crash is shown, but the first stream to hit EOS never stops.

Comment 3 Lim Siew Hoon 2017-03-24 07:51:20 UTC

(In reply to Víctor Manuel Jáquez Leal from comment #2)
> Hi Lim,
> 
> Thanks for filling this bug.
> 
> What do you mean with "get killed"?
> 
> I tried this pipeline in weston:
> 
> gst-launch-1.0 uridecodebin uri= file:///home/vjaquez/patterns/300\ -\ Rise\
> of\ an\ Empire\ -\ Trailer\ 2.mp4 ! vaapisink uridecodebin uri=
> file:///home/vjaquez/patterns/big_buck_bunny_1080p_h264.mov ! vaapisink
> 
> And got a crash. Is that what you mean?
> 
> If I run it under gdb, no crash is shown, but the first stream to hit EOS
> never stops.


Sorry. I need to rephrase on my sentence. What I mean the video still stuck on the desktop. It like video output surface didn't get clean up. Command pipeline still not exit, ever I keep press the space button. I can see the process id in "ps-aux | grep gst-launch-1.0".

When crashed happen, the video output will be stuck on the screen forever. If didn't crash, I press ctrl-c after video stream finish the video output will disappear. But I can't confirm the gst_vaapi_surface_destroy got call or not if I did that. I need to dump some print message to check on it. 

Right behaviour should be same as waylandsink after videos finish playback, it should be exit.

Any information that miss out you need, i will provide. Glab you able to reproduce the issue on your side.

Comment 4 Lim Siew Hoon 2017-03-24 07:52:34 UTC

Please let me know what are information you needed.

Comment 5 Hyunjun Ko 2017-03-27 06:45:57 UTC

Seems that this issue is similar to bug #772455.
Blocked on poll in one of streams.

Comment 6 Hyunjun Ko 2017-03-27 06:56:52 UTC

(In reply to Hyunjun Ko from comment #5)
> Seems that this issue is similar to bug #772455.
> Blocked on poll in one of streams.
Since multiple vaapisink in one process share same wayland display, this issue happens.

Comment 7 Hyunjun Ko 2017-04-06 08:15:16 UTC

Created attachment 349344 [details] [review]
libs: window: wayland: refactor event-handling

Seperates handling-event loop to new thread, which includes:
 - Using mutex to protect each frame information.
 - Dropping frame if there's pending frame.
 - Fixing counting pending frames.

Also, to avoid leakage, it sends null buffer while destruction.
Otherwise, usually the last buffer's callback is not called,
which leads to leak of GstVaapiDisplay. This fixes #774029.

Comment 8 Hyunjun Ko 2017-04-06 08:18:16 UTC

@Lim, Could you test with this patch?
I confirmed this fixes example pipeline above, but still needs to verify your real problem.

@Victor, do you agree with this approach? including sending null buffer when destruction.

Comment 9 Víctor Manuel Jáquez Leal 2017-04-07 11:52:37 UTC

Review of attachment 349344 [details] [review]:

just a nitpick :)

::: gst-libs/gst/vaapi/gstvaapiwindow_wayland.c
@@ +520,3 @@
+  /* Just drop if there's pending frames */
+  g_mutex_lock (&priv->frame_lock);
+  if (g_atomic_int_get (&priv->num_frames_pending) > 0) {

g_atomic_* is, well, atomic. In this case, since it is a single instruction, there's no need to guard it behind a lock

Comment 10 Víctor Manuel Jáquez Leal 2017-04-07 12:06:59 UTC

Review of attachment 349344 [details] [review]:

::: gst-libs/gst/vaapi/gstvaapiwindow_wayland.c
@@ +421,3 @@
+  if (!frame->done) {
+    g_atomic_pointer_compare_and_exchange (&priv->last_frame, frame, NULL);
+    g_atomic_int_dec_and_test (&priv->num_frames_pending);

what about create an inlined function that does this 

 g_atomic_pointer_compare_and_exchange (&priv->last_frame, frame, NULL);
 g_atomic_int_dec_and_test (&priv->num_frames_pending);

so we could reuse it in frame_done_callback()

Comment 11 Víctor Manuel Jáquez Leal 2017-04-07 12:08:24 UTC

(In reply to Hyunjun Ko from comment #7)
> Created attachment 349344 [details] [review] [review]
> libs: window: wayland: refactor event-handling
> 
> Seperates handling-event loop to new thread, which includes:
>  - Using mutex to protect each frame information.
>  - Dropping frame if there's pending frame.
>  - Fixing counting pending frames.
> 
> Also, to avoid leakage, it sends null buffer while destruction.
> Otherwise, usually the last buffer's callback is not called,
> which leads to leak of GstVaapiDisplay. This fixes #774029.

As I already mentioned you, I would like a more verbose commit log, explaining why this change needed, and detailing the solution and if it was inspired from the waylandimagesink.

Comment 12 Hyunjun Ko 2017-04-17 08:02:44 UTC

Created attachment 349933 [details] [review]
libs: window: wayland: refactor event-handling

Seperates handling-event loop to new thread, which includes:
 - Using mutex to protect each frame information.
 - Dropping frame if there's pending frame.
 - Fixing counting pending frames.

Problem description as the following:
1\ Racy condition in case that same wl_display/fd is being used in different threads.
In this case, if a thread swallows other's events, another thread could be
stuck on poll forever. The problem usually happens during destruction of
GstVaapiWindow in case that multiple piplelines running on a process.
At this time, it calls gst_vaapi_window_wayland_sync sequentially when destruction
from one vaapisink to another vaapisink, which leads that the first
gst_vaapi_window_wayland_sync is likely to swallow other's events.
If it happens, another vaapisink got stuck and can't be finished.

2\ Wrong counting pending frames and leads to leakage.
This is another main problem on GstVaapiWindowWayland, which is described on #774029.
It needs to send null buffer while destruction, so that it assures all wl buffers are 
released. Otherwise, usually the last buffer's callback is not called,
which leads to leak of GstVaapiDisplay. This is motivated by gstwaylandsink.
Note that, when the last buffer's callback is called,
relevant frame's callback is usually not called.

Comment 13 Víctor Manuel Jáquez Leal 2017-04-17 11:20:14 UTC

Review of attachment 349933 [details] [review]:

::: gst-libs/gst/vaapi/gstvaapiwindow_wayland.c
@@ +320,3 @@
+      gst_vaapi_window_wayland_thread_run, window, &err);
+  if (err) {
+    GST_WARNING ("Failed to start wayland thread: %s", err->message);

here's a memleak :)

g_error_free (err);

@@ +456,3 @@
 
+  /* Just drop if there's pending frames */
+  if (g_atomic_int_get (&priv->num_frames_pending) > 0)

IMO, we should send, at least, a log warning here. But what we really should do is to send a QoS event, if it is handled by the pipeline.

Comment 14 Víctor Manuel Jáquez Leal 2017-04-17 13:12:24 UTC

I'm testing the patch but I got this crash when finishing a video:

GLib (gthread-posix.c): Unexpected error from C library during 'pthread_mutex_lock': Invalid argument.  Aborting.
[Thread 0x7fffa6613700 (LWP 20844) exited]

Thread 1 "gst-play-1.0" received signal SIGABRT, Aborted.
__GI_raise (sig=sig@entry=6) at ../sysdeps/unix/sysv/linux/raise.c:58
58      ../sysdeps/unix/sysv/linux/raise.c: No such file or directory.
(gdb) bt

+ Trace 237370

#0 __GI_raise
at ../sysdeps/unix/sysv/linux/raise.c line 58
#1 __GI_abort
at abort.c line 89
#2 g_thread_abort
at gthread-posix.c line 78
#3 g_mutex_lock
at gthread-posix.c line 215
#4 frame_release_callback
at gstvaapiwindow_wayland.c line 429
#5 ffi_call_unix64
from /usr/lib/x86_64-linux-gnu/libffi.so.6
#6 ffi_call
from /usr/lib/x86_64-linux-gnu/libffi.so.6
#7 wl_closure_invoke
at src/connection.c line 935
#8 dispatch_event
at src/wayland-client.c line 1310
#9 dispatch_queue
at src/wayland-client.c line 1456
#10 wl_display_dispatch_queue_pending
at src/wayland-client.c line 1698
#11 wl_display_dispatch_queue
at src/wayland-client.c line 1674
#12 wl_display_roundtrip_queue
at src/wayland-client.c line 1121
#13 gst_vaapi_window_wayland_destroy
at gstvaapiwindow_wayland.c line 350
#14 gst_vaapi_object_finalize
at gstvaapiobject.c line 50
#15 gst_vaapi_mini_object_free
at gstvaapiminiobject.c line 39
#16 gst_vaapi_mini_object_unref_internal
at ./gstvaapiminiobject.h line 203
#17 gst_vaapi_mini_object_replace
at gstvaapiminiobject.c line 173
#18 gst_vaapi_object_replace_internal
at ./gstvaapiobject_priv.h line 215
#19 gst_vaapi_window_replace
at gstvaapiwindow.c line 291
#20 gst_vaapisink_stop
at gstvaapisink.c line 1240
#21 gst_base_sink_change_state
at gstbasesink.c line 5242
#22 gst_element_change_state
at gstelement.c line 2743
#23 gst_element_set_state_func
at gstelement.c line 2697

Comment 15 Hyunjun Ko 2017-04-18 02:41:16 UTC

(In reply to Víctor Manuel Jáquez Leal from comment #13)
> Review of attachment 349933 [details] [review] [review]:
> 
> ::: gst-libs/gst/vaapi/gstvaapiwindow_wayland.c
> @@ +456,3 @@
>  
> +  /* Just drop if there's pending frames */
> +  if (g_atomic_int_get (&priv->num_frames_pending) > 0)
> 
> IMO, we should send, at least, a log warning here. But what we really should
> do is to send a QoS event, if it is handled by the pipeline.

I'm not sure about the event.
A case that this happens is when wayland/weston is locked, for example.
When user unlock the screen, video can be displayed on the right time.
I don't think this is the case which should be handled by qos event.

Comment 16 Hyunjun Ko 2017-04-18 02:59:22 UTC

(In reply to Víctor Manuel Jáquez Leal from comment #14)
> I'm testing the patch but I got this crash when finishing a video:
> 
> GLib (gthread-posix.c): Unexpected error from C library during
> 'pthread_mutex_lock': Invalid argument.  Aborting.
> [Thread 0x7fffa6613700 (LWP 20844) exited]
> 
> Thread 1 "gst-play-1.0" received signal SIGABRT, Aborted.
> __GI_raise (sig=sig@entry=6) at ../sysdeps/unix/sysv/linux/raise.c:58
> 58      ../sysdeps/unix/sysv/linux/raise.c: No such file or directory.
> (gdb) bt

Oops, I haven't seen this when I'm testing, thanks for finding.
BTW, did you test on weston?

Comment 17 Hyunjun Ko 2017-04-18 03:00:07 UTC

Created attachment 349965 [details] [review]
libs: window: wayland: refactor event-handling

Seperates handling-event loop to new thread, which includes:
 - Using mutex to protect each frame information.
 - Dropping frame if there's pending frame.
 - Fixing counter of pending frames.

Problem description as the following:
1\ Racy condition in case that same wl_display/fd is being used in different threads.
In this case, if a thread swallows other's events, another thread could be
stuck on poll forever. The problem usually happens during destruction of
GstVaapiWindow in case that multiple piplelines running on a process.
At this time, it calls gst_vaapi_window_wayland_sync sequentially when destruction
from one vaapisink to another vaapisink, which leads that the first
gst_vaapi_window_wayland_sync is likely to swallow other's events.
If it happens, another vaapisink got stuck and can't be finished.

2\ Wrong counter of pending frames and buffer leakage.
This is another main problem on GstVaapiWindowWayland, which is described on #774029.
It needs to send null buffer while destruction, so that it assures all wl buffers are 
released. Otherwise, usually the last buffer's callback is not called,
which leads to leak of GstVaapiDisplay. This is motivated by gstwaylandsink.
Note that, when the last buffer's callback is called,
relevant frame's callback is usually not called.

https://bugzilla.gnome.org/show_bug.cgi?id=772455
https://bugzilla.gnome.org/show_bug.cgi?id=774029

Comment 18 Hyunjun Ko 2017-04-19 02:14:10 UTC

Created attachment 350036 [details] [review]
libs: window: wayland: refactor event-handling

Seperates handling-event loop to new thread, which includes:
 - Using mutex to protect each frame information.
 - Dropping frame if there's pending frame.
 - Fixing counter of pending frames.

Problem description:
There is racy condition in case that same wl_display/fd is being used in different threads.
In this case, if a thread swallows other's events, another thread could be
stuck on poll forever. The problem usually happens during destruction of
GstVaapiWindow in case that multiple piplelines running on a process.
At this time, it calls gst_vaapi_window_wayland_sync sequentially when destruction
from one vaapisink to another vaapisink, which leads that the first
gst_vaapi_window_wayland_sync is likely to swallow other's events.
If it happens, another vaapisink got stuck and can't be finished.

https://bugzilla.gnome.org/show_bug.cgi?id=772455

Comment 19 Hyunjun Ko 2017-04-19 02:14:45 UTC

Created attachment 350037 [details] [review]
libs: window: wayland: sending null buffer when  destruction

Fix leakage of the last wl buffer.

This is another main problem on GstVaapiWindowWayland, which is described on #774029.
It needs to send null buffer while destruction, so that it assures all wl buffers are 
released. Otherwise, usually the last buffer's callback is not called,
which leads to leak of GstVaapiDisplay. This is motivated by gstwaylandsink.

Comment 20 Víctor Manuel Jáquez Leal 2017-04-21 14:16:10 UTC

@Hyunjun

I'm not a big fan of threads. IMO we should avoid them as much as possible. And I think the solution for these issues can be done without a thread (though, threads will be required when we handle foreign windows - bug 705821 -).

Thus, I have sketched a solution, based on your work, removing the thread, and it seems to work. You can pull the branch from my repo:

https://github.com/ceyusa/gstreamer-vaapi/commits/780442

commit 896a836f reopens and fixes bug 773689 by pushing a modified version of the patch proposed in that bug
commit ca314a25 fixes bug 774029
commit 33282bb0 fixes this bug 780442

And two patches more to enhance a couple thinks that your patches here proposed.

Please try them and let me know how they go.

Comment 21 Víctor Manuel Jáquez Leal 2017-04-21 14:21:12 UTC

Also, I guess we could backport these patches to 1.10 easily.

Comment 22 Hyunjun Ko 2017-04-24 02:31:43 UTC

(In reply to Víctor Manuel Jáquez Leal from comment #20)
> @Hyunjun
> 
> I'm not a big fan of threads. IMO we should avoid them as much as possible.
> And I think the solution for these issues can be done without a thread
> (though, threads will be required when we handle foreign windows - bug
> 705821 -).
> 
> Thus, I have sketched a solution, based on your work, removing the thread,
> and it seems to work. You can pull the branch from my repo:
> 
> https://github.com/ceyusa/gstreamer-vaapi/commits/780442
> 
> commit 896a836f reopens and fixes bug 773689 by pushing a modified version
> of the patch proposed in that bug
> commit ca314a25 fixes bug 774029
> commit 33282bb0 fixes this bug 780442
> 
> And two patches more to enhance a couple thinks that your patches here
> proposed.
> 
> Please try them and let me know how they go.

I've tried and confirmed it's working fine, but there are 2 things to mention.
what do you think about adding to remove those codes below in the 4th commit?

if (priv->last_frame) {
  frame_state_free (priv->last_frame);
  priv->last_frame = NULL;
}

It causes segfault(only after 4th commit). Even it's fixed by the 5th commit, they're not necessary IMO.

Another thing is about skipping frame when there is pending frame.
I think it's neccessary, but not critical.

BTW, I'm not a fan of thread either :)

Comment 23 Víctor Manuel Jáquez Leal 2017-04-24 11:19:16 UTC

(In reply to Hyunjun Ko from comment #22)
> I've tried and confirmed it's working fine, but there are 2 things to
> mention.
> what do you think about adding to remove those codes below in the 4th commit?
> 
> if (priv->last_frame) {
>   frame_state_free (priv->last_frame);
>   priv->last_frame = NULL;
> }
> 
> It causes segfault(only after 4th commit). Even it's fixed by the 5th
> commit, they're not necessary IMO.

Good catch! I'll do that.

> 
> Another thing is about skipping frame when there is pending frame.
> I think it's neccessary, but not critical.

I don't think it is necessary, since we wait for each frame to be released.

> 
> BTW, I'm not a fan of thread either :)

:D

I'm going to push the branch with your modification and close this bug. Any other issue, please reopen it.

Comment 24 Víctor Manuel Jáquez Leal 2017-04-24 11:22:33 UTC

Review of attachment 350036 [details] [review]:

this patch was reworked and pushed differently.

Comment 25 Víctor Manuel Jáquez Leal 2017-04-24 11:23:07 UTC

Review of attachment 350037 [details] [review]:

this patch was reworked and pushed differently

Comment 26 Víctor Manuel Jáquez Leal 2017-04-24 11:35:51 UTC

commit 824974e657445401d9d7b56c556c0a1a9f4c22c7
Author: Hyunjun Ko <zzoon@igalia.com>
Date:   Fri Apr 21 15:30:09 2017 +0200

    libs: window: wayland: mark frames as done

    When the frame listener callbacks 'done', the number of pending
    frames are decreased. Nonetheless, there might be occasions where
    the buffer listener callbacks 'release', without calling previously
    frame's 'done'. This leads to problem with
    gst_vaapi_window_wayland_sync() operation.

    This patch marks as done those frames which were callbacked, but if
    the buffer callbacks 'release' and associated frame is not marked
    as 'done' it is so, thus the number of pending frames keeps correct.

    https://bugzilla.gnome.org/show_bug.cgi?id=780442

    Signed-off-by: Víctor Manuel Jáquez Leal <vjaquez@igalia.com>


commit 3b314ba93e61bad3d8decbe45bcb12717b595a55
Author: Víctor Manuel Jáquez Leal <vjaquez@igalia.com>
Date:   Fri Apr 21 14:07:44 2017 +0200

    libs: window: wayland: don't sync at destroy()

    Don't call gst_vaapi_window_wayland_sync() when destroying the
    wayland window instance, since it might lead to a lock at
    gst_poll_wait() when more than one instances of vaapisink are
    rendering in the same pipeline, this is because they share the
    same window.

    Since now all the frames are freed we don't need to freed the
    private last_frame, since its address is invalid now.

    https://bugzilla.gnome.org/show_bug.cgi?id=780442

    Signed-off-by: Hyunjun Ko <zzoon@igalia.com>


commit 9c3a4edf05ea325934c688622c8b1a1597620d0e
Author: Víctor Manuel Jáquez Leal <vjaquez@igalia.com>
Date:   Thu Apr 20 20:30:52 2017 +0200

    libs: window: wayland: cancel read at poll message

    Always call wl_display_cancel_read() when an errno is set, but
    different to EAGAIN or EINTR.

    https://bugzilla.gnome.org/show_bug.cgi?id=780442