GNOME Bugzilla – Bug 780442
vaapi: wayland: multiple sinks blocks the pipeline
Last modified: 2017-04-24 11:35:51 UTC
Gstreamer framework: 1.8.3 Gstreamer vaapi: 1.8 branch OS: Yocto Linux with wayland weston compositor Wayland and weston version: 1.10.0 Able to reproduce the issue with vaapisink (1 video didn't get killed): gst-launch-1.0 filesrc location=/Videos/test.mp4 ! qtdemux ! vaapidecode ! vaapisink filesrc location=/Videos/test.mp4 ! qtdemux ! vaapisink Able to reproduce the issue with vaapisink (2 video didn't get killed): gst-launch-1.0 filesrc location=/Videos/test.mp4 ! qtdemux ! vaapidecode ! vaapisink filesrc location=/Videos/test.mp4 ! qtdemux ! vaapisink filesrc location=/Videos/test.mp4 ! qtdemux ! vaapisink Not able to reproduce issue if vaapisink replace to waylandsink. This issue does not happen in X11 if using vaapisink. Attached the output_log.txt running with command: GST_DEBUG=vaapi*:5,GST_REFCOUNTING:7 gst-launch-1.0 -v filesrc location=/home/root/HoneyBee.mp4 ! qtdemux ! vaapidecode ! vaapisink filesrc location=/home/root/HoneyBee.mp4 ! qtdemux ! vaapidecode ! vaapisink &> output_log.txt
Created attachment 348564 [details] output_log with GST_DEBUG=vaapi*:5,GST_REFCOUNTING:7
Hi Lim, Thanks for filling this bug. What do you mean with "get killed"? I tried this pipeline in weston: gst-launch-1.0 uridecodebin uri= file:///home/vjaquez/patterns/300\ -\ Rise\ of\ an\ Empire\ -\ Trailer\ 2.mp4 ! vaapisink uridecodebin uri= file:///home/vjaquez/patterns/big_buck_bunny_1080p_h264.mov ! vaapisink And got a crash. Is that what you mean? If I run it under gdb, no crash is shown, but the first stream to hit EOS never stops.
(In reply to Víctor Manuel Jáquez Leal from comment #2) > Hi Lim, > > Thanks for filling this bug. > > What do you mean with "get killed"? > > I tried this pipeline in weston: > > gst-launch-1.0 uridecodebin uri= file:///home/vjaquez/patterns/300\ -\ Rise\ > of\ an\ Empire\ -\ Trailer\ 2.mp4 ! vaapisink uridecodebin uri= > file:///home/vjaquez/patterns/big_buck_bunny_1080p_h264.mov ! vaapisink > > And got a crash. Is that what you mean? > > If I run it under gdb, no crash is shown, but the first stream to hit EOS > never stops. Sorry. I need to rephrase on my sentence. What I mean the video still stuck on the desktop. It like video output surface didn't get clean up. Command pipeline still not exit, ever I keep press the space button. I can see the process id in "ps-aux | grep gst-launch-1.0". When crashed happen, the video output will be stuck on the screen forever. If didn't crash, I press ctrl-c after video stream finish the video output will disappear. But I can't confirm the gst_vaapi_surface_destroy got call or not if I did that. I need to dump some print message to check on it. Right behaviour should be same as waylandsink after videos finish playback, it should be exit. Any information that miss out you need, i will provide. Glab you able to reproduce the issue on your side.
Please let me know what are information you needed.
Seems that this issue is similar to bug #772455. Blocked on poll in one of streams.
(In reply to Hyunjun Ko from comment #5) > Seems that this issue is similar to bug #772455. > Blocked on poll in one of streams. Since multiple vaapisink in one process share same wayland display, this issue happens.
Created attachment 349344 [details] [review] libs: window: wayland: refactor event-handling Seperates handling-event loop to new thread, which includes: - Using mutex to protect each frame information. - Dropping frame if there's pending frame. - Fixing counting pending frames. Also, to avoid leakage, it sends null buffer while destruction. Otherwise, usually the last buffer's callback is not called, which leads to leak of GstVaapiDisplay. This fixes #774029.
@Lim, Could you test with this patch? I confirmed this fixes example pipeline above, but still needs to verify your real problem. @Victor, do you agree with this approach? including sending null buffer when destruction.
Review of attachment 349344 [details] [review]: just a nitpick :) ::: gst-libs/gst/vaapi/gstvaapiwindow_wayland.c @@ +520,3 @@ + /* Just drop if there's pending frames */ + g_mutex_lock (&priv->frame_lock); + if (g_atomic_int_get (&priv->num_frames_pending) > 0) { g_atomic_* is, well, atomic. In this case, since it is a single instruction, there's no need to guard it behind a lock
Review of attachment 349344 [details] [review]: ::: gst-libs/gst/vaapi/gstvaapiwindow_wayland.c @@ +421,3 @@ + if (!frame->done) { + g_atomic_pointer_compare_and_exchange (&priv->last_frame, frame, NULL); + g_atomic_int_dec_and_test (&priv->num_frames_pending); what about create an inlined function that does this g_atomic_pointer_compare_and_exchange (&priv->last_frame, frame, NULL); g_atomic_int_dec_and_test (&priv->num_frames_pending); so we could reuse it in frame_done_callback()
(In reply to Hyunjun Ko from comment #7) > Created attachment 349344 [details] [review] [review] > libs: window: wayland: refactor event-handling > > Seperates handling-event loop to new thread, which includes: > - Using mutex to protect each frame information. > - Dropping frame if there's pending frame. > - Fixing counting pending frames. > > Also, to avoid leakage, it sends null buffer while destruction. > Otherwise, usually the last buffer's callback is not called, > which leads to leak of GstVaapiDisplay. This fixes #774029. As I already mentioned you, I would like a more verbose commit log, explaining why this change needed, and detailing the solution and if it was inspired from the waylandimagesink.
Created attachment 349933 [details] [review] libs: window: wayland: refactor event-handling Seperates handling-event loop to new thread, which includes: - Using mutex to protect each frame information. - Dropping frame if there's pending frame. - Fixing counting pending frames. Problem description as the following: 1\ Racy condition in case that same wl_display/fd is being used in different threads. In this case, if a thread swallows other's events, another thread could be stuck on poll forever. The problem usually happens during destruction of GstVaapiWindow in case that multiple piplelines running on a process. At this time, it calls gst_vaapi_window_wayland_sync sequentially when destruction from one vaapisink to another vaapisink, which leads that the first gst_vaapi_window_wayland_sync is likely to swallow other's events. If it happens, another vaapisink got stuck and can't be finished. 2\ Wrong counting pending frames and leads to leakage. This is another main problem on GstVaapiWindowWayland, which is described on #774029. It needs to send null buffer while destruction, so that it assures all wl buffers are released. Otherwise, usually the last buffer's callback is not called, which leads to leak of GstVaapiDisplay. This is motivated by gstwaylandsink. Note that, when the last buffer's callback is called, relevant frame's callback is usually not called.
Review of attachment 349933 [details] [review]: ::: gst-libs/gst/vaapi/gstvaapiwindow_wayland.c @@ +320,3 @@ + gst_vaapi_window_wayland_thread_run, window, &err); + if (err) { + GST_WARNING ("Failed to start wayland thread: %s", err->message); here's a memleak :) g_error_free (err); @@ +456,3 @@ + /* Just drop if there's pending frames */ + if (g_atomic_int_get (&priv->num_frames_pending) > 0) IMO, we should send, at least, a log warning here. But what we really should do is to send a QoS event, if it is handled by the pipeline.
I'm testing the patch but I got this crash when finishing a video: GLib (gthread-posix.c): Unexpected error from C library during 'pthread_mutex_lock': Invalid argument. Aborting. [Thread 0x7fffa6613700 (LWP 20844) exited] Thread 1 "gst-play-1.0" received signal SIGABRT, Aborted. __GI_raise (sig=sig@entry=6) at ../sysdeps/unix/sysv/linux/raise.c:58 58 ../sysdeps/unix/sysv/linux/raise.c: No such file or directory. (gdb) bt
+ Trace 237370
(In reply to Víctor Manuel Jáquez Leal from comment #13) > Review of attachment 349933 [details] [review] [review]: > > ::: gst-libs/gst/vaapi/gstvaapiwindow_wayland.c > @@ +456,3 @@ > > + /* Just drop if there's pending frames */ > + if (g_atomic_int_get (&priv->num_frames_pending) > 0) > > IMO, we should send, at least, a log warning here. But what we really should > do is to send a QoS event, if it is handled by the pipeline. I'm not sure about the event. A case that this happens is when wayland/weston is locked, for example. When user unlock the screen, video can be displayed on the right time. I don't think this is the case which should be handled by qos event.
(In reply to Víctor Manuel Jáquez Leal from comment #14) > I'm testing the patch but I got this crash when finishing a video: > > GLib (gthread-posix.c): Unexpected error from C library during > 'pthread_mutex_lock': Invalid argument. Aborting. > [Thread 0x7fffa6613700 (LWP 20844) exited] > > Thread 1 "gst-play-1.0" received signal SIGABRT, Aborted. > __GI_raise (sig=sig@entry=6) at ../sysdeps/unix/sysv/linux/raise.c:58 > 58 ../sysdeps/unix/sysv/linux/raise.c: No such file or directory. > (gdb) bt Oops, I haven't seen this when I'm testing, thanks for finding. BTW, did you test on weston?
Created attachment 349965 [details] [review] libs: window: wayland: refactor event-handling Seperates handling-event loop to new thread, which includes: - Using mutex to protect each frame information. - Dropping frame if there's pending frame. - Fixing counter of pending frames. Problem description as the following: 1\ Racy condition in case that same wl_display/fd is being used in different threads. In this case, if a thread swallows other's events, another thread could be stuck on poll forever. The problem usually happens during destruction of GstVaapiWindow in case that multiple piplelines running on a process. At this time, it calls gst_vaapi_window_wayland_sync sequentially when destruction from one vaapisink to another vaapisink, which leads that the first gst_vaapi_window_wayland_sync is likely to swallow other's events. If it happens, another vaapisink got stuck and can't be finished. 2\ Wrong counter of pending frames and buffer leakage. This is another main problem on GstVaapiWindowWayland, which is described on #774029. It needs to send null buffer while destruction, so that it assures all wl buffers are released. Otherwise, usually the last buffer's callback is not called, which leads to leak of GstVaapiDisplay. This is motivated by gstwaylandsink. Note that, when the last buffer's callback is called, relevant frame's callback is usually not called. https://bugzilla.gnome.org/show_bug.cgi?id=772455 https://bugzilla.gnome.org/show_bug.cgi?id=774029
Created attachment 350036 [details] [review] libs: window: wayland: refactor event-handling Seperates handling-event loop to new thread, which includes: - Using mutex to protect each frame information. - Dropping frame if there's pending frame. - Fixing counter of pending frames. Problem description: There is racy condition in case that same wl_display/fd is being used in different threads. In this case, if a thread swallows other's events, another thread could be stuck on poll forever. The problem usually happens during destruction of GstVaapiWindow in case that multiple piplelines running on a process. At this time, it calls gst_vaapi_window_wayland_sync sequentially when destruction from one vaapisink to another vaapisink, which leads that the first gst_vaapi_window_wayland_sync is likely to swallow other's events. If it happens, another vaapisink got stuck and can't be finished. https://bugzilla.gnome.org/show_bug.cgi?id=772455
Created attachment 350037 [details] [review] libs: window: wayland: sending null buffer when destruction Fix leakage of the last wl buffer. This is another main problem on GstVaapiWindowWayland, which is described on #774029. It needs to send null buffer while destruction, so that it assures all wl buffers are released. Otherwise, usually the last buffer's callback is not called, which leads to leak of GstVaapiDisplay. This is motivated by gstwaylandsink.
@Hyunjun I'm not a big fan of threads. IMO we should avoid them as much as possible. And I think the solution for these issues can be done without a thread (though, threads will be required when we handle foreign windows - bug 705821 -). Thus, I have sketched a solution, based on your work, removing the thread, and it seems to work. You can pull the branch from my repo: https://github.com/ceyusa/gstreamer-vaapi/commits/780442 commit 896a836f reopens and fixes bug 773689 by pushing a modified version of the patch proposed in that bug commit ca314a25 fixes bug 774029 commit 33282bb0 fixes this bug 780442 And two patches more to enhance a couple thinks that your patches here proposed. Please try them and let me know how they go.
Also, I guess we could backport these patches to 1.10 easily.
(In reply to Víctor Manuel Jáquez Leal from comment #20) > @Hyunjun > > I'm not a big fan of threads. IMO we should avoid them as much as possible. > And I think the solution for these issues can be done without a thread > (though, threads will be required when we handle foreign windows - bug > 705821 -). > > Thus, I have sketched a solution, based on your work, removing the thread, > and it seems to work. You can pull the branch from my repo: > > https://github.com/ceyusa/gstreamer-vaapi/commits/780442 > > commit 896a836f reopens and fixes bug 773689 by pushing a modified version > of the patch proposed in that bug > commit ca314a25 fixes bug 774029 > commit 33282bb0 fixes this bug 780442 > > And two patches more to enhance a couple thinks that your patches here > proposed. > > Please try them and let me know how they go. I've tried and confirmed it's working fine, but there are 2 things to mention. what do you think about adding to remove those codes below in the 4th commit? if (priv->last_frame) { frame_state_free (priv->last_frame); priv->last_frame = NULL; } It causes segfault(only after 4th commit). Even it's fixed by the 5th commit, they're not necessary IMO. Another thing is about skipping frame when there is pending frame. I think it's neccessary, but not critical. BTW, I'm not a fan of thread either :)
(In reply to Hyunjun Ko from comment #22) > I've tried and confirmed it's working fine, but there are 2 things to > mention. > what do you think about adding to remove those codes below in the 4th commit? > > if (priv->last_frame) { > frame_state_free (priv->last_frame); > priv->last_frame = NULL; > } > > It causes segfault(only after 4th commit). Even it's fixed by the 5th > commit, they're not necessary IMO. Good catch! I'll do that. > > Another thing is about skipping frame when there is pending frame. > I think it's neccessary, but not critical. I don't think it is necessary, since we wait for each frame to be released. > > BTW, I'm not a fan of thread either :) :D I'm going to push the branch with your modification and close this bug. Any other issue, please reopen it.
Review of attachment 350036 [details] [review]: this patch was reworked and pushed differently.
Review of attachment 350037 [details] [review]: this patch was reworked and pushed differently
commit 824974e657445401d9d7b56c556c0a1a9f4c22c7 Author: Hyunjun Ko <zzoon@igalia.com> Date: Fri Apr 21 15:30:09 2017 +0200 libs: window: wayland: mark frames as done When the frame listener callbacks 'done', the number of pending frames are decreased. Nonetheless, there might be occasions where the buffer listener callbacks 'release', without calling previously frame's 'done'. This leads to problem with gst_vaapi_window_wayland_sync() operation. This patch marks as done those frames which were callbacked, but if the buffer callbacks 'release' and associated frame is not marked as 'done' it is so, thus the number of pending frames keeps correct. https://bugzilla.gnome.org/show_bug.cgi?id=780442 Signed-off-by: Víctor Manuel Jáquez Leal <vjaquez@igalia.com> commit 3b314ba93e61bad3d8decbe45bcb12717b595a55 Author: Víctor Manuel Jáquez Leal <vjaquez@igalia.com> Date: Fri Apr 21 14:07:44 2017 +0200 libs: window: wayland: don't sync at destroy() Don't call gst_vaapi_window_wayland_sync() when destroying the wayland window instance, since it might lead to a lock at gst_poll_wait() when more than one instances of vaapisink are rendering in the same pipeline, this is because they share the same window. Since now all the frames are freed we don't need to freed the private last_frame, since its address is invalid now. https://bugzilla.gnome.org/show_bug.cgi?id=780442 Signed-off-by: Hyunjun Ko <zzoon@igalia.com> commit 9c3a4edf05ea325934c688622c8b1a1597620d0e Author: Víctor Manuel Jáquez Leal <vjaquez@igalia.com> Date: Thu Apr 20 20:30:52 2017 +0200 libs: window: wayland: cancel read at poll message Always call wl_display_cancel_read() when an errno is set, but different to EAGAIN or EINTR. https://bugzilla.gnome.org/show_bug.cgi?id=780442