GNOME Bugzilla – Bug 761586
systemclock: regression causing audiomixer to stall
Last modified: 2018-03-27 08:20:03 UTC
commit f0e94b4cdf70955dc2b002148b1636f0dc18f459 for https://bugzilla.gnome.org/show_bug.cgi?id=760757 caused a regression and is making audiomixer tests timeout. It seems like some clock entry ends up waiting forever even after trying to be unscheduled.
What seems to be the relevant and last log lines for the clock: 0:00:00.383852455 16039 0x14ae450 DEBUG GST_CLOCK gstclock.c:1056:gst_clock_get_internal_time:<GstAudioSrcClock> internal time 0:00:00.253696144 0:00:00.383865314 16039 0x14ae450 DEBUG GST_CLOCK gstclock.c:1101:gst_clock_get_time:<GstAudioSrcClock> adjusted time 0:00:00.253696144 0:00:00.383878472 16039 0x14ae450 DEBUG GST_CLOCK gstclock.c:253:gst_clock_entry_new:<GstAudioSrcClock> created entry 0x7f8e100042b0, time 0:00:00.273061224 0:00:00.383895595 16039 0x14ae450 DEBUG GST_CLOCK gstclock.c:542:gst_clock_id_wait:<GstAudioSrcClock> waiting on clock entry 0x7f8e100042b0 0:00:00.383912717 16039 0x14ae450 DEBUG audioclock gstaudioclock.c:172:gst_audio_clock_get_internal_time:<GstAudioSrcClock> result 0:00:00.253786847, last_time 0:00:00.253786847 0:00:00.383929110 16039 0x14ae450 DEBUG GST_CLOCK gstclock.c:1056:gst_clock_get_internal_time:<GstAudioSrcClock> internal time 0:00:00.253786847 0:00:00.383942963 16039 0x14ae450 DEBUG GST_CLOCK gstclock.c:1101:gst_clock_get_time:<GstAudioSrcClock> adjusted time 0:00:00.253786847 0:00:00.383955129 16039 0x14ae450 DEBUG GST_CLOCK gstsystemclock.c:707:gst_system_clock_id_wait_jitter_unlocked: entry 0x7f8e100042b0 time 0:00:00.273061224 now 0:00:00.253786847 diff (time-now) 19274377 0:00:00.383980898 16039 0x14ae450 DEBUG GST_CLOCK gstsystemclock.c:735:gst_system_clock_id_wait_jitter_unlocked: entry 0x7f8e100042b0 unlocked, status 3, ret 1 0:00:00.646847957 16039 0x14ae320 DEBUG audioclock gstaudioclock.c:172:gst_audio_clock_get_internal_time:<GstAudioSrcClock> result 0:00:00.507188208, last_time 0:00:00.507188208 0:00:00.646899899 16039 0x14ae320 DEBUG GST_CLOCK gstclock.c:1056:gst_clock_get_internal_time:<GstAudioSrcClock> internal time 0:00:00.507188208 0:00:00.646916823 16039 0x14ae320 DEBUG GST_CLOCK gstclock.c:1101:gst_clock_get_time:<GstAudioSrcClock> adjusted time 0:00:00.507188208 0:00:00.646980370 16039 0x14ae320 DEBUG GST_CLOCK gstsystemclock.c:1000:gst_system_clock_id_unschedule: unscheduling entry 0x7f8e100042b0 0:00:00.887998162 16039 0x14ae320 DEBUG GST_CLOCK gstsystemclock.c:1000:gst_system_clock_id_unschedule: unscheduling entry 0x7f8e100042b0
I also notice the gst/gstsystemclock test in core timing out often these days, in the new test.
This seems to be the root cause for the issue later, the wakeup_count ends up being messed: 0:00:01.040926341 26812 0x1cc0850 DEBUG GST_CLOCK gstclock.c:253:gst_clock_entry_new:<GstAudioSrcClock> created entry 0x7f86c0002e70, time 0:00:00.779297051 0:00:01.040943389 26812 0x1cc0850 DEBUG GST_CLOCK gstclock.c:542:gst_clock_id_wait:<GstAudioSrcClock> waiting on clock entry 0x7f86c0002e70 0:00:01.040957159 26812 0x1cc0850 DEBUG audioclock gstaudioclock.c:172:gst_audio_clock_get_internal_time:<GstAudioSrcClock> result 0:00:00.780136054, last_time 0:00:00.780136054 0:00:01.040954297 26812 0x1cc0720 DEBUG GST_CLOCK gstsystemclock.c:1019:gst_system_clock_id_unschedule: unscheduling entry 0x7f86c0002e70 0:00:01.040971786 26812 0x1cc0850 DEBUG GST_CLOCK gstclock.c:1056:gst_clock_get_internal_time:<GstAudioSrcClock> internal time 0:00:00.780136054 0:00:01.040997865 26812 0x1cc0720 DEBUG GST_CLOCK gstsystemclock.c:1028:gst_system_clock_id_unschedule: unscheduled entry 0x7f86c0002e70 2 0:00:01.041014502 26812 0x1cc0850 DEBUG GST_CLOCK gstclock.c:1101:gst_clock_get_time:<GstAudioSrcClock> adjusted time 0:00:00.780136054 0:00:01.041025315 26812 0x1cc0720 DEBUG GST_CLOCK gstsystemclock.c:1035:gst_system_clock_id_unschedule: entry was BUSY, doing wakeup 0:00:01.041030800 26812 0x1cc0850 DEBUG GST_CLOCK gstsystemclock.c:710:gst_system_clock_id_wait_jitter_unlocked: entry 0x7f86c0002e70 time 0:00:00.779297051 now 0:00:00.780136054 diff (time-now 0:00:01.041042960 26812 0x1cc0720 DEBUG GST_CLOCK gstsystemclock.c:394:gst_system_clock_add_wakeup: writing control 0:00:01.041060645 26812 0x1cc0850 DEBUG GST_CLOCK gstclock.c:552:gst_clock_id_wait:<GstAudioSrcClock> done waiting entry 0x7f86c0002e70, res: 2 (unscheduled) 0:00:01.041079436 26812 0x1cc0720 DEBUG GST_CLOCK gstsystemclock.c:410:gst_system_clock_add_wakeup: wakeup count 1 0:00:01.041136890 26812 0x1cc0850 DEBUG GST_CLOCK gstclock.c:356:_gst_clock_id_free: freed entry 0x7f86c0002e70 T1 and T2 are the 2 threads 1) T1: The entry is created and a wait for it is done. 2) T2: Tries to unschedule the entry 3) T2: unschedule receives a BUSY return 4) T1: wait finishes and it is too late 5) T2: writes at the poll and increases wakeup count 6) T1: entry is done. and freed. The wakeup count is never decremented again and subsequent uses will stall
Created attachment 320511 [details] [review] systemclock: handle unschedule of late entries If the clockentry is too late and is unscheduled before it gets a change to detect its lateness the wakeup count and the poll are used but never properly cleaned up. This leaves it in a dirty state that is going to mess with the next clock entry waiting requests.
Comment on attachment 320511 [details] [review] systemclock: handle unschedule of late entries Nice work tracking that down. Mostly commenting on the code style of the refactoring, not the fix itself. I find the way you have factored out gst_system_clock_check_and_cleanup_unscheduled() a bit confusing, because the 'check' part (status argument and boolean return value) is only used in one case, in the other two cases you ignore the return value and check status outside of the function. I think in the first chunk you should keep the status checking there and just turn the function into a cleanup function without passing status or returning a return value, unless I missed something.
Created attachment 320513 [details] [review] systemclock: handle unschedule of late entries Indeed, the many refactorings I did after trying different approaches led it to that state. Thanks for spotting it, this version should be cleaner
Patch makes the audiomixer test pass reliably for me, systemclock test still passes reliably.
Created attachment 320783 [details] [review] The previously proposed patch improves the situation, but is not complete. This patch fixes a possible race on top of that patch and adds a couple of tests that triggers the issues fixed by both the previous and this patch. They should probably be squashed before applied? --- systemclock: Fix wait/unschedule race Fixes a race where an entry is set to BUSY in gst_system_clock_id_wait_jitter() and is UNSCHEDULED before gst_system_clock_id_wait_jitter_unlocked() starts processing it. The wakeup added by gst_system_clock_id_unschedule() must be cleaned up. Two stress tests are added. One test that triggers the specific issue described above. The second stresses the code path where a wait is rescheduled because the poll returned early.
Thanks for spotting the missing piece of the patch and, specially, for the tests! commit 2b0ff13d070f370286078c5531b2e13006e5ce98 Author: Stian Selnes <stian@pexip.com> Date: Wed Feb 10 09:09:29 2016 +0100 systemclock: Fix wait/unschedule race Fixes a race where an entry is set to BUSY in gst_system_clock_id_wait_jitter() and is UNSCHEDULED before gst_system_clock_id_wait_jitter_unlocked() starts processing it. The wakeup added by gst_system_clock_id_unschedule() must be cleaned up. Two stress tests are added. One test that triggers the specific issue described above. The second stresses the code path where a wait is rescheduled because the poll returned early. https://bugzilla.gnome.org/show_bug.cgi?id=761586 commit 77a98d98cbed15e0bfdd8fe982b7ac83b1826d13 Author: Thiago Santos <thiagoss@osg.samsung.com> Date: Fri Feb 5 15:34:47 2016 -0300 systemclock: handle unschedule of late entries If the clockentry is too late and is unscheduled before it gets a change to detect its lateness the wakeup count and the poll are used but never properly cleaned up. This leaves it in a dirty state that is going to mess with the next clock entry waiting requests. https://bugzilla.gnome.org/show_bug.cgi?id=761586
@ Thiago Sousa Santos Hello, Currently we are using gstreamer version 1.6.3 on imx6x platform. We came across same issue. Applied above two patches on 1.6.3. But still issue is reproducing. Please check logs attached. gst_clock_id_wait() will never return. One thread is waiting on the clock and another thread is disposing the clock. Please suggest if there is any wrong handling done from application side. Below is the issue logs section : 0:00:11.336737668 [336m 755[00m 0x72e022f0 [37mDEBUG [00m [00m basesink gstbasesink.c:2539:gst_base_sink_do_sync:<a_fakesink>[00m possibly waiting for clock to reach 0:00:10.727573696, adjusted 0:00:10.727573696 0:00:11.336880335 [336m 755[00m 0x72e022f0 [37mDEBUG [00m [00;01;33m GST_CLOCK gstclock.c:542:gst_clock_id_wait:<GstAudioSinkClock>[00m waiting on clock entry 0x72505ad8 0:00:11.336975668 [336m 755[00m 0x72e022f0 [37mDEBUG [00m [00;01;33m GST_CLOCK gstsystemclock.c:857:gst_system_clock_id_wait_jitter:[00m gst_system_clock_id_wait_jitter 0:00:11.337070335 [336m 755[00m 0x72e022f0 [37mDEBUG [00m [00;01;33m GST_CLOCK gstclock.c:1019:gst_clock_get_internal_time:<GstAudioSinkClock>[00m internal time 0:00:10.720000000 0:00:11.337216335 [336m 755[00m 0x72e022f0 [37mDEBUG [00m [00;01;33m GST_CLOCK gstclock.c:1064:gst_clock_get_time:<GstAudioSinkClock>[00m adjusted time 0:00:10.720000000 0:00:11.337325001 [336m 755[00m 0x72e022f0 [37mDEBUG [00m [00;01;33m GST_CLOCK gstsystemclock.c:703:gst_system_clock_id_wait_jitter_unlocked:[00m entry heere asd0x72505ad8 time 0:00:10.727573696 now 0:00:10.720000000 diff (time-now) 7573696 0:00:11.344644001 [336m 755[00m 0x724fdc60 [37mDEBUG [00m [00;01;33m GST_CLOCK gstsystemclock.c:383:gst_system_clock_add_wakeup:[00m writing control 0:00:11.344801668 [336m 755[00m 0x72e022f0 [37mDEBUG [00m [00;01;33m GST_CLOCK gstsystemclock.c:731:gst_system_clock_id_wait_jitter_unlocked:[00m entry 0x72505ad8 unlocked, status 3, ret 1 0:00:11.344959668 [336m 755[00m 0x724fdc60 [37mDEBUG [00m [00;01;33m GST_CLOCK gstsystemclock.c:399:gst_system_clock_add_wakeup:[00m wakeup count 1 0:00:11.345121335 [336m 755[00m 0x724fdc60 [37mDEBUG [00m [00;01;33m GST_CLOCK gstsystemclock.c:231:gst_system_clock_dispose:[00m joined thread
Created attachment 370175 [details] gst_system_clock_id_wait_jitter not returning On top of gstreamer 1.6.3, applied changes from commit 77a98d98cbed15e0bfdd8fe982b7ac83b1826d13 and 2b0ff13d070f370286078c5531b2e13006e5ce98