GNOME Bugzilla – Bug 752421
matroskademux: SRT subtitles with markup are displayed with a trailing asterisk
Last modified: 2015-08-16 13:38:39 UTC
Created attachment 307477 [details] testcase (must be merged into a Matroska file) The issue can be seen with the attached example, when merged into a Matroska file and played with totem or gst-launch. The asterisk is added by the pango element because a call to g_utf8_validate() fails. This happens because the subtitle buffer size also counts the null terminator. I have prepared a patch that seems to address the issue based on the subparse element.
Created attachment 307478 [details] [review] possible fix based on subparse.c
Thanks, this looks right to me. The buffers we push out should not contain a trailing NUL character in the (visible) buffer data, it's just added for safety. commit 45892ec8becdc6b5d1d72a1b90b42f90b699cf7e Author: Dimitrios Christidis <dchristidis@mykolab.com> Date: Wed Jul 15 13:44:52 2015 +0300 matroskademux: fix trailing '*' displayed with some text subtitles The subtitle buffer we push out should not include a NUL terminator as part of the data, we just add such a terminator for safety, but it should not be included in the buffer size. A NUL terminator is not valid UTF-8, so checks will fail if it's included in the size, and the NUL will be replaced by the fallback character specified when converting, i.e. '*'. https://bugzilla.gnome.org/show_bug.cgi?id=752421
I think this needs more fixing, for the case where there is a 0-terminator in the data already (since we now adjusted it from map.size-1 to map.size everywhere in the conversion/check functions).
Created attachment 307571 [details] [review] always copy buffer and add terminator
Created attachment 307572 [details] [review] trim buffer only if it contains NUL terminator
Indeed, the first patch causes a regression (see bug #707933). There are two ways to address this. With the second patch, a copy is always made to ensure a NUL terminator is added but is not part of the data. With the third patch, the buffer is trimmed if it contains a terminator. Otherwise, no action is taken.
What's left here, do we still need these patch now that bug 707933 is fixed ?
Created attachment 307651 [details] [review] trim buffer only if it contains NUL terminator Removed modification of line 1.
With the first patch merged, the function will not work correctly on subtitle buffers with NUL terminators as part of the data. I do not have a test case for this, but according to bug #707933 it could happen. Thus, that bug would be reopened. I think the third patch is the better approach. Per our discussion on IRC, when shrinking a buffer, one should not rely on the trimmed data being there. Furthermore, no part of the codebase that handles subtitle buffers should depend on the existence of a NUL terminator. Finally, I think it looks cleaner, but this is subjective. The commit message certainly needs some work from a developer with more experience.
Third time lucky, hopefully: commit c1382e97fa48f299895dd641df75b4e2ca588f31 Author: Tim-Philipp Müller <tim@centricular.com> Date: Thu Jul 16 18:09:30 2015 +0100 tests: add minmal matroskademux test for subtitle output Some of the subtitle chunks will have embedded NUL-terminators (last three), some don't (first three), some will have markup, some won't, some will be valid UTF-8 (all but last), some won't (last stanza). https://bugzilla.gnome.org/show_bug.cgi?id=752421 commit 744167056cea37fd6b225416624ff84bbb130b8f Author: Dimitrios Christidis <dchristidis@mykolab.com> Date: Thu Jul 16 18:49:26 2015 +0300 matroskademux: fix for subtitle buffers with NUL terminators Commit 45892ec8 created a regression where g_utf8_validate() would fail if the subtitle buffer had a NUL terminator as part of the data. https://bugzilla.gnome.org/show_bug.cgi?id=752421 I agree that jumping through hoops to NUL-terminate the data just in case some other element is buggy is a bit over the top, so let's not do that and just trim off an existing terminator.