GNOME Bugzilla – Bug 790686
matroskamux: re-activate TOC support
Last modified: 2017-12-15 14:22:30 UTC
Created attachment 364153 [details] [review] [PATCH] matroskamux: re-activate TOC support TOC support in mastroskamux has been deactivated for a couple of years. This patch updates it to recent GstToc evolutions and introduces a unit test.
Did you confirm that the output is correct? IIRC it was disabled not only because it was not updated for the GstTOC API changes, but also because the output was broken.
These are the checks I did so far: - Run the unit test (which w) to build a toc - chapters and sub-chapters -, mux a testsrc to a file, demux it and compare the result with the original. I also checked the temporary file with `gst-discoverer --toc`. - Use GStreamer based [media-toc](https://github.com/fengalin/media-toc) to build tocs on real life files - including audio only - and check the results with media-toc, ffmpeg and VLC. Just a side note to share my observations with you: when I compared the resulting binary to media originating from other frameworks, I noticed a couple of potential improvements, but they don't break compatibility, so I decided not to touch theses in the patch: - The size field for the nested structures is first written to a default value with the potential for the largest data size (the representation is 8 bytes long), then updated when all the inner values are known. Whatever the actual size of the resulting buffer, the field is kept 8 bytes long even for structures that could use less. This is different from other files I inspected. - The chapters titles end with a `\0` (this is explicit in `gst_ebml_write_ascii`). While the specification is a little vague about this ("zero-padded when needed"), files originating from other frameworks dodn't use this. IMHO, it's not required as the value is in a TLV.
Ok that sounds good enough for now. Do you want to provide another patch on top that fixes the other parts above?
I'd rather do it as a separate enhancement as I don't know when I can do that and I already know this has consequences: - Unit test for `matroskademux` fails when writing titles (tags) without the '\0'. It's probably easy to fix, but I didn't check yet. - The size field part will require more buffer copies than current solution. As long as it only applies to small structures such as toc or tags, I guess it is acceptable, but I need to make sure it won't extend to stream segments.
Ok, then let's go with this one. I'll review later :)
Review of attachment 364153 [details] [review]: ::: gst/matroska/matroska-mux.c @@ +2658,3 @@ gst_ebml_write_master_start (ebml, GST_MATROSKA_ID_CHAPTERATOM); + + uid = gst_matroska_mux_create_uid (mux); We're not using the UIDs from the GstTocEntry but always create our own here. Why? We're also not even updating the entry's UID, which is what the previous code seemed to have done @@ +2933,3 @@ + toc_entries = gst_toc_get_entries (toc); + if (toc_entries != NULL) { + cur = g_list_first (toc_entries); "cur = g_list_first(toc_entries)" is the same as "cur = toc_entries" btw @@ +2964,3 @@ + gst_matroska_mux_write_chapter_edition (mux, cur->data, ebml, + &master_chapters); + } The to_write list seems to be never freed ::: tests/check/pipelines/toc.c @@ +183,3 @@ + + launch_str = g_strdup_printf ("filesrc location=%s ! %s name=demux ! " + "fakesink", file, demuxer); Thanks for also providing a test! But instead of a full pipeline based test, maybe just add something to the existing matroskademux / matroskamux tests that only checks the TOC functionality? E.g. for matroskamux configure the TOC, and check that the output is bit-equivalent with what you would expect to get for the TOC. For matroskademux, push some minimal Matroska beginning into the demuxer and check that you get the expected TOC out. This here is testing too many things at once and if something fails it's harder to debug
For the UID generation, I kept the behaviour of previous implementation which seemed reasonable to me: considering that the UIDs have to be unique within the container, I'm not sure it would be a good idea to rely on the user's input. The function gst_matroska_mux_create_uid generates a UID in the expected range and checks that it is not already used. The UIDs generation is not only used by TOC entries, but also by Pads AFAICT. If you agree with this approach, I can add a comment. The reason why I didn't set the entry's UID back is that there is no setter for UIDs on GstTocEntry. Do you want me to add it? You pointed to a bug in my implementation as gst_matroska_mux_write_toc_entry_tags uses gst_toc_entry_get_uid when writing tags. Either we add the setters, or I will pass the generated UID to gst_matroska_mux_write_toc_entry_tags. > "cur = g_list_first(toc_entries)" is the same as "cur = toc_entries" btw Yes, you're right, I'll rationalize this. > The to_write list seems to be never freed Yes, you're right, I though I did the job by unrefing toc_edition_entry, but that's not enough. I'll change the unit test. Can you give me your opinion about the UID setters? Should we add them, thus changing the user provided UID on GstTocEntry?
(In reply to fengalin from comment #7) > > The reason why I didn't set the entry's UID back is that there is no setter > for UIDs on GstTocEntry. The user is supposed to provide a unique UID when creating the TOC entry Maybe we could re-use those and only fall-back to generating our own if there are conflicts or the format of the UIDs is wrong? Add a FIXME comment for that, for now it's fine as-is I guess.
I was busy with other stuff recently, but I'm still progressing. Here is an update on my understanding. It would be helpful if someone could confirm or correct my statements and proposal. - My reference for the Matroska's Specifications comes from (1). - There are two UIDs that can be assigned while defining Chapters (2): * ChapterUID: a mandatory unsigned integer which uniquely refers to a chapter. Except for the title & language which use dedicated fields, this UID can also be used to add tags to the Chapter. See (3). The tags come in a separate section of the container as pictured in (4). * ChapterStringUID: an optional UTF-8 string which also uniquely refers to a chapter. The specifications links the definition of the "WebVTT cue identifier" (5) which "can be used to reference a specific cue, for example from script or CSS". My understanding is that the ChapterUID is some kind of pointer which internally links items in the container while ChapterStringUID can be used for external purposes. The GstToc and GstTocEntry "classes" provide a single UID as a string. This UID is user defined upon creation of the objects and can't be modified (no setter). My proposal is to use the GstTocEntry user provided UID for ChapterStringUID and to generate the ChapterUID internally in matrsoka-mux. In order to keep track of the ChapterUID for tags, a separate GstToc / GstTocEntry tree is maintained which replicates the user provided GstToc but with ChapterUIDs in place of the user provided UIDs (and also the title tags removed once they are assigned in the chapters section). --- (1) https://www.matroska.org/technical/index.html (2) https://www.matroska.org/technical/specs/chapters/index.html (3) https://www.matroska.org/technical/specs/tagging/index.html (4) https://www.matroska.org/technical/diagram/index.html (5) https://w3c.github.io/webvtt/#webvtt-cue-identifier
Yes, to all parts :)
Created attachment 364775 [details] [review] Patch rework further to the review and comments
Created attachment 365202 [details] [review] matroska-mux: use gst_harness in unit tests This is an update on previous patch to use gst_harness in unit tests. What do you think about these?
Review of attachment 364775 [details] [review]: Generally looks good to me, thanks! ::: gst/matroska/matroska-mux.c @@ +3355,2 @@ gst_ebml_replace_uint (ebml, mux->seekhead_pos + 144, mux->tags_pos - mux->segment_master); Should we write another thing here for the tocs? ::: tests/check/elements/matroskademux.c @@ +137,3 @@ GST_END_TEST; +/* Recusively compare 2 toc entries */ typo :)
Comment on attachment 365202 [details] [review] matroska-mux: use gst_harness in unit tests This is great and a good start, but you might want to watch https://gstconf.ubicast.tv/videos/moar-better-tests/ for some further ideas :) E.g. currently the test is checking a mode that is not even used by default by matroskamux (version=1 in the test, version=2 default in matroskamux).
Created attachment 365552 [details] [review] Rework unit tests after watching suggested talk Thanks for the link to the talk! The position for the TOC in the header is replaced in matroska-mux.c @ 3330 AFAIU. This position is used in the unit tests @ 881 and 893. Should it be incorrectly set, the TOC test would fail as the rest of the checks starts at this position.
Created attachment 365554 [details] [review] Rework unit tests after watching suggested talk (more cleanups) Sorry for the spam, I just found more things that could be improved in the unit tests. Lines in the comment from my previous message have shifted. It should now read: The position for the TOC in the header is replaced in matroska-mux.c @ 3330 AFAIU. This position is used in the unit tests @ 860 and 872. Should it be incorrectly set, the TOC test would fail as the rest of the checks starts from this position. Won't touch this no more, unless you find something else to improve.
Comment on attachment 365554 [details] [review] Rework unit tests after watching suggested talk (more cleanups) The new test looks good, thanks for updating. Please also mention shortly Håvard in your commit message as inspiration, many of the changes now look very similar to his :) Otherwise looks good to go
Created attachment 365577 [details] [review] Rework unit tests after watching suggested talk (add author's name in commit message) > The new test looks good, thanks for updating. Please also mention shortly > Håvard in your commit message as inspiration, many of the changes now look > very similar to his :) Otherwise looks good to go Yes indeed, that's why I included the link to the talk in the commit message ;). I'll add his name too.
Created attachment 365583 [details] [review] matroskamux: Fix various memory leaks in the unit test
After this there are a few more leaks but I don't have time to look into that now. Some GstTocs are leaking. Run with "make elements/matroskamux.valgrind" in tests/check to get this. Running suite(s): matroskamux ==27460== 2,992 (88 direct, 2,904 indirect) bytes in 1 blocks are definitely lost in loss record 1,793 of 1,806 ==27460== at 0x4C2ABEF: malloc (vg_replace_malloc.c:299) ==27460== by 0x58AB568: g_malloc (gmem.c:94) ==27460== by 0x58C3035: g_slice_alloc (gslice.c:1025) ==27460== by 0x58C34C8: g_slice_alloc0 (gslice.c:1051) ==27460== by 0x53787C3: gst_toc_new (gsttoc.c:143) ==27460== by 0x10A15F: new_reference_toc (matroskamux.c:408) ==27460== by 0x10B64F: test_toc (matroskamux.c:927) ==27460== by 0x10C7D7: test_toc_with_edition (matroskamux.c:983) ==27460== by 0x50C1A70: tcase_run_tfun_fork (check_run.c:465) ==27460== by 0x50C1A70: srunner_iterate_tcase_tfuns (check_run.c:237) ==27460== by 0x50C1A70: srunner_run_tcase (check_run.c:377) ==27460== by 0x50C1A70: srunner_iterate_suites (check_run.c:205) ==27460== by 0x50C1A70: srunner_run_tagged (check_run.c:740) ==27460== by 0x50B676C: gst_check_run_suite (gstcheck.c:1059) ==27460== by 0x109DC7: main (matroskamux.c:1018) ==27460== ==27460== 3,026 (88 direct, 2,938 indirect) bytes in 1 blocks are definitely lost in loss record 1,794 of 1,806 ==27460== at 0x4C2ABEF: malloc (vg_replace_malloc.c:299) ==27460== by 0x58AB568: g_malloc (gmem.c:94) ==27460== by 0x58C3035: g_slice_alloc (gslice.c:1025) ==27460== by 0x58C34C8: g_slice_alloc0 (gslice.c:1051) ==27460== by 0x53787C3: gst_toc_new (gsttoc.c:143) ==27460== by 0x91E00FB: gst_matroska_mux_start (matroska-mux.c:3002) ==27460== by 0x91E00FB: gst_matroska_mux_handle_buffer (matroska-mux.c:3871) ==27460== by 0x4E862FC: gst_collect_pads_default_collected (gstcollectpads.c:1561) ==27460== by 0x4E83427: gst_collect_pads_check_collected (gstcollectpads.c:1362) ==27460== by 0x4E84E26: gst_collect_pads_chain (gstcollectpads.c:2217) ==27460== by 0x5343182: gst_pad_chain_data_unchecked (gstpad.c:4270) ==27460== by 0x5343182: gst_pad_push_data (gstpad.c:4526) ==27460== by 0x534B6B2: gst_pad_push (gstpad.c:4645) ==27460== by 0x10B6A6: test_toc (matroskamux.c:938) ==27460== by 0x10C7D7: test_toc_with_edition (matroskamux.c:983) ==27460== by 0x50C1A70: tcase_run_tfun_fork (check_run.c:465) ==27460== by 0x50C1A70: srunner_iterate_tcase_tfuns (check_run.c:237) ==27460== by 0x50C1A70: srunner_run_tcase (check_run.c:377) ==27460== by 0x50C1A70: srunner_iterate_suites (check_run.c:205) ==27460== by 0x50C1A70: srunner_run_tagged (check_run.c:740) ==27460== by 0x50B676C: gst_check_run_suite (gstcheck.c:1059) ==27460== by 0x109DC7: main (matroskamux.c:1018) ==27460== ==27461== 2,642 (88 direct, 2,554 indirect) bytes in 1 blocks are definitely lost in loss record 1,780 of 1,793 ==27461== at 0x4C2ABEF: malloc (vg_replace_malloc.c:299) ==27461== by 0x58AB568: g_malloc (gmem.c:94) ==27461== by 0x58C3035: g_slice_alloc (gslice.c:1025) ==27461== by 0x58C34C8: g_slice_alloc0 (gslice.c:1051) ==27461== by 0x53787C3: gst_toc_new (gsttoc.c:143) ==27461== by 0x10C0B6: new_no_edition_toc (matroskamux.c:432) ==27461== by 0x10C0B6: test_toc (matroskamux.c:929) ==27461== by 0x10C6E4: test_toc_without_edition (matroskamux.c:990) ==27461== by 0x50C1A70: tcase_run_tfun_fork (check_run.c:465) ==27461== by 0x50C1A70: srunner_iterate_tcase_tfuns (check_run.c:237) ==27461== by 0x50C1A70: srunner_run_tcase (check_run.c:377) ==27461== by 0x50C1A70: srunner_iterate_suites (check_run.c:205) ==27461== by 0x50C1A70: srunner_run_tagged (check_run.c:740) ==27461== by 0x50B676C: gst_check_run_suite (gstcheck.c:1059) ==27461== by 0x109DC7: main (matroskamux.c:1018) ==27461== ==27461== 2,841 (88 direct, 2,753 indirect) bytes in 1 blocks are definitely lost in loss record 1,781 of 1,793 ==27461== at 0x4C2ABEF: malloc (vg_replace_malloc.c:299) ==27461== by 0x58AB568: g_malloc (gmem.c:94) ==27461== by 0x58C3035: g_slice_alloc (gslice.c:1025) ==27461== by 0x58C34C8: g_slice_alloc0 (gslice.c:1051) ==27461== by 0x53787C3: gst_toc_new (gsttoc.c:143) ==27461== by 0x91E00FB: gst_matroska_mux_start (matroska-mux.c:3002) ==27461== by 0x91E00FB: gst_matroska_mux_handle_buffer (matroska-mux.c:3871) ==27461== by 0x4E862FC: gst_collect_pads_default_collected (gstcollectpads.c:1561) ==27461== by 0x4E83427: gst_collect_pads_check_collected (gstcollectpads.c:1362) ==27461== by 0x4E84E26: gst_collect_pads_chain (gstcollectpads.c:2217) ==27461== by 0x5343182: gst_pad_chain_data_unchecked (gstpad.c:4270) ==27461== by 0x5343182: gst_pad_push_data (gstpad.c:4526) ==27461== by 0x534B6B2: gst_pad_push (gstpad.c:4645) ==27461== by 0x10B6A6: test_toc (matroskamux.c:938) ==27461== by 0x10C6E4: test_toc_without_edition (matroskamux.c:990) ==27461== by 0x50C1A70: tcase_run_tfun_fork (check_run.c:465) ==27461== by 0x50C1A70: srunner_iterate_tcase_tfuns (check_run.c:237) ==27461== by 0x50C1A70: srunner_run_tcase (check_run.c:377) ==27461== by 0x50C1A70: srunner_iterate_suites (check_run.c:205) ==27461== by 0x50C1A70: srunner_run_tagged (check_run.c:740) ==27461== by 0x50B676C: gst_check_run_suite (gstcheck.c:1059) ==27461== by 0x109DC7: main (matroskamux.c:1018) ==27461== 100%: Checks: 14, Failures: 0, Errors: 0
Created attachment 365587 [details] [review] Fix remaining memory leaks Thank you for the fix and sorry about the leaks. I think I solved the others (and I believe I have a complete environment for GStreamer C development now ;)).
Review of attachment 365587 [details] [review]: ::: gst/matroska/matroska-mux.c @@ +499,3 @@ mux->num_t_streams = 0; mux->num_v_streams = 0; + mux->internal_toc = NULL; This is unneeded btw, like the lines above it. The memory is all zero-initialized. Good practice to do this nonetheless for explicitness
commit 3464aac3c9802eff32ce86b29c1f02ab316c5d99 (HEAD -> master, origin/master, origin/HEAD) Author: fengalin <fengalin@free.fr> Date: Fri Dec 15 14:48:09 2017 +0100 matroska: fix memory leaks due to toc related updates https://bugzilla.gnome.org/show_bug.cgi?id=790686 commit c55824e4fab835f20d1caa463d166e40e5d4d545 Author: Sebastian Dröge <sebastian@centricular.com> Date: Fri Dec 15 11:40:13 2017 +0200 matroskamux: Fix various memory leaks in the unit test https://bugzilla.gnome.org/show_bug.cgi?id=790686 commit 694c07fe63ed47138f8c1d80dadb96c63ffa7f6c Author: fengalin <fengalin@free.fr> Date: Thu Dec 14 19:05:36 2017 +0100 matroska-mux: migrate test to gst_harness ... following the guide lines from Håvard Graff (see https://gstconf.ubicast.tv/videos/moar-better-tests/). https://bugzilla.gnome.org/show_bug.cgi?id=790686 commit a6702a76d5b4f1f5da99f8ab6a0bae0941e82b6b Author: fengalin <fengalin@free.fr> Date: Fri Dec 1 18:17:06 2017 +0100 matroska: re-activate and update TOC support TOC support in mastroskamux has been deactivated for a couple of years. This commit updates it to recent GstToc evolutions and introduces toc unit tests for both matroska-mux and matroska-demux. There are two UIDs for Chapters in Matroska's specifications: - The ChapterUID is a mandatory unsigned integer which internally refers to a given chapter. Except for title & language which use dedicated fields, this UID can also be used to add tags to the Chapter. The tags come in a separate section of the container. - The ChapterStringUID is an optional UTF-8 string which also uniquely refers to a chapter but from an external perspective. It can act as a "WebVTT cue identifier" which "can be used to reference a specific cue, for example from script or CSS". During muxing, the ChapterUID is generated and checked for unicity, while the ChapterStringUID receives the user defined UID. In order to be able to refer to chapters from the tags section, we maintain an internal Toc tree with the generated ChapterUID. When demuxing, the ChapterStringUIDs (if available) are assigned to the GstTocEntries UIDs and an internal toc mimicking the toc is used to keep track of the ChapterUIDs and match the tags with the appropriate GstTocEntries. https://bugzilla.gnome.org/show_bug.cgi?id=790686
(In reply to fengalin from comment #21) > I think I solved the others You did, thanks :) And thanks for the whole patch and reviving the TOC support > (and I believe I have a complete environment for > GStreamer C development now ;)). That's always good to have anyway, also if you need to debug any problems in the future :)