After an evaluation, GNOME has moved from Bugzilla to GitLab. Learn more about GitLab.
No new issues can be reported in GNOME Bugzilla anymore.
To report an issue in a GNOME project, go to GNOME GitLab.
Do not go to GNOME Gitlab for: Bluefish, Doxygen, GnuCash, GStreamer, java-gnome, LDTP, NetworkManager, Tomboy.
Bug 790686 - matroskamux: re-activate TOC support
matroskamux: re-activate TOC support
Status: RESOLVED FIXED
Product: GStreamer
Classification: Platform
Component: gst-plugins-good
git master
Other All
: Normal enhancement
: 1.13.1
Assigned To: GStreamer Maintainers
GStreamer Maintainers
Depends on:
Blocks:
 
 
Reported: 2017-11-21 22:39 UTC by fengalin
Modified: 2017-12-15 14:22 UTC
See Also:
GNOME target: ---
GNOME version: ---


Attachments
[PATCH] matroskamux: re-activate TOC support (22.97 KB, patch)
2017-11-21 22:39 UTC, fengalin
none Details | Review
Patch rework further to the review and comments (60.92 KB, patch)
2017-12-01 17:26 UTC, fengalin
committed Details | Review
matroska-mux: use gst_harness in unit tests (20.13 KB, patch)
2017-12-07 15:27 UTC, fengalin
none Details | Review
Rework unit tests after watching suggested talk (31.23 KB, patch)
2017-12-14 18:12 UTC, fengalin
none Details | Review
Rework unit tests after watching suggested talk (more cleanups) (31.03 KB, patch)
2017-12-14 18:27 UTC, fengalin
none Details | Review
Rework unit tests after watching suggested talk (add author's name in commit message) (31.10 KB, patch)
2017-12-15 08:49 UTC, fengalin
committed Details | Review
matroskamux: Fix various memory leaks in the unit test (1.71 KB, patch)
2017-12-15 09:40 UTC, Sebastian Dröge (slomo)
committed Details | Review
Fix remaining memory leaks (6.13 KB, patch)
2017-12-15 13:57 UTC, fengalin
committed Details | Review

Description fengalin 2017-11-21 22:39:48 UTC
Created attachment 364153 [details] [review]
[PATCH] matroskamux: re-activate TOC support

TOC support in mastroskamux has been deactivated for a couple of years. This patch updates it to recent GstToc evolutions and introduces a unit test.
Comment 1 Sebastian Dröge (slomo) 2017-11-22 07:55:10 UTC
Did you confirm that the output is correct? IIRC it was disabled not only because it was not updated for the GstTOC API changes, but also because the output was broken.
Comment 2 fengalin 2017-11-22 11:24:45 UTC
These are the checks I did so far:

- Run the unit test (which w) to build a toc - chapters and sub-chapters -, mux a testsrc to a file, demux it and compare the result with the original. I also checked the temporary file with `gst-discoverer --toc`. 
- Use GStreamer based [media-toc](https://github.com/fengalin/media-toc) to build tocs on real life files - including audio only - and check the results with media-toc, ffmpeg and VLC.

Just a side note to share my observations with you: when I compared the resulting binary to media originating from other frameworks, I noticed a couple of potential improvements, but they don't break compatibility, so I decided not to touch theses in the patch:

- The size field for the nested structures is first written to a default value with the potential for the largest data size (the representation is 8 bytes long), then updated when all the inner values are known. Whatever the actual size of the resulting buffer, the field is kept 8 bytes long even for structures that could use less. This is different from other files I inspected.
- The chapters titles end with a `\0` (this is explicit in `gst_ebml_write_ascii`). While the specification is a little vague about this ("zero-padded when needed"), files originating from other frameworks dodn't use this. IMHO, it's not required as the value is in a TLV.
Comment 3 Sebastian Dröge (slomo) 2017-11-22 15:43:50 UTC
Ok that sounds good enough for now. Do you want to provide another patch on top that fixes the other parts above?
Comment 4 fengalin 2017-11-22 16:21:23 UTC
I'd rather do it as a separate enhancement as I don't know when I can do that and I already know this has consequences:

- Unit test for `matroskademux` fails when writing titles (tags) without the '\0'. It's probably easy to fix, but I didn't check yet.
- The size field part will require more buffer copies than current solution. As long as it only applies to small structures such as toc or tags, I guess it is acceptable, but I need to make sure it won't extend to stream segments.
Comment 5 Sebastian Dröge (slomo) 2017-11-23 08:41:14 UTC
Ok, then let's go with this one. I'll review later :)
Comment 6 Sebastian Dröge (slomo) 2017-11-24 09:27:06 UTC
Review of attachment 364153 [details] [review]:

::: gst/matroska/matroska-mux.c
@@ +2658,3 @@
       gst_ebml_write_master_start (ebml, GST_MATROSKA_ID_CHAPTERATOM);
+
+  uid = gst_matroska_mux_create_uid (mux);

We're not using the UIDs from the GstTocEntry but always create our own here. Why? We're also not even updating the entry's UID, which is what the previous code seemed to have done

@@ +2933,3 @@
+    toc_entries = gst_toc_get_entries (toc);
+    if (toc_entries != NULL) {
+      cur = g_list_first (toc_entries);

"cur = g_list_first(toc_entries)" is the same as "cur = toc_entries" btw

@@ +2964,3 @@
+        gst_matroska_mux_write_chapter_edition (mux, cur->data, ebml,
+            &master_chapters);
+      }

The to_write list seems to be never freed

::: tests/check/pipelines/toc.c
@@ +183,3 @@
+
+  launch_str = g_strdup_printf ("filesrc location=%s ! %s name=demux ! "
+      "fakesink", file, demuxer);

Thanks for also providing a test! But instead of a full pipeline based test, maybe just add something to the existing matroskademux / matroskamux tests that only checks the TOC functionality? E.g. for matroskamux configure the TOC, and check that the output is bit-equivalent with what you would expect to get for the TOC. For matroskademux, push some minimal Matroska beginning into the demuxer and check that you get the expected TOC out.

This here is testing too many things at once and if something fails it's harder to debug
Comment 7 fengalin 2017-11-24 12:07:43 UTC
For the UID generation, I kept the behaviour of previous implementation which seemed reasonable to me: considering that the UIDs have to be unique within the container, I'm not sure it would be a good idea to rely on the user's input. The function gst_matroska_mux_create_uid generates a UID in the expected range and checks that it is not already used. The UIDs generation is not only used by TOC entries, but also by Pads AFAICT. If you agree with this approach, I can add a comment.

The reason why I didn't set the entry's UID back is that there is no setter for UIDs on GstTocEntry. Do you want me to add it? You pointed to a bug in my implementation as gst_matroska_mux_write_toc_entry_tags uses gst_toc_entry_get_uid when writing tags. Either we add the setters, or I will pass the generated UID to gst_matroska_mux_write_toc_entry_tags.

> "cur = g_list_first(toc_entries)" is the same as "cur = toc_entries" btw
Yes, you're right, I'll rationalize this.

> The to_write list seems to be never freed
Yes, you're right, I though I did the job by unrefing toc_edition_entry, but that's not enough.

I'll change the unit test.

Can you give me your opinion about the UID setters? Should we add them, thus changing the user provided UID on GstTocEntry?
Comment 8 Sebastian Dröge (slomo) 2017-11-24 13:39:36 UTC
(In reply to fengalin from comment #7)
> 
> The reason why I didn't set the entry's UID back is that there is no setter
> for UIDs on GstTocEntry.

The user is supposed to provide a unique UID when creating the TOC entry


Maybe we could re-use those and only fall-back to generating our own if there are conflicts or the format of the UIDs is wrong? Add a FIXME comment for that, for now it's fine as-is I guess.
Comment 9 fengalin 2017-11-29 09:49:54 UTC
I was busy with other stuff recently, but I'm still progressing. Here is an update on my understanding. It would be helpful if someone could confirm or correct my statements and proposal.

- My reference for the Matroska's Specifications comes from (1).
- There are two UIDs that can be assigned while defining Chapters (2):
  * ChapterUID: a mandatory unsigned integer which uniquely refers to a chapter. Except for the title & language which use dedicated fields, this UID can also be used to add tags to the Chapter. See (3). The tags come in a separate section of the container as pictured in (4).
  * ChapterStringUID: an optional UTF-8 string which also uniquely refers to a chapter. The specifications links the definition of the "WebVTT cue identifier" (5) which "can be used to reference a specific cue, for example from script or CSS".

My understanding is that the ChapterUID is some kind of pointer which internally links items in the container while ChapterStringUID can be used for external purposes.

The GstToc and GstTocEntry "classes" provide a single UID as a string. This UID is user defined upon creation of the objects and can't be modified (no setter).

My proposal is to use the GstTocEntry user provided UID for ChapterStringUID and to generate the ChapterUID internally in matrsoka-mux. In order to keep track of the ChapterUID for tags, a separate GstToc / GstTocEntry tree is maintained which replicates the user provided GstToc but with ChapterUIDs in place of the user provided UIDs (and also the title tags removed once they are assigned in the chapters section).

---

(1) https://www.matroska.org/technical/index.html
(2) https://www.matroska.org/technical/specs/chapters/index.html
(3) https://www.matroska.org/technical/specs/tagging/index.html
(4) https://www.matroska.org/technical/diagram/index.html
(5) https://w3c.github.io/webvtt/#webvtt-cue-identifier
Comment 10 Sebastian Dröge (slomo) 2017-11-29 10:40:57 UTC
Yes, to all parts :)
Comment 11 fengalin 2017-12-01 17:26:44 UTC
Created attachment 364775 [details] [review]
Patch rework further to the review and comments
Comment 12 fengalin 2017-12-07 15:27:40 UTC
Created attachment 365202 [details] [review]
matroska-mux: use gst_harness in unit tests

This is an update on previous patch to use gst_harness in unit tests.
What do you think about these?
Comment 13 Sebastian Dröge (slomo) 2017-12-14 08:51:53 UTC
Review of attachment 364775 [details] [review]:

Generally looks good to me, thanks!

::: gst/matroska/matroska-mux.c
@@ +3355,2 @@
     gst_ebml_replace_uint (ebml, mux->seekhead_pos + 144,
         mux->tags_pos - mux->segment_master);

Should we write another thing here for the tocs?

::: tests/check/elements/matroskademux.c
@@ +137,3 @@
 GST_END_TEST;
 
+/* Recusively compare 2 toc entries */

typo :)
Comment 14 Sebastian Dröge (slomo) 2017-12-14 08:56:41 UTC
Comment on attachment 365202 [details] [review]
matroska-mux: use gst_harness in unit tests

This is great and a good start, but you might want to watch https://gstconf.ubicast.tv/videos/moar-better-tests/ for some further ideas :) E.g. currently the test is checking a mode that is not even used by default by matroskamux (version=1 in the test, version=2 default in matroskamux).
Comment 15 fengalin 2017-12-14 18:12:05 UTC
Created attachment 365552 [details] [review]
Rework unit tests after watching suggested talk

Thanks for the link to the talk!

The position for the TOC in the header is replaced in matroska-mux.c @ 3330 AFAIU. This position is used in the unit tests @ 881 and 893. Should it be incorrectly set, the TOC test would fail as the rest of the checks starts at this position.
Comment 16 fengalin 2017-12-14 18:27:28 UTC
Created attachment 365554 [details] [review]
Rework unit tests after watching suggested talk (more cleanups)

Sorry for the spam, I just found more things that could be improved in the unit tests.

Lines in the comment from my previous message have shifted. It should now read:

The position for the TOC in the header is replaced in matroska-mux.c @ 3330 AFAIU. This position is used in the unit tests @ 860 and 872. Should it be incorrectly set, the TOC test would fail as the rest of the checks starts from this position.

Won't touch this no more, unless you find something else to improve.
Comment 17 Sebastian Dröge (slomo) 2017-12-15 08:22:48 UTC
Comment on attachment 365554 [details] [review]
Rework unit tests after watching suggested talk (more cleanups)

The new test looks good, thanks for updating. Please also mention shortly Håvard in your commit message as inspiration, many of the changes now look very similar to his :) Otherwise looks good to go
Comment 18 fengalin 2017-12-15 08:49:06 UTC
Created attachment 365577 [details] [review]
Rework unit tests after watching suggested talk (add author's name in commit message)

> The new test looks good, thanks for updating. Please also mention shortly
> Håvard in your commit message as inspiration, many of the changes now look
> very similar to his :) Otherwise looks good to go

Yes indeed, that's why I included the link to the talk in the commit message ;). I'll add his name too.
Comment 19 Sebastian Dröge (slomo) 2017-12-15 09:40:44 UTC
Created attachment 365583 [details] [review]
matroskamux: Fix various memory leaks in the unit test
Comment 20 Sebastian Dröge (slomo) 2017-12-15 09:41:24 UTC
After this there are a few more leaks but I don't have time to look into that now. Some GstTocs are leaking. Run with "make elements/matroskamux.valgrind" in tests/check to get this.

Running suite(s): matroskamux
==27460== 2,992 (88 direct, 2,904 indirect) bytes in 1 blocks are definitely lost in loss record 1,793 of 1,806
==27460==    at 0x4C2ABEF: malloc (vg_replace_malloc.c:299)
==27460==    by 0x58AB568: g_malloc (gmem.c:94)
==27460==    by 0x58C3035: g_slice_alloc (gslice.c:1025)
==27460==    by 0x58C34C8: g_slice_alloc0 (gslice.c:1051)
==27460==    by 0x53787C3: gst_toc_new (gsttoc.c:143)
==27460==    by 0x10A15F: new_reference_toc (matroskamux.c:408)
==27460==    by 0x10B64F: test_toc (matroskamux.c:927)
==27460==    by 0x10C7D7: test_toc_with_edition (matroskamux.c:983)
==27460==    by 0x50C1A70: tcase_run_tfun_fork (check_run.c:465)
==27460==    by 0x50C1A70: srunner_iterate_tcase_tfuns (check_run.c:237)
==27460==    by 0x50C1A70: srunner_run_tcase (check_run.c:377)
==27460==    by 0x50C1A70: srunner_iterate_suites (check_run.c:205)
==27460==    by 0x50C1A70: srunner_run_tagged (check_run.c:740)
==27460==    by 0x50B676C: gst_check_run_suite (gstcheck.c:1059)
==27460==    by 0x109DC7: main (matroskamux.c:1018)
==27460== 
==27460== 3,026 (88 direct, 2,938 indirect) bytes in 1 blocks are definitely lost in loss record 1,794 of 1,806
==27460==    at 0x4C2ABEF: malloc (vg_replace_malloc.c:299)
==27460==    by 0x58AB568: g_malloc (gmem.c:94)
==27460==    by 0x58C3035: g_slice_alloc (gslice.c:1025)
==27460==    by 0x58C34C8: g_slice_alloc0 (gslice.c:1051)
==27460==    by 0x53787C3: gst_toc_new (gsttoc.c:143)
==27460==    by 0x91E00FB: gst_matroska_mux_start (matroska-mux.c:3002)
==27460==    by 0x91E00FB: gst_matroska_mux_handle_buffer (matroska-mux.c:3871)
==27460==    by 0x4E862FC: gst_collect_pads_default_collected (gstcollectpads.c:1561)
==27460==    by 0x4E83427: gst_collect_pads_check_collected (gstcollectpads.c:1362)
==27460==    by 0x4E84E26: gst_collect_pads_chain (gstcollectpads.c:2217)
==27460==    by 0x5343182: gst_pad_chain_data_unchecked (gstpad.c:4270)
==27460==    by 0x5343182: gst_pad_push_data (gstpad.c:4526)
==27460==    by 0x534B6B2: gst_pad_push (gstpad.c:4645)
==27460==    by 0x10B6A6: test_toc (matroskamux.c:938)
==27460==    by 0x10C7D7: test_toc_with_edition (matroskamux.c:983)
==27460==    by 0x50C1A70: tcase_run_tfun_fork (check_run.c:465)
==27460==    by 0x50C1A70: srunner_iterate_tcase_tfuns (check_run.c:237)
==27460==    by 0x50C1A70: srunner_run_tcase (check_run.c:377)
==27460==    by 0x50C1A70: srunner_iterate_suites (check_run.c:205)
==27460==    by 0x50C1A70: srunner_run_tagged (check_run.c:740)
==27460==    by 0x50B676C: gst_check_run_suite (gstcheck.c:1059)
==27460==    by 0x109DC7: main (matroskamux.c:1018)
==27460== 
==27461== 2,642 (88 direct, 2,554 indirect) bytes in 1 blocks are definitely lost in loss record 1,780 of 1,793
==27461==    at 0x4C2ABEF: malloc (vg_replace_malloc.c:299)
==27461==    by 0x58AB568: g_malloc (gmem.c:94)
==27461==    by 0x58C3035: g_slice_alloc (gslice.c:1025)
==27461==    by 0x58C34C8: g_slice_alloc0 (gslice.c:1051)
==27461==    by 0x53787C3: gst_toc_new (gsttoc.c:143)
==27461==    by 0x10C0B6: new_no_edition_toc (matroskamux.c:432)
==27461==    by 0x10C0B6: test_toc (matroskamux.c:929)
==27461==    by 0x10C6E4: test_toc_without_edition (matroskamux.c:990)
==27461==    by 0x50C1A70: tcase_run_tfun_fork (check_run.c:465)
==27461==    by 0x50C1A70: srunner_iterate_tcase_tfuns (check_run.c:237)
==27461==    by 0x50C1A70: srunner_run_tcase (check_run.c:377)
==27461==    by 0x50C1A70: srunner_iterate_suites (check_run.c:205)
==27461==    by 0x50C1A70: srunner_run_tagged (check_run.c:740)
==27461==    by 0x50B676C: gst_check_run_suite (gstcheck.c:1059)
==27461==    by 0x109DC7: main (matroskamux.c:1018)
==27461== 
==27461== 2,841 (88 direct, 2,753 indirect) bytes in 1 blocks are definitely lost in loss record 1,781 of 1,793
==27461==    at 0x4C2ABEF: malloc (vg_replace_malloc.c:299)
==27461==    by 0x58AB568: g_malloc (gmem.c:94)
==27461==    by 0x58C3035: g_slice_alloc (gslice.c:1025)
==27461==    by 0x58C34C8: g_slice_alloc0 (gslice.c:1051)
==27461==    by 0x53787C3: gst_toc_new (gsttoc.c:143)
==27461==    by 0x91E00FB: gst_matroska_mux_start (matroska-mux.c:3002)
==27461==    by 0x91E00FB: gst_matroska_mux_handle_buffer (matroska-mux.c:3871)
==27461==    by 0x4E862FC: gst_collect_pads_default_collected (gstcollectpads.c:1561)
==27461==    by 0x4E83427: gst_collect_pads_check_collected (gstcollectpads.c:1362)
==27461==    by 0x4E84E26: gst_collect_pads_chain (gstcollectpads.c:2217)
==27461==    by 0x5343182: gst_pad_chain_data_unchecked (gstpad.c:4270)
==27461==    by 0x5343182: gst_pad_push_data (gstpad.c:4526)
==27461==    by 0x534B6B2: gst_pad_push (gstpad.c:4645)
==27461==    by 0x10B6A6: test_toc (matroskamux.c:938)
==27461==    by 0x10C6E4: test_toc_without_edition (matroskamux.c:990)
==27461==    by 0x50C1A70: tcase_run_tfun_fork (check_run.c:465)
==27461==    by 0x50C1A70: srunner_iterate_tcase_tfuns (check_run.c:237)
==27461==    by 0x50C1A70: srunner_run_tcase (check_run.c:377)
==27461==    by 0x50C1A70: srunner_iterate_suites (check_run.c:205)
==27461==    by 0x50C1A70: srunner_run_tagged (check_run.c:740)
==27461==    by 0x50B676C: gst_check_run_suite (gstcheck.c:1059)
==27461==    by 0x109DC7: main (matroskamux.c:1018)
==27461== 
100%: Checks: 14, Failures: 0, Errors: 0
Comment 21 fengalin 2017-12-15 13:57:01 UTC
Created attachment 365587 [details] [review]
Fix remaining memory leaks

Thank you for the fix and sorry about the leaks.

I think I solved the others (and I believe I have a complete environment for GStreamer C development now ;)).
Comment 22 Sebastian Dröge (slomo) 2017-12-15 14:15:48 UTC
Review of attachment 365587 [details] [review]:

::: gst/matroska/matroska-mux.c
@@ +499,3 @@
   mux->num_t_streams = 0;
   mux->num_v_streams = 0;
+  mux->internal_toc = NULL;

This is unneeded btw, like the lines above it. The memory is all zero-initialized. Good practice to do this nonetheless for explicitness
Comment 23 Sebastian Dröge (slomo) 2017-12-15 14:17:22 UTC
commit 3464aac3c9802eff32ce86b29c1f02ab316c5d99 (HEAD -> master, origin/master, origin/HEAD)
Author: fengalin <fengalin@free.fr>
Date:   Fri Dec 15 14:48:09 2017 +0100

    matroska: fix memory leaks due to toc related updates
    
    https://bugzilla.gnome.org/show_bug.cgi?id=790686

commit c55824e4fab835f20d1caa463d166e40e5d4d545
Author: Sebastian Dröge <sebastian@centricular.com>
Date:   Fri Dec 15 11:40:13 2017 +0200

    matroskamux: Fix various memory leaks in the unit test
    
    https://bugzilla.gnome.org/show_bug.cgi?id=790686

commit 694c07fe63ed47138f8c1d80dadb96c63ffa7f6c
Author: fengalin <fengalin@free.fr>
Date:   Thu Dec 14 19:05:36 2017 +0100

    matroska-mux: migrate test to gst_harness
    
    ... following the guide lines from Håvard Graff (see https://gstconf.ubicast.tv/videos/moar-better-tests/).
    
    https://bugzilla.gnome.org/show_bug.cgi?id=790686

commit a6702a76d5b4f1f5da99f8ab6a0bae0941e82b6b
Author: fengalin <fengalin@free.fr>
Date:   Fri Dec 1 18:17:06 2017 +0100

    matroska: re-activate and update TOC support
    
    TOC support in mastroskamux has been deactivated for a couple of years. This commit updates it to recent GstToc evolutions and introduces toc unit tests for both matroska-mux and matroska-demux.
    
    There are two UIDs for Chapters in Matroska's specifications:
    - The ChapterUID is a mandatory unsigned integer which internally refers to a given chapter. Except for title & language which use dedicated fields, this UID can also be used to add tags to the Chapter. The tags come in a separate section of the container.
    - The ChapterStringUID is an optional UTF-8 string which also uniquely refers to a chapter but from an external perspective. It can act as a "WebVTT cue identifier" which "can be used to reference a specific cue, for example from script or CSS".
    
    During muxing, the ChapterUID is generated and checked for unicity, while the ChapterStringUID receives the user defined UID. In order to be able to refer to chapters from the tags section, we maintain an internal Toc tree with the generated ChapterUID.
    
    When demuxing, the ChapterStringUIDs (if available) are assigned to the GstTocEntries UIDs and an internal toc mimicking the toc is used to keep track of the ChapterUIDs and match the tags with the appropriate GstTocEntries.
    
    https://bugzilla.gnome.org/show_bug.cgi?id=790686
Comment 24 Sebastian Dröge (slomo) 2017-12-15 14:22:14 UTC
(In reply to fengalin from comment #21)

> I think I solved the others

You did, thanks :) And thanks for the whole patch and reviving the TOC support

> (and I believe I have a complete environment for
> GStreamer C development now ;)).

That's always good to have anyway, also if you need to debug any problems in the future :)