Bug 683010 – gst_tag_list_add_value_internal() behaviour for tags with no merge function

After an evaluation, GNOME has moved from Bugzilla to GitLab. Learn more about GitLab.
No new issues can be reported in GNOME Bugzilla anymore.
To report an issue in a GNOME project, go to GNOME GitLab.
Do not go to GNOME Gitlab for: Bluefish, Doxygen, GnuCash, GStreamer, java-gnome, LDTP, NetworkManager, Tomboy.

Bug 683010 - gst_tag_list_add_value_internal() behaviour for tags with no merge function


Summary:	gst_tag_list_add_value_internal() behaviour for tags with no merge function


Status:	RESOLVED OBSOLETE

Product:	GStreamer
Classification:	Platform
Component:	gstreamer (core)
Version:	git master
Hardware:	Other All

Importance:	Normal normal
Target Milestone:	git master
Assigned To:	GStreamer Maintainers
QA Contact:	GStreamer Maintainers

URL:
Whiteboard:

Depends on:
Blocks:

Reported:	2012-08-30 00:33 UTC by LRN
Modified:	2018-11-03 12:16 UTC

See Also:
GNOME target:	---
GNOME version:	---

Attachments
Non-NULL merge function to everyone! (18.36 KB, patch) 2012-08-30 00:34 UTC, LRN	needs-work	Details \| Review
Non-NULL merge function to everyone! (against current master HEAD) (18.34 KB, patch) 2013-07-24 17:10 UTC, LRN	none	Details \| Review

Description LRN 2012-08-30 00:33:34 UTC

Looking at the source code of gst_tag_list_add_value_internal(), it seems to be working like this:

If tag has a merge function and there already is a value for this tag:
GST_TAG_MERGE_REPLACE_ALL: replace
GST_TAG_MERGE_REPLACE    : replace
GST_TAG_MERGE_PREPEND    : concatenate new + old into a list of values
GST_TAG_MERGE_APPEND     : concatenate old + new into a list of values
GST_TAG_MERGE_KEEP       : do nothing
GST_TAG_MERGE_KEEP_ALL   : do nothing
Note that merge function is not actually used here (it will be used later, when merge is required, to turn concatenated list into a single value).

HOWEVER. If there is no merge function for this tag, OR there is a merge function, but no value for this tag in the list, it does this:
GST_TAG_MERGE_REPLACE_ALL: replace unconditionally
GST_TAG_MERGE_REPLACE    : replace unconditionally
GST_TAG_MERGE_PREPEND    : replace unconditionally!!!
GST_TAG_MERGE_APPEND     : if there already is a value for this tag - do nothing, replace otherwise!!!
GST_TAG_MERGE_KEEP       : if there already is a value for this tag - do nothing, replace otherwise
GST_TAG_MERGE_KEEP_ALL   : do nothing

Note that behaviour for PREPEND and APPEND is unexpected, if you consider the case where merge function is NULL, and there is a value for this tag already.
It means that it's impossible for non-mergeable tags to make a list of values for a tag, even if only one will be shown later, when merge is required.
And indeed, gst_tag_list_copy_value() fails when merge_func is NULL.

The reason for this seems to go like this:
If we can't keep both, when asked to append or prepend (because non-mergeable tags can't be stored as lists of values), do imaginary append/prepend operation, then take the head of the list (which means "keep existing item" for append, and "take new item" for prepend).
This seems to be a good idea, however:
1) It makes handling of a tag dependent on the merge mode, and not on tag's contents.
2) It is completely undocumented.

Proposed fix:
1) Give gst_tag_merge_use_first() merge function to types that have no merge function, do actual appending/prepending in a normal way. The end result will be the same (gst_tag_merge_use_first() will pick the head of the list, which will depend on whether the list was made by appending or prepending), except that user code that is list-aware will be able to fetch the whole list, and values won't be lost.
2) Also document this (unchanged, when merge function is NULL) behaviour (for users who register their own tags).

Comment 1 LRN 2012-08-30 00:34:29 UTC

Created attachment 222900 [details] [review]
Non-NULL merge function to everyone!

Comment 2 Sebastian Dröge (slomo) 2013-07-24 09:18:47 UTC

Could you update that to latest git master, also add the documentation about this and do the same with the tags in gst-plugins-base/gst-libs/gst/tag? Otherwise looks like a good idea to me

Comment 3 Sebastian Dröge (slomo) 2013-07-24 09:19:44 UTC

Maybe there should also be a GST_FIXME() if a tag is still registered with a NULL merge function after this.

Comment 4 Tim-Philipp Müller 2013-07-24 11:09:29 UTC

I'd just do

 if (merge_func == NULL)
   merge_func = gst_tag_merge_use_first;

in gst_tag_register*() ...

Comment 5 Sebastian Dröge (slomo) 2013-07-24 11:16:52 UTC

Me too but that breaks backwards compatibility as it changes the behaviour slightly. I'm fine with that if everybody else is :)

Comment 6 LRN 2013-07-24 17:10:24 UTC

Created attachment 250055 [details] [review]
Non-NULL merge function to everyone! (against current master HEAD)

Well, i don't really care how it's fixed, whether you just change the behaviour of NULL merge function, silently substituting it internally, or you use my patch to explicitly give non-NULL merge function to all tags you define - it matters not to me.

The silent approach may be backwards-compatible (since observable behaviour of functions that DO work on current tags with NULL merge functions will not change, AFAIU the code).

Comment 7 Tim-Philipp Müller 2013-07-25 11:43:55 UTC

Thinking about this again, I think the NULL merge function does represent a special case after all. It expresses that there should only be at most one tag of a certain kind in a taglist. This seems to make sense to me too, there just shouldn't be multiple values for some tags.

Comment 8 LRN 2013-07-25 20:37:07 UTC

That sounds like trying to shoehorn the reality into your idea of what the reality should be.

I.e. you think that there should only be one instance of tag X, but, say, Matroska spec doesn't, and an mkv file does have multiple X tags. Or it has A, B and C, and all three can only be mapped into X in GStreamer, thus X will have multiple values.

Should extra info be just _dropped_? That is something i don't want to see happening. My original goal, when i started messing with the Matroska demuxer and tag code, was to squeeze as much info from a file as possible. Dropping any info is contrary to that goal.

Again, AFAIU applications that expect to fetch one value from a tag, and do use the API that fetches only one value from a tag - they will actually fetch one value from a tag, even if it's a list of values. That's what merge functions are for.

Only apps that do not follow the spec (i.e. they read tags as they are, and don't make assumptions about tags having only one or multiple values) will see the difference.
If you are worried about these apps, you can make an API that enables/disables the new behaviour, and leave it disabled by default.

Or just commit this into 1.1 or whatever is the next major release, as you do with any non-backward-compatible change.

Comment 9 Tim-Philipp Müller 2013-07-25 21:12:53 UTC

Perhaps you could provide some examples of specific tags you were thinking of?

Comment 10 LRN 2013-07-25 21:39:58 UTC

GST_MATROSKA_TAG_ID_COPYRIGHT            -> GST_TAG_COPYRIGHT
GST_MATROSKA_TAG_ID_PRODUCTION_COPYRIGHT -> GST_TAG_COPYRIGHT

GST_MATROSKA_TAG_ID_LICENSE       -> GST_TAG_LICENSE
GST_MATROSKA_TAG_ID_TERMS_OF_USE -> GST_TAG_LICENSE

GST_MATROSKA_TAG_ID_DATE_RELEASED  -> GST_TAG_DATE
GST_MATROSKA_TAG_ID_DATE_RECORDED  -> GST_TAG_DATE
GST_MATROSKA_TAG_ID_DATE_ENCODED   -> GST_TAG_DATE
GST_MATROSKA_TAG_ID_DATE_TAGGED    -> GST_TAG_DATE
GST_MATROSKA_TAG_ID_DATE_DIGITIZED -> GST_TAG_DATE
GST_MATROSKA_TAG_ID_DATE_WRITTEN   -> GST_TAG_DATE
GST_MATROSKA_TAG_ID_DATE_PURCHASED -> GST_TAG_DATE

GST_MATROSKA_TAG_ID_LICENSE / GST_MATROSKA_TAG_ID_URL -> GST_TAG_LICENSE_URI
GST_MATROSKA_TAG_ID_LICENSE / GST_MATROSKA_TAG_ID_URL -> GST_TAG_LICENSE_URI

Comment 11 Sebastian Dröge (slomo) 2013-07-26 07:35:50 UTC

I think I have to agree with comment 8 here... so let's just go with merge_first() by default for biggest flexibility and least surprise.

Comment 12 Tim-Philipp Müller 2013-07-26 09:42:35 UTC

Even ignoring the tone of comment #8 for a moment, I'm not sure whether I agree with it.

For one, of course we provide an abstraction, and not even a particularly clever one. It will always be leaky and imperfect, and not be able to express everything. The same is true for the matroska tagging system (which isn't "reality" btw, it's just another abstraction).

Secondly, I think it's an illusion that we actually "keep information" if we just keep all those values around. We have lost the proper context for those then. What's the point of keeping 10 dates if you don't know what they mean anymore?

Thirdly, if something gets dropped because of an imperfect abstraction, it makes sense to have the element that has the most context do the dropping, and not some other function (though admittedly it would be sufficient if the element with the most context simply ordered things in a suitable way, knowing which one would be selected by merge_use_first).

Lastly, it is always possible to put extra information into "extended comments" in the form key=value if there's no good mapping in the tagging system. This way you can even preserve them if you remux to the same format.

I think we should look at the examples listed in more detail and decide on a case-by-case basis whether the best thing is to add more tags (which definitely makes sense for the dates IMHO), or to express it differently, or to allow multiple tags in those cases (I'm not saying the NULL merge function is correct everywhere where it's now, I'm just not sure it makes sense to replace it with merge_use_first everywhere).

Comment 13 LRN 2013-07-26 09:53:47 UTC

> Secondly, I think it's an illusion that we actually "keep information" if we
> just keep all those values around. We have lost the proper context for those
> then. What's the point of keeping 10 dates if you don't know what they mean
> anymore?

Good point, but for my purposes the lack of context is not a bug, just a lack of a feature.

> Lastly, it is always possible to put extra information into "extended comments"
> in the form key=value if there's no good mapping in the tagging system. This
> way you can even preserve them if you remux to the same format.

Since you've mentioned that, i have a question: why couldn't there be a generic tag class/whatever for holding tags that are not the part of the abstraction? I mean, putting things into "extended comments" like that sounds...hacky...

> I think we should look at the examples listed in more detail

OK

> whether the best thing is to add more tags (which definitely
> makes sense for the dates IMHO)

I think this possibility was mentioned previously, in the bug where i submitted matroska demuxer patches for tagging.

Comment 14 GStreamer system administrator 2018-11-03 12:16:04 UTC

-- GitLab Migration Automatic Message --

This bug has been migrated to freedesktop.org's GitLab instance and has been closed from further activity.

You can subscribe and participate further through the new bug through this link to our GitLab instance: https://gitlab.freedesktop.org/gstreamer/gstreamer/issues/29.