GNOME Bugzilla – Bug 683010
gst_tag_list_add_value_internal() behaviour for tags with no merge function
Last modified: 2018-11-03 12:16:04 UTC
Looking at the source code of gst_tag_list_add_value_internal(), it seems to be working like this: If tag has a merge function and there already is a value for this tag: GST_TAG_MERGE_REPLACE_ALL: replace GST_TAG_MERGE_REPLACE : replace GST_TAG_MERGE_PREPEND : concatenate new + old into a list of values GST_TAG_MERGE_APPEND : concatenate old + new into a list of values GST_TAG_MERGE_KEEP : do nothing GST_TAG_MERGE_KEEP_ALL : do nothing Note that merge function is not actually used here (it will be used later, when merge is required, to turn concatenated list into a single value). HOWEVER. If there is no merge function for this tag, OR there is a merge function, but no value for this tag in the list, it does this: GST_TAG_MERGE_REPLACE_ALL: replace unconditionally GST_TAG_MERGE_REPLACE : replace unconditionally GST_TAG_MERGE_PREPEND : replace unconditionally!!! GST_TAG_MERGE_APPEND : if there already is a value for this tag - do nothing, replace otherwise!!! GST_TAG_MERGE_KEEP : if there already is a value for this tag - do nothing, replace otherwise GST_TAG_MERGE_KEEP_ALL : do nothing Note that behaviour for PREPEND and APPEND is unexpected, if you consider the case where merge function is NULL, and there is a value for this tag already. It means that it's impossible for non-mergeable tags to make a list of values for a tag, even if only one will be shown later, when merge is required. And indeed, gst_tag_list_copy_value() fails when merge_func is NULL. The reason for this seems to go like this: If we can't keep both, when asked to append or prepend (because non-mergeable tags can't be stored as lists of values), do imaginary append/prepend operation, then take the head of the list (which means "keep existing item" for append, and "take new item" for prepend). This seems to be a good idea, however: 1) It makes handling of a tag dependent on the merge mode, and not on tag's contents. 2) It is completely undocumented. Proposed fix: 1) Give gst_tag_merge_use_first() merge function to types that have no merge function, do actual appending/prepending in a normal way. The end result will be the same (gst_tag_merge_use_first() will pick the head of the list, which will depend on whether the list was made by appending or prepending), except that user code that is list-aware will be able to fetch the whole list, and values won't be lost. 2) Also document this (unchanged, when merge function is NULL) behaviour (for users who register their own tags).
Created attachment 222900 [details] [review] Non-NULL merge function to everyone!
Could you update that to latest git master, also add the documentation about this and do the same with the tags in gst-plugins-base/gst-libs/gst/tag? Otherwise looks like a good idea to me
Maybe there should also be a GST_FIXME() if a tag is still registered with a NULL merge function after this.
I'd just do if (merge_func == NULL) merge_func = gst_tag_merge_use_first; in gst_tag_register*() ...
Me too but that breaks backwards compatibility as it changes the behaviour slightly. I'm fine with that if everybody else is :)
Created attachment 250055 [details] [review] Non-NULL merge function to everyone! (against current master HEAD) Well, i don't really care how it's fixed, whether you just change the behaviour of NULL merge function, silently substituting it internally, or you use my patch to explicitly give non-NULL merge function to all tags you define - it matters not to me. The silent approach may be backwards-compatible (since observable behaviour of functions that DO work on current tags with NULL merge functions will not change, AFAIU the code).
Thinking about this again, I think the NULL merge function does represent a special case after all. It expresses that there should only be at most one tag of a certain kind in a taglist. This seems to make sense to me too, there just shouldn't be multiple values for some tags.
That sounds like trying to shoehorn the reality into your idea of what the reality should be. I.e. you think that there should only be one instance of tag X, but, say, Matroska spec doesn't, and an mkv file does have multiple X tags. Or it has A, B and C, and all three can only be mapped into X in GStreamer, thus X will have multiple values. Should extra info be just _dropped_? That is something i don't want to see happening. My original goal, when i started messing with the Matroska demuxer and tag code, was to squeeze as much info from a file as possible. Dropping any info is contrary to that goal. Again, AFAIU applications that expect to fetch one value from a tag, and do use the API that fetches only one value from a tag - they will actually fetch one value from a tag, even if it's a list of values. That's what merge functions are for. Only apps that do not follow the spec (i.e. they read tags as they are, and don't make assumptions about tags having only one or multiple values) will see the difference. If you are worried about these apps, you can make an API that enables/disables the new behaviour, and leave it disabled by default. Or just commit this into 1.1 or whatever is the next major release, as you do with any non-backward-compatible change.
Perhaps you could provide some examples of specific tags you were thinking of?
GST_MATROSKA_TAG_ID_COPYRIGHT -> GST_TAG_COPYRIGHT GST_MATROSKA_TAG_ID_PRODUCTION_COPYRIGHT -> GST_TAG_COPYRIGHT GST_MATROSKA_TAG_ID_LICENSE -> GST_TAG_LICENSE GST_MATROSKA_TAG_ID_TERMS_OF_USE -> GST_TAG_LICENSE GST_MATROSKA_TAG_ID_DATE_RELEASED -> GST_TAG_DATE GST_MATROSKA_TAG_ID_DATE_RECORDED -> GST_TAG_DATE GST_MATROSKA_TAG_ID_DATE_ENCODED -> GST_TAG_DATE GST_MATROSKA_TAG_ID_DATE_TAGGED -> GST_TAG_DATE GST_MATROSKA_TAG_ID_DATE_DIGITIZED -> GST_TAG_DATE GST_MATROSKA_TAG_ID_DATE_WRITTEN -> GST_TAG_DATE GST_MATROSKA_TAG_ID_DATE_PURCHASED -> GST_TAG_DATE GST_MATROSKA_TAG_ID_LICENSE / GST_MATROSKA_TAG_ID_URL -> GST_TAG_LICENSE_URI GST_MATROSKA_TAG_ID_LICENSE / GST_MATROSKA_TAG_ID_URL -> GST_TAG_LICENSE_URI
I think I have to agree with comment 8 here... so let's just go with merge_first() by default for biggest flexibility and least surprise.
Even ignoring the tone of comment #8 for a moment, I'm not sure whether I agree with it. For one, of course we provide an abstraction, and not even a particularly clever one. It will always be leaky and imperfect, and not be able to express everything. The same is true for the matroska tagging system (which isn't "reality" btw, it's just another abstraction). Secondly, I think it's an illusion that we actually "keep information" if we just keep all those values around. We have lost the proper context for those then. What's the point of keeping 10 dates if you don't know what they mean anymore? Thirdly, if something gets dropped because of an imperfect abstraction, it makes sense to have the element that has the most context do the dropping, and not some other function (though admittedly it would be sufficient if the element with the most context simply ordered things in a suitable way, knowing which one would be selected by merge_use_first). Lastly, it is always possible to put extra information into "extended comments" in the form key=value if there's no good mapping in the tagging system. This way you can even preserve them if you remux to the same format. I think we should look at the examples listed in more detail and decide on a case-by-case basis whether the best thing is to add more tags (which definitely makes sense for the dates IMHO), or to express it differently, or to allow multiple tags in those cases (I'm not saying the NULL merge function is correct everywhere where it's now, I'm just not sure it makes sense to replace it with merge_use_first everywhere).
> Secondly, I think it's an illusion that we actually "keep information" if we > just keep all those values around. We have lost the proper context for those > then. What's the point of keeping 10 dates if you don't know what they mean > anymore? Good point, but for my purposes the lack of context is not a bug, just a lack of a feature. > Lastly, it is always possible to put extra information into "extended comments" > in the form key=value if there's no good mapping in the tagging system. This > way you can even preserve them if you remux to the same format. Since you've mentioned that, i have a question: why couldn't there be a generic tag class/whatever for holding tags that are not the part of the abstraction? I mean, putting things into "extended comments" like that sounds...hacky... > I think we should look at the examples listed in more detail OK > whether the best thing is to add more tags (which definitely > makes sense for the dates IMHO) I think this possibility was mentioned previously, in the bug where i submitted matroska demuxer patches for tagging.
-- GitLab Migration Automatic Message -- This bug has been migrated to freedesktop.org's GitLab instance and has been closed from further activity. You can subscribe and participate further through the new bug through this link to our GitLab instance: https://gitlab.freedesktop.org/gstreamer/gstreamer/issues/29.