GNOME Bugzilla – Bug 486659
xmp/exif metadata handling
Last modified: 2010-09-15 15:30:13 UTC
Some info fisrt - - - info related to Exif - - - Can embedded EXIF into JPEG or TIFF 6.0 images JPEG starts with 0xFFD8 (SOI - Start Of Image) TIFF starts with 0x49492A00 or 0x4D4D002A it dependes on byte order JPEG imagens could be found in JFIF file format or EXIF file format JFIF starts with 0xFFD8 (jpeg SOI) 0xFFE0 (app mark 0) 0xxxxx (size) 0x4A464946 ('JFIF') EXIF starts with 0xFFD8 (jpeg SOI) 0xFFE1 (app mark 1) 0xxxxx (size) 0x45786966 ('EXIF') ...so JFIF and EXIF are not compatible JPEG files are divided in segments and sometimes we can find EXIF segment (which is app mark 1 + size + 'EXIF') somewhere after JFIF segment. It means that it is a JFIF file, but some EXIF libs search for such segments anyway. In our implementation we could decide to not get metadata of those files if we don't want to. jpegdec can render both EXIF and JPEG Once we find the EXIF inside the TIFF or JPEG file, we can extract the info in the same way - - - info related to IPTC - - - IPTC metadata can be embedded in JFIF (photoshop segment oxFFED (APP MARK 14)), EXIF (exif segment) and TIFF files * it is alredy possible to have EXIF and IPTC on PSD files - - - XMP - - - can be inside PDF, JPEG, JPEG 2000, GIF, PNG, HTML, TIFF, Adobe Illustrator, PSD, SVG/XML, DNG, PostScript and Encapsulated PostScript In PDF documents, XMP can not only be used to describe the document as a whole, but can also be attached to parts of the document, such as pages, included images, and tags defining structural divisions of the document in case of JPEG it is inside a APP1 Marker (so it is compatible with a JFIF or also EXIF if it on a second APP1 marker) - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - So it seems to be too much generic... my idea is to have the following design we should create on -base, helper functions to generate a tag list GstTagList *gst_tag_list_from_metadata_chunk(const GstBuffer * buffer); each image demuxer/decoder has to find out the metadata chunk and then call this method. on the other side, muxer/encoder should call GstBuffer *gst_exif_chunk_from_tag_list(GstTagList *); GstBuffer *gst_iptc_chunk_from_tag_list(GstTagList *); GstBuffer *gst_xmp_chunk_from_tag_list(GstTagList *); and then, the muxer/encoder has to where to write this chunk in the file ....so, it would be need modifications to each decoder/demuxer/encoder/muxer that wants to extract/inject metadata info ps: we could implement it step-by-strep, starting for example with jpegdec and exif BR, Edgard
Right, so there should be a heperl-library under e.g. gst-plugins-base/gst-libs/gst/imagetags/ that provides support for each of the formats a la: GstTagList *gst_tag_list_from_exif_chunk(const GstBuffer * buffer); GstBuffer *gst_exif_chunk_from_tag_list(GstTagList *tag_list); Then jpeg{enc,dec}, tiff{enc,dec}, png{enc/dec} can make use of those. gst_XXX_from_tag_list() should return NULL if the taglist does not contain suitable tags. For the future we should think about a useful interface that allows applications to select what metadata-formats should be produced.
> Right, so there should be a helper-library under e.g. > gst-plugins-base/gst-libs/gst/imagetags/ > that provides support for each of the formats a la: Umm, why not just put it into the existing libgsttag?
I think it should be in gst-plugins-base/gst-libs/ext/tags/ "tags" instead of "imagetags" and "ext" instead of "gst" 'cause I would like to use the following libs: Exif: - http://libexif.sourceforge.net/ IPTC: - http://libiptcdata.sourceforge.net/ XMP: - http://libopenraw.freedesktop.org/wiki/Exempi BR, Edgard
Do you think we should create a new tag lib like libgsttagext or move the libgsttag from gst to ext? BR,
I didn't realise you were planning on using external libraries (is that really needed? Are those tag formats so complicated?). In that case libgsttag is not an option.
EXIF, XMP, IPCT are different from id3 or vorbiscomments. The standarts for metadata we are talking here about describes how to format it and how to embed this in various formats. In most cases it will be stored inside the container (and that the level of support we would like to address). Now basically all format in which we would like to support it would need to parse the chunk and emit tags when reading and format the tags into a chunk when writing. How the chunk is streamlined with the container-format is specific, but the content of the chunk is not. So basically when reading the app will do: tags = gst_tag_list_from_exif_chunk (buffer) when it found an exif chunk. If the proposed utility library has exif support gst_tag_list_from_exif_chunk will parse the block and generate single tags, if not it could emit GST_TAG_EXIF with the exif binary blob. When writing the app will do: buffer = gst_exif_chunk_from_tag_list (tags) if there is exif support in the lib it will check if there are suitable tags are in the taglist it will return a buffer, else NULL. if there is no exif support it will check if there is GST_TAG_EXIF and return that. This way we don't clutter several elements with #ifdef HAVE_LIBEXIF and we preserve the metadata for e.g. filesrc ! pngdec ! jpegenc ! filesink. If the libs are available one can even change the metadata.
1. Use helper libraries to create tag lists from metadachunk * lets think just about the design, no matter if it is ext or gst GstTagList *gst_tag_list_from_exifimage_chunk(const GstBuffer * buffer); GstTagList *gst_tag_list_from_iptc_chunk(const GstBuffer * buffer); GstTagList *gst_tag_list_from_xmp_chunk(const GstBuffer * buffer); GstBuffer *gst_exifimage_chunk_from_tag_list(GstTagList *tag_list); GstBuffer *gst_iptc_chunk_from_tag_list(GstTagList *tag_list); GstBuffer *gst_xmp_chunk_from_tag_list(GstTagList *tag_list); 1.1 those helper libraries would be called by decoders (tag_list_from_metadata_chunk) and sent as a message. On the other side, Encoders recive tag messsages and write it as chunks inside the file been converted to (metadata_chunk_from_tag_list) 1.1.1- Advantages i - autoplugable ii- there is only one central code (library) for all file formats. The only thing the file format has to do is to find the metadata chunk inside it. 1.1.2- Disadvanges i - The encoder doesn't have any idea about what kind of netadata it is so it will have to write duplicate information to all the metadata chunks exif, iptc, xmp and future ones. 1.1.3- Open issues i- Applications would need some extra Gst-Interface to encoders if it want to decide to write exif and/or iptc and/or xmp examples: ----------- ----------- ------------ | v4l2src | -> | jpegenc | -> | filesink | ----------- ----------- ------------ The application could send metadata (as any other element in pipepilene). The end file will be a file with 3 metadata chunks (exif, iptc, xmp) with duplicated info From the application point of view, there is no problem because the metadata is merged. Unless the application has 4 sidebars, one for general tags, one for iptc tags and one for xmp tags. In this case the application can't identify which metadata is from exif, iptc, and so on. ----------- ----------- ---------- ------------ | filesrc | -> | jpegdec | -> | pngenc | -> | filesink | ----------- ----------- ---------- ------------ The same problem, the application doesn't have control of what metadata will be created ----------- ----------- --------------- | filesrc | -> | jpegdec | -> | xvimagesink | ----------- ----------- --------------- The same problem, if the application wants to show tags separately it can't (lets just think of an app like Eye of Gnome for example) !!! So, to solve these limitations i- app can't know what kind of metadata it is ii - encoders don't know what kind of metadata to write I propose the following changes to GstTagList 1- A tag list has one id, name and description, i.e. GST_TAG_CATEGORY_EXIF, "Exif" and "Exif metadata for images" 1.1 - Ids starting from some value, for example 4000 are reserved for application specific 1.2 - there is a GST_TAG_CATEGORY_GENERAL 2- A tag values can contains not only unique values, like INT, STRING and so, but can also be of type GROUP 2.1- GROUP tags has a name, and description i.e. "Rights" "Information regarding the legal restrictions" and also a list of other tags With such a thing, the Application could show metadata all together or make it friendly like this: Exif Camera Make Model XResolution YResolution Image Data Orientation DateTime Compression MakerNote Object Distance Time Zone XMP Basic Advisory BaseURL CreateDate CreatorTool Rights Certificate Marked Owner UsageTerms
please comment it....I would like to start implementing ASAP (may be tomorrow) We have discussed on IRC and decide to not create a list of tags (group) type for a tag. Indeed grouping is important and will be done in the following way (easier implementation) strings with a separator will create the concept of groups, like bellow: "Exif" "Exif:Camera" "Exif:Camera:Make" "Exif:Camera:Model" what is a good separator? I think ':' is a good one ....to help application a helper function should be create GstTagList * gst_new_sorted_by_group_tag_list_from_tag_list(const GstTagList *); also a define must be exposed to applications #define GST_TAG_GROUP_CHAR ':'
Uhm. that's completely the opposite of what we discussed on IRC. We were talking about making a field of type GstStructure inside GstStructures to get a real hierarchy. We don't need a separator then.
....so, ok, fine I'm felling better with real hierarchy (GstStructure)....tomorrow I will start implementing it and then attach the patch (hopefully until Friday) here before commit BR, Edgard
Wim, the idea is to have a convinience API that hides the nested structures. So one can set "Video:Encoder" and it would automatically create the Video sub-structure if not there and create a Encoder element inside. Too bad that GstStructure has no flags. If it would a flag could signal the existence of substructures. This way gst_structure_{set|get} could avoid scanning for ":" in the name. If GstStructure would have been a GObject I would have sugested to use the ChildProxy Iface. Of could we could add something like gst_structure_deep_{set|get} instead.
I will not implement anymore in base/gst-libs/ext/tags I will follow the MikeS's suggestion on #gstreamer and implement in the following way: Create a new element, called 'metadataparser', that accepts image/jpeg , image/tiff, etc. as input the 'metadataparser' element has higher priority than jpegdec, tiffdec and so. them an auto-plugged would looks like this: filesrc -(img/jpeg)-> metadataparser -(img/jpeg-metadata)-> jpegdec -(video/x-raw-yuv)-> xvimagesink The 'jpegdec' still handles 'image/jpeg' but has lower priority than 'metadata'. In addition 'jpegdec' also hanldes 'image/jpeg-metadata'. So, the 'metadataparser' element has knowledge about each metadata type (Exif, Iptc, Xmp) and also how the metadata is embedded into files). I will try to code it most modularized as possible. The 'metadataparser' element doesn't change the stream, it just look into the stream and extract metadata, sending tags. For the encode the pipeline would looks like this videotestsrc -> jpegenc -> metadataenc -> filesink Different from 'metadataparser', 'metadataenc' changes the stream, embedding the metadata in it. So, in same way, 'metadataenc' has knowledge about all metadata types and file formats involved. The metadata embedded are tags sent as events, by application or upstream elements, and mapped to metadata. The 'metadataenc' element will have three properties, 'exif', 'iptc' and 'xmp'. By default only exif will be 'on', the application can decide which of those options to turn 'on' or btw: I hope commit to plugins-bad a first version until Friday (only metadataparser with jpeg)
just committed the first version in gst-plugins-bad/ext/metadata to run it try: $ GST_DEBUG=*metadata*:5 gst-launch-0.10 filesrc location=BlueSquare57.jpg ! metadataparse ! fakesink silent=true -v this version still doesn't send metadata tags, but GST_LOG its. this version only handles jpeg (exif, iptc) ...my plan for this week is to: 1- also find xmp chunk inside jpeg file. 2- send the tag messages. . . . . . . . . . *** if you want to test it like this (filesrc ! jpegdec ! xvimagesink) just add "image/jpeg-metadata" to the jpegdec sink pad BR Edgard *** please lets discuss bug #482947
What about the plugin name? someone suggested plugin: imagetag element parser: imagetag element writer:??????
today I have committed the following changes to the element: it sends the whole IPTC (Exif or else XMP) chunk in just one tag, #define GST_TAG_IPTC "iptc" this way, pipeline like this works fine: filesrc ! metadataparse ! jpegdec ! image-processing ! jpegenc ! metadatamux ! filesink 'cause the metadatamux element will receive the tag event and write to the image file. ....now it would be good to create some default tags (bug #482947) related to images. And those tags could be mapped to/from exif,iptc and xmp metadata. for example: v4l2src ! jpegenc ! metadatamux ! filesink the v4l2src element wants to send just "EXPOSURE_TIME" tag (no matter if it is iptc, exif or whatelse)...then the metadatamux-exif could just map this image general tag into one of its. ...if we don't have such new default tags to be mapped...the only thing we can do is : the application receives the tag message i.e. "EXPOSURE_TIME" and then send it back to the pipeline like this "Exif::ExposureTime" (or something like this using nested structures or whatelse)..so, in this second case, the map is up to the application comments pls !! BR Edgard
Hi I have just created a new dos to describe how parse and mux should operate. please comment it http://webcvs.freedesktop.org/gstreamer/gst-plugins-bad/ext/metadata/README?revision=1.1&view=markup thanks, Edgard
Apart from emitting tags, it would also be useful to let an image metadata parser change the caps. Specifically, width and height are always present. A use case for this is multiplexing of motion JPEG. The source of the JPEG data may not provide width and height properties in the caps, but multiplexers like avimux and matroskamux demand these properties on their source pads: souphttpsrc location="http://webcam/mjpeg" do-timestamp=true ! multipartdemux ! metadataparse ! matroskamux ! filesink location="webcam.mkv"
Some more thinking (and discussion with tim) about the structure: EXIF: can be in jfif (jpeg) and in tiff JFIF: APP1 (segment marker 0xFFE1), holds an entire TIFF file within TIFF: Private Tag 0x8769 holds the Exif specified TIFF Tags, Private Tag 0x8825 holds GPS sub-IFD XMP: can be in many filetypes XMP is most commonly serialized and stored using a subset of RDF, which is in turn expressed in XML IPTC: ignore for now What we should to is to have in gst-plugins-base/gst-libs/gst/tag gstexiftag.{c,h} gstxmptag.{c,h} Those would, if feasible, implementing the standard without the external dependencies. Two new elements: jfifparse: takes jfif, parses exif, xmp, outputs jpeg jfifformat: takes jpeg, adds jfif framing including exif and xmp, outputs jfif pngdec/enc, asfmux/demux, avimux/demux, flvmux/demux, qtmux/demux, wavparse/wavenc could use libgsttag to add xmp support.
XMP Links: http://www.adobe.com/devnet/xmp/ http://en.wikipedia.org/wiki/Extensible_Metadata_Platform EXIF Links: http://www.exif.org/specifications.html http://en.wikipedia.org/wiki/Exif
We really need 'proper' support for exif/xmp at least. The problem with the current metadata implementation and based on the last comments and discussion on IRC is that the metadata(de)mux elements are trying to do something which other elements can do (much better) and are failing abysmally at that. It would be better to provide a convenience library to give the exif/xmp blobs to and get back tags/structures (and vice-versa) and leave to existing elements that know how to parse/mux those blobs in a given format handle that. Ex : add it to jpegdec/jpegenc. It would then be also much more trivial to make (if needed) parsers for specific formats. Ex : within the jpeg plugin, you could share the code from gstjpeg{enc|dec} and make a jpegparse element which can extract/insert those tags without having to encode/decode anything.
Some links for JPEG in JFIF or EXIF format: http://www.fileformat.info/format/jpeg/egff.htm http://en.wikipedia.org/wiki/JPEG http://en.wikipedia.org/wiki/JFIF
Regarding my comment #18, especialy jfif-elements - I was reading the spec wrong, the format is called jpeg too, jfif is just an app marker like exif and xmp. It would still be nice to have the app-marker handling as separate elements, so that its reusable together with basic 3rd party jpeg-codecs (e.g. a dsp based jpeg encoder/decoder should not be bothered with exif parsing). jpegparse: parses jfif, exif, xmp app markers, outputs jpeg with app markers stripped jpegformat: takes jpeg, adds app markers like jfif, exif and xmp
For xmp, we probably don't want to rewrite exempi (http://libopenraw.freedesktop.org/wiki/Exempi) SLOC Directory SLOC-by-Language (Sorted) 34656 source cpp=34656 1757 exempi cpp=1737,sh=20 exempi is BSD. Kind of similar story for libexif. http://libexif.sourceforge.net/ and this one is LGPL. (removed some irrelevant parts). So this would be the first case of gst-plugins-base/gst-libs/ext/tag/gst{xmp,exif}tag.{c,h} What are the opinions one tag support libaries in base that have external deps. Basically plugins would either need some code like if gst_tag_is_exif_supported() or we could maybe even handle that in the tag libary to have stubs if the dependency is missing. Any preference here?
xmp is now implemented in gst-plugin-base/gst-libs/gst/tag/. Exif should probably be done the same way, as besides jpeg exif can be in wav and avi files (http://www.sno.phy.queensu.ca/~phil/exiftool/TagNames/RIFF.html#Exif).
FYI, I'm working on exif implementation.
Exif code has been committed. See Bug #614872. So we could close this. Wonder if we want to also kill the metadata plugin at the same time?
+1 on killing metadata plugin.
Does anyone see benefits in rescuing tests/icles/metadata_editor ? I think having a generic metadata editor would be nice, but is maybe a bit out of scope for tests/icles/.