GNOME Bugzilla – Bug 704796
Parsing problem with XMP language alternative fields leads to data loss
Last modified: 2013-10-27 13:17:22 UTC
The metadata plugin doesn't seem to read correctly XMP fields of type language alternative. After Gimp edition and image saving, initial fields values are lost and replaced by the default language specifier. Seen on Gimp versions 2.x including (my) latest 2.8.6 Untested with 2.9 Reproducible : always Steps to reproduce : 1. Have a test image, say IMG1.JPG (test image means an image without XMP metadata or metadata that can be sacrificed (i.e. an image copy).) 2. Add a XMP field which is of lang-alt type, e.g. the Dublin Core title. Can be done with a command line metadata editor such as exiftool: exiftool -xmp-dc:title='Example Title' IMG1.JPG 3. Dump the XMP section of image (for visual control) exiftool -xmp -b IMG1.JPG Output should show a correct implementation : <dc:title> <rdf:Alt> <rdf:li xml:lang='x-default'>Example title</rdf:li> </rdf:Alt> </dc:title> 4. Edit/save the image with Gimp. No need to modify anything, a simple open + export to IMG2.JPG will do. 5. Dump the XMP section of the new image exiftool -xmp -b IMG2.JPG <dc:title> <rdf:Alt> <rdf:li xml:lang='x-default'>x-default</rdf:li> </rdf:Alt> </dc:title> The text value of the title property is lost and replaced by the language specifier. :-( Also, when a XMP field contains several language variants (and the lang-alt aims to that), e.g. : <dc:title> <rdf:Alt> <rdf:li xml:lang='x-default'>Example title</rdf:li> <rdf:li xml:lang='fr-FR'>Titre exemple</rdf:li> <rdf:li xml:lang='es-ES'>Ejemplo titulo'</rdf:li> <rdf:li xml:lang='de-DE'>Beispieltitel</rdf:li> </rdf:Alt> </dc:title> after Gimp edit/save the default value is overwritten as above and all other variants are lost : <dc:title> <rdf:Alt> <rdf:li xml:lang='x-default'>x-default</rdf:li> </rdf:Alt> </dc:title> The problem is a XMP parsing problem, not XMP generation upon save, because after opening the image file under Gimp, the Image properties Advanced folder does show "x-default" for the DC Title value, not the correct text. NB: about severity : a bug leading to user data losses should probably be always flagged as severe or major. In that case, it's not because this XMP parser has a bunch of other problems leading to data losses, cf. bug #667038 and its duplicates. So probably all Gimp users handling XMP metadata have workaround strategies to protect their data, e.g. saving XMP sections in sidecar files before Gimp edition, and restoring after. Probably this report should be grouped with bug 667038 under a more general future goal « redesign of the XMP parser » ? NB2: is this on the way for future releases ? If not, is help wished and how ? Regards, Jean-François
Yes, help is very much wanted :) GIMP's metadata handling is pretty much aging home-brewn stuff and we'd like to switch the entire stuff to gexiv2 http://redmine.yorba.org/projects/gexiv2/wiki which is a GObject wrapper around Exiv2. But we are short on manpower. If you want to help, could you come to #gimp on irc.gimp.org? That's where most of the development coordination happens.
Hello Michael, thanks for the reply. Hem, "switch the entire stuff" is no longer a bugfix but becomes a regular subproject.:-) I'll reply, in a more detailed way, stating points on whihc I possibly could help and points on which I'll surely couldn't. By the way, searching a bit the bugzilla DB, I found my report is a duplicate. That problem has already been mentionned, bug #690089, Comment #7, 3rd paragraph. Regards, Jean-François
Created attachment 253839 [details] [review] gimp-metadata_master_2013-09-02.patch In my vacation I tried to make a new approach. 1. Exif changes from libexif to libgexiv2. 2. Metadata handling changes from plugin to core. 3. Metadata handling is not an editor (except some IPTC-tags) There is a lot of stuff to be done, but I would like to know, if it is worth coding on, or if I am running the wrong way. It works with master from 2013/09/02. Master should be configured with the "--without-libexif" parameter. The newest git-version of gexiv2 has to be used, because some functions have been added after releasing gexiv2-0.6.1. The "image-menu.patch" has to be installed, so the "Show Metadata" menu is shown.
Created attachment 253840 [details] image-menu.patch Patch for menus/image-menu.xml
You are officially hero of the month for pushing forward the gexiv2 stuff. Some quick comments (not that I have not looked closely yet): - let's depend on gexiv2 unconditionally, no #ifdefs - what about all the plugins and stuff? do you plan on porting this stuff too? - i'm not really fond of builder xml in the core, but let's see finally, please come to IRC so we can discuss that more profoundly, i'm not really the metadata expert around here. bottom line: we absolutely want to use gexiv2.
s/not/note/
I work with this conditional until its safe. Gexiv2 is not ready yet. the version to be used is 0.7.0 pre1... Plugins? Which plugins? :-) I will make the libexif stuff in file-jpeg obsolete, by porting it. The tiff stuff will be unchaged, because it is a functionality of libtiff. I was shocked about the file-psd. I don't know, if I can port. I'm not an expert of metadata, too. The parasite is not stored as xml :-| It is an "easy to use format", based on Strings... I have to work, so I am visiting IRC not very often.
Hello, @Hartmut, good new, and welcome. Well, I did trigger this thread to signal a problem (some other exists, e.g. the exif:Flash broken parsing) and offered help. Seems that my (long) e-mail ti Michael ended up in a spambox :-) Not very important as I'm not the right guy to help, very little knowledge of Gimp and C++ devolpper, so all my existing stuff is of little use. However, I do know metadata issues, I have some personal software around that, and some that use the exiv2 library. This leads to a few comments about questions raised by Hartmut. 1. About files specific stuff related to exif and xmp : the exiv2 library comes with ALL files related stuff, extract metadata sections from images files, write back metadata to images files. And the library supports a reasonnable large set of images formats, including JPEG, TIFF, PSD, and many others cf. http://dev.exiv2.org/wiki/exiv2/Supported_image_formats So, IMHO, file specific stuff, file-jpeg, file-psd, file-anything, would probably become obsolete. If the Gimp application is linked with libexiv2, supported files formats become exiv2 supported formats. 2. The libexiv2 acess data at image file level. Using it would probably require to rething the logic. Current Gimp read image data from files, and store « parasite » data for parsing. Using exiv2 would transform the logic in : - Gimp read only image data from file. For a JPEG, this would be the DHT, DQT, SOFn, SOI sections. - Gimp calls libexiv2, the API is something like readmetadatafromfile(filename) Nothing else to do, no parasite data to store, all metadata is stored internally by libexiv2. Similar for writing, this would require to, first, save all image data and close the file, then call libexiv2 which will reopen and write metadata sections. 3. Internal storage. All metadata is read, parsed, stored internally. This is very different from the current implementation where parsing is done in stream mode, and metadata stored in specifics one by one. The current tree model of the Gimp metadata plugin would become obsolete too. 4. What does mean, cf. #5, « builder xml in the core » ? If libexiv2 is used, all XML, XMP, RDF stuff is inside the library is fully transparent for the calling application. No need to add XML stuff. 5. About #3, « 3. Metadata handling is not an editor » Totally agree. Gimp is an excellent images editor and doesn't need at all to become a metadata editor. Metadata edition is a huge issue, many users have many different needs, and in most cases, there are only partial solutions. Also, users concerned by metadata edition have specific tools and can build their own scripts, using command line exiv2 or exiftool. 6. A question, for my information : how is stored metadata with the Gimp native format, .xcf files ? So what ? My feeling is that all that is an important work, because it will impact many parts of Gimp and perhaps lead to partial local redesign. Are they some intermediate steps ? Perhaps... Probably the most unconvenient problem is that Gimp has a broken XMP parser, leading to ignore XMP metadata due to parsing errors. And without metadata, upon image save or export, data is lost. The fix, for users, (thta's what I do), is to extract XMP data and save into a sidecar file before using Gimp. Later, after image edition, it's possible to write back XMP section into the file (command line exiv2 or exiftool do that). - Maybe a first step could be to disable XMP parsing and just keep the raw parasite section, ready to be writtent back later ? This way, nothing would be lost for those uses that use metadata. - A second step could to rewrite a XMP parser (broken module xmp-parse.c) - And final and great enhancement, a Gimp + exiv2 library, as proposed... (But available in a ... future) Regards, Jean-François
1. Exiv2 is a C++ library. Gimp is C based, and, furtheron, Gnome based, so Gimp uses the GLib very intensively. That would mean, that GLib C calls C++ library functions. That works, of course, but there is a library, that has done that before: GExiv2. 2. I don't understand, what you mean by "internal storage". The problem is, that Gimp is not an jpg editor, so dealing with metadata means to get metadata, change it (maybe) while editing and then write back to (maybe) a complete different file-format. That was the reason I started this project: I use Gimp for retouching Photos, so I use the lossless TIFF format from Lightroom, but all exif data is lost after saving the image. The metadata is persistently stored with the image, that means, if the file is saved as xcf, the persistent parasites are stored, too. 3. Parasites are most often texts. The representation of metadata is often xmp. So the metadata have to be read, converted in an xmp structure an stored as a parasite. To handle xmp data entries, the xmp structure must be parsed. I don't use xmp in my approach, I use an other text representation of the metadata. That sounds a little creepy, but I try to make it fast. The user shall not get aware of the things running in background. 4. I put it in core, because from core, any file can make use of metadata. I let gexiv2 decide, wheather a file format is supported or not. If not, no metadata. If, then metadata. I'm still working on it, I'm no expert in C or GLib C. It takes time. What works: Getting all exif data from a supported file. Storing all exif data in a supported file. Changing IPTC data. What does still not work: Deleting unused tags, like "Compression" for JPEG Rotate image by exif tag (working on it right now)
Hi Hartmut, 1. Oh yes, I was talking about libexiv2, not libgexiv2, it was just a shorthand notation for me. Gexiv2 is nothing but a C wrapping layer over libexiv2. But both are equivalent, depending on the calling code being written in C or C++. It doesn't change the design and logic. 2. When I wrote « internal storage », I meant that libexiv2 (or overlaid libgexiv2) features a data handler after parsing. The current Gimp metadata plugin has a XMP parser, and feature a separate data handler for storage and further access. So, an application using xxxx-exiv2 doesn't need to implement that. Once metadata has been read, the lib provides properties access, get/set. 3. I agree on the fact that implemented parsers are sloooow. But if Gimp is expected to be linked with libexiv2/libgexiv2, XMP encoding and decoding comes with the library, so it's « on the shelf », no ? 4. I probably misunderstood, I've a very little knowledge of Gimp's internals. Ok, thanks for your explanations. Regards, Jean-François
The separation of file loading/saving from the core into plug-ins exists for a reason. Therefore, the loading/saving of metadata from/to these files should happen in the respective plugins. Just ignore the current metadata parasites, there should either be a proper API in libgimp that can set/get the metadata encapsulated in GExiv2Metadata, or we come up with a new parasite. While an image is being edited, the metadata should be a property of the internal GimpImage object, the association to files is only relevant at load/save time. I suppose gexiv2 has some way to serialize/deserialize metadata into/from a blob of memory/string that can easily be exchanged between the core and plug-ins. There is also the issue of GIMP moving away from using files to URIs and GIO to deal with image files, maybe we have to talk the gexiv2 folks into adding a GFile or GIO stream based API. Jean-François, your mail wasn't lost, but sending me multi page mails is unfortunately a safe way to never get a response, I'm not a mail person, sorry about that.
Hi Michael, > Therefore, the loading/saving of metadata from/to > these files should happen in the respective plugins. With (G)Exiv2, metadata loading/parsing and saving use an API call which is independent of the file format. This should make things simple as the code to read or to save is exactly the same. > I suppose gexiv2 has some way to serialize/deserialize metadata into/from > a blob of memory/string that can easily be exchanged between the core > and plug-ins. I confirm, for XMP data only, not Exif. There are two calls, Xmpencode, Xmpdecode, to transform metadata into a text xpacket, or parse a xpacket. (The base exiv2 interface is a C++ string, and I guesse Gexiv2 remaps to a g_string.) Not for Exif or Iptc data, but the lib features functions to reencode into standard XMP schemes, for Tiff, Exif, Iptc, and extra schemes. So it's possible to produce a text stream with all pertinent metadata under XMP format. > Jean-François, your mail wasn't lost, but sending me multi page mails > is unfortunately a safe way to never get a response, I'm not a mail > person, sorry about that. Not a problem and sorry for the pollution.:-) Regards, Jean-François
This is not a discussion forum. I would enjoy talking with you in a forum like gimpchat.com
Managed to respond in the meantime, and thought about the basic prerequisites of storing the data in the GimpImage object, while dealing with IO on the plug-in side. I think for starters a very basic API that removes the need for parasite fiddling would be the bare minimum, something like: void gimp_image_set_metadata (gint32 image_id, GExiv2Metadata *metadata); GExiv2Metadata * gimp_image_get_metdata (gint32 image_id); This would be a libgimp API for plugins to use, and it would always be available (gexiv2 required unconditionally). Is my assumption correct that the opaque GExiv2Metadata object contains really *everything*?
gimpchat? What's wrong with IRC?
(In reply to comment #14) > Managed to respond in the meantime, and thought about the basic > prerequisites of storing the data in the GimpImage object, while > dealing with IO on the plug-in side. > > I think for starters a very basic API that removes the need for > parasite fiddling would be the bare minimum, something like: > > void gimp_image_set_metadata (gint32 image_id, > GExiv2Metadata *metadata); > GExiv2Metadata * gimp_image_get_metdata (gint32 image_id); > > This would be a libgimp API for plugins to use, and it would always > be available (gexiv2 required unconditionally). > > Is my assumption correct that the opaque GExiv2Metadata object > contains really *everything*? What about 'open as layer'? It is possible that an xcf file contains two or more layers each with its metadata/color profile etc. Isn't it better to use a gimp_item/drawable_set/get_metadata?
Created attachment 254612 [details] [review] gimp-core_metadata-2.9.1.patch This is the complete patch for gimp-master for metadata handling. This is not a metadata editor, except for important IPTC-Tags. 1. libgexiv2 0.6.1 is mandatory - configure.ac checks it 2. libexif is no more used, ./configure --without-libexif should be used, so plug-in metadata is not compiled. 3. metadata reading is done while opening a file, so every supported file format is supported automatically. 4. according to exif-orientation, the image is being rotated automatically. 5. A metadata dialog is shown in <image>/file/show metadata 6. metadata writing is done while saving a file, so every supported file format is supported automatically. Some tags are checked for validity, as far as I knew them. Other tags are written without change. 7. there are only few image actions that updates/writes metadata: scaling image and changing precision. 8. plug-ins can make use of metadata via: GExiv2Metadata *gimp_image_metadata_get_metadata (gint32 image_ID); void gimp_image_metadata_set_metadata (gint32 image_ID, GExiv2Metadata *metadata); including gexiv2 headers and linking to gexiv2 library is mandatory. What is still to be done (in my own opinion): a. changing export dialogs of supported file formats for the choice, not to save metadata (maybe for web export) (prepared via parasite, done already with file-jpeg) b. more image actions to update/write metadata c. maybe: make .xcf support metadata
Comment on attachment 254612 [details] [review] gimp-core_metadata-2.9.1.patch Very (very!) impressive, but can't go in like that. Too tired to fully review now, but most importantly, you diffed your branch against an old version of master, effectively reverting to that. Also, what are the changes to the rcm plugin doing in the patch? And many more comments, but later :)
Created attachment 254695 [details] [review] gimp-core_metadata-2.9.1.patch Ups, something went completely wrong. Sorry. Again.
Created attachment 255060 [details] [review] gexiv2-metadata-support.patch The next step is ready. I know, there is always another Kartoffel... This patch makes the former patch obsolete, it has the complete support for gexiv2-metadata 'onboard'. It works with master 2013-09-15 18:00:00 (GMT) 1. libexif is completely out of the code. 2. libgexiv2 is mandatory, without it, configure will fail. 3. metadata handling is in core, so supported formats formats will be supported automatically (read/write). I let exiv2 (via gexiv2) decide. this is: JPEG, PNG, TIFF, PSD (PSD after recompiling exiv2) 4. according to exif-orientation, the image is being rotated automatically. 5. A metadata dialog is shown in <image>/file/show metadata 6. the only image actions that updates/writes metadata: - scaling image, - rotating image - changing precision. 7. plug-ins can make use of metadata via: GExiv2Metadata *gimp_image_get_metadata (gint32 image_ID); void gimp_image_set_metadata (gint32 image_ID, GExiv2Metadata *metadata); including gexiv2 headers and linking to gexiv2 library is mandatory. 8. Some plug-ins are 'enhanced': JPEG: checkboxes for saving exif, xmp, thumbnail PNG: checkboxes for saving exif, xmp, thumbnail TIFF: checkboxes for saving exif, xmp, thumbnail PSD - no dialog, full saving is activated, I don't see, why someone should refuse to save a photoshop file without exif/xmp 9. IPTC tags are always saved. 10. XCF can save/load metadata, too, so they don't get lost, if a photo is opened, saved as xcf and later exported to, maybe, jpeg. For PSD support, the exiv2 library has to be recompiled. At least in my version 0.23, there is maybe a bug, that makes metadata from psd files read-only, although the cli exiv2 can read and write metadata. So I changed line 130 in image.cpp, where all supported formats are listed. it says: { ImageType::psd, newPsdInstance, isPsdType, amRead, amRead, amRead, amNone}, all 'amRead' must be changed to 'amReadWrite'. Be sure, not to change 'amNone' to 'amReadWrite', this makes an error... After that, metadata are saved to psd, too. I had to patch plug-ins/color-rotate also, because libgexiv defines a structure called 'Current', so compiling gimp stops with a double definition error.
Created attachment 256542 [details] [review] 2.9.1-gexiv2-metadata_13-10-05.patch According to the hints of mitch, I changed the code again: The serialization/deserialzation have moved to libgimpbase. The .am files are with tabs instead of spaces. The file-save functions are renewed. This patch works with gimp-master from 2013/10/05.
Created attachment 256962 [details] [review] Updated patch Here is the result of some hours of hacking on the patch: - introduce an opaque GimpMetadata typedef so gexiv2 only needs to be included in .c files when we actually use its api - move a lot of stuff to libgimpbase/gimpmetadata - simplify/fix the PDB wrappers - properly memory manage image->metadata - clean up formatting and all sorts of stuff a lot I stopped where I didn't know how exactly to proceed without talking to Hartmut, this is WIP.
Created attachment 257039 [details] [review] More cleanup Same patch with more cleanup and memory leaks fixed.
Comment on attachment 257039 [details] [review] More cleanup I moved the code to a git branch: metadata-wip-rebased, this patch is obsolete.
Fixed in master: commit 21bed1e2fb438fa5721bddb0573a724ae0024455 Author: Hartmut Kuhse <onkelhatti@gimp.org> Date: Sat Oct 19 18:38:01 2013 +0200 Completely rewrite metadata handling using gexiv2 Based on original patches from Hartmut Kuhse and modified by Michael Natterer. Changes include: - remove libexif dependency and add a hard dependency on gexiv2 - typedef GExiv2Metadata to GimpMetadata to avoid having to include gexiv2 globally - add basic GimpMetadata handling functions to libgimpbase - add image and image file specific metadata functions to libgimp, including the exif orientation image rotate dialog - port plug-ins to use the new APIs - port file-tiff-save's UI to GtkBuilder - add new plug-in "metadata" to view the image's metadata - keep metadata around as GimpImage member in the core - update the image's metadata on image size, resolution and precision changes - obsolete the old metadata parasites - migrate the old parasites to new GimpMetadata object on XCF load