After an evaluation, GNOME has moved from Bugzilla to GitLab. Learn more about GitLab.
No new issues can be reported in GNOME Bugzilla anymore.
To report an issue in a GNOME project, go to GNOME GitLab.
Do not go to GNOME Gitlab for: Bluefish, Doxygen, GnuCash, GStreamer, java-gnome, LDTP, NetworkManager, Tomboy.
Bug 704796 - Parsing problem with XMP language alternative fields leads to data loss
Parsing problem with XMP language alternative fields leads to data loss
Status: RESOLVED FIXED
Product: GIMP
Classification: Other
Component: Plugins
2.8.6
Other All
: Normal normal
: 2.10
Assigned To: GIMP Bugs
GIMP Bugs
Depends on:
Blocks:
 
 
Reported: 2013-07-24 10:22 UTC by Jean-François Rabasse
Modified: 2013-10-27 13:17 UTC
See Also:
GNOME target: ---
GNOME version: ---


Attachments
gimp-metadata_master_2013-09-02.patch (87.03 KB, patch)
2013-09-02 12:51 UTC, Hartmut Kuhse
none Details | Review
image-menu.patch (373 bytes, application/octet-stream)
2013-09-02 12:52 UTC, Hartmut Kuhse
  Details
gimp-core_metadata-2.9.1.patch (187.30 KB, patch)
2013-09-10 16:47 UTC, Hartmut Kuhse
needs-work Details | Review
gimp-core_metadata-2.9.1.patch (177.06 KB, patch)
2013-09-11 15:20 UTC, Hartmut Kuhse
none Details | Review
gexiv2-metadata-support.patch (300.56 KB, patch)
2013-09-16 18:35 UTC, Hartmut Kuhse
none Details | Review
2.9.1-gexiv2-metadata_13-10-05.patch (266.65 KB, patch)
2013-10-05 14:50 UTC, Hartmut Kuhse
none Details | Review
Updated patch (261.26 KB, patch)
2013-10-10 23:51 UTC, Michael Natterer
none Details | Review
More cleanup (261.37 KB, patch)
2013-10-11 18:03 UTC, Michael Natterer
none Details | Review

Description Jean-François Rabasse 2013-07-24 10:22:28 UTC
The metadata plugin doesn't seem to read correctly XMP fields of type language alternative. After Gimp edition and image saving, initial fields values are lost and replaced by the default language specifier.

Seen on Gimp versions 2.x including (my) latest 2.8.6
Untested with 2.9

Reproducible : always

Steps to reproduce :

1. Have a test image, say IMG1.JPG (test image means an image without XMP
   metadata or metadata that can be sacrificed (i.e. an image copy).)

2. Add a XMP field which is of lang-alt type, e.g. the Dublin Core title.
   Can be done with a command line metadata editor such as exiftool:
   exiftool -xmp-dc:title='Example Title' IMG1.JPG

3. Dump the XMP section of image (for visual control)
   exiftool -xmp -b IMG1.JPG

   Output should show a correct implementation :
   <dc:title>
     <rdf:Alt>
       <rdf:li xml:lang='x-default'>Example title</rdf:li>
     </rdf:Alt>
   </dc:title>

4. Edit/save the image with Gimp. No need to modify anything, a simple open
   + export to IMG2.JPG will do.

5. Dump the XMP section of the new image
   exiftool -xmp -b IMG2.JPG

   <dc:title>
     <rdf:Alt>
       <rdf:li xml:lang='x-default'>x-default</rdf:li>
     </rdf:Alt>
   </dc:title>

   The text value of the title property is lost and replaced by the language
   specifier. :-(


Also, when a XMP field contains several language variants (and the lang-alt
aims to that), e.g. :
  <dc:title>
   <rdf:Alt>
    <rdf:li xml:lang='x-default'>Example title</rdf:li>
    <rdf:li xml:lang='fr-FR'>Titre exemple</rdf:li>
    <rdf:li xml:lang='es-ES'>Ejemplo titulo'</rdf:li>
    <rdf:li xml:lang='de-DE'>Beispieltitel</rdf:li>
   </rdf:Alt>
  </dc:title>

after Gimp edit/save the default value is overwritten as above and all other variants are lost :
  <dc:title>
   <rdf:Alt>
    <rdf:li xml:lang='x-default'>x-default</rdf:li>
   </rdf:Alt>
  </dc:title>

The problem is a XMP parsing problem, not XMP generation upon save, because after opening the image file under Gimp, the Image properties Advanced folder does show "x-default" for the DC Title value, not the correct text.

NB: about severity : a bug leading to user data losses should probably be always flagged as severe or major.
In that case, it's not because this XMP parser has a bunch of other problems leading to data losses, cf. bug #667038 and its duplicates.
So probably all Gimp users handling XMP metadata have workaround strategies to protect their data, e.g. saving XMP sections in sidecar files before Gimp edition, and restoring after.
Probably this report should be grouped with bug 667038 under a more general future goal « redesign of the XMP parser » ?

NB2: is this on the way for future releases ? If not, is help wished and how ?


Regards,
Jean-François
Comment 1 Michael Natterer 2013-07-24 10:41:20 UTC
Yes, help is very much wanted :)

GIMP's metadata handling is pretty much aging home-brewn stuff and we'd
like to switch the entire stuff to gexiv2

http://redmine.yorba.org/projects/gexiv2/wiki

which is a GObject wrapper around Exiv2. But we are short on manpower.

If you want to help, could you come to #gimp on irc.gimp.org? That's
where most of the development coordination happens.
Comment 2 Jean-François Rabasse 2013-07-24 19:11:18 UTC
Hello Michael,
thanks for the reply.

Hem, "switch the entire stuff" is no longer a bugfix but becomes a regular subproject.:-)
I'll reply, in a more detailed way, stating points on whihc I possibly could help and points on which I'll surely couldn't.

By the way, searching a bit the bugzilla DB, I found my report is a duplicate.
That problem has already been mentionned, bug #690089, Comment #7, 3rd paragraph.

Regards,
Jean-François
Comment 3 Hartmut Kuhse 2013-09-02 12:51:12 UTC
Created attachment 253839 [details] [review]
gimp-metadata_master_2013-09-02.patch

In my vacation I tried to make a new approach.
1. Exif changes from libexif to libgexiv2.
2. Metadata handling changes from plugin to core.
3. Metadata handling is not an editor (except some IPTC-tags)

There is a lot of stuff to be done, but I would like to know, if it is worth coding on, or if I am running the wrong way.

It works with master from 2013/09/02. Master should be configured with the "--without-libexif" parameter.
The newest git-version of gexiv2 has to be used, because some functions have been added after releasing gexiv2-0.6.1.
The "image-menu.patch" has to be installed, so the "Show Metadata" menu is shown.
Comment 4 Hartmut Kuhse 2013-09-02 12:52:02 UTC
Created attachment 253840 [details]
image-menu.patch

Patch for menus/image-menu.xml
Comment 5 Michael Natterer 2013-09-02 20:04:47 UTC
You are officially hero of the month for pushing forward the gexiv2
stuff. Some quick comments (not that I have not looked closely yet):

- let's depend on gexiv2 unconditionally, no #ifdefs
- what about all the plugins and stuff? do you plan on porting this stuff too?
- i'm not really fond of builder xml in the core, but let's see

finally, please come to IRC so we can discuss that more profoundly,
i'm not really the metadata expert around here.

bottom line: we absolutely want to use gexiv2.
Comment 6 Michael Natterer 2013-09-02 20:06:09 UTC
s/not/note/
Comment 7 Hartmut Kuhse 2013-09-03 12:35:49 UTC
I work with this conditional until its safe.
Gexiv2 is not ready yet. the version to be used is 0.7.0 pre1...
Plugins? Which plugins? :-)
I will make the libexif stuff in file-jpeg obsolete, by porting it.
The tiff stuff will be unchaged, because it is a functionality of libtiff.
I was shocked about the file-psd. I don't know, if I can port.
I'm not an expert of metadata, too.
The parasite is not stored as xml :-| It is an "easy to use format", based on Strings...

I have to work, so I am visiting IRC not very often.
Comment 8 Jean-François Rabasse 2013-09-03 17:18:55 UTC
Hello,

@Hartmut, good new, and welcome.

Well, I did trigger this thread to signal a problem (some other exists, e.g. the
exif:Flash broken parsing) and offered help. Seems that my (long) e-mail ti Michael ended up in a spambox :-)
Not very important as I'm not the right guy to help, very little knowledge of Gimp and C++ devolpper, so all my existing stuff is of little use.

However, I do know metadata issues, I have some personal software around that, and some that use the exiv2 library. This leads to a few comments about questions raised by Hartmut.

1. About files specific stuff related to exif and xmp : the exiv2 library comes with ALL files related stuff, extract metadata sections from images files, write back metadata to images files. And the library supports a reasonnable large set of images formats, including JPEG, TIFF, PSD, and many others
 cf.  http://dev.exiv2.org/wiki/exiv2/Supported_image_formats

So, IMHO, file specific stuff, file-jpeg, file-psd, file-anything, would probably become obsolete. If the Gimp application is linked with libexiv2, supported files formats become exiv2 supported formats.

2. The libexiv2 acess data at image file level. Using it would probably require to rething the logic. Current Gimp read image data from files, and store « parasite » data for parsing. Using exiv2 would transform the logic in :
  - Gimp read only image data from file. For a JPEG, this would be the DHT,
    DQT, SOFn, SOI sections.
  - Gimp calls libexiv2, the API is something like
    readmetadatafromfile(filename)

Nothing else to do, no parasite data to store, all metadata is stored internally by libexiv2. 
Similar for writing, this would require to, first, save all image data and close the file, then call libexiv2 which will reopen and write metadata sections.

3. Internal storage. All metadata is read, parsed, stored internally. This is very different from the current implementation where parsing is done in stream mode, and metadata stored in specifics one by one. The current tree model of the Gimp metadata plugin would become obsolete too.

4. What does mean, cf. #5, « builder xml in the core » ?
If libexiv2 is used, all XML, XMP, RDF stuff is inside the library is fully transparent for the calling application. No need to add XML stuff.

5. About #3, « 3. Metadata handling is not an editor »
Totally agree. Gimp is an excellent images editor and doesn't need at all to become a metadata editor. Metadata edition is a huge issue, many users have many different needs, and in most cases, there are only partial solutions.
Also, users concerned by metadata edition have specific tools and can build their own scripts, using command line exiv2 or exiftool.

6. A question, for my information : how is stored metadata with the Gimp native format, .xcf files ?


So what ? My feeling is that all that is an important work, because it will impact many parts of Gimp and perhaps lead to partial local redesign.
Are they some intermediate steps ? Perhaps...

Probably the most unconvenient problem is that Gimp has a broken XMP parser, leading to ignore XMP metadata due to parsing errors. And without metadata, upon image save or export, data is lost.
The fix, for users, (thta's what I do), is to extract XMP data and save into a sidecar file before using Gimp. Later, after image edition, it's possible to write back XMP section into the file (command line exiv2 or exiftool do that).

- Maybe a first step could be to disable XMP parsing and just keep the raw parasite section, ready to be writtent back later ? This way, nothing would be lost for those uses that use metadata.

- A second step could to rewrite a XMP parser (broken module xmp-parse.c)

- And final and great enhancement, a Gimp + exiv2 library, as proposed...
(But available in a ... future)

Regards,
Jean-François
Comment 9 Hartmut Kuhse 2013-09-03 17:50:55 UTC
1. Exiv2 is a C++ library. Gimp is C based, and, furtheron, Gnome based, so Gimp uses the GLib very intensively.
That would mean, that GLib C calls C++ library functions. That works, of course, but there is a library, that has done that before: GExiv2.
2. I don't understand, what you mean by "internal storage". The problem is, that Gimp is not an jpg editor, so dealing with metadata means to get metadata, change it (maybe) while editing and then write back to (maybe) a complete different file-format. That was the reason I started this project: I use Gimp for retouching Photos, so I use the lossless TIFF format from Lightroom, but all exif data is lost after saving the image. The metadata is persistently stored with the image, that means, if the file is saved as xcf, the persistent parasites are stored, too.
3. Parasites are most often texts. The representation of metadata is often xmp. So the metadata have to be read, converted in an xmp structure an stored as a parasite. To handle xmp data entries, the xmp structure must be parsed. I don't use xmp in my approach, I use an other text representation of the metadata. That sounds a little creepy, but I try to make it fast. The user shall not get aware of the things running in background.
4. I put it in core, because from core, any file can make use of metadata. I let gexiv2 decide, wheather a file format is supported or not. If not, no metadata. If, then metadata.

I'm still working on it, I'm no expert in C or GLib C. It takes time.
What works:
Getting all exif data from a supported file.
Storing all exif data in a supported file.
Changing IPTC data.
What does still not work:
Deleting unused tags, like "Compression" for JPEG
Rotate image by exif tag (working on it right now)
Comment 10 Jean-François Rabasse 2013-09-03 18:37:16 UTC
Hi Hartmut,

1. Oh yes, I was talking about libexiv2, not libgexiv2, it was just a shorthand notation for me. Gexiv2 is nothing but a C wrapping layer over libexiv2. But both are equivalent, depending on the calling code being written in C or C++.
It doesn't change the design and logic.

2. When I wrote « internal storage », I meant that libexiv2 (or overlaid libgexiv2) features a data handler after parsing. The current Gimp metadata plugin has a XMP parser, and feature a separate data handler for storage and further access. So, an application using xxxx-exiv2 doesn't need to implement that. Once metadata has been read, the lib provides properties access, get/set.

3. I agree on the fact that implemented parsers are sloooow. But if Gimp is expected to be linked with libexiv2/libgexiv2, XMP encoding and decoding comes with the library, so it's « on the shelf », no ?

4. I probably misunderstood, I've a very little knowledge of Gimp's internals.

Ok, thanks for your explanations. 

Regards,
Jean-François
Comment 11 Michael Natterer 2013-09-03 19:16:33 UTC
The separation of file loading/saving from the core into plug-ins
exists for a reason. Therefore, the loading/saving of metadata from/to
these files should happen in the respective plugins. Just ignore the
current metadata parasites, there should either be a proper API in
libgimp that can set/get the metadata encapsulated in GExiv2Metadata,
or we come up with a new parasite.

While an image is being edited, the metadata should be a property of
the internal GimpImage object, the association to files is only
relevant at load/save time.

I suppose gexiv2 has some way to serialize/deserialize metadata into/from
a blob of memory/string that can easily be exchanged between the core
and plug-ins.

There is also the issue of GIMP moving away from using files to URIs
and GIO to deal with image files, maybe we have to talk the gexiv2
folks into adding a GFile or GIO stream based API.

Jean-François, your mail wasn't lost, but sending me multi page mails
is unfortunately a safe way to never get a response, I'm not a mail
person, sorry about that.
Comment 12 Jean-François Rabasse 2013-09-03 19:43:53 UTC
Hi Michael,

> Therefore, the loading/saving of metadata from/to
> these files should happen in the respective plugins.

With (G)Exiv2, metadata loading/parsing and saving use an API call which is independent of the file format. This should make things simple as the code to read or to save is exactly the same. 

> I suppose gexiv2 has some way to serialize/deserialize metadata into/from
> a blob of memory/string that can easily be exchanged between the core
> and plug-ins.

I confirm, for XMP data only, not Exif. There are two calls, Xmpencode, Xmpdecode, to transform metadata into a text xpacket, or parse a xpacket.
(The base exiv2 interface is a C++ string, and I guesse Gexiv2 remaps to a g_string.)
Not for Exif or Iptc data, but the lib features functions to reencode into standard XMP schemes, for Tiff, Exif, Iptc, and extra schemes. So it's possible to produce a text stream with all pertinent metadata under XMP format.

> Jean-François, your mail wasn't lost, but sending me multi page mails
> is unfortunately a safe way to never get a response, I'm not a mail
> person, sorry about that.

Not a problem and sorry for the pollution.:-)

Regards,
Jean-François
Comment 13 Hartmut Kuhse 2013-09-03 20:06:05 UTC
This is not a discussion forum. I would enjoy talking with you in a forum like gimpchat.com
Comment 14 Michael Natterer 2013-09-03 20:08:49 UTC
Managed to respond in the meantime, and thought about the basic
prerequisites of storing the data in the GimpImage object, while
dealing with IO on the plug-in side.

I think for starters a very basic API that removes the need for
parasite fiddling would be the bare minimum, something like:

void             gimp_image_set_metadata (gint32          image_id,
                                          GExiv2Metadata *metadata);
GExiv2Metadata * gimp_image_get_metdata  (gint32          image_id);

This would be a libgimp API for plugins to use, and it would always
be available (gexiv2 required unconditionally).

Is my assumption correct that the opaque GExiv2Metadata object
contains really *everything*?
Comment 15 Michael Natterer 2013-09-03 20:09:30 UTC
gimpchat? What's wrong with IRC?
Comment 16 Massimo 2013-09-04 11:21:17 UTC
(In reply to comment #14)
> Managed to respond in the meantime, and thought about the basic
> prerequisites of storing the data in the GimpImage object, while
> dealing with IO on the plug-in side.
> 
> I think for starters a very basic API that removes the need for
> parasite fiddling would be the bare minimum, something like:
> 
> void             gimp_image_set_metadata (gint32          image_id,
>                                           GExiv2Metadata *metadata);
> GExiv2Metadata * gimp_image_get_metdata  (gint32          image_id);
> 
> This would be a libgimp API for plugins to use, and it would always
> be available (gexiv2 required unconditionally).
> 
> Is my assumption correct that the opaque GExiv2Metadata object
> contains really *everything*?

What about 'open as layer'?

It is possible that an xcf file contains two or more layers 
each with its metadata/color profile etc.

Isn't it better to use a gimp_item/drawable_set/get_metadata?
Comment 17 Hartmut Kuhse 2013-09-10 16:47:11 UTC
Created attachment 254612 [details] [review]
gimp-core_metadata-2.9.1.patch

This is the complete patch for gimp-master for metadata handling.
This is not a metadata editor, except for important IPTC-Tags.
1. libgexiv2 0.6.1 is mandatory - configure.ac checks it
2. libexif is no more used, ./configure --without-libexif should be used, so plug-in metadata is not compiled.
3. metadata reading is done while opening a file, so every supported file format is supported automatically.
4. according to exif-orientation, the image is being rotated automatically.
5. A metadata dialog is shown in <image>/file/show metadata
6. metadata writing is done while saving a file, so every supported file format is supported automatically. Some tags are checked for validity, as far as I knew them. Other tags are written without change.
7. there are only few image actions that updates/writes metadata: scaling image and changing precision.
8. plug-ins can make use of metadata via:
GExiv2Metadata *gimp_image_metadata_get_metadata (gint32          image_ID);

void            gimp_image_metadata_set_metadata (gint32          image_ID,
                                                  GExiv2Metadata *metadata); 
including gexiv2 headers and linking to gexiv2 library is mandatory.
What is still to be done (in my own opinion):
a. changing export dialogs of supported file formats for the choice, not to save metadata (maybe for web export) (prepared via parasite, done already with file-jpeg)
b. more image actions to update/write metadata
c. maybe: make .xcf support metadata
Comment 18 Michael Natterer 2013-09-10 20:19:05 UTC
Comment on attachment 254612 [details] [review]
gimp-core_metadata-2.9.1.patch

Very (very!) impressive, but can't go in like that. Too tired to fully
review now, but most importantly, you diffed your branch against an
old version of master, effectively reverting to that. Also, what are
the changes to the rcm plugin doing in the patch?

And many more comments, but later :)
Comment 19 Hartmut Kuhse 2013-09-11 15:20:36 UTC
Created attachment 254695 [details] [review]
gimp-core_metadata-2.9.1.patch

Ups, something went completely wrong. Sorry.
Again.
Comment 20 Hartmut Kuhse 2013-09-16 18:35:40 UTC
Created attachment 255060 [details] [review]
gexiv2-metadata-support.patch

The next step is ready. I know, there is always another Kartoffel...
This patch makes the former patch obsolete, it has the complete
support for gexiv2-metadata 'onboard'. It works with master
2013-09-15 18:00:00 (GMT)

1. libexif is completely out of the code.
2. libgexiv2 is mandatory, without it, configure will fail.
3. metadata handling is in core, so supported formats formats will
be supported automatically (read/write). I let exiv2 (via gexiv2) decide.
this is: JPEG, PNG, TIFF, PSD (PSD after recompiling exiv2)
4. according to exif-orientation, the image is being rotated automatically.
5. A metadata dialog is shown in <image>/file/show metadata
6. the only image actions that updates/writes metadata:
- scaling image,
- rotating image
- changing precision.
7. plug-ins can make use of metadata via:
GExiv2Metadata *gimp_image_get_metadata (gint32          image_ID);

void            gimp_image_set_metadata (gint32          image_ID,
                                         GExiv2Metadata *metadata); 
including gexiv2 headers and linking to gexiv2 library is mandatory.
8. Some plug-ins are 'enhanced':
JPEG: checkboxes for saving exif, xmp, thumbnail
PNG: checkboxes for saving exif, xmp, thumbnail
TIFF: checkboxes for saving exif, xmp, thumbnail
PSD - no dialog, full saving is activated, I don't see, why someone
should refuse to save a photoshop file without exif/xmp
9. IPTC tags are always saved.
10. XCF can save/load metadata, too, so they don't get lost, if a photo
is opened, saved as xcf and later exported to, maybe, jpeg.

For PSD support, the exiv2 library has to be recompiled. At least in my
version 0.23, there is maybe a bug, that makes metadata from psd files 
read-only, although the cli exiv2 can read and write metadata.
So I changed line 130 in image.cpp, where all supported formats are listed.
it says:
{ ImageType::psd,  newPsdInstance,  isPsdType,  amRead, amRead, amRead, amNone},
all 'amRead' must be changed to 'amReadWrite'.
Be sure, not to change 'amNone' to 'amReadWrite', this makes an error...

After that, metadata are saved to psd, too.

I had to patch plug-ins/color-rotate also, because libgexiv defines a 
structure called 'Current', so compiling gimp stops with a double definition
error.
Comment 21 Hartmut Kuhse 2013-10-05 14:50:34 UTC
Created attachment 256542 [details] [review]
2.9.1-gexiv2-metadata_13-10-05.patch

According to the hints of mitch, I changed the code again:

The serialization/deserialzation have moved to libgimpbase.
The .am files are with tabs instead of spaces.
The file-save functions are renewed.

This patch works with gimp-master from 2013/10/05.
Comment 22 Michael Natterer 2013-10-10 23:51:52 UTC
Created attachment 256962 [details] [review]
Updated patch

Here is the result of some hours of hacking on the patch:

- introduce an opaque GimpMetadata typedef so gexiv2 only needs to
  be included in .c files when we actually use its api
- move a lot of stuff to libgimpbase/gimpmetadata
- simplify/fix the PDB wrappers
- properly memory manage image->metadata
- clean up formatting and all sorts of stuff a lot

I stopped where I didn't know how exactly to proceed without talking
to Hartmut, this is WIP.
Comment 23 Michael Natterer 2013-10-11 18:03:08 UTC
Created attachment 257039 [details] [review]
More cleanup

Same patch with more cleanup and memory leaks fixed.
Comment 24 Michael Natterer 2013-10-15 21:04:11 UTC
Comment on attachment 257039 [details] [review]
More cleanup

I moved the code to a git branch: metadata-wip-rebased, this patch is obsolete.
Comment 25 Michael Natterer 2013-10-27 13:17:22 UTC
Fixed in master:

commit 21bed1e2fb438fa5721bddb0573a724ae0024455
Author: Hartmut Kuhse <onkelhatti@gimp.org>
Date:   Sat Oct 19 18:38:01 2013 +0200

    Completely rewrite metadata handling using gexiv2
    
    Based on original patches from Hartmut Kuhse and modified
    by Michael Natterer. Changes include:
    
    - remove libexif dependency and add a hard dependency on gexiv2
    - typedef GExiv2Metadata to GimpMetadata to avoid having to
      include gexiv2 globally
    - add basic GimpMetadata handling functions to libgimpbase
    - add image and image file specific metadata functions to libgimp,
      including the exif orientation image rotate dialog
    - port plug-ins to use the new APIs
    - port file-tiff-save's UI to GtkBuilder
    - add new plug-in "metadata" to view the image's metadata
    - keep metadata around as GimpImage member in the core
    - update the image's metadata on image size, resolution and precision
      changes
    - obsolete the old metadata parasites
    - migrate the old parasites to new GimpMetadata object on XCF load