GNOME Bugzilla – Bug 711241
Broken or unknown metadata tag should not cancel the whole metadata loading
Last modified: 2013-11-11 00:02:12 UTC
In bug 710937, a user provided a jpeg straight out a Canon camera. When I open it in master, I get the following popup error: ----------------------------------------------------------------- Calling error for procedure 'gimp-image-set-metadata': Procedure 'gimp-image-set-metadata' has been called with value '<?xml version='1.0' encoding='UTF-8'?> <metadata> <tag name="Exif.Canon.0x0003">0 0 0 0</tag> <tag name="Exif.Canon.0x0019">1</tag> <tag name="Exif.Canon.0x0035">0 0 0 0</tag> <tag name="Exif.Canon.0x0098">1551 0 1 0</tag> <tag name="Exif.Canon.0x009a">23790592 67109417 33162240 67109255 46924800</tag> <tag name="Exif.Canon.0x4008">0 0 0</tag> <tag name="Exif.Canon.0x4009">0 0 0</tag> <tag name="Exif.Canon.0x4010"></tag> <tag name="Exif.Canon.0x4011">0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 44 0 0 0 0 0 0 0 0 0 0 0 10 0 0 0 255 255 255 255 0 0 0 0 10 0 0 0 0 0 0 0 10 0 0 0 0 0 0 0 10 0 0 0 0 32 196 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 204 16 64 20</tag> <tag name="Exif.Canon.0x4012"> (invalid UTF-8 string) ----------------------------------------------------------------- The image still loads fine though but clicking the "Image Metadata" item fails saying there is no metadata attached to this image. I've tried to search, and don't really know what this Exif.Canon.0x4012 tag is. And anyway the XML encoding is advertised UTF-8, so there is definitely an issue in the source (is this XML taken from the image or is it generated by (G)Exiv2?). But in any case, even if the broken metadata is in the image, GIMP should still at least load the other tags somehow. A call to `exiftool -a -u -g1 IMG_4779.JPG` works fine with a whole bunch of information (Canon 0x4012 is also there, just empty). So it is definitely possible to save the rest of the information.
Damn, the attachment is too big. The user provided an upload there: https://app.box.com/s/3j2u9nu7hcr9fgpz1two Please download IMG_4779.JPG.
That can only mean that gexiv2_metadata_get_tag_string() does not always return utf-8. We feed it into g_markup_escape_text() which expects utf-8 input.
This makes metadata serialization robust no matter what comes out of gexiv2, or how we misinterpret it. Leaving open until we figure the root of the problem. commit 798c62a54486916c69141463980a4497aea14b98 Author: Michael Natterer <mitch@gimp.org> Date: Fri Nov 1 14:15:15 2013 +0100 Bug 711241 - Broken or unknown metadata tag should not cancel... ...the whole metadata loading Don't serialize a value that does not UTF-8-validate to XML. This is not a real fix, but no matter what we do here in the future, UTF-8 validation should always be part of the serialization, in order to avoid passing broken data into the core. libgimpbase/gimpmetadata.c | 61 ++++++++++++++++++++++++++++++++++++++-------------------- 1 file changed, 40 insertions(+), 21 deletions(-)
Cool. I tested and confirm now the image returns its metadata (and a nice warning in the console for the invalid field). Now about gexiv2_metadata_get_tag_string() not always returning UTF-8, is that expected? I can't see any kind of encoding process in GExiv2 code anyway and neither in Exiv2 documentation. Does it mean that we should know the meaning of the key, and depending on it, that's our role to do the appropriate conversion?
Don't drop the tags, instead encode them as base64. This should be able to handle whatever comes out of gexiv2, closing as FIXED. commit 33a8d68117a1ade59279102935ca128a25ec04d3 Author: Michael Natterer <mitch@gimp.org> Date: Mon Nov 11 00:11:43 2013 +0100 Bug 711241 - Broken or unknown metadata tag should not cancel... ...the whole metadata loading Don't drop non-utf8 values from gexiv2 when serializing to XML, instead, base64 encode them. This should be robust against whatever garbage data is in tags. libgimpbase/gimpmetadata.c | 98 +++++++++++++++++++++++++++++++++------------- 1 file changed, 71 insertions(+), 27 deletions(-)