After an evaluation, GNOME has moved from Bugzilla to GitLab. Learn more about GitLab.
No new issues can be reported in GNOME Bugzilla anymore.
To report an issue in a GNOME project, go to GNOME GitLab.
Do not go to GNOME Gitlab for: Bluefish, Doxygen, GnuCash, GStreamer, java-gnome, LDTP, NetworkManager, Tomboy.
Bug 787589 - Add GExiv2 based extractor for RAW files
Add GExiv2 based extractor for RAW files
Status: RESOLVED FIXED
Product: tracker
Classification: Core
Component: Extractor
unspecified
Other All
: Normal enhancement
: ---
Assigned To: tracker-extractor
tracker-extractor
Depends on:
Blocks: 751212
 
 
Reported: 2017-09-12 15:19 UTC by Debarshi Ray
Modified: 2018-02-07 00:12 UTC
See Also:
GNOME target: ---
GNOME version: ---


Attachments
tracker-extract: Add GExiv2 based extractor for RAW files (21.50 KB, patch)
2017-10-20 11:18 UTC, Debarshi Ray
none Details | Review
tracker-extract: Add GExiv2 based extractor for RAW files (20.98 KB, patch)
2017-11-03 07:43 UTC, Debarshi Ray
none Details | Review
tracker-extract: Add GExiv2 based extractor for RAW files (21.02 KB, patch)
2017-12-05 12:46 UTC, Debarshi Ray
none Details | Review
functional-tests: Add a test for the RAW extractor (13.89 KB, patch)
2017-12-05 12:47 UTC, Debarshi Ray
committed Details | Review
gexiv2 test case to play with missing FNumber tags (752 bytes, text/plain)
2018-01-05 10:57 UTC, Debarshi Ray
  Details
tracker-extract: Add GExiv2 based extractor for RAW files (21.83 KB, patch)
2018-01-05 14:23 UTC, Debarshi Ray
accepted-commit_now Details | Review
tracker-extract: Add GExiv2 based extractor for RAW files (21.83 KB, patch)
2018-01-27 22:26 UTC, Debarshi Ray
none Details | Review
Update COPYING (963 bytes, patch)
2018-01-29 21:47 UTC, Debarshi Ray
committed Details | Review
tracker-extract: Add GExiv2 based extractor for RAW files (22.43 KB, patch)
2018-01-29 21:47 UTC, Debarshi Ray
committed Details | Review

Description Debarshi Ray 2017-09-12 15:19:24 UTC
It will be good to have a LibRaw based extractor for RAW files generated by various digital SLR cameras. Without such an extractor, gnome-photos is unable to  orient such images that have any other orientation value than 'orientation-top'.
Comment 1 Debarshi Ray 2017-09-12 15:21:01 UTC
I have pushed an initial very rough skeleton to:
tracker-miners:wip/rishi/libraw
Comment 2 Debarshi Ray 2017-09-17 00:38:34 UTC
I might have spoken too soon about using LibRaw. I took a quick look at its API and it wasn't clear how to extract the buffer containing the EXIF, XMP and IPTC data with it. On the other hand, gexiv2/exiv2 might be a better choice. It takes care of decoding the image file, so all one has to do is provide the path.

In fact, we can carry on and have a generic gexiv2/exiv2 based decoder for most image file formats, and drop the dependencies on libjpeg, libpng, etc.. It might be good to stay away from those long/setjmping libraries. The only exceptions are GIFs and PDFs - it doesn't support extracting XMP from them.

Here is the support matrix for gexiv2/exiv2:
http://dev.exiv2.org/projects/exiv2/wiki/Supported_image_formats

I was convinced by mitch that exiv2 is the gold standard in image metadata handling. It is used by darktable, gimp and shotwell, which is why gnome-photos uses it too for those rare cases when it needs to directly touch the metadata in the image files. It'll be nice to consolidate the number of image metadata libraries in the GNOME platform.
Comment 3 Debarshi Ray 2017-09-17 00:39:12 UTC
Renamed the WIP branch to tracker-miners:wip/rishi/exiv2.
Comment 4 Debarshi Ray 2017-10-20 11:18:24 UTC
Created attachment 361931 [details] [review]
tracker-extract: Add GExiv2 based extractor for RAW files

This parses only Exif out of RAW files using GExiv2. No XMP and IPTC parsing at the moment, and needs lots more testing.
Comment 5 Debarshi Ray 2017-11-03 07:43:19 UTC
Created attachment 362872 [details] [review]
tracker-extract: Add GExiv2 based extractor for RAW files
Comment 6 Debarshi Ray 2017-11-03 07:51:10 UTC
I'd like some feedback on how to structure the extractor with respect to writing a test case.

I see that tracker-exif-test.c uses the infrastructure inside src/libtracker-extract. Since the RAW extractor uses (g)exiv2, not libexif, it doesn't use the same infrastructure. Even though we may want to move to (g)exiv2, I didn't want to change everything in one go - a series of incremental changes seemed like a better option. In the meantime, how should I construct a test for the RAW extractor? Should we add parallel (g)exiv2 based code paths and APIs to libtracker-extract/tracker-extract? That would be a bit ugly, maybe? :)
Comment 7 Carlos Garnacho 2017-11-06 15:42:52 UTC
Hmmm. I think it would be good to move all these helpers to src/tracker-extract, and leave src/libtracker-extract for the stuff that's actually shared (TrackerModuleManager, basically), all the rest is only used by tracker-extract.

But that doesn't really answer the question :p. Given the patch implements a tracker-extract module of its own, this probably belongs to tests/functional-tests/test-extraction-data, but if the final intent is using this instead of libexif, probably a parallel API might not be a too far intermediate step.
Comment 8 Debarshi Ray 2017-12-05 12:45:40 UTC
(In reply to Carlos Garnacho from comment #7)
> Given the patch implements a
> tracker-extract module of its own, this probably belongs to
> tests/functional-tests/test-extraction-data, but if the final intent is
> using this instead of libexif, probably a parallel API might not be a too
> far intermediate step.

Ok, thanks Carlos. I have added a functional-test for the immediate short-term, but it currently doesn't work due to bug 791259.
Comment 9 Debarshi Ray 2017-12-05 12:46:56 UTC
Created attachment 365023 [details] [review]
tracker-extract: Add GExiv2 based extractor for RAW files

Updated the license blurb to be GPLv2+, instead of LGPLv2+, because (g)exiv2 is GPLv2+.
Comment 10 Debarshi Ray 2017-12-05 12:47:25 UTC
Created attachment 365024 [details] [review]
functional-tests: Add a test for the RAW extractor
Comment 11 Carlos Garnacho 2017-12-16 12:36:45 UTC
Review of attachment 365023 [details] [review]:

Looks mostly good! This is certainly not the only GPL piece, might be good to add some mention on the COPYING file though.

::: src/tracker-extract/tracker-extract-raw.c
@@ +149,3 @@
+	case 0x0010: /* Flash in compulsory mode, did not fire */
+	case 0x0018: /* Flash in auto mode, did not fire */
+	case 0x0058: /* Only red-eye reduction mode */

Doesn't gexiv2 provide any enum for this? Maybe the extractor can have its internal one so code is more self explanatory without help of comments.

@@ +166,3 @@
+
+	switch (metering_mode_value) {
+	case 1:

Same here

@@ +357,3 @@
+		tracker_resource_set_uri (resource, "nmm:whiteBalance", ed->white_balance);
+
+	if (ed->fnumber != -1.0)

Hmm, here you apparently try to catch the default set in raw_exif_data_new(), but it seems unconditionally overwritten in parse_exif_data(), is "!= -1.0" a safe comparison to catch broken values from the library?

@@ +363,3 @@
+		tracker_resource_set_uri (resource, "nmm:flash", ed->flash);
+
+	if (ed->focal_length != -1.0)

Same here.

@@ +392,3 @@
+		tracker_guarantee_resource_utf8_string (resource, "nie:comment", ed->user_comment);
+
+	if (ed->gps_altitude > -G_MAXDOUBLE || ed->gps_latitude > -G_MAXDOUBLE || ed->gps_longitude > -G_MAXDOUBLE) {

I'm sure the range of sane values is a lot more constrained than that :). But nevermind, would be nice to get some data validation layer.

@@ +417,3 @@
+		gdouble value;
+
+		if (ed->resolution_unit != 3)

Assuming 3 means "inches". Is anything else guaranteed to be cm? Is there a single other possible value? Maybe another case for a define/enum...
Comment 12 Carlos Garnacho 2017-12-16 12:38:26 UTC
Review of attachment 365024 [details] [review]:

Very nice :). Marked as a-c-n but should probably wait for the other patch.
Comment 13 Debarshi Ray 2018-01-03 17:35:12 UTC
(In reply to Carlos Garnacho from comment #11)

Thanks for the review!

> ::: src/tracker-extract/tracker-extract-raw.c
> @@ +149,3 @@
> +	case 0x0010: /* Flash in compulsory mode, did not fire */
> +	case 0x0018: /* Flash in auto mode, did not fire */
> +	case 0x0058: /* Only red-eye reduction mode */
> 
> Doesn't gexiv2 provide any enum for this? Maybe the extractor can have its
> internal one so code is more self explanatory without help of comments.
> 
> @@ +166,3 @@
> +
> +	switch (metering_mode_value) {
> +	case 1:
> 
> Same here

I borrowed these constants from src/libtracker-extract/tracker-exif.c. I couldn't spot anything in the header files of libexif and (g)exiv2, so, I guess, we need to carry our own private enum.

I attached patches to bug 792178 to fix src/libtracker-extract/tracker-exif.c.
Comment 14 Debarshi Ray 2018-01-03 19:54:03 UTC
(In reply to Carlos Garnacho from comment #11)

> @@ +357,3 @@
> +		tracker_resource_set_uri (resource, "nmm:whiteBalance",
> ed->white_balance);
> +
> +	if (ed->fnumber != -1.0)
> 
> Hmm, here you apparently try to catch the default set in
> raw_exif_data_new(), but it seems unconditionally overwritten in
> parse_exif_data(), is "!= -1.0" a safe comparison to catch broken values
> from the library?

Yes, gexiv2_metadata_get_fnumber is documented to return -1.0 if the tag is absent or invalid. I checked the wrapper code and it matches the documentation.

Assuming that absent means the tag is missing and invalid is when the value can't be converted to a x/y rational form, I see that exiv2 can throw an exception, in which case a WARNING is logged. However, it's not clear when that can happen though. Skimming through the exiv2 headers, I failed to see which of the involved methods are marked with "throws".

I'll see if I can replicate a RAW file without FNumber, and then try it out.

(I don't think the gexiv2 documentation is available via developer.gnome.org, but it does show up in devhelp these days.)

> @@ +363,3 @@
> +		tracker_resource_set_uri (resource, "nmm:flash", ed->flash);
> +
> +	if (ed->focal_length != -1.0)
> 
> Same here.

It's the same as gexiv2_metadata_get_fnumber.
Comment 15 Debarshi Ray 2018-01-05 10:56:38 UTC
(In reply to Debarshi Ray from comment #14)
> (In reply to Carlos Garnacho from comment #11)
> 
> > @@ +357,3 @@
> > +		tracker_resource_set_uri (resource, "nmm:whiteBalance",
> > ed->white_balance);
> > +
> > +	if (ed->fnumber != -1.0)
> > 
> > Hmm, here you apparently try to catch the default set in
> > raw_exif_data_new(), but it seems unconditionally overwritten in
> > parse_exif_data(), is "!= -1.0" a safe comparison to catch broken values
> > from the library?
> 
> Yes, gexiv2_metadata_get_fnumber is documented to return -1.0 if the tag is
> absent or invalid. I checked the wrapper code and it matches the
> documentation.
> 
> Assuming that absent means the tag is missing and invalid is when the value
> can't be converted to a x/y rational form, I see that exiv2 can throw an
> exception, in which case a WARNING is logged. However, it's not clear when
> that can happen though. Skimming through the exiv2 headers, I failed to see
> which of the involved methods are marked with "throws".
> 
> I'll see if I can replicate a RAW file without FNumber, and then try it out.

Since RAW files are tricky to edit, I tried this with a JPEG file since the higher level (g)exiv2 code paths would be the same.

I used gexiv2_metadata_clear_tag to remove the following tags:
  * Exif.Image.FNumber
  * Exif.Photo.FNumber
  * Exif.Photo.MaxApertureValue

(I actually stuck those in photos_base_item_save_metadata_in_thread_func in gnome-photos, where gexiv2 is already used while exporting any file.)

Then I used the attached test case to call gexiv2_metadata_get_fnumber, and I didn't get a WARNING. So, my assumption is that the WARNING is only for cases where a tag is present but the value is not a rational number.
Comment 16 Debarshi Ray 2018-01-05 10:57:41 UTC
Created attachment 366371 [details]
gexiv2 test case to play with missing FNumber tags
Comment 17 Debarshi Ray 2018-01-05 14:23:16 UTC
Created attachment 366380 [details] [review]
tracker-extract: Add GExiv2 based extractor for RAW files

Added enumerated constants, and a minor build-fix to keep up with master.

I haven't updated COPYING, yet. Did you want me to restrict the ranges of the GPS values?
Comment 18 Debarshi Ray 2018-01-27 22:26:57 UTC
Created attachment 367540 [details] [review]
tracker-extract: Add GExiv2 based extractor for RAW files
Comment 19 Carlos Garnacho 2018-01-28 21:03:19 UTC
Review of attachment 366380 [details] [review]:

Looks nice! Found a couple of nits, but feel free to fix in place before pushing.

::: src/tracker-extract/tracker-extract-raw.c
@@ +30,3 @@
+#include "tracker-main.h"
+
+#define CM_TO_INCH              2.54

would be nice to get this updated to CMS_PER_INCH before pushing :)

@@ +418,3 @@
+
+		location = tracker_resource_new (NULL);
+		tracker_resource_set_uri (location, "rdf:type", "slo:GeoLocation");

Converting doubles to strings is a bit of a pain, but maybe would be nicer to use tracker_extract_new_location() from libtracker-extract?

@@ +439,3 @@
+		gdouble value;
+
+		if (ed->resolution_unit != 3)

I now miss the shiny resolution enums here :)
Comment 20 Carlos Garnacho 2018-01-28 21:07:22 UTC
brrr, I think I commented on the wrong patch. Except the CMS_PER_INCH comment, the rest applies.
Comment 21 Carlos Garnacho 2018-01-28 22:01:52 UTC
(In reply to Debarshi Ray from comment #15)
> > Yes, gexiv2_metadata_get_fnumber is documented to return -1.0 if the tag is
> > absent or invalid. I checked the wrapper code and it matches the
> > documentation.
> > 
> > Assuming that absent means the tag is missing and invalid is when the value
> > can't be converted to a x/y rational form, I see that exiv2 can throw an
> > exception, in which case a WARNING is logged. However, it's not clear when
> > that can happen though. Skimming through the exiv2 headers, I failed to see
> > which of the involved methods are marked with "throws".
> > 
> > I'll see if I can replicate a RAW file without FNumber, and then try it out.
> 
> Since RAW files are tricky to edit, I tried this with a JPEG file since the
> higher level (g)exiv2 code paths would be the same.
> 
> I used gexiv2_metadata_clear_tag to remove the following tags:
>   * Exif.Image.FNumber
>   * Exif.Photo.FNumber
>   * Exif.Photo.MaxApertureValue
> 
> (I actually stuck those in photos_base_item_save_metadata_in_thread_func in
> gnome-photos, where gexiv2 is already used while exporting any file.)
> 
> Then I used the attached test case to call gexiv2_metadata_get_fnumber, and
> I didn't get a WARNING. So, my assumption is that the WARNING is only for
> cases where a tag is present but the value is not a rational number.

Nice job :). I was mostly concerned about checking for equality on doubles to catch invalid values, but I find it reassuring that -1.0 verbatim is used, and that it's not that easy to trigger.

(In reply to Debarshi Ray from comment #17)
> Created attachment 366380 [details] [review] [review]
> tracker-extract: Add GExiv2 based extractor for RAW files
> 
> Added enumerated constants, and a minor build-fix to keep up with master.
> 
> I haven't updated COPYING, yet. Did you want me to restrict the ranges of
> the GPS values?

Let's do the GPS checks on a future commit. If you move the patch to using libtracker-extract's helper, I think it could make sense to have gdouble arguments and do g_return_val_if_fail() range validation there. But I don't think this is a showstopper for RAW images.

But a minor blurb mentioning GExiv extractor as an exception to tracker-extract in COPYING is still welcome :).
Comment 22 Debarshi Ray 2018-01-29 21:47:08 UTC
Created attachment 367609 [details] [review]
Update COPYING
Comment 23 Debarshi Ray 2018-01-29 21:47:37 UTC
Created attachment 367610 [details] [review]
tracker-extract: Add GExiv2 based extractor for RAW files
Comment 24 Carlos Garnacho 2018-02-07 00:12:11 UTC
All looks good, and is now pushed prior to release :)

Attachment 365024 [details] pushed as f471f06 - functional-tests: Add a test for the RAW extractor
Attachment 367609 [details] pushed as c8bd980 - Update COPYING
Attachment 367610 [details] pushed as 48c3369 - tracker-extract: Add GExiv2 based extractor for RAW files