GNOME Bugzilla – Bug 719253
Duplicate detection should detect previous versions of an image (md5 of image content only)
Last modified: 2021-05-19 13:56:39 UTC
---- Reported by shotwell-maint@gnome.bugs 2013-05-19 14:27:00 -0700 ---- Original Redmine bug id: 6970 Original URL: http://redmine.yorba.org/issues/6970 Searchable id: yorba-bug-6970 Original author: Aaron W Original description: I am running Shotwell 0.14.1 on Ubuntu Raring (13.04). If I do the following: 1) change preferences to write changes to tags/metadata to the file; 2) export a copy of a picture as "picture_notags.jpg"; 3) add a tag to the picture of "test1"; 4) export a copy of the picture as "picture_tag_test1.jpg"; 5) add a tag to the picture of "test2"; 6) import the two photos from the folder; then the "notags" and "test1" versions of the file will be imported, giving three versions of the file in total. If one thinks about this, the user has made each of these changes (adding/removing tags, changing dates etc) chronologically and has therefore deliberately changed the file from the one that is being imported. A good starting assumption is that the user does not want an additional previous version of a changed file. In my view, the hashes (or however you are currently checking for duplicates) of previous versions of files should also be kept, so that these can be identified in addition to current versions. I accept that there are situations where you may want to re-add files as they were, rather than as they are. I therefore suggest that, if duplicates to previous versions are detected, a dialogue along the following lines is displayed: " 50 photos were successfully imported. 35 duplicates were not imported. 20 photos are earlier versions of photos that are in your library (eg before you made changes to metadata/tags). Do you wish to import these earlier versions of photos?" The lack of this feature catches me out a lot, as I am paranoid about my photos and often re-import my photos from various sources to ensure I don't lose any. The result is a mess of duplicate photos that don't incorporate tags /date-fixes. Related issues: related to shotwell - Feature #3487: More sophisticated, configurable handling of duplicates (Open) related to shotwell - 7027: Tagged pictures are duplicated upon reimportation (Open) --- Bug imported by chaz@yorba.org 2013-11-25 22:11 UTC --- This bug was previously known as _bug_ 6970 at http://redmine.yorba.org/show_bug.cgi?id=6970 Unknown version " in product shotwell. Setting version to "!unspecified". Unknown milestone "unknown in product shotwell. Setting to default milestone for this product, "---". Setting qa contact to the default for this product. This bug either had no qa contact or an invalid one. Resolution set on an open status. Dropping resolution
*** Bug 738222 has been marked as a duplicate of this bug. ***
*** Bug 738221 has been marked as a duplicate of this bug. ***
This bug was and is valid since Shotwell version 0.14 and still valid. Another dupe is: https://bugzilla.gnome.org/show_bug.cgi?id=719233
*** Bug 719233 has been marked as a duplicate of this bug. ***
This bug gets me every time I do an import. I have a saved search for untagged files ending in '_1.JPG' and can manually remove them, but this really shouldn't be necessary. Shotwell ought to be smart enough to recognise that the picture is the same even with the addition of tags. Current version: shotwell-0.20.2-2.fc21.x86_64
I'm not sure if my comment is redundant, but this is still a problem in Shotwell 0.22.0 when importing from a folder, at least. Pictures I tagged earlier (writing metadata to files) result in the same pictures being imported again and not detected as duplicates.
This is infuriating, mostly because doing a simple hash of files to compare them should be a no brainer. If metadata makes a photo "different" then offer the user both a default duplicate action as well as the option to review each duplicate found to pick which to keep. This feature should be core to photo organising software like Shotwell. Do we have to put a bounty or something on this bug to get it fixed? I'd pay.
Still happens in 0.26.3. I moved a 40GB library to another drive, and activated "Watch library directory for new files". Now Shotwell is eating up all my CPU and drive. @Peter: If you want to pay then join me on Bountysource. I have put 5 bucks on this bug. https://www.bountysource.com/issues/52635043-duplicate-detection-should-detect-previous-versions-of-an-image-md5-of-image-content-only
It is creating additional JPEGs to all raw files. Original File: DSC_0644.NEF Already existed: DSC_0644_NEF_embedded.jpg Shotwell now does another one: DSC_0644_NEF_embedded_1.jpg
What exactly did you move?
(In reply to Lubosz Sarnecki from comment #8) > Still happens in 0.26.3. > I moved a 40GB library to another drive, and activated "Watch library > directory for new files". > Now Shotwell is eating up all my CPU and drive. > > @Peter: If you want to pay then join me on Bountysource. I have put 5 bucks > on this bug. > > https://www.bountysource.com/issues/52635043-duplicate-detection-should- > detect-previous-versions-of-an-image-md5-of-image-content-only I chipped in $15, let's get this fixed.
(In reply to Jens Georg from comment #10) > What exactly did you move? I moved the 2017 folder (I just had images from this year) from ~/Pictures to a destination on an external drive and changed the target library folder in Shotwell, applying the auto discover function in hope that Shotwell will find the new paths. I didn't touch any dot folders or config files. Is there another procedure of doing this? In the end it turned out ok, except for the several gigabytes of JPEG dupes produced and the CPU time required for that.
Yes, you manually need to adapt the paths in the database. What happens is: Shotwell (currently) needs a JPEG for each RAW. This is called a backing photo (the _shotwell.jpg files). Those are associated with the RAW file in the database. If you only copy the library and re-import it (by automatic scanning), the connection is lost since the database points to the old file location. Shotwell will then create a new JPEG backing file (_shotwell_1.jpg) because the original name is already taken. The old backing photo will be taken as a new file. While comparing the image contents (which isn't exactly trivial) would help in this situation, IMHO it would only be curing the symptoms. What Shotwell should provide is: - Short-term: Proper documentation on how to relocate the library - Mid-term: Support the user when he has to/wants to move the library around with some kind of tool - Long-term: Get rid of the need of a backing photo for RAWs (bug 718242). There is a trade-off to be paid between speed and disk-space so this has to be chosen carefully.
Hi, I am using shotwell 0.28.4 (Fedora 28 system) and I still see this behaviour. Thanks, RM
(In reply to R Mercado from comment #14) > Hi, > I am using shotwell 0.28.4 (Fedora 28 system) and I still see this behaviour. > Thanks, > RM Which of the many behaviors described in this bug do you see exactly?
-- GitLab Migration Automatic Message -- This bug has been migrated to GNOME's GitLab instance and has been closed from further activity. You can subscribe and participate further through the new bug through this link to our GitLab instance: https://gitlab.gnome.org/GNOME/shotwell/-/issues/4380.