GNOME Bugzilla – Bug 169646
need to handle duplicates
Last modified: 2009-10-22 15:26:53 UTC
Version details: 0.0.10 Distribution/Version: SUSE LINUX 9.3 Beta 2 I somehow dupplicated a whole directory of images, and I now have to go through and manually remove each duplicate. In gthumb there is an option to automatically remove dups. If the functionality is there in f-spot I couldn't find it.
hmmmm, it shouldn't be possible to create duplicates. Do you know how you did it? It might be possible to do via drag and drop, does that sounds likely?
No, it doesn't. I was doing a lot of testing with 0.0.9-3 and 0.0.10. When I first started testing I imported individual directories, then I figured out how to import all the directories under /common/Picutures/. I assumed that was how it happened. I could try to reproduce it for you.
s/could try/will try/
Ok. It is because in that directory I have all the original pictures, a scaled version of all the pictures, and a subdirectory call "original" that _also_ contains all the original pictures. Not sure why :-/ However, given this situation should there be a way to detect that all those files are duplicates of the others? Not necessarily during import (though that would be nice...), but afterward based on EXIF data or something.
*** Bug 305734 has been marked as a duplicate of this bug. ***
*** Bug 308796 has been marked as a duplicate of this bug. ***
F-Spot should have duplicate dectection support, if it happened by checksumming all images, either on import, or in the background and providing an alert if duplicates are found. due to the cpu intensive nature of checksumming it should however be an option that can be turned off.
Just adding Gabriels comment 2 on bug 308796#c2 ------------- Duplicated detection that would solve bug 169646 will solve this too. Instead of using EXIF-matching or MD5 hash matching to detect, it would be faster to, as Haran said, check if it's a link and if the path it points to is already in the db.
Did we not have a patch for this one? Seem to remember that someone did a patch for detecting duplicates, and verified it did not slow down the loading of pictures much. Perhaps it could be extended with checking for existing links first?
yeah most of this bug is solved in a patch in acs' repo but it requires some changes to the db schema and a bit more polish. Now that we have the db update code it should be possible to integrate the patch with some work. It is a hight priority but I might do it in stages.
massis is cooking this patch now, and progress seems excellent.
Created attachment 62237 [details] [review] Duplicates patch In attachment you will find the duplicates patch acs once created, but updated to current cvs head and at the same time fixing a lot of the original issues raised. You can find the original discussion here: http://mail.gnome.org/archives/f-spot-list/2005-October/msg00044.html and the comments on the first version of the patch here: http://mail.gnome.org/archives/f-spot-list/2006-March/msg00018.html Mind you that your database will be upgraded to a new version, so please backup before you test this.
Thomas, I plan to invest sometime in F-Spot next weeks so I can help you testing the final version of the duplicates patch I have sent to the list some months ago. I will try your patches as a first step and then do some testing in order to include it finally in F-Spot. sid@delito:~/devel/f-spot-devel$ patch -p0 < DuplicatesPatch.patch patching file src/ImportCommand.cs patching file src/MainWindow.cs patching file src/PhotoQuery.cs patching file src/PhotoStore.cs Hunk #3 succeeded at 564 (offset 7 lines). .... Ok, the patch seems to works with current CVS, compile and install ok. I can see the Find Duplicates entry in the search menu entry and it is time to try to import some photos with duplicates. In the import dialog you can select with the checkbox to include or not to include duplicates and it works as expected. If I select to import duplicates I have some photo duplicates in my collection. And then I can use the serach for duplicates in the search menu and it works nicely. I can remove the duplicates and then try to search for duplicates and the search is empty. Ok, it seems that everything is working as expected. Cheers
How about - first find candidates for duplicates based on file sizes of the images - then for colliding file sizes calculate md5sums From my set of about 2000 photos in jpg-format, only two had the same file size. Therefore I conclude that for the typical case where the photos are stored in compressed format rather than raw format, there would be no performace penalty.
Then we have the question of what is a picture? The JPG file, or only the image embeeded in the JPG file (that is excluding the tags). You could modify one of the tags, and the filesize would then differ...
Jaakko, Comparing the file sizes _could_ speed up the duplicate detection, but I don't think it is really necessary because the MD5 impact is taken upfront: * the sums for already existing photos are calculated in a database upgrade * the sums for new photos are calculated at the time they are created This means that after creation the md5sums just have to be read from the database (unless the photo changes), and also that the detection of duplicate photos is quite speedy because we maintain a hashtable that contains all the md5sums (and this allows us to do fast lookups). If you want to see if comparing file sizes might speed up duplicate detection, please don't hesitate to create your own patch... Bengt, For the moment the md5sums are calculated against the complete image file, which indeed ignores the fact that two photos might be the same with different exif/xmp data. This point was already raised by Ruben Vermeersch, and Larry commented that he "should probably add some methods to the image classes that allow them to refine the checksum so that it has a better chance of matching. It's important to also do a full checksum to point out that the file is not an exact duplicate though." Regards, Thomas
Hello! What is the current state of this bug/patch? This seems like a fantastic feature which could be added and then optomised for speed. I've only just started using f-spot seriously but this feature would make my life significantly easier. I'm loving finally having a decent way of organising my photos BTW. Nice work. Cheers, Tim
Created attachment 67096 [details] [review] updated version Updated the duplicates patch. The previous version crashed f-spot when the updater encountered photos in the database that were no longer present on disk; also it fixes the "Find Duplicates" query by omitting files that are not on disk.
This is what I was looking for! I often use F-spot to get new photos from my mobile phone. naturally I don't delete all the photos... so I often end up importing photos more than once, which results in those ...-1.jpg and so on. Tested the patch from #18, seems to work really well for me! Would be great to see this upstram. Only thing I would change is the current menu entry from "Find->Find Duplicates" to "Find->Duplicate Photos" to better match with the rest of the menu entries.
Using MD5 hashes to calculate dupes sounds like a pretty horrible idea. Why don't we compare histograms? In case you're thinking "This guy sounds like he has no idea what he's talking about," you are right. But I was looking into seeing how other open-source picture programs do dupe detection, and gallery2 is having such a feature implemented (for Google Summer of Code 2006). I'm going to look into what kind of algorithm they're using, and I'll see about trying to put a patch up here.
*** Bug 352300 has been marked as a duplicate of this bug. ***
After using this patch for a short while and then using plain HEAD again, I found f-spot was not able to write to the db again. I had to manually remove the new "md5sum" column added by this patch. As a reference, here's the sqlite session I used to drop the column (I wish sqlite had proper alter table support...) $ sqlite photos.db sqlite> .tables sqlite> .schema photos sqlite> BEGIN TRANSACTION; sqlite> CREATE TEMPORARY TABLE photos_backup( id INTEGER PRIMARY KEY NOT NULL, time INTEGER NOT NULL, directory_path STRING NOT NULL, name STRING NOT NULL, description TEXT NOT NULL, default_version_id INTEGER NOT NULL); sqlite> INSERT INTO photos_backup SELECT id,time,directory_path,name,description,default_version_id FROM photos; sqlite> DROP TABLE photos; sqlite> CREATE TABLE photos( id INTEGER PRIMARY KEY NOT NULL, time INTEGER NOT NULL, directory_path STRING NOT NULL, name STRING NOT NULL, description TEXT NOT NULL, default_version_id INTEGER NOT NULL); sqlite> INSERT INTO photos SELECT id,time,directory_path,name,description,default_version_id FROM photos_backup; sqlite> DROP TABLE photos_backup; sqlite> COMMIT; sqlite> .exit
*** Bug 365573 has been marked as a duplicate of this bug. ***
*** Bug 382843 has been marked as a duplicate of this bug. ***
Why all the md5sum stuff? If seems to me that duplicates can be found efficiently without adding fields to the database: 1. File size must match. 2. Photo metadata (camera, shutter speed, date?, ...) must match. 3. File contents must match. Most non-dupes should be identified at steps 1 or 2 here.
Extra use case to support the need for a 'find duplicate' tool. I had to recover my f-spot photo-collection. I recovered the Photos dir and the fspot-gconf settings, but was not entirely sure that it had all my photos in it. So I added all my photos from a separate backup and gave them the tag 'new'. I figured that I could quickly figure out which photos didn't survive and remove the 'new' tag and then delete all the 'new' photos from my HDD. This is seriously a bad idea! I got loads of duplicates in f-spot since most of them survived. A bunch didn't survive the recovery so they were unique. I removed the 'new' tag of those who didn't survive and deleted all the 'now' photos. But when I wanted to look at my pictures again I noticed that all the duplicates were empty. They were listed in f-spot; name, date, file location etc. but the file was gone. This is seriously distressing; f-spot shows 2 photos but deleting one does delete the file and the other instance has a problem. btw; using a fresh ubuntu edgy with f-spot 0.2.1 on a normal x86 system
Fixing the metadata a bit, and noting that this is still really irritating and the biggest thing preventing me from importing the rest of my pictures into f-spot.
*** Bug 408541 has been marked as a duplicate of this bug. ***
Created attachment 83079 [details] [review] patch for current subversion I reworked the patch against current subversion, fixing a small bug in the process (I don't even remember what right now). Is this being considered for a future version? If not, what's needed to get similar functionality accepted in f-spot?
Created attachment 83151 [details] [review] Small improvements Small update: * close filestream after creating md5sum * make addtocache in photostore also add to md5cache * updated ChangeLog
Oh yeah, and it still seems to work properly; !! mind that your db will be upgraded though !!
Created attachment 84149 [details] [review] Updated for current SVN head I've just applied this patch almost cleanly against HEAD as of 10 minutes ago. This patch rules. I seriously hope you apply it straight away.
Hello devs, what's up with this? This is a pretty infuriating problem (coupled with the fact that f-spot starts importing before the user clicks the button). There is a bunch of patches ready, what is holding them back?
*** Bug 448519 has been marked as a duplicate of this bug. ***
I don't see why this very basic feature is still not in. Could a reason be given so that we can have a dialog as to what needs to be done to address this situation. Would it be at least possible to add this code into Updater.cs so that those using patches for duplicates do not have to an incompatible database to the other versions.
Created attachment 93309 [details] [review] minimal patch for database to have a md5sum field.
Hello, has this feature slipped under the radar? IMHO it's one of the most important features for a photo management software (right after the timeline, but very clearly more important than "arty" retouche tools). Is there still anyone working on this? Any schedule estimations?
Hello, I'm also eager to see this feature upstream. As was pointed before (http://bugzilla.gnome.org/show_bug.cgi?id=169646#c16) I also think a md5sum on the full file won't fill the needs, for instance, in the following scenario 1. I import some picture 2. I tag it, with "write metadata on file" 3. I re-import the initial picture by mistake. after the step 2, the file on disk changed, therefore, comparing the checksums at step 3 won't detect duplicate. To circumvent this a checksum of the image part itself should be stored, or we could decide to always keep the original picture unmodified (any modification should end up in a new version or be stored in the db) Other tools to detect duplicates could be useful too: - using metadata (2 pictures taken at the exact same time with the same camera are good duplicate candidates) - an image similarity measure, independant of the file (using histogram, wavelet or...), GQview as such a feature, These two features could be usefull to detect non strict duplicates such as different resolutions or rotated versions of a picture.
Maybe this is related, maybe it isn't. I keep all my folders in a directory of my own choosing organized in my own way. Ideally, I'd like f-spot to just "watch" that directory and have it update when I drag new pictures in. Failing that, I'd like to be able to click "import" and import new pictures into f-spot -- either from the main directory or from a subdirectory. The problem is, I often end up wanting to add pictures to a directory which I've already imported and then import them -- f-spot creates duplicates of all the pictures. For the very simple case of importing the *exact same filename* into f-spot, and *not* copying it to the Photos directory, it seems like a no brainer to simply not create the duplicate. A patch that made that simple fix would make f-spot much more usable to me!
(In reply to comment #39) > Maybe this is related, maybe it isn't. I keep all my folders in a directory of > my own choosing organized in my own way. Ideally, I'd like f-spot to just > "watch" that directory and have it update when I drag new pictures in. then you might be interested in inotify support (bug #312613)
*** Bug 508011 has been marked as a duplicate of this bug. ***
*** Bug 517234 has been marked as a duplicate of this bug. ***
(In reply to comment #39) > Maybe this is related, maybe it isn't. I keep all my folders in a directory of > my own choosing organized in my own way. Ideally, I'd like f-spot to just > "watch" that directory and have it update when I drag new pictures in. Failing > that, I'd like to be able to click "import" and import new pictures into f-spot > -- either from the main directory or from a subdirectory. The problem is, I > often end up wanting to add pictures to a directory which I've already imported > and then import them -- f-spot creates duplicates of all the pictures. FWIW, that's also my "usecase" for using f-spot. Adding this comment so that devs can see that Thomas (c#39) is not alone. > For the very simple case of importing the *exact same filename* into f-spot, > and *not* copying it to the Photos directory, it seems like a no brainer to > simply not create the duplicate. A patch that made that simple fix would make > f-spot much more usable to me! Metoo!
*** Bug 520725 has been marked as a duplicate of this bug. ***
That makes ten duplicates for a bug that has had at least a tentative patch for 2 years. This is a highly irritating issue. It clearly hits lots of people. Note that bug-buddy is not involved in the reports -- 11 reports has got to be something of a record. Yet there hasn't been a peep from the f-spot team for years about this.
Indeed... I remember that when installing Ubuntu 5.10 at a friend, I wondered why this feature wasn't in, and thought that surely it would be available in the next major Ubuntu release (due Juli 2006) :-/ By now the friend's photo collection has ~1500 unique photos (amazing how much pictures she got together when using a digicam), but there are around 2500 photos in F-spot, from the constant import-but-dont-delete-on-camera procedure, and manually weeding out the duplicates has become hopeless :-(
I have been evaluating F-Spot version 0.4.0 under Ubuntu 7.10 for 3 months now and have been very IMPRESSED by the overall design and implementation of F-Spot. The sole remaining concern that holds me back from using F-Spot to manage my 6 years worth of digital photos (which appears to be growing exponentially) is this duplication issue. Although I have not examined the C# code, it appears that the central issue revolves around importing a file with the same name and date. Currently, when that condition exists, the code appears to identify the error condition and follows the procedure of creating a unique filename (-1.jpg, -2.jpg, etc). I would suggest that at this point, the code should not automatically produce a unique filename. Instead, present the user with a dialogue window. The dialogue window could simply tell the user that the photo appears to be a duplicate and ask if the user would like to import this photo anyway. Perhaps a check box could be included in the dialogue window to indicate the same answer should be used during this import session when this condition occurs (i.e. when the filename and EXIF photo date appears to indicate a duplicate). Another approach that might work even better would be to include within the "Preference" configuration an option for not importing any new photos when an identical filename and date are already contained with the current photo collection. I suspect that providing users such an option would be VERY welcomed and would cover 99% of the most common use case conditions. And hopefully this approach would not require significant new code or database changes. If any of the F-Spot maintainers feel that this approach might hold promise, I would be happy to spend some more time documenting the suggestion more fully. Thanks again for the donation of your time and energy producing such a great application for the world to use! F-Spot sets a new standard in photo management.
Confirmed on Hardy (f-spot 0.4.2-1ubuntu1). This is typically the kind of bug that makes me sad. And so I absolutely don't use f-spot.
*** Bug 526274 has been marked as a duplicate of this bug. ***
It has been a while now and this bug is still present (4.2). It is showstopper for many people I believe. For instance I like to keep photos on memory card, but I also like to download them to computer just in case. With F-spot not detecting I have to be very careful while importing photos and even now I have 4 copies of the same photo, which takes space and my time while i have to hunt for duplicates... maybe it would be enough to add some sort of basic duplicate based on file name, that would be enough for me.. Maybe when importing and find image in the destination folder (with the same date) exists then ask what to do.
I have just done some tests and I found out one thing.. I don't know from what is the md5 calculated but I would bet that it's calculated from imported file. If so the md5 sums of the imported image and the one on camera will never match because f-spot adds "tag" with information when was the picture imported and that changes the image. I have done some tests to confirm this I have imported a photo from camera two times and I got duplicate. Then I took the imported image (from where f-spot keeps images) copied it to a different folder and tried to import it, this time f-spot detected that there is the same picture in the database (it was essentially the same file including the "time of import tag" so even the md5 matched). So, in order to successfully detect duplicates f-spot needs to either: A) not store the import date/time inside file. This is probably not preferred. B) calculate the md5 before it writes the time/date into import and store it somewhere.. This seems like the way to go, so now only somebody that would put it into code is needed. I would do it but I really suck at C# :(
This feature *still* isn't in F-Spot yet? WTF? I just made the switch from Picasa for Linux (proprietary, running under WINE), to F-Spot because I like Open Source and wanted a native GNOME app. But then I discover there's no duplicate detection in this application, which I find absolutely amazing that there isn't, especially since this topic has been discussed... and discussed... and discussed for *years* now, and is still no peep from the developers. I, for one, sadly declare proprietary software the winner in this round. Open Source F-Spot is not a viable application. Back to Picasa, I guess. *sigh*
Created attachment 111542 [details] [review] New duplicates detection patch To finally put an end to all the whining and no code, i decided to give this one another shot. It's a more or less from scratch implementation to detect duplicates at import time WARNING: it modifies your db schema, so use with the needed care Things it does: 1. add md5 field to photos table to store the photos 2. in the import from folder dialog you can now toggle whether to include duplicates or not (others default to no duplicates, but i didn't test this) 3. duplicate detection occurs ad hoc with db queries, which makes it rather fast; the previous patches kept an md5 dict but there's no need for that, querying is fast 4. the md5 is created against a smaller version of the image, that should be enough and makes things fast. So if you're eager to get this into f-spot, show some guts and try it out.
Created attachment 111705 [details] [review] Slighty improved duplicates patch
a quick test raised a problem: it prevents importing jpg/raw! when importing both raw and jpeg versions of a picture only one of the two is really added to f-spot database.
Thanks for testing, As i don't have any raw pictures, i didn't have any problem. But sure, the way the duplicates thing is working right now (calculating against the thumbnail), that scenario won't work. The CheckForDuplicate method in the PhotoStore is where we can go wild on how to calculate duplicates. Expect an update soon.
Created attachment 112371 [details] [review] Duplicates Patch that should make import of jpg/raw combinations work
Created attachment 112392 [details] [review] Duplicates patch, with smarter comparison Ignore the previous patch, it completely ingores the embedding of tags in the xmp data. This one should work better
thanks for the update, this patch handles correctly r+j! great! i did just some simple tests: * importing same picture * importing same pic with different filename * importing same pic without exif data (jhead -purejpg) * importing some pics, few dupe and few not all these tests went fine. The only _trivial_ bug is that the last roll is always show, even if no pic was imported (being dup). you probably miss to remove the skipped pics from the import count.
i also tried to import a picture, make a copy and a small modification to it (say luminosity or such), import. the pic was correctly imported as new.
(In reply to comment #59) > all these tests went fine. > The only _trivial_ bug is that the last roll is always show, even if no pic was > imported (being dup). you probably miss to remove the skipped pics from the > import count. > Hmmm, what does f-spot do when you import from an empty folder? Does it create a roll as well?
Created attachment 113543 [details] [review] Duplicates patch update N This patch fixes some issues with the previous version: 1. calculation of md5's in updater happens async 2. no dummy roll created when all photos on import are dups 3. also versions have md5 sums and are checked for duplicates detection 4. no crash on file not found Remaining: after modification of a picture (b&w, red-eye) the md5 sum should be updated as well
Created attachment 113581 [details] [review] Almost there duplicates detection patch Same as previous patch but with improved version handling and md5 updating on photo edits
Just applied this against head and it applies with a few offsets :-) Would it be possible to add detection and filtering of duplicates in the existing database, as opposed to just on import? Thanks for the work!
(In reply to comment #64) > Just applied this against head and it applies with a few offsets :-) > > Would it be possible to add detection and filtering of duplicates in the > existing database, as opposed to just on import? > I discussed this with Stephane, and we both agree that this kind of functionality can be added as an extension. Once the datamodel changes are in place, it could be easy to build an extension on top of it that does this exactly.
I'll look forward to that then :-) I have 45k photos of which I'm fairly certain 5k are duplicates; reimporting everything just to benefit from the dedupe is not a process I particularly look forward to, retagging would take weeks. Two issues that came up while trying out the patch: 1) Initial DB upgrade and initialisation took 18 minutes with the UI frozen and no progress indication (100% CPU + 90% Disk) 2) f-spot consumed 3GB of ram during the background md5 hashing, then fails due to not enough memory (I have 3GB ram + 8GB swap but the swap was ignored). Restarting f-spot allowed the process to continue where it left off, finishing again at 2.5GB of used ram. Not related to this patch, no progress indication for background tasks is a bit problematic for large photo collections :-)
Created attachment 113959 [details] [review] A bit closer again Same as previous, but 18 minutes waiting at job creation time shouldn't happen as it goes in one insert. Also i added some explicit gc; could you check if it makes any difference?
That's infinitely better on the startup time: [Info 15:52:18.685] Starting new FSpot server Updating F-Spot Database photos_temp - photo_versions_temp Updated database from version 14 to 15 Database updates completed successfully. [Debug 15:52:24.384] Db Initialization took 5.477514s [Debug 15:52:24.867] Query: SELECT photos.id, photos.time, photos.uri, photos.description, photos.roll_id, photos.default_version_id, photos.rating, photos.md5_sum FROM photos WHERE photos.id NOT IN (SELECT photo_id FROM photo_tags WHERE tag_id = 2) ORDER BY photos.time [Debug 15:52:26.371] Query took 1.50316s [Debug 15:52:26.410] Query: SELECT photos.id, photos.time, photos.uri, photos.description, photos.roll_id, photos.default_version_id, photos.rating, photos.md5_sum FROM photos WHERE photos.id NOT IN (SELECT photo_id FROM photo_tags WHERE tag_id = 2) ORDER BY photos.time [Debug 15:52:27.355] Query took 0.945111s [Info 15:52:28.516] Starting BeagleService [Debug 15:52:28.517] BeagleService startup took 2.4E-05s [Debug 15:52:28.716] Calculating Hash 1... Memory usage doesn't seem to have been affected by the latest patch: 1500 hashes, 197MB 2000 hashes, 239MB 2500 hashes, 304MB 3000 hashes, 368MB 3500 hashes, 435MB that's roughly 238MB for 2000 hashes, ~119kb for each hash. Is there a pixbuf dispose missing somewhere?
Created attachment 114320 [details] [review] Duplicates patch, this time without memory increasing Same as before, only this time it should no longer increase memory. Appeared to be the Pixdata in the PixbufSerializer wasn't being freeed (the api is a bit strange). On my machine, memory stays under 40mb with 500+ photos and it doesn't seem to increase over time.
Works as advertised! Total memory usage only went up 20MB (153MB->173MB) after 45k files. Thanks again for the work :)
Stephane, do you have any objections to committing this patch? Regards //Johan
No objections at all - can't wait for the plugin version of it either :)
Committed a slightly modified version of the patch in r4313. Keeping the bug open until some more testing has happened and next version of f-spot is released.
Importing a 14 CRW files now with this patch took more than an hour, as mentioned on IRC on 2008/09/08 I am reading from a CF card in an USB card reader, my photo dir resides on a NAS mounted on NFS via Gigabit-Link. It is definitely not network trouble or a slow USB connection as both worked fast before. Editing pictures after the import works also fast like before. At the first try some processes were also occupying CPU, but I re-tested with an idle CPU. F-Spot takes 100% CPU for a long time while calculating the md5 sums... I did run f-spot (without importing) for a couple of hours before, so all the md5 hashing in the background for the existing database should be done. As f-spot only uses very little CPU when not importing, I assume that the background hashing should be done already. Running sqlite3 ~/.gnome2/f-spot/photos.db "select count(*) from jobs" as suggested by sde on IRC returns 0. Running import tasks with the svn version from before the md5 patch also worked as usual and did not take that long.
And... it does not work. I do get duplicates: additional to the file CRW_XXXX.CRW I also get now CRW_XXXX-1.CRW in my photo dir. It's not showing up in f-spot but it's on disk... I deleted all but one picture on my CF card now. Selecting import it immediately shows "1 of 1" at the progress bar and the file is copied over, but then takes a long time (~10mins) to do anything. It does not show a preview of the file (as it is a dupe) and does not include it in the DB, but it is on disk.
Created attachment 118346 [details] [review] Fix for files remaining on disk This patch should fix files being copied but not removed when they are detected as being a dupe.
I'm trying to upgrade my main db (~34k photos) and it's going straight since 30 minutes, having done ~4000 jobs so far. given that, shouldn't f-spot justify the usage of cpu for so long time? I'm just guessing the first bug for next release: "f-spot taking 100% cpu". maybe just a dialog saying that the db needs upgrade and may take cpu time for some time, depending on db size. what do you think? now two bugs: 1. i ran f-spot while my photo archive was disconnected (it's on usb). the jobs executed between f-spot startup and usb disk connect obviously went wrong, nothing bad so far. the bad thing is that those pictures will never get their md5 hash. so, shouldn't a query be run at startup resubmitting jobs for the photos without md5? 2. i think you lose photo version infos in src/PhotoStore:631. you use "version" but don't update that back to "photo" before commit. in fact i have no picture in photo_versions with an md5.
(In reply to comment #75) > And... it does not work. I do get duplicates: additional to the file > CRW_XXXX.CRW I also get now CRW_XXXX-1.CRW in my photo dir. It's not showing up > in f-spot but it's on disk... > I committed a fix for this in svn; can you please test? Also i added some debug that should give some more info on the md5 summing of the pictures; could please run the svn version with the --debug flag and paste the output here?
> I committed a fix for this in svn; can you please test? > > Also i added some debug that should give some more info on the md5 summing of > the pictures; could please run the svn version with the --debug flag and paste > the output here? I will attache the debug output. At your patch seemed to work, I did run f-spot without --debug and everything was OK (besides the import being painfully slow): no duplicate on disk. When I ran f-spot with --debug, I got an error message "System.NullReferenceException: Object reference not set to an instance of an object" after importing a duplicate.
Created attachment 118444 [details] Debug output from Nils - painfully slow import + error at duplicate
Nils, could you send me and stephane your db? We promise we won't do anything with it, except fix your problem. :-) thomas dot vanmachelen at gmail dot com
Will this find exsisting duplicates in your db and allow their removal?
(In reply to comment #77) > now two bugs: > 1. i ran f-spot while my photo archive was disconnected (it's on usb). the jobs > executed between f-spot startup and usb disk connect obviously went wrong, > nothing bad so far. the bad thing is that those pictures will never get their > md5 hash. so, shouldn't a query be run at startup resubmitting jobs for the > photos without md5? That indeed is a problem, but it's hard to tell whether the picture is really deleted (in which case the md5 hashing would keep going on forever); or whether the photo archive was disconnected. Need to think about it... > 2. i think you lose photo version infos in src/PhotoStore:631. you use > "version" but don't update that back to "photo" before commit. in fact i have > no picture in photo_versions with an md5. > I got a fix ready for that; should land in svn soon.
the slow import issue is fixed
thomas, it doesn't detect dupes on import from card or camera (the CameraFileSelectionDialog)
Created attachment 118788 [details] [review] Add duplicate detection support to camera import Stephane, here is a version that allows you to skip duplicates when importing from camera. Can you test and report? It should not copy over any duplicate files from the camera to the photos directory; and properly detect photos that were imported before. The checking happens explictely in the CameraFileSeletionDialog class as otherwise the photos are copied, and then detected as being duplicate; causing the files to remain on disk.
Created attachment 118811 [details] [review] Slightly better version of the camera import
works, commit
Could someone please confirm whether comment 88 from Stephane means that this issue has been fixed and will be in the next f-spot release? If so, should we open new bugs for any issues that are not fixed by this patch?
(In reply to comment #89) > Could someone please confirm whether comment 88 from Stephane means that this > issue has been fixed and will be in the next f-spot release? committed in r4385 > If so, should we open new bugs for any issues that are not fixed by this patch? before filing new bugs please test with SVN or STABLE SVN (which is 0.5.0.3 with few other patches). Some fixes has been committed there, but no release yet. http://svn.gnome.org/viewvc/f-spot/branches/FSPOT_0_5_0_STABLE/
Shouldn't this bug be set to FIXED?
Done.