GNOME Bugzilla – Bug 543366
Replacement of modified items could be more efficient
Last modified: 2008-08-03 01:10:31 UTC
+++ This bug was initially created as a clone of Bug #521196 +++ > I've applied Matt's patch, it correctly detectes changes in size, caption and > tags. > > At this time no image provider implements _replace_photo, so I've made a quick > and dirty implementation in Image.py that should work for most of them. It > simply deletes the photo and uploads the modified version. Was this missing from the patch? > > I've tested this with f-spot as source and Picasa and flickr as sinks and it > seems to work well, although given the _replace_photo implementation, for every > modified photo in the source appears a deletion conflict in the sink. So a > better _replace_photo implementation should be done for every provider. > > I've found a extrange behavior of sync manager. In one way syncs, sync manager > treats added and modified items the same way, both being sent to the sink with > the overwrite flag set to False, so the photo is uploaded every time there's > some change in its tags or caption. IMHO Synchronization.py should be modified > to treat differently added and modified items. I don't know how it might affect > the providers of other data types. You mean because it may be more efficient to just update the metadata (if that makes sense), instead of just deleting the old image and adding a new one?
>> At this time no image provider implements _replace_photo, so I've made a quick >> and dirty implementation in Image.py that should work for most of them. It >> simply deletes the photo and uploads the modified version. > > Was this missing from the patch? Yes, Matt's patch just calculates the hash, based on tags, caption and the photo's size. > You mean because it may be more efficient to just update the metadata (if that > makes sense), instead of just deleting the old image and adding a new one? In my opinion the main problems regarding the replacement of modified items are these: 1. In one way sync, the sync manager shoud execute put_data with the overwrite flag set to true for the modified items, so the right replacement method would be executed. Now, in a typical scenario F-spot --> Picasa, if we want Picasa to mirror the modifications made in f-spot we are forced to set the conduit as "two ways", which is much less efficient. 2. In two ways syncs, the fact that the sink item's hash is not recorded in the mapping db, makes DeltaProvider detect non existent modifications. I explained this point in the mail I sent to the mailing list the 15 of july. 3. The DataProvider's replace method should be able to tell if the modification has been made in the photo or in the metadata. It's very inefficient to update everything (photo and metadata) just because a tag has been added. I don't know if this is possible. I'm not very experienced with the conduit architecture, so if I'm wrong please correct me.
Your Email: I've been working on bug 521196 (Conduit does not sync F-Spot photos with changed tags/captions) and I've found some problems related with the sync manager. First, in one way syncs the modified items are treated as new ones, and put to the sink with the overwrite flag set to False. Therefore in the sink, _upload_photo is executed instead of _replace_photo. So, I've been testing with a two-way sync. The patch I made implementing _replace_photo for Picasa works fine, it applies in Picasa the changes made in f-spot. But I've noticed that DeltaProvider sometimes finds non existent modifications. Consider the following two-ways sync case: F-spot (1 photo) <--> Picasa (empty) 1.- Execute Synchronize group, and the photo from f-spot is correctly uploaded to picasa. 2.- Right after the first sync I execute again Synchronize group. No change has been made in any end of the conduit but, surprisingly, a modification is detected in Picasa. The reason is that after the first sync the mapping db has something like this: sinkUID: PicasaTwoWay... sinkDataLUID: 5223... sinkDataMtime: <empty> sinkDataHash: <empty>, and the second sync prints this log: Modified: Actual:UID:5223268002402587890 mtime:2008-07-15 15:47:33 hash:1487227422 v DB:UID:5223268002402587890 mtime:None hash: mtime and hash are different so, DeltaProvider thinks there's been a modification in the photo and sent to f-spot. I think these two problems can afect the update operations of other data types.
(In reply to comment #1) > >> At this time no image provider implements _replace_photo, so I've made a quick > >> and dirty implementation in Image.py that should work for most of them. It > >> simply deletes the photo and uploads the modified version. > > > > Was this missing from the patch? > > Yes, Matt's patch just calculates the hash, based on tags, caption and the > photo's size. > > > You mean because it may be more efficient to just update the metadata (if that > > makes sense), instead of just deleting the old image and adding a new one? > > In my opinion the main problems regarding the replacement of modified items are > these: > > 1. In one way sync, the sync manager shoud execute put_data with the overwrite > flag set to true for the modified items, so the right replacement method would > be executed. Now, in a typical scenario F-spot --> Picasa, if we want Picasa to > mirror the modifications made in f-spot we are forced to set the conduit as > "two ways", which is much less efficient. I dont agree that this is the role of overwrite - it has a very specific use case in two way sync. In the one way case, once you get told to put() a photo, there is nothing stopping the data sink from inspecting the photo that has been asked to be put, and the photo that presumably already exists at the given != None LUID. put(photo, luid=123) if photo.only_metadata_different(self.get_photo(LUID)): update_photo_metadata(LUID, photo.get_metadata()) Or something. Basically, in the one way case, if the got data differs (any way, metadata or data) from the mapping DB that is all that is required to get it put() into the sink. The sink should take the most appropriate action, as the concept of *seperate* metadata and data is not general enough to go into the core i think. > > 2. In two ways syncs, the fact that the sink item's hash is not recorded in the > mapping db, I think this is a bug. makes DeltaProvider detect non existent modifications. I explained > this point in the mail I sent to the mailing list the 15 of july. > > 3. The DataProvider's replace method should be able to tell if the modification > has been made in the photo or in the metadata. It's very inefficient to update > everything (photo and metadata) just because a tag has been added. I don't know > if this is possible. This capability is partly a function of Photo.py and partly depends on the capabilities of the datasink (answer above) > > I'm not very experienced with the conduit architecture, so if I'm wrong please > correct me. It sounds about right. If you could help in tracking down the bug that results in the hash not getting stored in the DB then this would be great.
(In reply to comment #2) > Your Email: > > I've been working on bug 521196 (Conduit does not sync F-Spot photos > with changed tags/captions) and I've found some problems related with > the sync manager. > > First, in one way syncs the modified items are treated as new ones, > and put to the sink with the overwrite flag set to False. Therefore in > the sink, _upload_photo is executed instead of _replace_photo. > > So, I've been testing with a two-way sync. The patch I made > implementing _replace_photo for Picasa works fine, it applies in > Picasa the changes made in f-spot. > > But I've noticed that DeltaProvider sometimes finds non existent > modifications. Consider the following two-ways sync case: > > F-spot (1 photo) <--> Picasa (empty) > > 1.- Execute Synchronize group, and the photo from f-spot is correctly > uploaded to picasa. > > 2.- Right after the first sync I execute again Synchronize group. No > change has been made in any end of the conduit but, surprisingly, a > modification is detected in Picasa. The reason is that after the first > sync the mapping db has something like this: > > sinkUID: PicasaTwoWay... sinkDataLUID: 5223... sinkDataMtime: <empty> > sinkDataHash: <empty>, > > and the second sync prints this log: > > Modified: Actual:UID:5223268002402587890 mtime:2008-07-15 15:47:33 > hash:1487227422 v > DB:UID:5223268002402587890 mtime:None hash: > > mtime and hash are different so, DeltaProvider thinks there's been a > modification in the photo and sent to f-spot. Do you think this is a bug in the photo/image class, or in the DB code. I believe the DB code is quite well tested, so I am inclined to think it might be somewhere else. > > I think these two problems can afect the update operations of other data types. >
> Do you think this is a bug in the photo/image class, or in the DB code. I > believe the DB code is quite well tested, so I am inclined to think it might be > somewhere else. The problem is in the data providers. When they upload an item, the "put" method returns something like this: Rid(uid=photoId) So, when uploading, "uid" is the only info stored in the mapping database for that item. But when DeltaProvider invokes the dataprovider's "get" for the same item, a Photo object is returned, whose Rid have: uid, hash, and in some dataproviders, mtime. So the Photo's Rid and the Rid stored in the database are different and a inexistent modification is detected. I think this is a general problem in the Photos dataproviders. I don't know if this may happen with other data types.
(In reply to comment #5) > > Do you think this is a bug in the photo/image class, or in the DB code. I > > believe the DB code is quite well tested, so I am inclined to think it might be > > somewhere else. > > The problem is in the data providers. When they upload an item, the "put" > method returns something like this: Rid(uid=photoId) > > So, when uploading, "uid" is the only info stored in the mapping database for > that item. > > But when DeltaProvider invokes the dataprovider's "get" for the same item, a > Photo object is returned, whose Rid have: uid, hash, and in some dataproviders, > mtime. So the Photo's Rid and the Rid stored in the database are different and > a inexistent modification is detected. > > I think this is a general problem in the Photos dataproviders. I don't know if > this may happen with other data types. This is clearly the problem then. Other dataproviders certainly return more than just the uid in the Rid. According to http://www.conduit-project.org/wiki/WritingADataProvider One should be very careful to return Rid()s with correct and consistant information between get and put. Because of how the photo dataproviders are structured, it looks like a we hardcode returning just the Rid in at least one case. The other derived image dataproviders should just return more information in the rid from _upload_photot and _replace_photo.
> According to http://www.conduit-project.org/wiki/WritingADataProvider One > should be very careful to return Rid()s with correct and consistant >information between get and put. I think for some dataproviders can be difficult to figure out the photo's mtime or hash, because the photo service (picasa, flickr...) might transform the original info (not allowing blanks or special characters in the tags, resizing the picture, setting a new mtime, etc.). So the safest (but inefficient) way of returning the right Rid in _upload_photo would be something like this: def _upload_photo(self, uploadInfo): ..... upload the photo LUID=uploaded id self.get_all() return self.get(LUID).get_rid() This way, get and put always return the same Rid. And nothing had to be changed even if we later decided to modify the hashing function. Any suggestions for a better solution, or for a more inefficient way of implementing this one?
(In reply to comment #7) > > According to http://www.conduit-project.org/wiki/WritingADataProvider One > > should be very careful to return Rid()s with correct and consistant >information between get and put. > > I think for some dataproviders can be difficult to figure out the photo's mtime > or hash, because the photo service (picasa, flickr...) might transform the > original info (not allowing blanks or special characters in the tags, resizing > the picture, setting a new mtime, etc.). So the safest (but inefficient) way of > returning the right Rid in _upload_photo would be something like this: I agree. > > def _upload_photo(self, uploadInfo): > ..... > upload the photo > LUID=uploaded id > self.get_all() > return self.get(LUID).get_rid() > > This way, get and put always return the same Rid. And nothing had to be changed > even if we later decided to modify the hashing function. > > Any suggestions for a better solution, or for a more inefficient way of > implementing this one? The only problem with that implementation is that it requires a two-way photo dataprovider (to implement get). Im not sure if the solution to this is to just do the work to make this happen, or to hide it behind another level of abstraction def _upload_photo(self, uploadInfo): ..... #upload the photo LUID=uploaded id return self._get_rig(LUID) def _get_rid(LUID): #default inefficient implementation return self.get(LUID).get_rid() We might also run into trouble if there is a delay between a photo being uploaded to a service, and it being available for get()
> > 1. In one way sync, the sync manager shoud execute put_data with the overwrite > > flag set to true for the modified items, so the right replacement method would > > be executed. > > I dont agree that this is the role of overwrite - it has a very specific use > case in two way sync. In the one way case, once you get told to put() a photo, > there is nothing stopping the data sink from inspecting the photo that has been > asked to be put, and the photo that presumably already exists at the given != > None LUID. > OK, you're right. I was confused because in one-way syncs I didn't got the photos in the sink updated when I made modifications in the source's metadata. I've found the reason is the Photo's comparison are based on the mtime and size (compare is inherited from File). But at this moment I think Picasa is the only provider that sets mtime, and if in the source we've only modified metadata, the photo's size in source and sink are the same. So,in ImageTwoWays.put, compare() returns COMPARISON_EQUAL and the photo in the sink is not replaced. So I think we should implement a compare() method for Photo, that should take into account the photo's metadata.
(In reply to comment #9) > > > 1. In one way sync, the sync manager shoud execute put_data with the overwrite > > > flag set to true for the modified items, so the right replacement method would > > > be executed. > > > > I dont agree that this is the role of overwrite - it has a very specific use > > case in two way sync. In the one way case, once you get told to put() a photo, > > there is nothing stopping the data sink from inspecting the photo that has been > > asked to be put, and the photo that presumably already exists at the given != > > None LUID. > > > > OK, you're right. I was confused because in one-way syncs I didn't got the > photos in the sink updated when I made modifications in the source's metadata. > I've found the reason is the Photo's comparison are based on the mtime and size > (compare is inherited from File). But at this moment I think Picasa is the only > provider that sets mtime, and if in the source we've only modified metadata, > the photo's size in source and sink are the same. So,in ImageTwoWays.put, > compare() returns COMPARISON_EQUAL and the photo in the sink is not replaced. > > So I think we should implement a compare() method for Photo, that should take > into account the photo's metadata. I agree. This sounds sensible
Created attachment 115626 [details] [review] Implementation for Phoro.compare() and ImageTwoWay.put This patch does: * implement the "compare" method for Photos, comparison is based on mtime and hash * implement the "put" method for ImageTwoWay. It uses Photo.compare to compare photos (ImageSink uses File.compare because it doesn't have a "get" method) * modify ImageSink._replace_photo() in order to return a Rid, (as _upload_photo does) rather than id.
This problem has been fixed in the development version. The fix will be available in the next major software release. Thank you for your bug report.