GNOME Bugzilla – Bug 727759
tracker:modified gets updated even if the file has not been changed
Last modified: 2014-11-03 13:41:10 UTC
See: rishi@kolache ~$ ls -l /usr/share/gnome-documents/getting-started/C/gnome-documents-getting-started.pdf -rw-r--r--. 1 root root 85125 Mar 11 10:21 /usr/share/gnome-documents/getting-started/C/gnome-documents-getting-started.pdf rishi@kolache ~$ tracker-info /usr/share/gnome-documents/getting-started/C/gnome-documents-getting-started.pdf Querying information for entity:'/usr/share/gnome-documents/getting-started/C/gnome-documents-getting-started.pdf' 'urn:uuid:1e0b4a61-8224-964e-1eb5-d00b929c48dd' Results: ... 'tracker:modified' = '65832' ... rishi@kolache ~$ tracker-control -f /usr/share/gnome-documents/getting-started/C/gnome-documents-getting-started.pdf (Re)indexing file was successful rishi@kolache ~$ ls -l /usr/share/gnome-documents/getting-started/C/gnome-documents-getting-started.pdf -rw-r--r--. 1 root root 85125 Mar 11 10:21 /usr/share/gnome-documents/getting-started/C/gnome-documents-getting-started.pdf rishi@kolache ~$ tracker-info /usr/share/gnome-documents/getting-started/C/gnome-documents-getting-started.pdf Querying information for entity:'/usr/share/gnome-documents/getting-started/C/gnome-documents-getting-started.pdf' 'urn:uuid:1e0b4a61-8224-964e-1eb5-d00b929c48dd' Results: ... 'tracker:modified' = '65836' ... rishi@kolache ~$
Hi Rishi, The tracker:modified property is an internal property in the database, are you sure you don't mean to use nfo:fileLastModified ?
The background here is that gnome-documents does a manager.index_file_async on the getting-started PDF during startup. Whenever someone does a gnome-shell search, gnome-documents is spawned to retrieve search results, resulting in tracker:modified being updated. This leads grilo's tracker plugin to believe something changed, which is not necessarily true, and that affects applications using tracker via grilo. Such gnome-music and totem.
Any progress on this ? I haven't been able to use totem for month. https://bugzilla.gnome.org/show_bug.cgi?id=730028
The tracker:modified property is like a MODSEQ in IMAP's CONDSTORE. Technically it works like this: Each transaction, we always implicitly store the modseq to the resources involved in the transaction and then we globally increment it. So imagine: INSERT { <a1> <b1> '1' . <a2> <b1> '2' . } INSERT { <x1> <b1> '1' . <x2> <b1> '2' . } It works behind the scenes like this: modseq=some value <a1> <b1> '1' ; tracker:modified '$modseq' . <a2> <b1> '2' ; tracker:modified '$modseq' . modseq++ <x1> <b1> '1' ; tracker:modified '$modseq' . <x2> <b1> '2' ; tracker:modified '$modseq' . This means that you can after the first transaction make a copy of data to a local data from our tracker store. Then allow more transactions to happen on our tracker store. You know what the last modseq was because you can add max(tracker:updated(?subject)) to the query you need to get all resources you wanted to copy locally. You can now just query all resources that have a modseq later than the one your max is. The results you can use to bring your local database up to date with a minimal delta. You can also read how IMAP's CONDSTORE works, where it is used to update the flags of the E-mail headers of your E-mail client without having to download all E-mail headers of the mailbox again each time you select it.
Philip is right, it sounds like Grilo is not using this property in Tracker correctly from what I can tell. Pending feedback from the Grilo team to be sure the usage here from them is correct. if I hear nothing I will reassign or mark as not a bug. Thanks Philip, all.
13:25 <pvanhoof> rishi, thing is that you are also not certain what we consider a change to the resource 13:25 <pvanhoof> You might not have changed the file at all, and still we might consider certain things a change to the resource
Thanks for the information. I need to re-read it carefully to understand how to use it correctly.
(In reply to comment #7) > Thanks for the information. > > > I need to re-read it carefully to understand how to use it correctly. That would be like this: Say you wanted to create a stored hashmap<subject, title> to your filesystem as title-cache.txt. You'd do this: select ?s ?title ?modseq { ?s nie:title ?title ; tracker:modified ?modseq } You write title-cache.txt with ?s and ?title and you write a file title-cache-modseq.txt that contains the max value of all ?modseq you saw. Time passes and titles gets changed in tracker's RDF store. Your application wants a new title-cache.txt. But it doesn't want to fetch all titles and subjects, it just wants to know the ones that got changed. So it reads title-cache-modseq.txt and takes that max value of last time, let's call it $last_modseq. Now it does this: select ?s ?title ?modseq { ?s nie:title ?title ; tracker:modified ?modseq . FILTER { ?modseq > $last_modseq } } It updates title-cache.txt with the ?s and ?title it received (it received a delta, not the complete list, due to the FILTER). And it again writes the max value of all ?modseq it received in title-cache-modseq.txt This can go on to keep title-cache.txt up to date. I don't think tracker:modified should be used for any other purpose than synchronization like explained above.
That is one seriously awful API.
But it's also not a bug, as this is intended behaviour and we definitely don't plan to change this as the way how tracker:modified operates cannot be correlated to individual triples. It's also relatively central to how tracker-store works. Compare it with the MODSEQ in CONDSTORE and QRSYNC in IMAP. It works in a similar way: the MODSEQ is also not pinned to the individual E-mail envelopes, and you also don't get them per individual E-mail envelope, but still you use it to tell the server to give you a diff to bring your current situation of envelopes up to date using minimal bandwidth and roundtrips. SELECT INBOX using CONDSTORE 12345 will give you a diff of ENVELOPES and a new MODSEQ (larger than 12345) for you to use next time you do SELECT on INBOX again. Just like that is tracker:modified is a modification sequence too, it tells you where the modification sequence was when the last change to the resource happened. You can use it to get the changes since that modification sequence. ps. I would have expected a bit more constructive criticism from a professional like you, Bastien.
(In reply to comment #10) > ps. I would have expected a bit more constructive criticism from a professional > like you, Bastien. It is my professional opinion that the Tracker API is horrible to use for application developers. This case isn't the only one where the API is causing problems.
It is the only way how this 'API' ever got documented and it has been like this since the beginning, so if this 'API' is causing problems for those users then they should probably be using nfo:fileLastModified instead. We don't plan to adapt a API (or ontology), that is being used correctly by existing softwares, to the misuse of it. The people who misuse it should themselves adapt by using the right methods and ontology instead (it's called ontology and not API in this case, as it's not a SPARQL function but a rdfs:Property that tracker-store manages internally). Just like with tracker:id(), tracker:url(), tracker:added, tracker:writeback, tracker:indexed, tracker:notify, tracker:available, etc is the 'tracker:' prefix for this rdfs:Property also indicating a Tracker specific ontology. So if you use this, you are doing Tracker specific things.
(In reply to comment #11) > (In reply to comment #10) > > ps. I would have expected a bit more constructive criticism from a professional > > like you, Bastien. > > It is my professional opinion that the Tracker API is horrible to use for > application developers. This case isn't the only one where the API is causing > problems. Just because it's an exposed functionality doesn't mean users should be using it. There are plenty of examples of APIs around GNOME that are meant for advanced or specific purposes and not to be used unless understood properly. Also, if you feel we could improve this in some way, we would love to see patches, documentation or something to help that ;)
Discussion and recent clarification here: https://mail.gnome.org/archives/tracker-list/2014-September/msg00023.html Original blog post documenting the feature here: http://pvanhoof.be/blog/index.php/2011/01/31/synchronizing-your-applications-data-with-trackers-rdf-store
Ah and Martyn added that explanation to official documentation, too: https://wiki.gnome.org/Projects/Tracker/Documentation/SparqlInternals