After an evaluation, GNOME has moved from Bugzilla to GitLab. Learn more about GitLab.
No new issues can be reported in GNOME Bugzilla anymore.
To report an issue in a GNOME project, go to GNOME GitLab.
Do not go to GNOME Gitlab for: Bluefish, Doxygen, GnuCash, GStreamer, java-gnome, LDTP, NetworkManager, Tomboy.
Bug 727759 - tracker:modified gets updated even if the file has not been changed
tracker:modified gets updated even if the file has not been changed
Status: RESOLVED NOTABUG
Product: tracker
Classification: Core
Component: General
git master
Other All
: Normal normal
: ---
Assigned To: tracker-general
Depends on:
Blocks:
 
 
Reported: 2014-04-07 15:00 UTC by Debarshi Ray
Modified: 2014-11-03 13:41 UTC
See Also:
GNOME target: ---
GNOME version: ---



Description Debarshi Ray 2014-04-07 15:00:23 UTC
See:

rishi@kolache ~$ ls -l /usr/share/gnome-documents/getting-started/C/gnome-documents-getting-started.pdf 
-rw-r--r--. 1 root root 85125 Mar 11 10:21 /usr/share/gnome-documents/getting-started/C/gnome-documents-getting-started.pdf
rishi@kolache ~$ tracker-info /usr/share/gnome-documents/getting-started/C/gnome-documents-getting-started.pdf 
Querying information for entity:'/usr/share/gnome-documents/getting-started/C/gnome-documents-getting-started.pdf'
  'urn:uuid:1e0b4a61-8224-964e-1eb5-d00b929c48dd'
Results:
...
  'tracker:modified' = '65832'
...

rishi@kolache ~$ tracker-control -f /usr/share/gnome-documents/getting-started/C/gnome-documents-getting-started.pdf 
(Re)indexing file was successful
rishi@kolache ~$ ls -l /usr/share/gnome-documents/getting-started/C/gnome-documents-getting-started.pdf -rw-r--r--. 1 root root 85125 Mar 11 10:21 /usr/share/gnome-documents/getting-started/C/gnome-documents-getting-started.pdf
rishi@kolache ~$ tracker-info /usr/share/gnome-documents/getting-started/C/gnome-documents-getting-started.pdf 
Querying information for entity:'/usr/share/gnome-documents/getting-started/C/gnome-documents-getting-started.pdf'
  'urn:uuid:1e0b4a61-8224-964e-1eb5-d00b929c48dd'
Results:
...
  'tracker:modified' = '65836'
...

rishi@kolache ~$
Comment 1 Martyn Russell 2014-05-28 08:52:11 UTC
Hi Rishi,

The tracker:modified property is an internal property in the database, are you sure you don't mean to use nfo:fileLastModified ?
Comment 2 Debarshi Ray 2014-05-28 09:25:29 UTC
The background here is that gnome-documents does a manager.index_file_async on the getting-started PDF during startup. Whenever someone does a gnome-shell search, gnome-documents is spawned to retrieve search results, resulting in tracker:modified being updated. This leads grilo's tracker plugin to believe something changed, which is not necessarily true, and that affects applications using tracker via grilo. Such gnome-music and totem.
Comment 3 Maël Lavault 2014-07-15 16:05:39 UTC
Any progress on this ? I haven't been able to use totem for month. https://bugzilla.gnome.org/show_bug.cgi?id=730028
Comment 4 Philip Van Hoof 2014-09-12 13:16:24 UTC
The tracker:modified property is like a MODSEQ in IMAP's CONDSTORE.

Technically it works like this:

Each transaction, we always implicitly store the modseq to the resources involved in the transaction and then we globally increment it.

So imagine:

INSERT {
<a1> <b1> '1' .
<a2> <b1> '2' .
}

INSERT {
<x1> <b1> '1' .
<x2> <b1> '2' .
}


It works behind the scenes like this:

modseq=some value

<a1> <b1> '1' ; tracker:modified '$modseq' .
<a2> <b1> '2' ; tracker:modified '$modseq' . 

modseq++

<x1> <b1> '1' ; tracker:modified '$modseq' .
<x2> <b1> '2' ; tracker:modified '$modseq' . 


This means that you can after the first transaction make a copy of data to a local data from our tracker store.

Then allow more transactions to happen on our tracker store.

You know what the last modseq was because you can add  max(tracker:updated(?subject)) to the query you need to get all resources you wanted to copy locally.


You can now just query all resources that have a modseq later than the one your max is.

The results you can use to bring your local database up to date with a minimal delta.


You can also read how IMAP's CONDSTORE works, where it is used to update the flags of the E-mail headers of your E-mail client without having to download all E-mail headers of the mailbox again each time you select it.
Comment 5 Martyn Russell 2014-09-12 13:21:20 UTC
Philip is right, it sounds like Grilo is not using this property in Tracker correctly from what I can tell.

Pending feedback from the Grilo team to be sure the usage here from them is correct.

if I hear nothing I will reassign or mark as not a bug.

Thanks Philip, all.
Comment 6 Debarshi Ray 2014-09-12 13:36:24 UTC
13:25 <pvanhoof> rishi, thing is that you are also not certain what we          
      consider a change to the resource
13:25 <pvanhoof> You might not have changed the file at all, and still we       
      might consider certain things a change to the resource
Comment 7 Juan A. Suarez Romero 2014-09-12 13:41:10 UTC
Thanks for the information.


I need to re-read it carefully to understand how to use it correctly.
Comment 8 Philip Van Hoof 2014-09-12 14:17:17 UTC
(In reply to comment #7)
> Thanks for the information.
> 
> 
> I need to re-read it carefully to understand how to use it correctly.

That would be like this:

Say you wanted to create a stored hashmap<subject, title> to your filesystem as title-cache.txt. You'd do this:

select ?s ?title ?modseq { ?s nie:title ?title ; tracker:modified ?modseq }

You write title-cache.txt with ?s and ?title and you write a file title-cache-modseq.txt that contains the max value of all ?modseq you saw.


Time passes and titles gets changed in tracker's RDF store. Your application wants a new title-cache.txt. But it doesn't want to fetch all titles and subjects, it just wants to know the ones that got changed.

So it reads title-cache-modseq.txt and takes that max value of last time, let's call it $last_modseq. Now it does this:

select ?s ?title ?modseq { ?s nie:title ?title ; tracker:modified ?modseq . FILTER { ?modseq > $last_modseq } }

It updates title-cache.txt with the ?s and ?title it received (it received a delta, not the complete list, due to the FILTER). And it again writes the max value of all ?modseq it received in title-cache-modseq.txt

This can go on to keep title-cache.txt up to date.

I don't think tracker:modified should be used for any other purpose than synchronization like explained above.
Comment 9 Bastien Nocera 2014-11-03 07:36:24 UTC
That is one seriously awful API.
Comment 10 Philip Van Hoof 2014-11-03 09:06:41 UTC
But it's also not a bug, as this is intended behaviour and we definitely don't plan to change this as the way how tracker:modified operates cannot be correlated to individual triples. It's also relatively central to how tracker-store works.

Compare it with the MODSEQ in CONDSTORE and QRSYNC in IMAP. It works in a similar way: the MODSEQ is also not pinned to the individual E-mail envelopes, and you also don't get them per individual E-mail envelope, but still you use it to tell the server to give you a diff to bring your current situation of envelopes up to date using minimal bandwidth and roundtrips.

SELECT INBOX using CONDSTORE 12345 will give you a diff of ENVELOPES and a new MODSEQ (larger than 12345) for you to use next time you do SELECT on INBOX again.

Just like that is tracker:modified is a modification sequence too, it tells you where the modification sequence was when the last change to the resource happened. You can use it to get the changes since that modification sequence.

ps. I would have expected a bit more constructive criticism from a professional like you, Bastien.
Comment 11 Bastien Nocera 2014-11-03 09:28:21 UTC
(In reply to comment #10)
> ps. I would have expected a bit more constructive criticism from a professional
> like you, Bastien.

It is my professional opinion that the Tracker API is horrible to use for application developers. This case isn't the only one where the API is causing problems.
Comment 12 Philip Van Hoof 2014-11-03 09:36:31 UTC
It is the only way how this 'API' ever got documented and it has been like this since the beginning, so if this 'API' is causing problems for those users then they should probably be using nfo:fileLastModified instead. We don't plan to adapt a API (or ontology), that is being used correctly by existing softwares, to the misuse of it. The people who misuse it should themselves adapt by using the right methods and ontology instead (it's called ontology and not API in this case, as it's not a SPARQL function but a rdfs:Property that tracker-store manages internally).

Just like with tracker:id(), tracker:url(), tracker:added, tracker:writeback, tracker:indexed, tracker:notify, tracker:available, etc is the 'tracker:' prefix for this rdfs:Property also indicating a Tracker specific ontology. So if you use this, you are doing Tracker specific things.
Comment 13 Martyn Russell 2014-11-03 09:37:42 UTC
(In reply to comment #11)
> (In reply to comment #10)
> > ps. I would have expected a bit more constructive criticism from a professional
> > like you, Bastien.
> 
> It is my professional opinion that the Tracker API is horrible to use for
> application developers. This case isn't the only one where the API is causing
> problems.

Just because it's an exposed functionality doesn't mean users should be using
it. There are plenty of examples of APIs around GNOME that are meant for
advanced or specific purposes and not to be used unless understood properly.

Also, if you feel we could improve this in some way, we would love to see
patches, documentation or something to help that ;)
Comment 14 Philip Van Hoof 2014-11-03 09:38:55 UTC
Discussion and recent clarification here:
https://mail.gnome.org/archives/tracker-list/2014-September/msg00023.html

Original blog post documenting the feature here:
http://pvanhoof.be/blog/index.php/2011/01/31/synchronizing-your-applications-data-with-trackers-rdf-store
Comment 15 Philip Van Hoof 2014-11-03 09:41:35 UTC
Ah and Martyn added that explanation to official documentation, too:
https://wiki.gnome.org/Projects/Tracker/Documentation/SparqlInternals