GNOME Bugzilla – Bug 652643
Add PersonaStore cache
Last modified: 2011-08-01 18:22:30 UTC
As discussed at the hackfest, we need to add a cache layer in libfolks which can be used by backends which may go offline. There should be a cache per PersonaStore, which just caches Personas from that store. Each cache should be a single file with atomic commits. It can then just be updated right before going offline, or similar. If the PersonaStore is then used again while the account is offline, it can read from the cache instead of from Tp.
Created attachment 190140 [details] [review] First attempt at a caching layer in Tpf.PersonaStore http://cgit.collabora.com/git/user/pwith/folks/log/?h=652643-tp-contacts-caching Here's a first attempt, pushed to Bugzilla really fast because Bastien wants to go to the pub. I've tested it lightly, and it doesn't crash.
Does this cache solves bug #648922 by any chance?
(In reply to comment #2) > Does this cache solves bug #648922 by any chance? As it stands, the branch caches the URI of each Persona's cached avatar in the Telepathy avatar cache, so I think it would solve bug #648922 if Telepathy has cached the Persona's avatar and notified folks of the cache file URI by the time the user restarts Empathy. I haven't tested this though.
*** Bug 648922 has been marked as a duplicate of this bug. ***
(mass changing milestones)
Review of attachment 190140 [details] [review]: general ======= * needed to update the (nullability of) Trf.avatar and Swf.avatar. Fixed in my branch. * there was a fairly serious typo in ObjectCache.load_objects() where it was checking the length of 'variant' but iterating over 'objects_variant'. Fixed in my branch * the docs need to note that Tpf.Persona is no longer guaranteed to have a valid TpContact (since the cached personas don't have them) * have you considered the performance characteristics of going through GVariant/GFile for the cache? Considering we're only caching several-strings'-worth of data per Persona, I imagine it should be OK, but we obviously don't want to have to change the format in 6 months if we discover a problem we can't fix in-place. I think a reasonable scope would be several accounts with about a thousand contacts each. This could make a good test. * Folks clients will need to no longer assume that all Tpf.Personas have a valid TpContact at all times. I've created branches for gnome-contacts and empathy to fix this, but we should clarify it in the docs (and note it in the NEWS entry for this change). object-cache.vala ================= + protected abstract VariantType get_serialised_object_type (); Shouldn't this be a property? notify::serialised-object-type won't be used, but it seems preferable to explicitly creating a get_*() function. + // Unpack the stored data + var version = variant.get_child_value (0).get_byte (); Before this section, we should validate that variant has the number of children we expect in the same way we check is_normal_form(). trf-persona-store.vala ====================== Why aren't these async? * _store_cache() * _load_cache() They're dependent upon I/O, so they could take a Very Long Time. We would have to drop or serialize multiple instances to prevent them from modifying the cache file at the same time, but that should be workable. tpf-persona-store-cache.vala ============================ + * Each {@link Tpf.Persona} is stored as a serialised {@link Variant} which is + * a tuple containing the following fields: + * # UID (`s`) + * # IID (`s`) + * # IM address (`s`) + * # Protocol (`s`) + * # Set of group names (`as`) + * # Favourite? (`b`) + * # Alias (`s`) + * # In contact list? (`b`) + * # Avatar file URI (`s`) It seems like this class should have a version number of its own (like ObjectCache._FILE_FORMAT_VERSION, but distinct), since it clearly could change as we add more fields to the cache. We might want to do this through an abstract int in ObjectCache (implemented in each derived class), so OC itself could contain the logic for validating and auto-deleting old caches. + foreach (var protocol in persona.im_addresses.get_keys ()) + { + im_protocol = protocol; + break; + } This is a little odd - why not just get the first key explicitly? + new Variant.string (persona.display_id), + new Variant.string (im_protocol), ... + var id = variant.get_child_value (2).get_string (); + var protocol = variant.get_child_value (3).get_string (); Please use the same identifiers in both places for consistency.
Created attachment 190892 [details] [review] Tpf.PersonaStore cache (with some fixes) Patches for branch: http://cgit.collabora.com/git/user/treitter/folks.git/log/?h=652643-tp-contacts-caching-3
We should also add a handful of tests before merging this. Caching can be a nightmare if it's buggy. In particular, I think we should add these cases: * storing and loading thousands of contacts within a reasonable amount of time (eg, pick some maximum time value and make that a hard requirement). * successfully identifying and removing hand-corrupted cache files * identifying and removing version mismatched cache files, including cases: * PersonaStore cache version < supported version * PersonaStore cache version > supported version * ObjectCache version < supported version * ObjectCache version > supported version
Review of attachment 190892 [details] [review]: We should either put the cache files under .cache/folks/<backend name> (my preference) or use the full PersonaStore uid for the file path, not just the iid.
(In reply to comment #6) > Review of attachment 190140 [details] [review]: > > general > ======= > > * needed to update the (nullability of) Trf.avatar and Swf.avatar. Fixed in my > branch. > > * there was a fairly serious typo in ObjectCache.load_objects() where it was > checking the length of 'variant' but iterating over 'objects_variant'. Fixed > in my branch > > * the docs need to note that Tpf.Persona is no longer guaranteed to have a > valid > TpContact (since the cached personas don't have them) All good catches. Thanks. > * have you considered the performance characteristics of going through > GVariant/GFile for the cache? Considering we're only caching > several-strings'-worth of data per Persona, I imagine it should be OK, but we > obviously don't want to have to change the format in 6 months if we discover > a > problem we can't fix in-place. This is the first time I've used GVariant files for storing data. The documentation for GVariant boasts about how fast it is, and I didn't notice it being slow, but that was with only a small data set. IIRC my cache file was ~10KiB in size, and I guess most of that was avatars. Changing the format should be fairly painless, since it's versioned. That was the intention, anyway. Is there some way this won't work? > I think a reasonable scope would be several accounts with about a thousand > contacts each. This could make a good test. Agreed. > * Folks clients will need to no longer assume that all Tpf.Personas have a > valid > TpContact at all times. I've created branches for gnome-contacts and empathy > to fix this, but we should clarify it in the docs (and note it in the NEWS > entry for this change). Nice work. Also agreed about the docs changes. > object-cache.vala > ================= > > + protected abstract VariantType get_serialised_object_type (); > > Shouldn't this be a property? notify::serialised-object-type won't be used, but > it seems preferable to explicitly creating a get_*() function. I thought about this briefly, but decided to use a method instead because it should be returning a constant, static value. I've always felt icky about using properties for stuff like that, since as you say, none of the property-like properties of the properties (such as notification and a setter) would ever be useful. > + // Unpack the stored data > + var version = variant.get_child_value (0).get_byte (); > > Before this section, we should validate that variant has the number of children > we expect in the same way we check is_normal_form(). Thinking about this a little more, I mucked up the version checking completely. The version number needs to apply to the format of the serialised objects as well as to the wrapper format, which it doesn't currently do. The code which deserialises the wrapper needs to check the version field before attempting to deserialise the rest of the wrapper (let alone the objects), as otherwise it'll use the wrong GVariantType to attempt to deserialise the wrapper. > trf-persona-store.vala > ====================== > > Why aren't these async? > > * _store_cache() > * _load_cache() > > They're dependent upon I/O, so they could take a Very Long Time. We would have > to drop or serialize multiple instances to prevent them from modifying the > cache file at the same time, but that should be workable. They're currently pseudo-async, in that they .begin() an async function and then return. However, looking at the places they're called from again I'm surprised that everything didn't explode spectacularly when I ran it. There are race conditions all over the place because of this. They should be made properly async. I think I didn't make them properly async in the first place because they were called from signal handlers or something equally brain-dead. > tpf-persona-store-cache.vala > ============================ > > + * Each {@link Tpf.Persona} is stored as a serialised {@link Variant} which is > + * a tuple containing the following fields: > + * # UID (`s`) > + * # IID (`s`) > + * # IM address (`s`) > + * # Protocol (`s`) > + * # Set of group names (`as`) > + * # Favourite? (`b`) > + * # Alias (`s`) > + * # In contact list? (`b`) > + * # Avatar file URI (`s`) > > It seems like this class should have a version number of its own (like > ObjectCache._FILE_FORMAT_VERSION, but distinct), since it clearly could change > as we add more fields to the cache. We might want to do this through an > abstract int in ObjectCache (implemented in each derived class), so OC itself > could contain the logic for validating and auto-deleting old caches. Yeah, as I said above. You're ahead of me. > + foreach (var protocol in persona.im_addresses.get_keys ()) > + { > + im_protocol = protocol; > + break; > + } > > This is a little odd - why not just get the first key explicitly? I couldn't see a way to do this with a MultiMap. > + new Variant.string (persona.display_id), > + new Variant.string (im_protocol), > ... > + var id = variant.get_child_value (2).get_string (); > + var protocol = variant.get_child_value (3).get_string (); > > Please use the same identifiers in both places for consistency. Good point. This code got a bit confused because I changed the object file format a few times.
Created attachment 191127 [details] [review] Tpf.PersonaStore cache (updated again) https://www.gitorious.org/folks/folks/commits/652643-tp-contacts-caching-3 Here's a branch which has been reworked quite a bit to fix the versioning issues, as well as all the other problems listed above. I've also added a test suite for the ObjectCache, but there aren't any tests for Tpf.PersonaStoreCache or its behaviour yet. Similarly, I haven't had a chance to test the updated code against anything except the new test suite.
(Also, I've just found out about `git commit --fixup=…` and started using it, which is why my style of fixup commit messages has changed.)
Any chance of this getting reviewed soon?
(In reply to comment #13) > Any chance of this getting reviewed soon? Sorry, I think I was waiting on this: > I've also added a test suite for the ObjectCache, but there aren't any tests > for Tpf.PersonaStoreCache or its behaviour yet. Similarly, I haven't had a > chance to test the updated code against anything except the new test suite. ...then lost track of it.
Review of attachment 191127 [details] [review]: Looks good. I especially like the test cases that are included in this.
I was doing some final testing when getting ready to commit this, and found that it introduces some refcounting problems. I'm still working through solving these, and I'll commit once they're fixed.
Comment on attachment 191127 [details] [review] Tpf.PersonaStore cache (updated again) I think I fixed the problems, but it'll need some more thorough real-life testing, since I've gone and touched the aggregator again. (Always a bad plan.) Ignore the “needs testing” note in the commit message for 67cf88527d96fcc345c18d5c3bf974f7fb11ae48; I forgot to remove it before pushing. :-( commit 7af3da6ca4342e1f951e29962bf63ccebcb69213 Author: Philip Withnall <philip@tecnocode.co.uk> Date: Wed Jul 27 23:44:43 2011 +0100 inspect: Add a linking command commit e532982f321ff623a476d6c211eba2895dc14199 Author: Philip Withnall <philip@tecnocode.co.uk> Date: Wed Jul 27 23:43:14 2011 +0100 individual: Fix linking when personas are moved between individuals weirdly It's possible (likely, even) for a set of personas to move between two individuals during linking. Previously, bad things would happen to persona.individual if the personas weren't correctly removed from the first individual before being added to the second (e.g. if they were added to the second individual before being removed from the first). This fixes that. commit f1e44427cfcda0a61dd0024deaf66abf0e8d98cf Author: Philip Withnall <philip@tecnocode.co.uk> Date: Tue Jul 26 22:56:44 2011 +0100 tests: Add a basic refcounting test for linked individuals commit 512bf8aff81106e3be92cb532079a3ef5f56c23a Author: Philip Withnall <philip@tecnocode.co.uk> Date: Tue Jul 26 22:32:51 2011 +0100 individual: Add more debug output to Individual commit 67cf88527d96fcc345c18d5c3bf974f7fb11ae48 Author: Philip Withnall <philip@tecnocode.co.uk> Date: Mon Jul 25 22:58:15 2011 +0100 aggregator: Ensure we don't remove Personas we're adding If a two Personas with the same UID are in the added and removed sets for a single call to personas_changed_cb() in the individual aggregator, make sure that we remove the old instance and replace it with the new instance, instead of just removing both instances. This covers the case where the real instance of a (Tpf.) Persona disappears and is immediately replaced by its cached copy. NOTE: This needs more testing. commit 5b90ebe8fde8229f30f56ea78feff17cea70ed34 Author: Philip Withnall <philip@tecnocode.co.uk> Date: Thu Jun 16 18:35:52 2011 +0100 Bug 652643 — Add PersonaStore cache Use ObjectCache in Tpf.PersonaStore. Closes: bgo#652643 commit 776adae7b0e61ef5d542d992d5819bcacadce70d Author: Philip Withnall <philip@tecnocode.co.uk> Date: Thu Jun 16 18:33:26 2011 +0100 Add a generic caching object to the core of folks This adds the ObjectCache API. Helps: bgo#652643 commit f84d14b88ea1aceee2f3586593a967fcb87ef181 Author: Philip Withnall <philip@tecnocode.co.uk> Date: Thu Jun 16 18:32:06 2011 +0100 Unset the self-contact on Tpf.PersonaStore when resetting the store
It would be good to have this in a release soon to have as much testing as possible.
(In reply to comment #18) > It would be good to have this in a release soon to have as much testing as > possible. I believe Travis is planning to make a 0.6.0 release in the next day or two.