GNOME Bugzilla – Bug 697278
sqlite vfs + sync operation
Last modified: 2019-11-13 09:24:31 UTC
I ran into e-sqlite3-vfs.c while debugging a mysterious sqlite "database disk image is malformed" problem: when using an sqlite compiled with --enable-debug, sqlite aborts because it is called incorrectly by EDS. This was reported before and fixed in similar code in Camel (bug #688926). After looking at e-sqlite3-vfs.c I've come to the conclusion that it has more serious problems. From the commit message of a patch addressing these problems: When closing an sqlite file, EDS unconditionally called the original sqlite sync operation, even if no sync had been requested. This triggers a fatal assert when sqlite is compiled with --enable-debug. At first glance it seems to have no negative effect otherwise. Was reported before and fixed in Camel (BGO #688926). More worrying is that the EDS code for syncing during closing the file drops the flags also if there had been a previous, pending sync request: - sync_push_request copies the flags to SyncRequestData and clears them in ESqlite3File. - e_sqlite3_file_xClose cancels that pending request and calls sync_push_request, which then runs with empty flags. This happens in practice when writing and then quickly closing the address book. Furthermore, the intended logic for accumulating sync flags with a simple binary "or" is dangerous. Suppose xSync is called with SQLITE_SYNC_NORMAL, followed by a second xSync with SQLITE_SYNC_NORMAL|SQLITE_SYNC_DATAONLY. Simply or-ing the flags leads to a single SQLITE_SYNC_NORMAL|SQLITE_SYNC_DATAONLY which is weaker than the original sequence of two syncs. If g_thread_pool_push fails, the code incorrectly assumes that the operation was pushed, whereas the GThreadPool documentation says that the error merely indicates failures to create more threads, while still adding the data. If this error happens, the sqlite file may get closed while there is still a thread pool operation pending which then will probably crash once it runs due to stale pointers. Finally, there is a race condition: - sync_push_request_timeout is called, pushes the sync operation into the thread pool. - xClose is called, skips cancelling the timeout because there is none, pushes a second sync and waits for that to complete. Then it returns and the file gets freed. - The older sync operation runs (or continues to run) with stale pointers. This can happen regardless whether GThreadPool is first in/first out (which is not guaranteed explicitly, only implied by "list of tasks"), because the initial sync might run longer than the second one.
Created attachment 240620 [details] [review] rewrite logic for syncing data to disk At this time I am only looking for confirmation that these problems exist and whether the patch makes sense in general. It passes several of the SyncEvolution tests, but I have not run all of them yet (still in progress).
mbarnes asked whether the whole vfs approach is still needed. I'm not sure, we should test. As far as I understand it, the code has two purposes: 1. Run the sync operation in a thread, outside of the main event loop. 2. Rate-limit the number of syncs. The first point may be mute if the writing itself already runs in a separate thread; right now, using the thread is necessary because the delayed sync would run in the main thread otherwise. The sqlite way of getting better performance at the expense of less safety is PRAGMA synchronous = NORMAL; http://www.sqlite.org/pragma.html#pragma_synchronous The sqlite backend does not seem to use that. It definitely should be added, because we are not going to execute all the extra sync calls that the default triggers anyway. With PRAGMA synchronous = NORMAL added, a performance comparison should be done between EDS with and with vfs (or perhaps easier, a vfs that doesn't wrap xSync). That'll give us some data to decide whether the vfs really serves a useful purpose.
(In reply to comment #2) > With PRAGMA synchronous = NORMAL added, a performance comparison should be done > between EDS with and with vfs "with and without"
Note that in the context of Evolution, GIO itself insists on syncing every time we write something to disk through the local file backend ("file:///"). As I understand it, syncing is usually per-device, not per-file. Which leads me to wonder if GIO's behavior is negating whatever performance boost we may think we attained (or perhaps actually did attain) through these sqlite hacks.
(In reply to comment #4) > Note that in the context of Evolution, GIO itself insists on syncing every time > we write something to disk through the local file backend ("file:///"). But does reading/writing sqlite use GIO? I don't think so. The way I understand the def_subclassed/use_subclassed macros, only xOpen/Close/Sync do something special, and that doesn't involve GIO. All of the actual file access uses the default Unix support in sqlite.
Correct, sqlite does not use GIO. My point was GIO ends up syncing the filesystem fairly frequently, so I'm not sure how much good it's doing to limit how frequently sqlite does it.
Review of attachment 240620 [details] [review]: OK, I'd say the patch looks good, but I cannot apply it to git master, none of the 6 hunks apply there. Could you check the 'wait_for_finish' thing and update the patch thus it'll be applicable to git master and gnome-3-8 branch of eds, please? Then I'll reread your changes in the code, even I'm fine with the change as such. ::: libebackend/e-sqlite3-vfs.c @@ +91,3 @@ +sync_push_request_locked (ESqlite3File *cFile, + gint flags, + gboolean wait_for_finish) The 'wait_for_finish' is always FALSE, thus it can be removed, or used differently. @@ +129,3 @@ + if (error) + g_error_free (error); I found g_clear_error() convenient.
Created attachment 243380 [details] [review] patch for 3.8 This patch does *not* yet include the fixes requested in comment #7. I'm just recording it here to avoid loosing it. Before pursuing this further, I want to convince myself that controlling the rate of syncs is worthwhile - in other words, compare against a run with sqlite in PRAGMA synchronous = NORMAL mode.
(In reply to comment #8) > This patch does *not* yet include the fixes requested in comment #7. I'm just > recording it here to avoid loosing it. Due to this I mark just as reviewed, expecting newer version in the future :)
Tristan, you asked about the sqlite vfs on IRC. I have this issue open about that.
I've made small test to compare performance of sqlite with and without using vfs. During tests I was creating 1000 contacts and measured time required to execute COMMIT command. Additionally I've checked number of xSync calls made by sqlite when using vfs, I've checked what is real source of actually doing sync operation (whether it was done by request from thread pool or was forced by xClose call). Results showed that with vfs performance is about 6 times better that without it (even when using PRAGMA synchronous = NORMAL, actually performance is more similar to using PRAGMA synchronous = OFF, I'm not giving exact numbers as this may vary from machine to machine). During execution of COMMIT sqlite is using 3 files - contacts.db, contacts.db-journal and temporary db and tries to sync all of them at the end of COMMIT. Journal and temporary db are closed at the end of COMMIT, so their sync are forced by xClose call when using vfs, but sync of contacts.db is deferred and COMMIT finishes quicker - this is why there is so big performance boost. Contacts.db will be synced depending on use case, if EBookClient was created just to add or modify contacts and will be closed right after that, sync will be forced by xClose, as contacts.db will be closed, if EBookClient won't be closed sync will be processed by thread from thread pool (currently sync is deferred by 5 second, which may be dangerous if there will be e.g. power failure in that time, data may be lost).
When used as storage for cached phone address books, SyncEvolution will add contacts retrieved via PBAP to EDS in batches. When you say "creating 1000 contacts", was that one contact at a time or in batches? If it was one at a time, then please redo the experiment with batches. Regarding data loss, can you estimate or experiment what the impact of power loss will be in the period between adding a contact and syncing? Is merely the new data lost or the entire sqlite DB corrupted? My (preliminary) conclusion is that the delayed sync is useful as a workaround for some pathological cases (importing many contacts one-by-one); a better solution would be to fix the apps doing that, or accept that it'll be slow. If that's not acceptable upstream, then I would prefer to have a source configuration option: sync = off/delayed/normal/full (or something like this). Then system builders can choose the optimal behavior for their system.
(In reply to comment #12) > When used as storage for cached phone address books, SyncEvolution will add > contacts retrieved via PBAP to EDS in batches. When you say "creating 1000 > contacts", was that one contact at a time or in batches? I was adding in batches (e_book_client_add_contacts_sync) > Regarding data loss, can you estimate or experiment what the impact of power > loss will be in the period between adding a contact and syncing? Is merely the > new data lost or the entire sqlite DB corrupted? I will try to reproduce such case and see what will happen.
(In reply to comment #12) > My (preliminary) conclusion is that the delayed sync is useful as a workaround > for some pathological cases (importing many contacts one-by-one); a better > solution would be to fix the apps doing that, or accept that it'll be slow. One of the reasons was that the sync calls also fsync, which could on some partition types (I think it was in ext3 types) flush the memory pending data from all applications into the disk, making the whole system stuck, sometimes for several seconds. If I recall correctly, it was also flushing firefox caches to disk.
Is Patrick’s patch still relevant 5 years later?
I guess no. The code itself also changed, the patch doesn't apply. And as there were no other reports, it seems to be "fine" in real life usage. Let's assume it is. Feel free to reopen, if I'm wrong. Thanks in advance.