GNOME Bugzilla – Bug 778649
Multiple reliability issues with history service, causing massive loss of history
Last modified: 2017-02-21 16:12:21 UTC
I noticed that snapshots aren't being saved properly. Tracked it down to a history DB problem. The thumbnail mtimes are saved in the history DB in the URLs table, and not being updated properly. Why? Well, because nothing's being updated properly. The history service is pretty much totally broken, and has been for a long time. It seems to work sometimes -- I don't know why -- for a short while, then it gets locked and stays locked and all SQLite operations fail. Most of the history you see in the history dialog is stored in memory. This sucks! I pushed a commit that causes the history DB to actually print warnings when an error occurs. You will now see a bunch of errors like this: ** (epiphany:15240): WARNING **: Could not commit idle history database transaction: database is locked ** (epiphany:15240): WARNING **: Could not start long-running history database transaction: cannot start a transaction within a transaction Since the transaction can never be committed, DB changes pile up in ephy-history.db-journal and never get moved to the DB itself. The journal gets deleted when ephy quits and all the changes disappear. Problem: **I don't know why it's getting locked.** There seems to be a bunch of different reasons why this can happen and it requires some expertise in SQLite to figure out what's going on.
(In reply to Michael Catanzaro from comment #0) > Problem: **I don't know why it's getting locked.** There seems to be a bunch > of different reasons why this can happen and it requires some expertise in > SQLite to figure out what's going on. I figured it out. It's the search provider.
The search provider sets the history service to read-only mode, but the history service does not use SQLite in read-only mode. It's therefore taking unnecessary write locks. All database transactions fail for one minute after searching for anything in the shell overview (the timeout for our search provider).
Adding some related fixes... Unfortunately the test I added is racy, and my attempt to fix that is broken for some reason, which is why I haven't pushed this yet.
Created attachment 346173 [details] [review] Fix search provider horribly breaking history service If the search provider is running, all database transactions will fail because the search provider will take a write lock on the database. Ouch! This is worth a good string of profanities.... Notably, this causes opening the database to fail if you searched for anything in the shell overview in the minute prior to starting Epiphany. (One minute is our search provider timeout.) Then no history will ever get saved, ever. I think. Something like that. So, although our history service has read-only mode, it's enforced at the history service level, not the SQLite connection level. SQLite actually has a read-only mode, which we are not using, and which we need to use in the search provider if we want to have any chance of reliably saving history. Accordingly, give EphySQLiteConnection a mode property, to indicate whether it is in writable mode or read-only mode. Teach all callers to set it properly. Use it, rather than a boolean, when creating the EphyHistoryService, since boolean parameters are hard to read at call sites. And actually put the underlying SQLite connection in read-only mode when set. Don't open transactions or ever attempt to rollback in read-only mode, because that doesn't make any sense. This should never have been happening due to the history service level read-only checks, but it should be enforced at the SQLite connection level now, too. Avoid initializing tables when opening the database in read-only mode. This is obviously writing to the database, and now that we really have a read-only SQLite connection it fails. As it should. SQLite connection creation will now fail in case the connection is read-only and the database does not yet exist; it will no longer be created anyway. So handle this case gracefully. It's fine for the history service to return nothing in this case. This has the small advantage that the history thread will quit immediately after it's created in this case, so it's not constantly running if there's no history in incognito mode anymore. To check for this condition, we expose the underlying SQLite error; previously, only the error message was exposed outside of EphySQLiteConnection. Exposing the error isn't really necessary or sufficient, though, since it's super generic and we have to check if the file actually exists on disk anyway. Test it. Ensure that a read/write history service functions properly if it's running at the same time as a read-only history service. Using two read/write services here fails very badly, but when one of the services is read-only it works fine. Also, remove the original read-only service test. It only ever tested that creating a read-only history service with an empty history database would succeed. And, as I just explained, that fails now. Lastly, stop running a second history service for the search provider. It needed its own once upon a time when the search provider did not run an EphyShell instance. That changed when we stopped using the default web context, because nothing works without EphyEmbedShell now, as all sorts of places need it to get the embed's web context. And since EphyEmbedShell runs its own history service, the search provider can just use that now instead of running its own separate one.
Created attachment 346174 [details] [review] history-service: Ensure thread member is initialized before use We have assertions to ensure that several functions are only ever called on the history thread. But the first such assertion, at the top of run_history_service_thread, sometimes fails when running the tests. It is racy. Use a mutex to fix this. These assertions are actually executed at runtime for end users, so it's surprising that nobody has ever reported a bug about this. We also need to be sure to initialize the async queue before running the history service thread. The mutex is needed as a memory barrier here, so it's not possible to remove the mutex by removing the assertions except in debug mode, which is something I considered.
Created attachment 346175 [details] [review] history-service: Fix leak when clearing all history Closing the connection is great, but not enough. We're leaking our wrapper object.
Created attachment 346176 [details] [review] history-service: Remove Yoda conditions There's no excuse for this....
Created attachment 346192 [details] [review] history-service: Don't schedule commit after clearing history Now that clear all is implemented by deleting the database file, there's no longer any need to schedule a commit here.
Created attachment 346193 [details] [review] history-service: Remove incorrect comment The code does something different, and it's not complex enough to merit a comment anyway.
Created attachment 346194 [details] [review] history-service: Fix multiple initialization race conditions This started out as a project to fix the read-only service test I just added. Initializing two history service objects in a row was racy, because I needed the first history service to be initialized before creating the second one, but there was no way to ensure that. This was only an issue for this one test, though; real Epiphany browser mode of course only creates one history service, so I assumed it was not a big problem. Fix this first issue using a condition variable to ensure the GObject initialization doesn't complete until after the history service has actually created the SQLite database. In doing this, I discovered a second bug. The use of the condition variable altered the timing slightly, and caused the history filename property to not be set in time when entering the history service thread. In fact, it's kind of amazing that the history service ever worked at all, because there is absolutely nothing here to guarantee that the filename and read-only properties have been initialized prior to starting the history service thread. So the database filename could be NULL when opening the database, which is a great way to lose all your history. Also, it could also be in read-only mode here even if it is supposed to be read/write mode, which is going to cause failures after today's commits. Fix this by adding a constructed function and starting the history thread from there, instead of doing it in init. This means that the history thread will not be started until after properties have been set. Note that, while I could not reproduce this bug on my machine until after adding the condition variable to fix the first bug, that was just due to timing and luck; it was already broken before.
Attachment 346173 [details] pushed as 9771c52 - Fix search provider horribly breaking history service Attachment 346174 [details] pushed as 594e489 - history-service: Ensure thread member is initialized before use Attachment 346175 [details] pushed as d75b14b - history-service: Fix leak when clearing all history Attachment 346176 [details] pushed as cfd9e13 - history-service: Remove Yoda conditions Attachment 346192 [details] pushed as 492b0fc - history-service: Don't schedule commit after clearing history Attachment 346193 [details] pushed as 132e08c - history-service: Remove incorrect comment Attachment 346194 [details] pushed as 2cb839c - history-service: Fix multiple initialization race conditions
The following fix has been pushed: fd19259 history-service: Fix write to database in read-only mode
Created attachment 346198 [details] [review] history-service: Fix write to database in read-only mode Now that SQLite enforces read-only mode for us, bugs like this will be uncovered....
The following fix has been pushed: 2237adc Fix theoretical race condition in ephy_history_service_add_visit_row
Created attachment 346295 [details] [review] Fix theoretical race condition in ephy_history_service_add_visit_row The design of the history service feels like one big footgun. I'm really not sure why a history thread is necessary at all, or why we have longstanding transactions (defeating the entire purpose of transactions) instead of just using autocommit, which I think would be sufficient for everything we do. This commit doesn't fix any of that. That's just a rant. This commit just fixes one theoretical race condition. Prepared statements lock the database and need to be finalized BEFORE commit. The current code only works if the prepared statement is finalized on the UI thread before the scheduled commit occurs on the history thread. Which is probably always, but let's not leave it to luck. I could see this leading to a small loss of the last bit of history when closing the browser.
(In reply to Michael Catanzaro from comment #2) > The search provider sets the history service to read-only mode, but the > history service does not use SQLite in read-only mode. It's therefore taking > unnecessary write locks. > > All database transactions fail for one minute after searching for anything > in the shell overview (the timeout for our search provider). This was seriously exacerbated by bug #753101, which explains why I have no history at all for many days. :)