After an evaluation, GNOME has moved from Bugzilla to GitLab. Learn more about GitLab.
No new issues can be reported in GNOME Bugzilla anymore.
To report an issue in a GNOME project, go to GNOME GitLab.
Do not go to GNOME Gitlab for: Bluefish, Doxygen, GnuCash, GStreamer, java-gnome, LDTP, NetworkManager, Tomboy.
Bug 778649 - Multiple reliability issues with history service, causing massive loss of history
Multiple reliability issues with history service, causing massive loss of his...
Status: RESOLVED FIXED
Product: epiphany
Classification: Core
Component: History
3.23.x
Other Linux
: Normal critical
: ---
Assigned To: Epiphany Maintainers
Epiphany Maintainers
Depends on:
Blocks:
 
 
Reported: 2017-02-15 05:59 UTC by Michael Catanzaro
Modified: 2017-02-21 16:12 UTC
See Also:
GNOME target: ---
GNOME version: ---


Attachments
Fix search provider horribly breaking history service (26.50 KB, patch)
2017-02-19 05:16 UTC, Michael Catanzaro
committed Details | Review
history-service: Ensure thread member is initialized before use (3.14 KB, patch)
2017-02-19 05:16 UTC, Michael Catanzaro
committed Details | Review
history-service: Fix leak when clearing all history (1.18 KB, patch)
2017-02-19 05:16 UTC, Michael Catanzaro
committed Details | Review
history-service: Remove Yoda conditions (3.79 KB, patch)
2017-02-19 05:16 UTC, Michael Catanzaro
committed Details | Review
history-service: Don't schedule commit after clearing history (906 bytes, patch)
2017-02-19 15:55 UTC, Michael Catanzaro
committed Details | Review
history-service: Remove incorrect comment (946 bytes, patch)
2017-02-19 15:55 UTC, Michael Catanzaro
committed Details | Review
history-service: Fix multiple initialization race conditions (7.23 KB, patch)
2017-02-19 15:55 UTC, Michael Catanzaro
committed Details | Review
history-service: Fix write to database in read-only mode (1.06 KB, patch)
2017-02-19 17:06 UTC, Michael Catanzaro
committed Details | Review
Fix theoretical race condition in ephy_history_service_add_visit_row (1.89 KB, patch)
2017-02-21 02:30 UTC, Michael Catanzaro
committed Details | Review

Description Michael Catanzaro 2017-02-15 05:59:48 UTC
I noticed that snapshots aren't being saved properly. Tracked it down to a history DB problem. The thumbnail mtimes are saved in the history DB in the URLs table, and not being updated properly. Why? Well, because nothing's being updated properly. The history service is pretty much totally broken, and has been for a long time. It seems to work sometimes -- I don't know why -- for a short while, then it gets locked and stays locked and all SQLite operations fail. Most of the history you see in the history dialog is stored in memory. This sucks!

I pushed a commit that causes the history DB to actually print warnings when an error occurs. You will now see a bunch of errors like this:

** (epiphany:15240): WARNING **: Could not commit idle history database transaction: database is locked
** (epiphany:15240): WARNING **: Could not start long-running history database transaction: cannot start a transaction within a transaction

Since the transaction can never be committed, DB changes pile up in ephy-history.db-journal and never get moved to the DB itself. The journal gets deleted when ephy quits and all the changes disappear.

Problem: **I don't know why it's getting locked.** There seems to be a bunch of different reasons why this can happen and it requires some expertise in SQLite to figure out what's going on.
Comment 1 Michael Catanzaro 2017-02-18 23:10:39 UTC
(In reply to Michael Catanzaro from comment #0)
> Problem: **I don't know why it's getting locked.** There seems to be a bunch
> of different reasons why this can happen and it requires some expertise in
> SQLite to figure out what's going on.

I figured it out.

It's the search provider.
Comment 2 Michael Catanzaro 2017-02-18 23:20:20 UTC
The search provider sets the history service to read-only mode, but the history service does not use SQLite in read-only mode. It's therefore taking unnecessary write locks.

All database transactions fail for one minute after searching for anything in the shell overview (the timeout for our search provider).
Comment 3 Michael Catanzaro 2017-02-19 05:15:38 UTC
Adding some related fixes...

Unfortunately the test I added is racy, and my attempt to fix that is broken for some reason, which is why I haven't pushed this yet.
Comment 4 Michael Catanzaro 2017-02-19 05:16:15 UTC
Created attachment 346173 [details] [review]
Fix search provider horribly breaking history service

If the search provider is running, all database transactions will fail
because the search provider will take a write lock on the database.
Ouch! This is worth a good string of profanities....

Notably, this causes opening the database to fail if you searched for
anything in the shell overview in the minute prior to starting Epiphany.
(One minute is our search provider timeout.) Then no history will ever
get saved, ever. I think. Something like that.

So, although our history service has read-only mode, it's enforced at
the history service level, not the SQLite connection level. SQLite
actually has a read-only mode, which we are not using, and which we need
to use in the search provider if we want to have any chance of reliably
saving history.

Accordingly, give EphySQLiteConnection a mode property, to indicate
whether it is in writable mode or read-only mode. Teach all callers to
set it properly. Use it, rather than a boolean, when creating the
EphyHistoryService, since boolean parameters are hard to read at call
sites. And actually put the underlying SQLite connection in read-only
mode when set.

Don't open transactions or ever attempt to rollback in read-only mode,
because that doesn't make any sense. This should never have been
happening due to the history service level read-only checks, but it
should be enforced at the SQLite connection level now, too.

Avoid initializing tables when opening the database in read-only mode.
This is obviously writing to the database, and now that we really have a
read-only SQLite connection it fails. As it should.

SQLite connection creation will now fail in case the connection is
read-only and the database does not yet exist; it will no longer be
created anyway. So handle this case gracefully. It's fine for the
history service to return nothing in this case. This has the small
advantage that the history thread will quit immediately after it's
created in this case, so it's not constantly running if there's no
history in incognito mode anymore. To check for this condition, we
expose the underlying SQLite error; previously, only the error message
was exposed outside of EphySQLiteConnection. Exposing the error isn't
really necessary or sufficient, though, since it's super generic and we
have to check if the file actually exists on disk anyway.

Test it. Ensure that a read/write history service functions properly if
it's running at the same time as a read-only history service. Using two
read/write services here fails very badly, but when one of the services
is read-only it works fine.

Also, remove the original read-only service test. It only ever tested
that creating a read-only history service with an empty history database
would succeed. And, as I just explained, that fails now.

Lastly, stop running a second history service for the search provider.
It needed its own once upon a time when the search provider did not run
an EphyShell instance. That changed when we stopped using the default
web context, because nothing works without EphyEmbedShell now, as all
sorts of places need it to get the embed's web context. And since
EphyEmbedShell runs its own history service, the search provider can
just use that now instead of running its own separate one.
Comment 5 Michael Catanzaro 2017-02-19 05:16:20 UTC
Created attachment 346174 [details] [review]
history-service: Ensure thread member is initialized before use

We have assertions to ensure that several functions are only ever called
on the history thread. But the first such assertion, at the top of
run_history_service_thread, sometimes fails when running the tests. It
is racy. Use a mutex to fix this.

These assertions are actually executed at runtime for end users, so it's
surprising that nobody has ever reported a bug about this.

We also need to be sure to initialize the async queue before running the
history service thread. The mutex is needed as a memory barrier here, so
it's not possible to remove the mutex by removing the assertions except
in debug mode, which is something I considered.
Comment 6 Michael Catanzaro 2017-02-19 05:16:24 UTC
Created attachment 346175 [details] [review]
history-service: Fix leak when clearing all history

Closing the connection is great, but not enough. We're leaking our
wrapper object.
Comment 7 Michael Catanzaro 2017-02-19 05:16:27 UTC
Created attachment 346176 [details] [review]
history-service: Remove Yoda conditions

There's no excuse for this....
Comment 8 Michael Catanzaro 2017-02-19 15:55:45 UTC
Created attachment 346192 [details] [review]
history-service: Don't schedule commit after clearing history

Now that clear all is implemented by deleting the database file, there's
no longer any need to schedule a commit here.
Comment 9 Michael Catanzaro 2017-02-19 15:55:49 UTC
Created attachment 346193 [details] [review]
history-service: Remove incorrect comment

The code does something different, and it's not complex enough to merit
a comment anyway.
Comment 10 Michael Catanzaro 2017-02-19 15:55:53 UTC
Created attachment 346194 [details] [review]
history-service: Fix multiple initialization race conditions

This started out as a project to fix the read-only service test I just
added. Initializing two history service objects in a row was racy,
because I needed the first history service to be initialized before
creating the second one, but there was no way to ensure that. This was
only an issue for this one test, though; real Epiphany browser mode of
course only creates one history service, so I assumed it was not a big
problem.

Fix this first issue using a condition variable to ensure the GObject
initialization doesn't complete until after the history service has
actually created the SQLite database.

In doing this, I discovered a second bug. The use of the condition
variable altered the timing slightly, and caused the history filename
property to not be set in time when entering the history service thread.
In fact, it's kind of amazing that the history service ever worked at
all, because there is absolutely nothing here to guarantee that the
filename and read-only properties have been initialized prior to
starting the history service thread. So the database filename could be
NULL when opening the database, which is a great way to lose all your
history. Also, it could also be in read-only mode here even if it is
supposed to be read/write mode, which is going to cause failures after
today's commits. Fix this by adding a constructed function and starting
the history thread from there, instead of doing it in init. This means
that the history thread will not be started until after properties have
been set. Note that, while I could not reproduce this bug on my machine
until after adding the condition variable to fix the first bug, that was
just due to timing and luck; it was already broken before.
Comment 11 Michael Catanzaro 2017-02-19 15:56:41 UTC
Attachment 346173 [details] pushed as 9771c52 - Fix search provider horribly breaking history service
Attachment 346174 [details] pushed as 594e489 - history-service: Ensure thread member is initialized before use
Attachment 346175 [details] pushed as d75b14b - history-service: Fix leak when clearing all history
Attachment 346176 [details] pushed as cfd9e13 - history-service: Remove Yoda conditions
Attachment 346192 [details] pushed as 492b0fc - history-service: Don't schedule commit after clearing history
Attachment 346193 [details] pushed as 132e08c - history-service: Remove incorrect comment
Attachment 346194 [details] pushed as 2cb839c - history-service: Fix multiple initialization race conditions
Comment 12 Michael Catanzaro 2017-02-19 17:06:39 UTC
The following fix has been pushed:
fd19259 history-service: Fix write to database in read-only mode
Comment 13 Michael Catanzaro 2017-02-19 17:06:43 UTC
Created attachment 346198 [details] [review]
history-service: Fix write to database in read-only mode

Now that SQLite enforces read-only mode for us, bugs like this will be
uncovered....
Comment 14 Michael Catanzaro 2017-02-21 02:30:16 UTC
The following fix has been pushed:
2237adc Fix theoretical race condition in ephy_history_service_add_visit_row
Comment 15 Michael Catanzaro 2017-02-21 02:30:23 UTC
Created attachment 346295 [details] [review]
Fix theoretical race condition in ephy_history_service_add_visit_row

The design of the history service feels like one big footgun. I'm really
not sure why a history thread is necessary at all, or why we have
longstanding transactions (defeating the entire purpose of transactions)
instead of just using autocommit, which I think would be sufficient for
everything we do.

This commit doesn't fix any of that. That's just a rant. This commit
just fixes one theoretical race condition. Prepared statements lock the
database and need to be finalized BEFORE commit. The current code only
works if the prepared statement is finalized on the UI thread before the
scheduled commit occurs on the history thread. Which is probably always,
but let's not leave it to luck.

I could see this leading to a small loss of the last bit of history when
closing the browser.
Comment 16 Michael Catanzaro 2017-02-21 16:12:21 UTC
(In reply to Michael Catanzaro from comment #2)
> The search provider sets the history service to read-only mode, but the
> history service does not use SQLite in read-only mode. It's therefore taking
> unnecessary write locks.
> 
> All database transactions fail for one minute after searching for anything
> in the shell overview (the timeout for our search provider).

This was seriously exacerbated by bug #753101, which explains why I have no history at all for many days. :)