After an evaluation, GNOME has moved from Bugzilla to GitLab. Learn more about GitLab.
No new issues can be reported in GNOME Bugzilla anymore.
To report an issue in a GNOME project, go to GNOME GitLab.
Do not go to GNOME Gitlab for: Bluefish, Doxygen, GnuCash, GStreamer, java-gnome, LDTP, NetworkManager, Tomboy.
Bug 738965 - [SQLite VFS] Crash due to missing xFetch definition
[SQLite VFS] Crash due to missing xFetch definition
Status: RESOLVED FIXED
Product: evolution
Classification: Applications
Component: general
3.12.x (obsolete)
Other Linux
: Normal normal
: ---
Assigned To: Evolution Shell Maintainers Team
Evolution QA team
: 739881 (view as bug list)
Depends on:
Blocks:
 
 
Reported: 2014-10-21 21:33 UTC by Paul Menzel
Modified: 2014-11-11 16:58 UTC
See Also:
GNOME target: ---
GNOME version: ---



Description Paul Menzel 2014-10-21 21:33:02 UTC
With Evolution 3.12.7.1 on Debian Sid/unstable Evolution crashes reliably:

[ 1126.970356] pool[6371]: segfault at 0 ip   (null) sp a67d26ec error 14
[ 1394.760544] pool[10798]: segfault at 0 ip   (null) sp 9fbdf6ac error 14
[  582.541377] pool[3845]: segfault at 0 ip   (null) sp a16476ec error 14
[ 1126.832550] pool[6744]: segfault at 0 ip   (null) sp a2d176ec error 14
[  673.607899] pool[3538]: segfault at 0 ip   (null) sp a56736ec error 14
[ 2775.313991] pool[12034]: segfault at 0 ip   (null) sp a61ff6ec error 14
[ 3039.843309] pool[14207]: segfault at 0 ip   (null) sp a20ed6ec error 14
[ 7947.811380] pool[14300]: segfault at 0 ip   (null) sp a1c1135c error 14
[ 8178.838744] pool[15437]: segfault at 0 ip   (null) sp a173635c error 14
[ 8366.427523] pool[16504]: segfault at 0 ip   (null) sp a343c6ec error 14

It is no memory error as this happens on two systems where I attach the same disk.
Comment 1 Paul Menzel 2014-10-21 21:50:35 UTC
Here is the backtrace.

Core was generated by `evolution'.
Program terminated with signal SIGSEGV, Segmentation fault.

Comment 2 Paul Menzel 2014-10-22 06:16:51 UTC
Starting in the calendar view, I do not experience any crashes. My guess it, it has to do with WebKit.
Comment 3 Paul Menzel 2014-10-22 06:29:01 UTC
(In reply to comment #2)
> Starting in the calendar view, I do not experience any crashes. My guess it, it
> has to do with WebKit.

Scratch that. It crashed again, although I only used the calendar view.
Comment 4 Paul Menzel 2014-10-22 07:27:45 UTC
This is a 32-bit user space and it happens with a 32-bit Linux kernel (see above) and also with a 64-bit Linux kernel.

    [  676.728983] pool[3732]: segfault at 0 ip           (null) sp 00000000e10f635c error 14
    [  888.944668] pool[5374]: segfault at 0 ip           (null) sp 00000000e010835c error 14
Comment 5 Milan Crha 2014-10-22 08:15:23 UTC
Thanks for a bug report This looks like a crash in sqlite, I would try to downgrade it (it helped to someone else).
Comment 6 Paul Menzel 2014-10-22 08:35:42 UTC
(In reply to comment #5)
> Thanks for a bug report This looks like a crash in sqlite, I would try to
> downgrade it (it helped to someone else).

Milan, thanks a lot for the fast reply. I’ll test it tonight and hopefully downgrading is going to fix this problem.

There is also Debian bug report #765812 [1] in Debian [1].

[1] https://bugs.debian.org/765812
Comment 7 Paul Menzel 2014-10-22 08:40:48 UTC
(In reply to comment #0)
> With Evolution 3.12.7.1 on Debian Sid/unstable Evolution crashes reliably:

[…]

In my original report it is of course Evolution 3.12.7 and Evolution-Data-Server 3.12.7.1.
Comment 8 Paul Menzel 2014-10-22 08:47:47 UTC
There is a thread on the mailing list evolution-list [1] and the downstream report in Arch Linux is #42455 [2].


[1] https://mail.gnome.org/archives/evolution-list/2014-October/thread.html#00123
[2] https://bugs.archlinux.org/task/42455
Comment 9 Paul Menzel 2014-10-22 22:32:03 UTC
I sent a message to the list sqlite-user@sqlite.org, but it was not delivered for some reason.
Comment 10 D. Richard Hipp 2014-10-23 01:41:57 UTC
In the stack trace linked in Comment 1 above, in Thread 45, I see that the SQLite routine sqlite3OsRead() invokes an external routine named camel_sqlite3_file_xRead().  From this I presume that evolution is using a custom VFS for SQLite that is implemented in the file named "camel-db.c".  Is that correct?

Meanwhile, on Thread 20, the sqlite3OsFetch() routine is called, which then calls the xFetch method in the VFS object.  (See https://www.sqlite.org/c3ref/io_methods.html for more information on the sqlite3_io_methods object that holds the xFetch method.)  But, if I'm reading the stack trace correctly, the xFetch pointer is NULL.  If that is correct, it suggests that the "camel-db" VFS is not set up correctly.  The xFetch pointer of the sqlite3_io_methods object needs to point to a valid implementation of that function if the iVersion value is greater than 2.  (In fairness to the implementers of the camel-db VFS, this requirement needs to be more clearly stated in the SQLite documentation.)

All of the above is conjecture based on a sketchy stack trace.  But it seems like a plausible conjecture.  Can somebody who knows how to locate the "camel-db.c" source file please investigate?  

If I'm right, then the likely solution is to just initialize the sqlite3_io_methods.iVersion field of the camel-db VFS to 2 instead of 3.

This problem would never have come up prior SQLite 3.8.7 because in earlier versions the xFetch method was only used if memory-mapped I/O was enabled using the "PRAGMA mmap_size=N" statement.  But beginning in 3.8.7, the xFetch method might also be used for large sort operations.
Comment 11 Milan Crha 2014-10-23 09:17:36 UTC
(In reply to comment #10)
> In the stack trace linked in Comment 1 above, in Thread 45, I see that the
> SQLite routine sqlite3OsRead() invokes an external routine named
> camel_sqlite3_file_xRead().  From this I presume that evolution is using a
> custom VFS for SQLite that is implemented in the file named "camel-db.c".  Is
> that correct?

Thanks for the investigation. You are absolutely right, camel-db provides its own SQLite VFS to have delayed writes to a disk.

> If I'm right, then the likely solution is to just initialize the
> sqlite3_io_methods.iVersion field of the camel-db VFS to 2 instead of 3.

We inherit that version from the default VFS:
https://git.gnome.org/browse/evolution-data-server/tree/camel/camel-db.c#n324

> This problem would never have come up prior SQLite 3.8.7 because in earlier
> versions the xFetch method was only used if memory-mapped I/O was enabled using
> the "PRAGMA mmap_size=N" statement.  But beginning in 3.8.7, the xFetch method
> might also be used for large sort operations.

I see, that explains why it didn't strike earlier. We overwrite only few methods, we pass the rest to the old vfs, as can be seen here:
https://git.gnome.org/browse/evolution-data-server/tree/camel/camel-db.c#n175

It also shows that the xFetch is missing from the list. I'm wondering how to do this more robust, to not be tight to exact SQLite version, in case there will be added a new IO method or some removed. The:
   io_methods = *cFile->old_vfs_file->pMethods;
didn't work well probably, I do not recall.
Comment 12 Milan Crha 2014-10-23 10:04:41 UTC
(In reply to comment #11)
> The:
>    io_methods = *cFile->old_vfs_file->pMethods;
> didn't work well probably, I do not recall.

Right, it just crashes if done this simple. I changed the code to be able to "adapt" to future additions. API changes are recognized during build time.
Thanks a lot Richard for the pointers.

Created commit 01cd4a6 in eds master (3.13.7+) [1]
Created commit 6e5e4fd in eds evolution-data-server-3-12 (3.12.8+)

[1] https://git.gnome.org/browse/evolution-data-server/commit/?id=01cd4a6
Comment 13 Milan Crha 2014-11-11 16:58:50 UTC
*** Bug 739881 has been marked as a duplicate of this bug. ***