GNOME Bugzilla – Bug 612082
Do not expose off_t in public API, use goffset instead
Last modified: 2010-10-01 13:07:32 UTC
Version: 2.30.x What were you doing when the application crashed? I just changed from one local mail folder to another. Distribution: Debian squeeze/sid Gnome Release: 2.28.2 2009-12-18 (Debian) BugBuddy Version: 2.28.0 System: Linux 2.6.32 #5 PREEMPT Fri Feb 19 14:20:50 CET 2010 i686 X Vendor: The X.Org Foundation X Vendor Release: 10604000 Selinux: No Accessibility: Disabled GTK+ Theme: QtCurve Icon Theme: Mist GTK+ Modules: globalmenu-plugin, globalmenu-gnome, gnomebreakpad, canberra-gtk-module Memory status: size: 183586816 vsize: 183586816 resident: 35954688 share: 23240704 rss: 35954688 rss_rlim: 18446744073709551615 CPU usage: start_time: 1267966142 rtime: 502 utime: 455 stime: 47 cutime:39 cstime: 8 timeout: 0 it_real_value: 0 frequency: 100 Backtrace was generated from '/usr/bin/evolution' [Thread debugging using libthread_db enabled] [New Thread 0xadc9bb70 (LWP 4653)] [New Thread 0xb17feb70 (LWP 4652)] [New Thread 0xae49cb70 (LWP 4559)] [New Thread 0xaec9db70 (LWP 4558)] [New Thread 0xaf7fab70 (LWP 4530)] [New Thread 0xb0ffdb70 (LWP 4529)] [New Thread 0xafffbb70 (LWP 4528)] [New Thread 0xb07fcb70 (LWP 4527)] [New Thread 0xb29d6b70 (LWP 4523)] [New Thread 0xb31d7b70 (LWP 4522)] [New Thread 0xb39fcb70 (LWP 4521)] [New Thread 0xb41fdb70 (LWP 4520)] 0xffffe424 in __kernel_vsyscall ()
+ Trace 220846
Thread 2 (Thread 0xadc9bb70 (LWP 4653))
---- Critical and fatal warnings logged during execution ---- ** evolution **: categories_icon_theme_hack: assertion `filename != NULL && *filename != '\0'' failed ----------- .xsession-errors (9 sec old) --------------------- (epiphany:3063): GLib-GObject-CRITICAL **: g_object_ref: assertion `object->ref_count > 0' failed (epiphany:3063): GLib-GObject-CRITICAL **: g_object_ref: assertion `object->ref_count > 0' failed (gnome-panel:3029): GLib-GObject-CRITICAL **: g_object_ref: assertion `object->ref_count > 0' failed empathy: /usr/lib/libxslt.so.1: no version information available (required by /usr/lib/libwebkit-1.0.so.2) empathy: /usr/lib/libxslt.so.1: no version information available (required by /usr/lib/libwebkit-1.0.so.2) empathy: /usr/lib/libxslt.so.1: no version information available (required by /usr/lib/libwebkit-1.0.so.2) empathy: /usr/lib/libxslt.so.1: no version information available (required by /usr/lib/libwebkit-1.0.so.2) Gtk-Message: Failed to load module "globalmenu-gnome": libglobalmenu-gnome.so: Kann die Shared-Object-Datei nicht \xf6ffnen: Datei oder Verzeichnis nicht gefunden ** (empathy:4549): WARNING **: _nm_object_get_property: Error getting 'WwanHardwareEnabled' for /org/freedesktop/NetworkManager: (16) No such property WwanHardwareEnabled kdeinit4: preparing to launch /usr/lib/kde4/kio_pop3.so Gtk-Message: Failed to load module "globalmenu-gnome": libglobalmenu-gnome.so: Kann die Shared-Object-Datei nicht \xf6ffnen: Datei oder Verzeichnis nicht gefunden --------------------------------------------------
I realized that it happens when I had selected the "Unwanted" or "Spam" folder and then try to change to another.
Created attachment 156995 [details] [review] Fix NULL pointer dereference This patch fixes these crashes here. Can you confirm?
I can confirm that the patch works. Also, bug #612174 appears to be a dupe.
*** Bug 614600 has been marked as a duplicate of this bug. ***
*** Bug 615200 has been marked as a duplicate of this bug. ***
(In reply to comment #3) > I can confirm that the patch works. Peter, are you sure that this patch is a reason for the fix? What I see in the source code is that there is pretty much no difference between before and after the patch, because the mem->buffer shouldn't be NULL for all the life time of the memory stream. Note that: > CamelStream * > camel_stream_mem_new (void) > { > return camel_stream_mem_new_with_byte_array (g_byte_array_new ()); > } >... >CamelStream * >camel_stream_mem_new_with_byte_array (GByteArray *buffer) >{ > CamelStreamMem *stream_mem; > > stream_mem = CAMEL_STREAM_MEM (camel_object_new (CAMEL_STREAM_MEM_TYPE)); > stream_mem->buffer = buffer; > stream_mem->owner = TRUE; > > return CAMEL_STREAM (stream_mem); > }
(In reply to comment #6) > Peter, are you sure that this patch is a reason for the fix? What I see in the > source code is that there is pretty much no difference between before and after > the patch, because the mem->buffer shouldn't be NULL for all the life time of > the memory stream. With the patch, mem->buffer isn't used in this case. I fully suspect that this merely papers over the real issue, but it definitely fixes these crashes.
I can confirm that the patch works, too. It resolves the issue described in Bug 616755 (where you'll find a backtrace with debugging symbols).
*** Bug 616755 has been marked as a duplicate of this bug. ***
*** Bug 612174 has been marked as a duplicate of this bug. ***
Could anyone of you give me the exact message structure with which it is reproducible, please? The best whole test message, but as it can contain confidential information, then I do not want to ask for it. I would like to know what's wrong here, because it doesn't make much sense to me. Also, what is the account type you are seeing this on? (IMAP/Local On This Computer/...) I tried with IMAP with this message structure and it works just fine for me, no issus so far, neither from valgrind: To: xxx ... Content-Type: multipart/mixed; boundary="_005_19A951631061BE428ED3215371653D6A7000windows2003r2exchan_" --_005_19A951631061BE428ED3215371653D6A7000windows2003r2exchan_ Content-Type: multipart/alternative; boundary="_000_19A951631061BE428ED3215371653D6A7000windows2003r2exchan_" --_000_19A951631061BE428ED3215371653D6A7000windows2003r2exchan_ Content-Type: text/plain; charset="us-ascii" Content-Transfer-Encoding: quoted-printable --_000_19A951631061BE428ED3215371653D6A7000windows2003r2exchan_ Content-Type: text/html; charset="us-ascii" Content-Transfer-Encoding: quoted-printable <html xmlns:v=3D"urn:schemas-microsoft-com:vml" xmlns:o=3D"urn:schemas-micr= ... </html> --_000_19A951631061BE428ED3215371653D6A7000windows2003r2exchan_-- --_005_19A951631061BE428ED3215371653D6A7000windows2003r2exchan_ Content-Type: application/pdf; name="DefaultID.pdf" Content-Description: DefaultID.pdf Content-Disposition: attachment; filename="DefaultID.pdf"; size=84253; creation-date="Wed, 16 Aug 2006 10:08:50 GMT"; modification-date="Wed, 16 Aug 2006 10:08:50 GMT" Content-Transfer-Encoding: base64 JVBERi0xLjYNJeLjz9MNCjEzIDAgb2JqDTw8L0xpbmVhcml6ZWQgMS9MIDgwNjUxL08gMTYvRSA5 ... MDAwIG4NCnRyYWlsZXINCjw8L1NpemUgMTM+Pg0Kc3RhcnR4cmVmDQoxMTYNCiUlRU9GDQo= --_005_19A951631061BE428ED3215371653D6A7000windows2003r2exchan_ Content-Type: application/pdf; name="AdobeID.pdf" Content-Description: AdobeID.pdf Content-Disposition: attachment; filename="AdobeID.pdf"; size=85672; creation-date="Wed, 16 Aug 2006 10:09:20 GMT"; modification-date="Wed, 16 Aug 2006 10:09:20 GMT" Content-Transfer-Encoding: base64 JVBERi0xLjYNJeLjz9MNCjEzIDAgb2JqDTw8L0xpbmVhcml6ZWQgMS9MIDgyMDcwL08gMTYvRSAx ... CnRyYWlsZXINCjw8L1NpemUgMTM+Pg0Kc3RhcnR4cmVmDQoxMTYNCiUlRU9GDQo= --_005_19A951631061BE428ED3215371653D6A7000windows2003r2exchan_--
Created attachment 159803 [details] Two mails that are causing a segfault Here two mails that are causing a segfault/crash. These are not the only ones, but the only ones I can post due to reasons of privacy. The account type does not matter, it always segfaults/crashes when opening such mails.
Thanks for test messages. I just tried with 2.30.1 and I do not see any crash with them, no matter what I have set in Edit->Preferences->Mail Preferences->tab "HTML Messages", section "Plain Text Mode". Either I'm doing something wrong, or it got fixed by something else meanwhile.
I can always reproduce it, running evo 2.30.1.2. Only when applying the patch I'm able to open the above mentioned messages. E.g. Dienstagsbrief Nr. 16 causes the following: [Thread debugging using libthread_db enabled] [New Thread 0xae2acb70 (LWP 12097)] [New Thread 0xafcfeb70 (LWP 12090)] [New Thread 0xb0e27b70 (LWP 12088)] [New Thread 0xb1628b70 (LWP 12087)] [New Thread 0xb1e4eb70 (LWP 12086)] [New Thread 0xb264fb70 (LWP 12085)] 0xb7829424 in __kernel_vsyscall ()
+ Trace 221596
Thread 2 (Thread 0xae2acb70 (LWP 12097))
Created attachment 159815 [details] screenshot when opening one of the mails Sometimes evo does not segfault, it simply freezes...
Created attachment 159824 [details] Valgrind log Here a valgrind log when opening the mail mentioned, maybe it helps you to track the bug down. Strange that you can't reproduce the segfault/crash with the mails attached because I'm testing evo compiled from upstream-tarballs without 3rd-party patches applied and for me it is always reproducible.
Thanks for the update. It's really accessing invalid memory, but the memory wasn't allocated for some reason (see the list commented line): > Thread 7: > Invalid read of size 4 > at 0x63183C0: em_format_snoop_type (em-format.c:2021) > by 0x631A21A: em_format_part_as (em-format.c:659) > by 0x631A354: em_format_part (em-format.c:704) > by 0x631AD98: emf_multipart_mixed (em-format.c:1435) > by 0x631A20A: em_format_part_as (em-format.c:675) > by 0x631A354: em_format_part (em-format.c:704) > by 0x64FE238: efh_format_message (em-format-html.c:2782) > by 0x64FC623: efh_format_exec (em-format-html.c:216) > by 0x6510EC7: mail_msg_proxy (mail-mt.c:471) > by 0x517F62B: g_thread_pool_thread_proxy (gthreadpool.c:315) > by 0x517D71E: g_thread_create_proxy (gthread.c:1893) > by 0x47CE584: start_thread (pthread_create.c:300) > by 0x52B029D: clone (clone.S:130) > Address 0x3 is not stack'd, malloc'd or (recently) free'd Do you think your distribution is using any patches to the official release? This is for evolution-data-server and I suppose you do not compile it yourself, do you? Maybe some level of compiler optimization involved here? I'll try and report back tomorrow or so.
> Do you think your distribution is using any patches to the official release? The distro (debian) does and I use the debian config as a base for compiling. The patches to eds are not worth to mention, the only one I'm using from debian is the one that is relocating camel-provider-dir to be compatible to the distro as a whole. > This is for evolution-data-server and I suppose you do not compile it yourself, > do you? As said, I do. > Maybe some level of compiler optimization involved here? No, only defaults. But after you've pointed me to eds, I've started to play around with the configure switches. Debian passes "--enable-largefile" to evolution-data-server. After building an eds package without that switch, evo stopped segfaulting. So there seems to be something wrong with largefile-support.
(In reply to comment #18) > But after you've pointed me to eds, I've started to play around with the > configure switches. Debian passes "--enable-largefile" to > evolution-data-server. > > After building an eds package without that switch, evo stopped segfaulting. So > there seems to be something wrong with largefile-support. good catch, same situation here with Gentoo. After rebuilding without largefile support my crashes on sending Exchange messages are gone, too. See: https://bugzilla.gnome.org/show_bug.cgi?id=612178#c9
Well, that answers the question of whether turning on large file support breaks existing installs. Apparently it does. Damn. For the record, large file support itself is not broken. As I understand it, one of our binary cache files (I still haven't figured out which) has a field or fields whose byte size depends on sizeof(offset_t). Toggling large file support changes the result of sizeof(offset_t). If my understanding is correct, it boils down to one of the binary files being misread and not having sufficient input validation. In which binary files is this is issue is the million dollar question. That's what we need to hunt down.
Hrm, it's probably something with that optimisation too, because I enabled the largefile support and nothing wrong happened, everything same good. I use -O0. This is Fedora 12, i686.
I've done a jhbuild of evolution (gnome 2.30.1 tarball release moduleset) with CFLAGS set to '-O0 -g', so only upstream tarballs, no patches, no optimization, nothing skipped in favour of system libs, the only change is to enable largefile support in eds. What shall I say: evo segfaults for me (yes, now "Dienstagsbrief" opens, but I still have a lot of mails that aren't). Like said, does not happen when either applying the patch of Michel Dänzer or skipping "--enable-largefile" on eds build.
*** Bug 617817 has been marked as a duplicate of this bug. ***
*** Bug 617305 has been marked as a duplicate of this bug. ***
*** Bug 617886 has been marked as a duplicate of this bug. ***
*** Bug 614824 has been marked as a duplicate of this bug. ***
'm hitting this in Debian since they upgraded to 2.30.1.2.
I got the same SEGV on Debian unstable today. What I was doing is: click on unread email, on email list window. Crash is repeteable using the very same message. The specific message is available. More information and details are available at the Debian bug #582087: http://bugs.debian.org/582087
It's worth noting that, as far as I know, all reports are on i386 installs. Nothing about x86_64 (or other arches where evolution might be installed, like ppc).
(In reply to comment #29) > It's worth noting that, as far as I know, all reports are on i386 installs. > Nothing about x86_64 (or other arches where evolution might be installed, like > ppc). I'm on powerpc.
(In reply to comment #30) > I'm on powerpc. Thanks for letting us know :)
*** Bug 619059 has been marked as a duplicate of this bug. ***
I'm applying Michel's patch to gnome-2-30 because with Camel's API being sealed up for 3.0, the patch is essentially what the code looks like now in master. You can't say "mem->buffer" anymore because the "buffer" member is now private. I think this only fixes a symptom of a deeper problem though, so leaving the bug open until we get to the bottom of this. http://git.gnome.org/browse/evolution/commit/?h=gnome-2-30&id=cacfd2114e7dd56cc12613d625bac450cc69b4ae
*** Bug 619108 has been marked as a duplicate of this bug. ***
*** Bug 619125 has been marked as a duplicate of this bug. ***
*** Bug 619166 has been marked as a duplicate of this bug. ***
*** Bug 619159 has been marked as a duplicate of this bug. ***
I found a couple spots in Camel where we might be getting into trouble with large file support. Again, I don't think the problem is with large file support itself, but with the fact that on a 32-bit system, sizeof(off_t) changes from 4 bytes to 8 bytes. The first thing I spotted is probably insignificant but I'll mention it anyway. The mbox backend was converting an off_t value to a string improperly: http://git.gnome.org/browse/evolution-data-server/commit/?id=777c55b67ea450834e53faf72fa6b325c9347071 The second is probably more significant. Camel has a set of functions for encoding and decoding values for use when saving to and loading from binary files: http://library.gnome.org/devel/camel/stable/camel-camel-file-utils.html We leaned on these functions much more heavily prior to the introduction of SQLite message summary databases. Of particular note are camel_file_util_encode_off_t() and camel_file_util_decode_off_t(), which uses sizeof(off_t) to write to or read from a binary file. The problem here is if you encode an off_t value with large file support disabled on a 32-bit system, it will write 4 bytes to the file. Then if you rebuild Camel with large file support enabled, decoding a off_t value will read 8 bytes from the file. However, after grepping for and examining places where these functions are called, it turns out most of the call sites are in dead code -- code for the old disk-based message summaries that's still present but appears to be disabled. After Evolution 2.31.2 ships on Monday, I'll be ripping out all this dead code so I can get a better look at the situation. If we still have binary files that are holding off_t values, and (hopefully) if those files have a file format version identifier embedded in them, then the solution will likely be to bump the file format version and rewrite the off_t encode/decode functions to always convert off_t values to 64-bits. All of this is theoretical, however. I still don't have any direct evidence that links the crashes reported here to the bugs I just described.
*** Bug 619375 has been marked as a duplicate of this bug. ***
*** Bug 619178 has been marked as a duplicate of this bug. ***
(In reply to comment #38) > http://git.gnome.org/browse/evolution-data-server/commit/?id=777c55b67ea450834e53faf72fa6b325c9347071 Did the above commit introduce this warning? I see it when compiling actual master. Maybe some of my libraries is old? camel-mbox-summary.c: In function ‘message_info_to_db’: camel-mbox-summary.c:436: warning: format ‘%lli’ expects type ‘long long int’, but argument 2 has type ‘off_t’
Yeah, looks like it did. The value needs to be cast to a goffset. I guess you're seeing it because you're on a 64-bit machine, and I'm on 32-bit.
Following up on my earlier analysis, I've now removed all the unused methods from CamelFolderSummary and the various providers. The only remaining call to camel_file_util_decode_off_t() is in some mbox migration method, so it looks like that function is not the source of these crashes. So either it has something to do with the printf thing that I didn't think was relevant or I'm back to the drawing board. It would be helpful if Debian and any other distro that's enabled large file support could apply the commit in comment #41 (along with a (goffset) type cast) and see if it makes any difference.
Hey, I plan to upload to debian an evolution patched with bc054c94cb46e4f8f8881c2a1b0268e2f05b307b and 4a2343cb34498c701e71679e3c50c9fc81dd5b80 to fix the segfault on 32 bits. Is there anything else I should apply? 777c55b67ea450834e53faf72fa6b325c9347071 I guess, and then?
(In reply to comment #44) > Hey, > > I plan to upload to debian an evolution patched with > bc054c94cb46e4f8f8881c2a1b0268e2f05b307b and > 4a2343cb34498c701e71679e3c50c9fc81dd5b80 to fix the segfault on 32 bits. Is > there anything else I should apply? 777c55b67ea450834e53faf72fa6b325c9347071 I > guess, and then? Sorry, the two first ones are unrelated, I meant cacfd2114e7dd56cc12613d625bac450cc69b4ae
(And cacfd211 is against evolution while 777c55b6 is against eds, too)
*** Bug 619582 has been marked as a duplicate of this bug. ***
(In reply to comment #46) > (And cacfd211 is against evolution while 777c55b6 is against eds, too) Yes, those two would be the ones to try. I think I may also enable large file support in Fedora for 2.31.x and see if we can get some more statistics from early adopters.
*** Bug 619427 has been marked as a duplicate of this bug. ***
(In reply to comment #42) > Yeah, looks like it did. The value needs to be cast to a goffset. > > I guess you're seeing it because you're on a 64-bit machine, and I'm on 32-bit. Nono, I'm on 32 bit as well, but I do not compile eds with --enable-largefile (for now). It's a big issue, though, as regenerating folders.db summary stores incorrect offsets in the summary (some negative number, too large to write it here), and though you can see messages the first run, the second run they are not viewable. Thus I created commit 4f700fb in eds master (2.31.3+).
*** Bug 620597 has been marked as a duplicate of this bug. ***
Hey, I just wanted to say that I notified the maintainer of https://launchpad.net/~jacob/+archive/evo230 about this problem (having segfaults on startup with evolution 2.30.x with imap+). He then released a patched version of evolution (2.30.1.2-2ubuntu1~ppa2 *), and this has immediately fixed my problem (evo started normally without segfaulting). Since then, everything has been smooth sailing on my end. For the record, I'm running on 32 bits PAE. I hope this additional info helps. *: http://launchpadlibrarian.net/49179092/evolution_2.30.1.2-2ubuntu1~ppa1_2.30.1.2-2ubuntu1~ppa2.diff.gz
*** Bug 621101 has been marked as a duplicate of this bug. ***
*** Bug 621104 has been marked as a duplicate of this bug. ***
I am having a similar problem on an X86_64 system that used to be 32bit but converted to 64bit. Now evolution crashes constantly whether largefile support is compiled in or not.
Is there a utility some where that will create clean files for evolution and migrate the old on into the new?
Robin: see Help->FAQ OK, after all the similar bugs and some investigation and chatting it turned out that the real problem is with the large file support being enabled on eds side, but not on the consumer, which is Evolution in our case. Because the off_t is of a different size, and it's used in the public API and influences structure size, then the compiler calculates wrong memory offsets in the structure and crashes the application (I suppose). There are two possibilities how to fix this: a) when the eds has enabled large file support, then also every consumer of it should have enabled it (which is pretty unlikely to be done) b) do not expose off_t in the public API and use goffset instead I have a little test application which demonstrates the issue, as it's in both ways: either if eds has large file support enabled and the consumer not or eds is without it, but consumer has large file support enabled. The crash usually happens on lines like: > ((CamelStreamMem *)mem)->buffer->... This is mainly with 2.30.x, and does not exhibit with the git master (2.31.x) because of API changes and GObject-ification of Camel in it, at least with CamelStreamMem, which is used quite extensively. Changing the API will be good to have anyway.
Created attachment 165791 [details] test app test application demonstrating the issue out of evolution itself. It crashes based on the fact one uses -D_FILE_OFFSET_BITS=64 or not, in both eds and when compiling this, on a 32bit system.
Created attachment 165795 [details] [review] eds patch for evolution-data-server; Do not expose off_t in public API, use goffset instead. There left two functions, but they shouldn't be used outside of eds anyway.
Created commit 4b28fdd in eds master (2.31.6+)
*** Bug 500591 has been marked as a duplicate of this bug. ***
*** Bug 619259 has been marked as a duplicate of this bug. ***