Bug 630293 – SEGV in magazine_cache_trim()

After an evaluation, GNOME has moved from Bugzilla to GitLab. Learn more about GitLab.
No new issues can be reported in GNOME Bugzilla anymore.
To report an issue in a GNOME project, go to GNOME GitLab.
Do not go to GNOME Gitlab for: Bluefish, Doxygen, GnuCash, GStreamer, java-gnome, LDTP, NetworkManager, Tomboy.

Bug 630293 - SEGV in magazine_cache_trim()


Summary:	SEGV in magazine_cache_trim()


Status:	RESOLVED OBSOLETE

Product:	evolution
Classification:	Applications
Component:	general
Version:	3.0.x (obsolete)
Hardware:	Other Linux

Importance:	Normal critical
Target Milestone:	---
Assigned To:	Evolution Shell Maintainers Team
QA Contact:	Evolution QA team

URL:
Whiteboard:

Depends on:
Blocks:

Reported:	2010-09-21 20:44 UTC by David Woodhouse
Modified:	2014-03-07 10:04 UTC

See Also:
GNOME target:	---
GNOME version:	---

Attachments
valgrind annotations for gslice (2.76 KB, patch) 2010-10-12 20:57 UTC, David Woodhouse	needs-work	Details \| Review

Description David Woodhouse 2010-09-21 20:44:44 UTC

Program received signal SIGSEGV, Segmentation fault.

+ Trace 223848

Thread 140735164679952 (LWP 3485)

#0 magazine_cache_trim
at gslice.c line 596
#1 magazine_cache_push_magazine
at gslice.c line 657
#2 private_thread_memory_cleanup
at gslice.c line 724
#3 __nptl_deallocate_tsd
at pthread_create.c line 154
#4 start_thread
at pthread_create.c line 308
#5 clone
at ../sysdeps/unix/sysv/linux/x86_64/clone.S line 115

Comment 1 David Woodhouse 2010-09-21 20:46:29 UTC

(gdb) t a a bt

+ Trace 223849

Thread 2559 (Thread 0x7fffa5dd5710 (LWP 3879))

#0 __lll_lock_wait
at ../nptl/sysdeps/unix/sysv/linux/x86_64/lowlevellock.S line 136
#1 _L_lock_868
from /lib64/libpthread.so.0
#2 __pthread_mutex_lock
at pthread_mutex_lock.c line 61
#3 magazine_cache_push_magazine
at gslice.c line 640
#4 private_thread_memory_cleanup
at gslice.c line 724
#5 __nptl_deallocate_tsd
at pthread_create.c line 154
#6 start_thread
at pthread_create.c line 308
#7 clone
at ../sysdeps/unix/sysv/linux/x86_64/clone.S line 115

Thread 2500 (Thread 0x7fff757f8710 (LWP 3485))

#0 magazine_cache_trim
at gslice.c line 596
#1 magazine_cache_push_magazine
at gslice.c line 657
#2 private_thread_memory_cleanup
at gslice.c line 724
#3 __nptl_deallocate_tsd
at pthread_create.c line 154
#4 start_thread
at pthread_create.c line 308
#5 clone
at ../sysdeps/unix/sysv/linux/x86_64/clone.S line 115

Thread 1 (Thread 0x7ffff4d96940 (LWP 28720))

#0 g_str_hash
at gstring.c line 135
#1 g_hash_table_lookup_node
at ghash.c line 308
#2 g_hash_table_lookup
at ghash.c line 897
#3 g_quark_from_string_internal
at gdataset.c line 1047
#4 g_quark_from_string
at gdataset.c line 1077
#5 g_object_dispatch_properties_changed
at gobject.c line 799
#6 g_object_notify_queue_thaw
at gobjectnotifyqueue.c line 132
#7 g_object_newv
at gobject.c line 1386
#8 g_object_new
at gobject.c line 1178
#9 mail_msg_new
at mail-mt.c line 95
#10 ping_store
at mail-folder-cache.c line 830
#11 g_hash_table_foreach
at ghash.c line 1324
#12 ping_cb
at mail-folder-cache.c line 841
#13 g_timeout_dispatch
at gmain.c line 3555
#14 g_main_dispatch
at gmain.c line 2119
#15 g_main_context_dispatch
at gmain.c line 2672
#16 g_main_context_iterate
at gmain.c line 2750
#17 g_main_loop_run
at gmain.c line 2958
#18 IA__gtk_main
at gtkmain.c line 1219
#19 main
at main.c line 671

Comment 2 David Woodhouse 2010-09-22 15:45:10 UTC

I suspect this one is probably related...

Program received signal SIGSEGV, Segmentation fault.

+ Trace 223856

Thread 140734917236496 (LWP 22528)

#0 magazine_chain_pop_head
at gslice.c line 486
#1 thread_memory_magazine1_alloc
at gslice.c line 789
#2 g_slice_alloc
at gslice.c line 827
#3 g_slist_prepend
at gslist.c line 272
#4 pool_depth_list
at gparam.c line 1218
#5 g_hash_table_foreach
at ghash.c line 1324
#6 g_param_spec_pool_list
at gparam.c line 1279
#7 g_object_class_list_properties
at gobject.c line 659
#8 object_state_write
at camel-object.c line 269
#9 camel_object_state_write
at camel-object.c line 453
#10 vee_folder_sync
at camel-vee-folder.c line 1087
#11 camel_folder_sync
at camel-folder.c line 1124
#12 refresh_folders_exec
at mail-send-recv.c line 910
#13 mail_msg_proxy
at mail-mt.c line 487
#14 g_thread_pool_thread_proxy
at gthreadpool.c line 314
#15 g_thread_create_proxy
at gthread.c line 1897
#16 start_thread
at pthread_create.c line 301
#17 clone
at ../sysdeps/unix/sysv/linux/x86_64/clone.S line 115

$1 = (ChunkLink **) 0x7fffcbe62810
(gdb) p *magazine_chunks
$2 = (ChunkLink *) 0x9350f0
(gdb) p chunk
$3 = (ChunkLink *) 0x7470697263736564
(gdb) p (char *)&chunk
$4 = 0x9350f8 "descript\a"

Comment 3 Milan Crha 2010-09-24 14:16:43 UTC

I recall similar issue, which was, if I recall correctly, caused by freeing GSLice'ed memory with g_free. The above seems to me like just a random crash, so do you have any steps how to reproduce this, please? Maybe valgrind with G_SLICE=always-malloc may help here.

Comment 4 David Woodhouse 2010-09-24 15:14:32 UTC

I've no idea how to reproduce, I'm afraid -- apart from the fact that in the Red Hat abrt bug, it seems it happened when I hit Ctrl-R to reply to a message.

It seems very random. I've been running in valgrind a lot and haven't seen anything that seems related. And wouldn't valgrind complain about using g_free on GSlice allocations? Or does using G_SLICE=always-malloc make that work out OK from valgrind's point of view?

Comment 5 Milan Crha 2010-09-27 08:55:29 UTC

To be honest I do not know. Valgrind usually claims when you try to free something "in the middle" of an allocated memory block (where I suppose G_SLICE is one large memory block), so you might be right that using G_SLICE=always-malloc can be counter productive for such issues, though for most other it perfectly fits.

Comment 6 David Woodhouse 2010-10-08 09:08:24 UTC

Just seen it again with current master. Had just started evolution and was reading mail.

Program received signal SIGSEGV, Segmentation fault.

+ Trace 224057

Thread 140736181815056 (LWP 14337)

#0 magazine_chain_pop_head
at gslice.c line 492
#1 thread_memory_magazine1_alloc
at gslice.c line 795
#2 g_slice_alloc
at gslice.c line 833
#3 g_slist_prepend
at gslist.c line 273
#4 pool_depth_list
at gparam.c line 1224
#5 g_hash_table_foreach
at ghash.c line 1328
#6 g_param_spec_pool_list
at gparam.c line 1285
#7 g_object_class_list_properties
at gobject.c line 774
#8 object_state_write
at camel-object.c line 269
#9 camel_object_state_write
at camel-object.c line 453
#10 vee_folder_synchronize_sync
at camel-vee-folder.c line 1366
#11 camel_folder_synchronize_sync
at camel-folder.c line 3324
#12 refresh_folders_exec
at mail-send-recv.c line 914
#13 mail_msg_proxy
at mail-mt.c line 473

Comment 7 David Woodhouse 2010-10-08 09:27:01 UTC

This looks like it's the same as bug 624081, 618128, 623822, 623246, and maybe 621314

Comment 8 David Woodhouse 2010-10-12 08:43:41 UTC

Came down this morning and found evolution sitting at the same crash again. I don't think there's any special trick to reproducing it.

Program received signal SIGSEGV, Segmentation fault.
[Switching to Thread 0x7fffdddf5710 (LWP 31367)]
magazine_chain_pop_head (mem_size=16) at gslice.c:492
492	      (*magazine_chunks)->data = chunk->next;

Comment 9 David Woodhouse 2010-10-12 10:13:00 UTC

What we should really do is put the appropriate valgrind client calls into gslice.c -- see http://valgrind.org/docs/manual/mc-manual.html#mc-manual.mempools

Then we wouldn't need to use G_SLICE=always-malloc to get sane results out of valgrind, and our use of valgrind wouldn't hide this problem.

But I'm far too lazy to do that today for both allocators, so I'm using a dirty hack to ensure that using g_free() on GSlice memory will upset valgrind even with GSLICE=always-malloc:

--- gslice.c~	2010-09-13 16:57:51.000000000 +0100
+++ gslice.c	2010-10-12 11:08:21.000000000 +0100
@@ -839,7 +839,7 @@ g_slice_alloc (gsize mem_size)
       g_mutex_unlock (allocator->slab_mutex);
     }
   else                          /* delegate to system malloc */
-    mem = g_malloc (mem_size);
+    mem = g_malloc (mem_size + 8) + 8;
   if (G_UNLIKELY (allocator->config.debug_blocks))
     smc_notify_alloc (mem, mem_size);
 
@@ -904,7 +904,7 @@ g_slice_free1 (gsize    mem_size,
     {
       if (G_UNLIKELY (g_mem_gc_friendly))
         memset (mem_block, 0, mem_size);
-      g_free (mem_block);
+      g_free (mem_block - 8);
     }
   TRACE (GLIB_SLICE_FREE((void*)mem_block, mem_size));
 }
@@ -980,7 +980,7 @@ g_slice_free_chain_with_offset (gsize   
           abort();
         if (G_UNLIKELY (g_mem_gc_friendly))
           memset (current, 0, mem_size);
-        g_free (current);
+        g_free (current - 8);
       }
 }

Comment 10 David Woodhouse 2010-10-12 20:57:23 UTC

Created attachment 172216 [details] [review]
valgrind annotations for gslice

This adds annotations to gslice, although it's not quite right because it still keeps track of the pages allocated with memalign(). So although it mostly seems to give the right complaints, it gives the wrong allocation trace:

==2251== 1 errors in context 1 of 2:
==2251== Invalid free() / delete / delete[]
==2251==    at 0x4A04D72: free (vg_replace_malloc.c:325)
==2251==    by 0x400588: main (in /home/dwmw2/slice)
==2251==  Address 0x4f38020 is 32 bytes inside a block of size 496 alloc'd
==2251==    at 0x4A04360: memalign (vg_replace_malloc.c:532)
==2251==    by 0x4A043B9: posix_memalign (vg_replace_malloc.c:660)
==2251==    by 0x4C683EB: slab_allocator_alloc_chunk (gslice.c:1164)
==2251==    by 0x4C69970: g_slice_alloc (gslice.c:682)
==2251==    by 0x400565: main (in /home/dwmw2/slice)

In this case, 0x4f38020 was *also* allocated by a call to g_slice_alloc()...

Comment 11 Milan Crha 2010-10-13 07:07:47 UTC

(In reply to comment #10)
> valgrind annotations for gslice

Well, you should rather open a bug against glib and offer them this change. They will not look for glib patches in evolution bugs for sure. On the other hand, if this patch fixes the initial issue then this bug can be safely moved to glib, instead of filling the new bug.

Comment 12 David Woodhouse 2010-10-13 10:11:55 UTC

The patch doesn't fix anything -- just makes valgrind work a little better with gslice, so we don't have to use GSLICE=always-malloc, and running in valgrind should actually catch this bug.

But it isn't finished yet; I'll submit it when I've got a response to http://www.mail-archive.com/valgrind-users@lists.sourceforge.net/msg02045.html and fixed the remaining issues.

Comment 13 David Woodhouse 2010-10-19 21:59:26 UTC

See bug 335126 for gslice/valgrind discussion. Since implementing the offset allocation as described in comment 9 above, I still haven't managed to trigger this.

Comment 14 David Woodhouse 2010-10-26 11:04:26 UTC

I still haven't managed to reproduce this... until last night, when I tried to connect with 2.32 when I wasn't on the VPN. (Yes, it should *know* that it needs a VPN connection, and it should ask NetworkManager to make one before that particular account comes online, but that's about three separate RFEs for another day).

This wasn't the box with my debugging version of glib, but it does make me strongly suspect that this is a manifestation of bug 631290 and bug 632212.

Comment 15 Milan Crha 2010-10-27 06:47:59 UTC

(In reply to comment #14)
> This wasn't the box with my debugging version of glib, but it does make me
> strongly suspect that this is a manifestation of bug 631290 and bug 632212.

Do you propose to close this one in favour of one of these bugs? Maybe also part of bug #631804? I saw a strange crash in mail_msg_free, but after a workaround on this it didn't crash for a day (at other user's machine).

Comment 16 David Woodhouse 2010-10-27 13:06:31 UTC

Dunno. In comment 7 I said this looked very similar to a number of other bugs, and I'm not sure that *all* of those are going to be the same as the imapx connect failure one. I suspect there may be a few memory corruptors, and we'll really need the gslice/valgrind stuff working to properly get to the bottom of all of them.

Comment 17 André Klapper 2013-03-27 08:46:46 UTC

Comment on attachment 172216 [details] [review]
valgrind annotations for gslice

Patch "isn't finished yet" according to comment 12 --> needs-work

Comment 18 Milan Crha 2014-03-07 10:04:44 UTC

I'm closing this as obsolete.